Fans Facebook
Related Posts Plugin for WordPress, Blogger...

Sunday, July 22, 2012

July Mozscape Update

It's time for another Mozscape index update. New data is now available in Open Site Explorer, the Mozbar, our tools and through the API. July's update comes with some good news, and potentially some bad news, too. As you're likely aware, the previous two indices, while huge in size (150B+ URLs each) suffered from a lack of freshness due to the additional processing time required to calculate our link graph and metrics over such phenomenally big numbers of links & pages. Today's index is relatively large by prior standards (~72B URLs, larger than most anything we launched before April 2012). And it's slightly fresher - the link data in the index today was crawled almost entirely in May.

This index was originally scheduled to launch earlier, but ran into troubles, including Amazon's AWS outage and plenty of hardware failures, too. As we've mentioned in the past, SEOmoz is in the process of building a new private hybrid cloud datacenter that will replace AWS for Mozscape and should provide us with much greater reliability. We know how important it is to have regular data updates you can count on, and we're putting people and money to work as fast as possible to get off the unreliability that Amazon's systems have created.

Let's take a look at the full metrics for this index:

78,813,641,094 (78 billion) URLs674,286,481 (674 million) Subdomains165,476,769 (165 million) Root Domains778,554,162,687 (778 billion) LinksFollowed vs. Nofollowed 2.33% of all links found were nofollowed57.62% of nofollowed links are internal42.38% are externalRel Canonical - 12.5% of all pages now employ a rel=canonical tagThe average page has 74 links on it 63.28 internal links on average10.72 external links on average

And here are the latest correlations between Mozscape metrics and Google's search results:

Page Authority - 0.34Domain Authority - 0.23MozRank - 0.19Linking Root Domains - 0.24Total Links - 0.2External Links - 0.24

Because this update is much smaller in total URL size (~50% of the prior, 165 billion URL index), your link count totals will likely be much smaller, even if you've grown your link building efforts. Below is an example of the numbers for various Seattle startups across May's larger index and July's smaller one:

Mozscape Data for Seattle Startups from the May Index Update

Above: May's 165 Billion URL index data

July Mozscape Data

Above: July's smaller, 78 Billion URL index data

Note that, as one might expect, link counts are between 50-75% of their former value. This percentage will be lower for sites that get many links from the far corners of the less-traversed, less-popular pages and sites on the web, and higher for sites with links from more popular/well-linked-to sites and pages.

We're working hard to grow index size in the future back up to 100Billion+ URLs. Our crawlers can already handle vastly more, and it's just the unreliability of Amazon's hardware that holds us back. Our engineers and sysops folks are working around the clock to get there as soon as we can.

We've also done some work recently to update the scoring systems for the Keyword Difficulty/SERPs Analysis Tool. You'll now see a more accurate and usable algorithm applied to results where very fresh pages are ranking, e.g. news, sports, trending topics, etc. Here's an example query that previously would have produced a keyword difficulty score of 1:

Libor Rate Scandal

Libor Rate Scandal was a SERP that until a few days ago, had virtually no traffic and very different results. All of these pages are ones that have been produced in the last day or two, and thus don't have Page Authority scores. However, the Domain Authority is now being used to help calculate KW difficulty, which should seriously help those of you who analyze fresh results.

The next 2-4 Mozscape index updates will continue to be on AWS, but we're now running 3-4 indices in parallel (which costs a fortune, but gives us fallback options if/when Amazon's failures lose an index or massively delay it). In the next 3-4 months, we hope to be operating indices off our new hybrid cloud environment and see much greater reliability, which will enable us to produce larger, fresher and more consistent updates.


View the original article here

0 comments:

Post a Comment

Share

Twitter Delicious Facebook Digg Stumbleupon Favorites