Fans Facebook
Related Posts Plugin for WordPress, Blogger...
Showing posts with label Mozscape. Show all posts
Showing posts with label Mozscape. Show all posts

Thursday, August 16, 2012

400% Higher Throughput Mozscape API Now in Beta, And Seeking Testers

If you've used or considered using the Mozscape API to retrieve link metrics data, we've got something unique to share - a brand new beta of a much faster, more robust API. This beta version currently has just a few testers (and we're seeking more), but thus far, we're seeing remarkable results.

Carin, who manages the big data team here at Moz, helped share the story with me last week:

The current API is not able to support everyone's use case! Some people need to make a lot of calls in a really short period of time - our API currently can't support more than 10 requests/second (even for paid users). Others have a large list of URLs they want to update metrics on every new index release - our current API doesn't support batching very well and will timeout with batch sizes larger than 50 URLs.The beta version has made some serious performance improvements with single URL throughput and can handle 200 requests / second - the beta API is seeing a 400% throughput improvement, although response times will still be the sameTo address batching users we've developed a new batching model - online batching (available in the beta API) and offline batching (coming soon to the beta API) Online batching: the maximum amount of results we can process in a POST without a timeout from S3. This has been improved from 50 URLs to 500 URLs in one batch requestOffline batching (still in development): for batch sizes larger than 500 URLs, offline batching will process through the entire list (probably up to a certain limit not yet decided) and return a downloadable CSV link to S3 where all the data will be available. Since this is still in development, it is not clear the SLA on offline batching, but this feature will be also be available for beta testing as soon as it is feature complete!

Mozscape's API is pretty big today - we served 154,352,249 (154 million) requests in the first 10 days of August and returned 1,186,736,774 (1.2 billion) rows of link metrics data.

You can still sign up free or try our paid API, but if you have serious demand for high-volume or large batches of link data, we'd love to have you in the beta for the new API. Just contact Andrew Dumont - andrew@seomoz.org - and he'll get you set up!


View the original article here

Two Mozscape Updates in August! And More Info on Why PA/DA Fluctuate

Just 14 short days ago, I wrote about the August Mozscape index update. Today, as part of our efforts to create shorter deltas between indices, I'm excited to announce that we have our fastest ever time between updates. There's new data right now in the Mozscape API (for which we're still seeking beta testers on the new version), in Open Site Explorer, through the Mozbar, and in your PRO web app.

This current index has the following metrics:

60,852,245,271 (60 billion) URLs657,072,652 (657 million) Subdomains153,355,227 (153 million) Root Domains610,557,978,730 (610 billion) LinksFollowed vs. Nofollowed 2.26% of all links found were nofollowed 54.95% of nofollowed links are internal45.05% are externalRel Canonical - 13.46% of all pages now employ a rel=canonical tagThe average page has 70 links on it 59.91 internal links on average10.57 external links on average

And the following correlations with Google's US search results:

Page Authority - 0.34Domain Authority - 0.24MozRank - 0.20Linking Root Domains - 0.24Total Links - 0.20External Links - 0.24

Below is a histogram showing this update's crawling pattern:

2nd August Mozscape Index Crawl Histogram

Basically, this is very good news. We had an outage of our crawler in early June, but the large amounts of crawling performed in late July mean a lot of this index is extremely fresh - in fact, parts of this index are the freshest we've ever had (launched ~20 days after crawling - that's some speedy processing).

Every index, we get a lot of questions about why a site's/page's PA/DA goes up or down. The answer's not easy because the inputs vary quite a bit, but basically, four things can cause change in these metrics from index to index:

The site/page received more or fewer links or more/fewer more/less powerful links. Your site's link profile may even remain completely unchanged and still see fluctuation in DA/PA because the sites pointing to you have been recalculated to have better or worse metrics.Google changed things in their ranking algorithm and thus our models for DA/PA, which measure and attempt to track to correlation with Google's rankings changed, too.The web's link graph changed, and what was "0" (the lowest possible score) is now lower/higher than before and/or what was "100" (the highest possibly score) is now higher/lower than before. Essentially, think of this as the goalposts moving because the field's gotten bigger or smaller.Our web index changed in size/structure as we toss our more spam/junk and crawl more/fewer webpages, potentially biasing against links we were counting or hadn't counted in prior indices.

Thus, it's very hard to know for sure whether an increase in DA/PA for a particular page is entirely tied to your efforts, Google's changes or changes to the web as a whole. This is why I strongly, strongly recommend tracking your metrics against your competition. For example, in July, I compared several sites to show the delta between their scores across the May vs. July index like so:

Mozscape Data for Seattle Startups from the May Index Update

Above: May's 165 Billion URL index data

July Mozscape Data

Above: July's 78 Billion URL index data

Comparison of August 1st update data

Above: August 1st's 69 Billion URL index
(please ignore the SEOmoz.org numbers in this one - we had an error that affected our own site in the last index)

Above: August 14th's 61 Billion URL index
(again, please ignore SEOmoz.org numbers. Index error on our part)

This comparative process is done for you inside the PRO web app if/when you set up competitors: 

Domain Authority Over Time

Using the comparison data is a great way to get a sense of whether you're gaining/losing vs. the competition and remove a lot of the bias from the other types of macro-index-level modifiers. More so than any other methodology, I recommend this technique to help get a sense for how your site's metrics perform vs. a raw historical perspective.

As you can see, the past few indices have been falling in size. This is due to our efforts to make indices faster and more consistent. We hope to remain in the 60-70 billion URL range for the next few indices, and we're relatively close to having our first index produced on our new private cloud. It will take a while, possibly 6 months, to get back up to the 150 billion page indices we had this Spring (which were very, very slow and stale), but the goal is to have an index every 2 weeks that exceeds that size. Exciting stuff, but crazy hard. Luckily, we have a fantastic and growing team of engineers working on it. If you know great minds in the field, we still pay $12,000 referral and signing bonuses, so send 'em our way!

Thanks very much - looking forward to your feedback.


View the original article here

Sunday, July 22, 2012

July Mozscape Update

It's time for another Mozscape index update. New data is now available in Open Site Explorer, the Mozbar, our tools and through the API. July's update comes with some good news, and potentially some bad news, too. As you're likely aware, the previous two indices, while huge in size (150B+ URLs each) suffered from a lack of freshness due to the additional processing time required to calculate our link graph and metrics over such phenomenally big numbers of links & pages. Today's index is relatively large by prior standards (~72B URLs, larger than most anything we launched before April 2012). And it's slightly fresher - the link data in the index today was crawled almost entirely in May.

This index was originally scheduled to launch earlier, but ran into troubles, including Amazon's AWS outage and plenty of hardware failures, too. As we've mentioned in the past, SEOmoz is in the process of building a new private hybrid cloud datacenter that will replace AWS for Mozscape and should provide us with much greater reliability. We know how important it is to have regular data updates you can count on, and we're putting people and money to work as fast as possible to get off the unreliability that Amazon's systems have created.

Let's take a look at the full metrics for this index:

78,813,641,094 (78 billion) URLs674,286,481 (674 million) Subdomains165,476,769 (165 million) Root Domains778,554,162,687 (778 billion) LinksFollowed vs. Nofollowed 2.33% of all links found were nofollowed57.62% of nofollowed links are internal42.38% are externalRel Canonical - 12.5% of all pages now employ a rel=canonical tagThe average page has 74 links on it 63.28 internal links on average10.72 external links on average

And here are the latest correlations between Mozscape metrics and Google's search results:

Page Authority - 0.34Domain Authority - 0.23MozRank - 0.19Linking Root Domains - 0.24Total Links - 0.2External Links - 0.24

Because this update is much smaller in total URL size (~50% of the prior, 165 billion URL index), your link count totals will likely be much smaller, even if you've grown your link building efforts. Below is an example of the numbers for various Seattle startups across May's larger index and July's smaller one:

Mozscape Data for Seattle Startups from the May Index Update

Above: May's 165 Billion URL index data

July Mozscape Data

Above: July's smaller, 78 Billion URL index data

Note that, as one might expect, link counts are between 50-75% of their former value. This percentage will be lower for sites that get many links from the far corners of the less-traversed, less-popular pages and sites on the web, and higher for sites with links from more popular/well-linked-to sites and pages.

We're working hard to grow index size in the future back up to 100Billion+ URLs. Our crawlers can already handle vastly more, and it's just the unreliability of Amazon's hardware that holds us back. Our engineers and sysops folks are working around the clock to get there as soon as we can.

We've also done some work recently to update the scoring systems for the Keyword Difficulty/SERPs Analysis Tool. You'll now see a more accurate and usable algorithm applied to results where very fresh pages are ranking, e.g. news, sports, trending topics, etc. Here's an example query that previously would have produced a keyword difficulty score of 1:

Libor Rate Scandal

Libor Rate Scandal was a SERP that until a few days ago, had virtually no traffic and very different results. All of these pages are ones that have been produced in the last day or two, and thus don't have Page Authority scores. However, the Domain Authority is now being used to help calculate KW difficulty, which should seriously help those of you who analyze fresh results.

The next 2-4 Mozscape index updates will continue to be on AWS, but we're now running 3-4 indices in parallel (which costs a fortune, but gives us fallback options if/when Amazon's failures lose an index or massively delay it). In the next 3-4 months, we hope to be operating indices off our new hybrid cloud environment and see much greater reliability, which will enable us to produce larger, fresher and more consistent updates.


View the original article here

Share

Twitter Delicious Facebook Digg Stumbleupon Favorites