How to Crawl Authority Sites for External Links


Yesterday I wrote that I like to crawl authority sites for external links. Some of you wondered how to do that. There a bunch a of ways to do this. This is one way.

What is an authority site?

Since we are trying to identify high-quality, authoritative sites for SEO, we turn to Google:

Of course, we aren’t disclosing the actual ranking signals used in our algorithms because we don’t want folks to game our search results; but if you want to step into Google’s mindset, the questions below provide some guidance on how we’ve been looking at the issue:

  • Would you trust the information presented in this article?
  • Is this article written by an expert or enthusiast who knows the topic well, or is it more shallow in nature?
  • Does the site have duplicate, overlapping, or redundant articles on the same or similar topics with slightly different keyword variations?
  • Would you be comfortable giving your credit card information to this site?
  • Does this article have spelling, stylistic, or factual errors?
  • Are the topics driven by genuine interests of readers of the site, or does the site generate content by attempting to guess what might rank well in search engines?
  • Does the article provide original content or information, original reporting, original research, or original analysis?
  • Does the page provide substantial value when compared to other pages in search results?
  • How much quality control is done on content?
  • Does the article describe both sides of a story?
  • Is the site a recognized authority on its topic?
  • Is the content mass-produced by or outsourced to a large number of creators, or spread across a large network of sites, so that individual pages or sites don’t get as much attention or care?
  • Was the article edited well, or does it appear sloppy or hastily produced?
  • For a health related query, would you trust information from this site?
  • Would you recognize this site as an authoritative source when mentioned by name?
  • Does this article provide a complete or comprehensive description of the topic?
  • Does this article contain insightful analysis or interesting information that is beyond obvious?
  • Is this the sort of page you’d want to bookmark, share with a friend, or recommend?
  • Does this article have an excessive amount of ads that distract from or interfere with the main content?
  • Would you expect to see this article in a printed magazine, encyclopedia or book?
  • Are the articles short, unsubstantial, or otherwise lacking in helpful specifics?
  • Are the pages produced with great care and attention to detail vs. less attention to detail?
  • Would users complain when they see pages from this site?

Obviously, not all of these will apply to every site, page, article, etc. That is why they are called guidelines.

You should also review Google’s Search Quality Rating Guidelines.

You should also use the best law firm SEO tool ever created: your brain.

You probably already know what sites are authoritative on your subject matter. A few examples that I usually check out:

  • Government sites
  • Education sites (universities, law schools, etc)
  • Professional organization sites
  • State Bar sites
  • News sites
  • Real blogs

You can probably think of many more.

Once you have a good list of these sites (say 20 or so), you can crawl them for external links.

How to crawl your selected target sites

As previously mentioned, there are a bunch of ways to crawl sites. I prefer Screaming Frog.

Screaming Frog

Simply enter the domain you would like to crawl, click the “External” tab and click start. Screaming Frog will identify sites/pages that your target site is already linking to.

Export the list to a .csv and dive into new link opportunities.

What to do with the crawl data

You should parse your crawl data for:

  • Links to other sites that are similar to yours
  • Broken links
  • Links to images
  • Links to specific pages (as opposed to home page links)

The idea here is to identify places your target site is already linking as an indication that they might also be willing to link to you.

Hint: The fact that links to is not an indication that CNN is likely to link to you.

Generally speaking, the more relevant and local your site is to your law firm, the better opportunities you are likely to uncover.

Once you’ve identified some good candidates, it’s time to start thinking about why the target site might link to you.

Does your site have a great page that could replace a broken link resource?

Has the other site published articles on subjects on which you have expertise?

Do you have something cool on your site that is pretty obviously interesting to the other site’s audience?

If the answer is yes to any of these, it might be time to introduce yourself and your site to the people at your target site.

This doesn’t mean mass emailing webmasters of a bunch of target sites. That’s spam friend.

Instead, introduce yourself as you would in real life.

Do you go up to people at networking events and ask them for links? I hope not. So don’t do that online.

Start a conversation. If and when it seems appropriate, suggest something that would:

  • Improve their site (fixing a broken link)
  • Help their audience
  • Help them

This is one of most frictionless ways to acquire links from high-quality sites.

But remember, high-quality sites don’t maintain their quality by linking to every Jack and Jill that comes along.

If you’re going to approach these folks, you better bring something worth linking to.