September 2, 2011
Remember the early days of SEO? The dark days? Crawling from the primordial ooze, advising clients to make good use of their meta keywords tags; helping them get better positions in Ask Jeeves and AltaVista and WiseNut; telling them that, sadly, all the great content they had in PDFs was invisible to the engines and, as such, useless.
We’ve known for some time now that copy contained in PDFs can be of great benefit to the overall relevance and visibility of a website. We’ve seen PDF copy, actual copy, cached in the engines. We’ve seen PDFs position, for Pete’s sake. And now, Google Webmaster Trends Analyst Gary Illyes has posted an extremely informative entry in the Webmaster Blog detailing how the engine deals with PDFs, how best to make them visible, and, alternately, how to keep them from being indexed.
We absolutely advise you read the post, but here are a few takeaways:
• If you can copy the text from a PDF and paste it into another document, chances are Google can index it.
• Links in PDFs pass PageRank, but make sure you want them to, because they can’t be “nofollow”-ed.
• Avoid displaying the same copy in PDF and HTML formats. (This flies in the face of dark ages advice; before PDFs could be properly indexed, telling clients to also host the copy in HTML was fairly standard practice.) If, for whatever reason, you have to, make use of canonicalization to indicate the juicy page.
• The title shown in search results for PDFs is determined by two factors: the meta title and anchor text pointing to the PDF, so avoid using “read PDF” or something similarly useless when linking to your collateral.
Again, read the post. PDFs still aren’t as good as solid HTML pages, but we’ve come a long way, um, baby. And remember, as Google gets better at recognizing what’s important, what searchers are truly searching for, and how to get it to them, we need to take every opportunity to help our clients and ourselves show Google what we have to offer.