How to Prevent Content Scraping in WordPress

Certainly no panic is involved here because content scraping is not the biggest problem among WordPress bloggers right now. However, it can very quickly become one in some specific circumstances. First of all, what content scraping is.

It’s when your content gets extracted from your blog (usually through your own RSS feed) and republished on a completely different site without your consent.

In other words, content scraping is content stealing … there’s no better name for it.

Why do people steal?

This one’s quite simple, actually. Creating content takes time. And if you want it to be half decent then it takes even more time.

I can’t remember who said it originally, but one of the best quotations related to creating (writing) content I’ve ever encountered is this:

Easy reading is damn hard writing.

There are hundreds of people out there who want to launch new sites, publish stuff, get popular, earn some money through ads, affiliates links and so on. But none of this is possible if you don’t have content, hence content scraping/stealing.

What’s the big problem

In my opinion, the big problem with people taking your content is not that they do it, it’s that they don’t give you any credit pointing you out as the author. Really, if they only linked to you through an author’s box then everything would be (more or less) fine.

(If your content gets viral and spreads all around the internet then it’s obviously a great thing, but for you to be able to enjoy this popularity there has to be an indication that you are, indeed, the author.)

Additionally, if your site is not that authoritative yet then Google can mistakenly take it as the scraper and penalize you instead of the actual scraper. Which obviously is a bad BAD thing. That’s why, in essence, I advise you to act fast if you find anyone scraping your content regularly.

How to catch people red-handed

The absolute best method is Copyscape and their Copysentry service. Comes with a price tag though (about $5 a month to take care of one site).

What Copysentry does is it protects your site against theft by monitoring the web and searching for copies of your content. If it finds anything you get an email and can then take further action.

But Copyscape is not the only method, especially if you don’t want to spend any money. In this case, there are two sensible solutions:

  • Pingbacks and trackbacks. These native functionalities in WordPress are somewhat efficient at discovering content scrapers, but only if you have a good internal linking structure within your posts (more on that in a minute). Just pay attention to the stats and take notice of every domain that sends you trackbacks or pingbacks regularly.
  • Google Webmaster Tools. Just go to the section of Traffic > Links to Your Site. This is an even better method because you get a clear table containing every domain linking to yours, along with the number of links. Content scrapers are always at the top of this list. This is how I discovered my site is being scraped by (mind that you do need a good interlinking structure for this to work too).

Fighting content scrapers

The first and the most important thing to do is to pay attention to your interlinking structure just like mentioned above. In short, link to your posts from within your other posts, which has a lot more benefits than just content scraping protection (SEO, better readability, and better structure – just to name a few).

Content thieves don’t spend much time tweaking your content and removing the links, they usually take things in their original form. So if your posts contain a lot of links, those links will then appear in your Google Webmaster Tools.

If you don’t do this then the only way for you to find scraped content will be either through manual Google searches (with your post titles) – highly not effective, or through Copyscape.

Additionally, you can do a number of other things to get even more links from scrapers. For example:

  • Affiliate links. If you’re into affiliate marketing, don’t forget to include affiliate links inside your posts, but make sure to redirect them through a plugin like Pretty Link. That way they look exactly like any other link on your site, so any scripts for auto-shaving affiliate links won’t pick them up.
  • If you’re using the WordPress SEO plugin, you can set a footer for your RSS and use it to promote some other projects of yours. For example, you can link to other pages within your site, to your email subscription form, or include even more affiliate links.

Now, let’s talk about some actions to take if you’ve decided to take the scraper down.

Being the nice guy

There are basically two paths you can take when fighting content scrapers. This is the first one – being the nice guy.

The nice guy tries to use legal methods and fight the scrapers “by the book.”

First of all, forget about contacting the scraper directly. Not worth the effort at all. Instead, contact their hosting provider and send a little something called the “DMCA takedown notice.”

All it takes to find out where the evil site is hosted is a quick who-is lookup, through, for example.

Then, just visit their hosting provider and contact them directly via email, or via their own template for DMCA notices – most hosting companies have those set in place. (If you want to send such a notice manually, you can get a nice template message here.)

If all goes well, after a while the scraped content (or the whole site) should go down and your problem should be over.

Being the street-smart guy

This is the other approach. The thing with DMCA notices is that all the back and forth takes some time, and you can actually handle the whole issue much quicker.

First, do a who-is lookup and get the scraper’s IP address. Then you can take it and either ban it in your .htaccess file by using this line:

Deny from

Or, which is actually something that sounds way cooler than banning, redirect them to a completely different feed. Again, your .htaccess files, lines:

RewriteCond %{REMOTE_ADDR} 
RewriteRule .* [R,L]

Now, the whole trick with the address you’re using here is to redirect the IP to a porn site’s feed or something equally as inappropriate. That way, instead of your content, the scraper gets some naughty images and videos. The kinkier the better…

Of course, if they’re truly dedicated they will get a different IP, but it’s really REALLY unlikely. 99% of scrapers will just find another source of content and leave you alone right after you start feeding them porn pics.

Selecting your path

To be honest, you can use both methods at the same time (DMCA notice and feed redirection).

You can start by redirect the feed. Then, you can send a DMCA takedown notice which points out all the content that has been scraped. This way, no content will get scraped in the meantime.

What’s your take on this? Did you have to face any content scrapers on your WordPress site, and how did you handle it?

there are 3 comments added

  1. Deepak Arora 21st April 2013

    Hello Karol, I have also written on the same topic on my blog, we both agree on the sending dmca, copyscape and editing the .htaccess method. I have also written about the time tested tools captcha and testing the incoming links. Please take a look at my post and let me know what you think via comment :)

  2. copyright law here 3rd September 2013

    Hey I know this is off topic but I was wondering if you knew of any widgets I could add to my blog that automatically tweet my newest twitter updates. I've been looking for a plug-in like this for quite some time and was hoping maybe you would have some experience with something like this. Please let me know if you run into anything. I truly enjoy reading your blog and I look forward to your new updates.

  3. Daniel 29th October 2017

    Ha Ha, loved reading this article. Made me laugh about the part sending them to a porn site, "The kinkier the better". Thanks for the great read and of course the vital lesson that was taught.

Reset fields

back to top