Scraping along

Occasionally, a post that I write is deemed interesting enough by another blogger to be quoted. This generates an incoming link that appears among the article’s comments. I don’t mind this at all. In fact, I think it’s good when bloggers read one another’s posts and take up the discussion. So if you are an honest blogger, you may quote me on that and on anything else!

However (and there’s always a however in this wicked world), there are those who quote my blog – and yours, and anyone else’s – for less innocent reasons. This practice has become so prevalent as to rival spamming as a nuisance and has attracted the ire of the blogging community. As a result, it has been given a name, blog scraping, and the blogs based on it are referred to by the term splogs.

As far as I can make out, there are two main sorts of splogs. They all have one thing in common (with the odd exception), namely that their authors (if they deserve such a name) use automatic software to cruise the blogosphere and seize on certain articles according to the keywords in them.

Let me give you an example. A few weeks back, I wrote a couple of posts about my hearing aids. These were immediately “scraped” by a site claiming to aggregate information on hearing aids. However, the word “hearing” also appeared in a later post of mine, this time without any reference to hearing aids, and that post, too, was scraped by the same splog. This lack of finesse shows that automatic software is being used.

The different sorts of splog that I know of (though there may be others) are, firstly, those that quote the whole of a post, often without acknowledging the source (i.e. you or your blog) or even assigning it to a false authorship and, secondly, those that quote a few lines of an article, following these with a “Read more…” link to the original post. What these splogs tend to have in common, apart from a rather dull format, is a complete lack of any contact information, making it difficult to express your displeasure at their actions.

How do you know you have been “scraped”? An essential part of the scraping process is to generate an incoming link to your blog. You will see this, either in the comments section at the end of a post or among the comments held for review as possible spams by the blogging software. The idea behind this, I think, is to try to generate traffic to the splog which will raise its ranking on search engines such as Google.

Why would these parasites wish their sad, unoriginal and lack-lustre little splogs to achieve such favour? There might be several motives, I suppose. A splog with a high profile might attract advertising, for example, and thus generate income.

So does scraping really matter? After all, doesn’t it possibly give your blog extra publicity? There are several answers to that, depending on the different sorts of blogs that are scraped. In the first place, most honest people deprecate with various degrees of passion such parasitical use of their original work. If you are an artist or writer or a provider of information, you do not want your work to be stolen and credited to someone else. At the very least, scraping is a breach of copyright. Added to this, you, the author, have no control over where your work is exhibited and may find your posts appearing where you do not wish them to appear, for example on porn sites.

Are there any defences against scraping? None that I can see. A lot of bloggers now put copyright or creative commons licence announcements on their blogs. If you use WordPress, there is a plug-in * designed to insert a copyright notice in your text if this is quoted in its entirety. I don’t think these help much because machines don’t take any notice of copyright notices and scrapers probably pay no need to them either.

Are there, then, any remedies once the offence has occurred? If you can find a contact address, you can try asking the splogger to remove your content. Some bloggers report success in this. If you cannot contact the splogger or if the latter does not respond, you can contact the Internet service provider who is hosting the splog. This too has met with some success.

This is all very reminiscent of the fight against spammers. It costs time and possibly money to go along this route and unless you are a business blogger and you feel your business is being jeopardized, it may not be worth the trouble.

In any case, such victories, if they are won, are piecemeal. For every splog that you persuade to delete your post or is closed down, several more will appear to continue their parasitical activities. As with spammers, we are on a hiding to nothing in the absence of a global strategy to deal with the problem.

There are no copyright notices on my blog. This is not because I do not value my work but because I believe that the blogging community in general is honest and because I do not want to deter others from quoting me or linking to me for perfectly legitimate reasons whereas those who rip off my content will ignore copyright notices anyway. Ending every post with “© 2008 SilverTiger” would, I feel, be giving in to hysteria. You may disagree with me and, if so, good luck to you. Let me know whether it makes any difference and I will perhaps change my mind.


*As emalyse points out in her comment, this plugin is only for self hosted WordPress (wordpress.org) blogs, not those hosted on wordpress.com.

Advertisements

About SilverTiger

I live in Islington with my partner, "Tigger". I blog about our life and our travels, using my own photos for illustration.
This entry was posted in Blogging and tagged , , . Bookmark the permalink.

5 Responses to Scraping along

  1. emalyse says:

    The only solace I derive from the daily event of spammers and sploggers is that the peddled dream of making big bucks from blogging, even by scraping others original content is a complete illusion (most will barely cover their hosting and bandwidth costs before being shut down). Sadly there’s a very shallow element out there that is drooling at the dangled carrot presented by the ‘lifestyle’ of blogging or peddling their product.On the whole I’ve found sploggers copy either just the first paragraph that links back anyway or they cut and paste an entire post complete with unique identifiers intact.I think copyright is a much looser arrangement online and would require quite restrictive technologies to be in place to protect everyone’s personal and freely shared content. It’s more a question of establishing attribution and sources.I think sploggers are a fact of life .I take more seriously wholesale lifting of others content on a legitimate bloggers site or merely commenting for links.I think it’s better to build your own identifiable writing style and ‘brand” if anything but on the whole sploggers are each a short lived phenomena chasing the illusion of a get rich quick scheme (most legitimate blogs will easily outlive them). Maybe in the future digital watermarking techniques may ensure that the attribution of source is easier to establish.The plugin you mention is only for self hosted wordpress.org based blogs (just in case any wordpress.com users are confused).

  2. SilverTiger says:

    Thanks, emalyse, for making the point about the plug-in. I think this sufficiently important necessitate adding a note to the original post.

  3. Big John says:

    I am often ‘splogged’. Once by a site promoting Hillary Clinton’s campaign. BTW Thanks for the explanation Tiger, my old brain can’t always fathom these things. 😀

  4. Chris says:

    @emalyse – You say that most will [sploggers] barely cover their hosting and bandwidth costs before being shut down. This may be true, but honest bloggers like the rest of us would struggle to make Adsense work with just one (or maybe two) blogs. Sploggers, on the other hand, probably run hundreds of splogs. It’s an uncomfortable fact that we hate to admit: splogging, spamming and scraping are presumably profitable for those who engage in these activities.

    @SilverTiger – In general, there isn’t much we can do about splogs. As you say, for every one that closes, loads more are there to fill the space.

    Putting copyright notice all over your blog and RSS feed can look a bit heavy-handed and unfriendly. Sploggers take no notice of them anyway. And changing from a full feed to a summary feed won’t work either. Sploggers can easily bypass things like that. All it does is annoy your regular feed subscribers.

    Most people who read blogs can easily tell the difference between genuine search results and splogs on a search engine results page. And readers don’t all come via Google and Yahoo. So although it can be irritating to see splog results, they may not be such a big drain on your readership as you might imagine.

  5. SilverTiger says:

    To Big John: Maybe the Clinton Campaign thought your post had something useful to say or would serve as a talking point.

    To Chris: My guess (without arrogance) is that people are more likely to click on a splog link on my blog than they are to click on a link to my blog on a splog. I could be mistaken, of course. Anyway, with that thought in mind, I feel that deleting the link my end is good enough. Any traffic is then going to be one-way – in my direction.

Genuine comments are welcome. Spam and comments with commercial URLs will be deleted.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s