Still scraping

I was under the mistaken impression that “blog scraping” (the technique of building a blog by copying other people’s content) had ceased. Apparently, I was wrong. Instead it seems to have acquired a new face.

I discovered this by chance this afternoon when I decided to do a Google search on my WordPress domain name, tigergrowl. The results of the search were interesting.

I discovered first of all that my blog figured in blog directories that I had never signed up to, some of them in countries speaking a language other than English. That doesn’t matter, I suppose, as they generally don’t quote me at length and they link back to my blog.

Less happily, I discovered a new form of “scraping”. In this, an entire post is ripped off and posted on a blog under someone else’s name, photos and all. That’s not all, however. The text seems to have been passed through some sort of filter which replaces some words by their synonyms. I assume this is intended to make the text less immediately recognizable. The result looks like something that has come out of the Google translator or was originally penned by Professor Stanley Unwin.

You will find an example here: We got equally far as Waterloo! and the original here: We got as far as Waterloo!

I discovered this because whoever posts the material, perhaps a machine, came unstuck. Perhaps my HTML was faulty or the host gagged on it, but the post did not display correctly and the HTML was revealed. This included the URLs of the photos in which tigergrowl appeared and was picked up by the Google search.

This is not the only example. I have found others. So far, they have all been on Livejournal, to whom I have sent an email on the subject. I await their reply with interest. Is Livejournal knowingly hosting scrapers or have they failed to spot what is going on? If the latter, I hope they will now take action on the matter.

I like to think that the world of blogs is a friendly easygoing world and if people quote my blog that’s fine. Ask me nicely, and I might even allow you to copy a picture as long as you acknowledge the source. But ripping off my content and posting it and my pictures under another name is not fine. It is theft and breach of copyright.

On the bus home the other evening, Tigger noticed that a woman had her hand in my coat pocket. All she got was a snotty tissue but the principle is the same: a pickpocket is a pickpocket is a thief. In the same way, blog scrapers are thieves with their grubby hands in our virtual pockets. They deserve to be detected and revealed for what they are.

Over to you, Livejournal.

Update

24 hours have passed and I have not heard back from LiveJournal. My support request in still listed as “open”, which I take to mean that nothing has been done about it.

In retrospect, technical support was perhaps not the best department to approach. I have now found a page for the reporting of “abuse” and will resubmit my complaint on that.

About SilverTiger

I live in Islington with my partner, "Tigger". I blog about our life and our travels, using my own photos for illustration.
This entry was posted in SilverTiger and tagged , . Bookmark the permalink.

8 Responses to Still scraping

  1. Ancient Brit says:

    What scumbags (the ripoff merchant and the pickpocket).

    I had a look at the ripoff merchant and I can’t tell whether the guy that appears in the photo towards the bottom is the culprit or if that’s another ripoff from another site (the peculiar use of language seems to mark the person out as the same one).

    It never occurred to me that anyone would do that. Shows how old I am.

    Years ago when I owned a TI-99 I found a neat way to estimate the amount of memory a program needed to run. It was a simple two line BASIC program that consisted of an incremental counter (A=A+8) on one line followed by a GOSUB to that line. The recursive call to a subroutine used up 8 bytes every time and there was no apparent limit to the number of times you could nest the call – it depended on the amount of available memory.

    Eventually the system would run out of memory and stopped with an error. But the value contained in variable A gave the amount of memory that had been available to be used for the calls. Subtracting that from the known amount of RAM gave the total usage for the program. TI never saw fit to provide a function to calculate that so we had to be inventive🙂

    A member of the same group in which I published the trick added an extra line that did nothing and then sent it for publication in a popular computer magazine without crediting its source (for which he received payment). He also put it in his book, again without crediting the source. I’ve never quite been able to forgive him for that…

    • SilverTiger says:

      Theft in all its forms is unfortunately rife but it still comes as a shock when we fall victim to a thief ourselves.

      I would definitely not forgive the person who stole an idea from me in the way you describe.

  2. Ancient Brit says:

    Well, waddya know. After commenting on your report, out of idle curiosity I did a quick search to see whether anything from EoC had found its way somewhere unexpected.

    What I did find was that some scumbag uploaded a document of mine (fortunately registered with WriteSafe.com) to a site called Docstoc. They appear to offer documents to order – not sure why mine would have been accepted (it even says “Competition Winner” on each page along with my name, which you’d think would be a red flag to any company vetting uploads).

    I’ve written to the site asking them to take the document down. Their public policy says they will if ownership is proved (and I’m sure mine is, since I formally registered it).

    We’ll see what happens…

    • SilverTiger says:

      I only did a brief search before finding the LiveJournal cases and now I am in two minds as to whether to continue the search. It would be annoying and frustrating to find others, knowing that it is hard or impossible to obtain redress.

      • Ancient Brit says:

        I had a good response from the Docstoc site – within 24 hours of my report they took my material down.

        They didn’t say whether they would take action against the offending user, but their policy does say that multiple offences will lead to the cancellation of the user’s account.

        They don’t say whether the materials uploaded by the scumbag will automatically be taken down as well, though…

  3. Reluctant Blogger says:

    Outrageous! I would be so angry if I found anything I had written had been stolen. It is not terribly likely in my case because my blog is almost private.

    What is the point though? Why has that person stolen your Waterloo post? He doesn’t appear to have any readers on his LiveJournal thing. And he has rather wrecked it anyway.

    I feel rather angry on your behalf.

    You must let us know what you hear back from LiveJournal (if anything).

    • SilverTiger says:

      There are various reasons why people rip off others’ work. On the Web it is often related to advertising: the scraper sets up a site with rapidly changing content in the hope of attracting visitors who will then click on adverts, earning the site owner a few cents with each click.

      In the LiveJournal cases, the two (or possibly one) scraper was obviously incompetent and made a mess of reproducing my posts. Others make a better job of it and the process can be automated.

      I suspect that I will just have to swallow my bile over the LJ cases.

      I am looking at defensive strategies but I have no certainty of finding one that is effective.

      While certain types of content may be preferred where the content is important, scrapers will hijack any blog because I think they are hungry for material and by quoting selectively, they can make any blog seem interesting.

Genuine comments are welcome. Spam and comments with commercial URLs will be deleted.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s