Mike Vardy is the Managing Editor at Lifehack. An independent writer, speaker, podcaster and "productivityist", you can read more of his writing at Vardy.me. He is @mikevardy on Twitter.
But right now you can read a guest post by Mike right here. It concerns something every writer posting on the internet needs to be aware of – “Scraping”.
Yep, I haven’t gone down this road in some time, slipping away from productivity and such and talking pure tech.
And yep, the headline might be a bit misleading (but not entirely), especially when you consider my thoughts on Readability and its latest co-venture, Readlists. What they aren’t doing is scraping, per se.
But they might as well be.
I remember attending 604 FreelanceCamp in 2010 and watching (at the time soon-to-be my friend) Kemp Edmonds deliver a talk on how to protect your content. He shared a story about how his own content was essentially stolen, and one of the audience members asked him about scraping. He talked a bit about scraping sites, and you can really dig into his thoughts over at his weblog.
Here’s what a scraping site is, courtesy of Wikipedia:
“A scraper site is a spam website that copies all of its content from other websites using web scraping.”
I can tell you that I’ve had to deal with scraping sites for some of the major websites I’ve worked for, and Lifehack content gets scraped a lot. A whole lot. I’ve even got a TextExpander snippet that I use to send to these sites to let them know they’ve repurposed and republished content without permission (and are making money off of said content in most cases via GoogleAds, etc.). I can count on exactly zero fingers the amount of times I’ve received a reply or had one complied.
I hate wasting time on these sites. After all, I’d rather be a dog breeder than a dog catcher. (Note: I am not saying that all content I create/edit are “dogs”. Far from it.)
Now, let’s all talk about Readability and Readlists, and how what they do differs from what scraping sites do…and how what they do doesn’t.
Differences
1. They are reader-driven services. Readability and Readlists are reader-driven. In order to for profit to be had, the reader needs to take action with a specific post (or create a list). The service simply aggregates and compiles for them. So unlike scraping sites, the work isn’t entirely done by the site itself, but by the person using the site.
2. Marketing/Promotion. Scraping sites generally don’t market or promote themselves. The content does that for them via search engines. Readability and Readlists definitely do promote themselves. When Readlists launched this past week, I was able to find out about it on a lot of technology sites on the Internet. Even Lifehacker had a piece on it – and you know how much I love reading their stuff. Several of my online writing friends (Ben Brooks, Stephen Hackett) wrote about it, and others who I respect – but don’t know personally – (Kyle Baxter) did as well.
The old-style scraping sites never promote themselves. Does that mean that they have more brains than guts? Probably. Because I’d have to say that Readability and Readlists seem to displaying more guts than brains with how they seem to work the system.
3. Writers can get paid. Unlike a pure scraping site, Readability does pay those who register once they hit a certain benchmark (much like how Google Adsense pays publishers). But you only get paid twice per year – which is, to be fair, two times more per year than old-school scraping sites pay.
Similarities
1. Profiting from the works of others. Sure, publishers can get paid (I haven’t), and here’s how Readability themselves describes the way that happens:
“As a web publisher or writer, you can register with Readability and start collecting contributions. Any time a Readability Subscriber uses Readability on a page of yours, a portion of that Subscriber’s monthly contribution is allocated to you. Here’s an example: Joe Subscriber pays $10.00 a month for the Readability service. Of the $10.00, $7.00 (70%) is allocated for publishers. If Joe reads 14 articles with Readability on 14 different domains in the month of February, each domain will receive $.50 ($7.00 divided by 14 pages) from Joe’s contribution pool.”
So Readability (although it doesn’t explicitly state this) takes in 30% of the monthly subscriber fee. But if a domain hasn’t registered with a site then the division changes up. Well…what if none of the sites are registered? What then? Does Readability keep 100%? Sure, it might not be likely that no site any one subscriber visits in a month isn’t a registered site, but it is possible. Ben Brooks has talked about Readability’s money collection practices before, and he did so when he was advocating the service.
(In fact, you can check out Ben’s thoughts on Readability from the get-go by just searching his site with the term “Readability”. You’ll get every last one of ’em.)
2. Not asking permission first. Ben covered this as well, and the fact that money is being made off of my content without asking first (all of my content on Vardy.me and Eventualism is licensed Creative Commons Attribution-NonCommercial Unported) is a problem for me. Yes, I did register my site when the service launched. And I didn’t opt out right away once all of this stuff starting coming to light. But when they announced Readlists….well, that was it.
I’ve asked Readability to stop processing and storing my content. Same goes with Readlists.
Which brings to what I can say is another difference: at least they got back to me and seem to have complied. Can’t say the same for the other scraping sites out there.
So there’s that.
Photo credit: Greg Peverill-Conti (CC BY-NC-SA 2.0)