Why I share
I think openness in geoscience is very important, and I feel we all have a duty to be open with our work, data, ideas when possible and practical. I certainly do believe in sharing a good deal of the work I do in my spare time. So much so that when I started this blog there was no doubt in my mind I would include an agreement for people to use and modify freely what I published. Indeed, I venture to say I conceived the blog primarily as a vehicle for sharing.
Some of the reasons for sharing are also selfish (in its best sense): doing so gives me a sense of fulfillment, and pleasure, as Matt Hall writes in Five things I wish I’d known (one of the essays in 52 You Should Know About Geophysics), you can find incredible opportunities for growth in writing, talking, and teaching. There is also the professional advantage of maintaining visibility in the marketplace, or as Sven Treitel puts it, Publish or perish, industrial style (again in 52 You Should Know About Geophysics).
How I used to share
At the beginning I choose an Attribution-NonCommercial-ShareAlike license (CC BY-NC-SA) but soon removed the non-commercial limitation in favour of an Attribution-ShareAlike license (CC BY-SA).
A (very) cold shower
Unfortunately, one day last year I ‘woke up’ to an umpleasant surprise: in two days an online magazine had reposted all my content – literally, A to Z! I found this out easily because I received pingback approval requests for each of them (thank you WP!). Quite shocked, I confess, the first thing I did was to check the site: indeed all my posts were there. The published included an attribution with my name at the top of each post but I was not convinced this would be fair use. Quite the contrary, to me this was a clear example of content scraping, and the reason why I say that is because they republished even my Welcome post and my On a short blogging sabbatical post – in the science category! – please see the two screen captures below (I removed their information) of the pingbacks:
If this was a legitimate endeavour, I reasoned, a magazine with thoughtful editing, I was sure those two posts would have not been republished. Also, I saw that posts from many other blogs were republished en masse daily.
Limitations of Creative Commons licenses
I asked for advice/help from my twitter followers, and on WordPress Forums, while at the same time started doing some research. That is when I learned this is very common, however being in good company (google returned about 9,310,000 results when searching ‘blog scraping’) did not feel like much consolation: I read that sites may get away with scraping content, or at least try. I will quote directly from the Plagiarism Today article Creative Commons: License to Splog?: “They can scrape an entire feed, offer token attribution to each full post lifted (often linking just to the original post) and rest comfortably knowing that they are within the bounds of the law. After all, they had permission …Though clearly there is a difference between taking and reposting a single work and reposting an entire site, the license offers a blanket protection that covers both behaviors”.
Fight or flight?
Yes, Creative Commons have mechanisms that allow fighting of this abuse, but their effectiveness is yet to be proved (read for example this other article by Plagiarism Today, Using Creative Commons to Stop Scraping). Notice these articles are a bit out of date but as far as I could see things have not improved much. The way is still the hard way of tracking down the culprit and fighting through legal action, although social media support helps.
It is possible to switch to a more restrictive Creative Commons license like the Attribution-NonCommercial-NoDerivs (perhaps modified as a CC+), but that only allows to cut your losses, not to fight the abuse, as it is only on a going-forward basis (I read this in an article, and jotted down a note, but I unfortunately cannot track down the source – you may be luckier, or cleverer).
Then I was contacted by the site administrator through my blog contact form (again I removed their information), who had read my question on the WordPress forum:
Your Name: ______
Your Email Address: ______
Your Website: ______
Your site is under a CC license. What’s the trouble in republishing your content?
Subject: Your license
Time: Thursday July 26, 2012 at 12:26 am
IP Address: ________
Contact Form URL: http://mycartablog.com/contact/
Sent by an unverified visitor to your site.
I responded with a polite letter, as suggested by @punkish on twitter. I explained why I thought they were exceeding what was warranted under the Creative Commons license, that republishing the About page and Sabbatical posts was to me proof of scraping, and I threatened to pursue legal recourse, starting with DMCA Notice of Copyright Infringement. Following my email they removed all my posts from their site, and notified me.
I think I was fortunate in this case, and decided to take matters into my own hands to prevent it from happening again. Following my research I saw two good, viable ways to better protect my blog from scraping whole content, while continuing to share my work. The first one involved switching to WordPress.org. This would allow more customization of the blog, and use of such tools as the WP RSS footer plugin, which allows to Get credit for scraped posts, and WP DMCA website protection. Another benefit of switching to WordPress.org is that – if you are of belligerent inclination – you can try to actively fight content scraping with cloacking. Currently, although it is one of my goals for the future of this blog, I am not prepared to switch WordPress.com due to time constraints.
Having decided to stick to WordPress.com the alternative I camu up with was to remove the CC license, replacing it with a Copyright notice with what I would call a liberal attitude. A simple way to do that was to add the Konomark logo accompanied by a statement that encourages sharing, but without surrendering any rights upfront. Addtionally you can prevent content theft from your WordPress.com (or at least reduce the risk) by configuring the RRS feed so that it displays post summaries only, not full posts.
How I share now
I customized my statement to reduce as much as possible the need for readers to ask for permission by allowing WorPress reblogging and by allowing completely open use of my published code and media. Below is a screen capture of my statement, which it is located in the blog footer:
I hope this will be helpful for those that may have the same problem. Let me know what you think.
Whine Journalism and how to bring the splashback – a great story and a great step-by-step guide, to fight content theft
Content Scrapers – How to Find Out Who is Stealing Your Content & What to Do About It
Useful tools to detect stolen content
Copyscape – Search for copies of your page on the web
Google Alerts – (for example read this article)
TinEye – reverse image search engine. ‘It finds out where an image came from, how it is being used, if modified versions of the image exist…’
They can try to steal your content, but they can’t steal your genius. Maybe I am too idealistic, but the reason we (should) blog / share / discuss things in the open, is to get them out there, out of our own heads. Ideas are cheap and plenty, executing is what matters. Perhaps spreading ideas and casting them outward allows the sharer to focus on the really important stuff. The emotion and the enthusiasm that can’t be imitated.
Interesting post, Matteo. I hadn’t heard of Konomark before. It seems to be not very different from an ordinary copyright notice, but suggests “I’ll probably say yes”, as opposed to the “I’ll probably say no” vibe I think the (c) mark transmits. It’s friendly, but it’s not open access — though I like your modification granting the right to re-use code. I also like the idea of getting people thinking more carefully about how they license their work. Creative Commons maybe encourages people not to think too hard, just choose something off the shelf.
Our site doesn’t seem to have been scraped, though it’s hard to know for sure. I guess I just hope there are reasons to come to the site itself, rather than a mirror. I don’t know who’d want to read a scrapy site, presumably covered in Flat Belly ads. Or maybe no-one is meant to read, and they are just link farms, but I don’t think those sites get good Google Juice anymore.
For myself, I will stick to open access (that is, freedom to share without permission), but am more wary about it after reading this.
Thank you very much for your comments. This is exactly the kind of response and discussion I was hoping to foster. As I mentioned when we ‘chatted’ yesterday, this post, and my decision to remove my Creative Commons license, was in large part a way to make a statement against theft of content. I don’t think you are too idealistic Evan, complete openness is possible and good. I agree nobody can steal your genius, or your enthusiasm and passion. But for some, like writers-bloggers, who depend on their original work for their livelyhood, it is not possible to so openly share, and content theft is a real threath. If my statement can help even one of them be more prepared, I will be happy. I like what you say Matt that perhaps Creative Commons encourages people not to think too hard, and I am glad you said that.
Your comments did give me pause for thought and I realized I could push my statement further by premptively share my media as well as my code. And people that use Matlab or ImageJ could already reproduce the images using my code or ideas, but not others, so this is also more fair. I already updated my statement and post to reflect the change.
It’s always worth remembering, when it gets too frustrating dealing with sploggers and theives, what Picasso said about inferior artists copying his work. He said that he just laughed, knowing it was the greatest thing they’d ever do, but the next day he’d wake up and HE’D BE PICASSO and they wouldn’t.
Thank you raincoaster, I’ve heard that before and forgotten. Awesome quote! Right to the point
I’ve thought about publishing my posts in the public domain but govern them with some sort of license because I’ve had the bad experience of sites scraping my posts (and several Flickr pictures) and COPYRIGHTING those posts as their own. In essence, a license ensures that the work is shared using my (pretty open) terms and so that I don’t have to deal with copyright sharks later on down the road.
They COPYRIGHTED your work! There is really no shame. Here is a good example: http://manolofood.com/whine-journalism-and-how-to-bring-the-splashback/ Even the ‘best’ of us aren’t immune to the impulse to cheat.
Out of curiosity, were you able to enforce your copyright?
I have begun my blog a few days ago and I also have the philosophy of sharing all the content, as my blog is mostly aimed for students. But after reading your post, I have decided to include the konomark symbol to prevent possible problems in the future. Just wanted to thank you for the information.
Good to hear. Good luck with the new blog!