Data: can we have it all?

An interesting moral and legal dilemma hit me today, as I was deliberating how to scrape a large amount of data from a website.

Should I be doing this?

There has been plenty of debate, and campaigning, for free public data,and there are a host of websites that make excellent use of it for the public good(e.g. TheyWorkForYou.com, FixMyStreet). (notes from Talk About Local ’09 on Open Data). There is no moral or legal dilemma here – the data is public and has been made available for public use/manipulation.

However, what happens when the data is NOT public, and has not been released, but is located on a website in a “scrapeable” form?

There have been several examples of attempted lawsuits – where a host company claimed another firm was breaking the law by scraping their data, and potentially infringing on their business model. (USA Cvent v. Eventbrite)

This debate has also reared it’s head in the issue of newspaper paywalls – is it OK for content to be scraped over the wall for free? Search engines claim their actions actually drive business to the pay-for site, the bosses aren’t so sure.

Unfortunately it’s not as simple as companies keeping their valuable material under lock and key – often the data has a certain value in being visible online (even if that is to attract payment at a later time)- whilst another value comes in being able to sell the downloaded data to a 3rd party.

If we then muscle in, with our scrapers and pipes at the ready, are we doing wrong?

Morally, many would say this comes down to a variety of issues, the so-called “victim”, the extent of the loss and what the intentions are with the data – e.g. financial, malicious, educational.

There is another issue. Often a website will deem the “reproduction, transfer, transmission or dissemination” of the data as an infringement of their copyright.

Is this claim worth anything or does it have as much power as a “breakages must be paid for” sign in a local shop? (thanks @pigsonthewing)

  1. Legally this is a grey area, and of course varies across jurisdictions. In the UK (and EU generally) there are mainly two legal issues to consider – ‘copyright’ and ‘database right’ – both of these offer some level of protection to those publishing information and data.

    I’ve posted a summary of the issues as I see them at http://lists.w3.org/Archives/Public/public-lld/2011Mar/0136.html and http://lists.w3.org/Archives/Public/public-lld/2011Mar/0152.html

    In the US the issues are different as the ‘database right’ doesn’t exist, and case law makes it clear that ‘facts’ (and collections of facts no matter how hard to collect) are not protected by copyright.

    Some websites also post sets of ‘terms and conditions’ which they claim you accept by using the website. I’m sceptical that these have teeth but if they are seen as valid contracts between you and the website owner then you would be legally bound by them (note, that there are plenty of examples of this type of T&C that forbid linking to the site without written permission – something you’d hope would get laughed out of court but that is far from guaranteed!)

      • carolinebeavon
      • August 31st, 2011

      Owen, apologies for not replying to your comment at the time. I just wanted to say thank you for posting and for those links – very helpful. I’m just writing my final report and will be sure to give you a mention in the thanks section.

      Caroline
      http://www.cbviz.co.uk

  1. No trackbacks yet.

Leave a comment