Screen scraping / web crawler Legal admissibility and framework conditions

Are screen scraping and web crawling legally permitted, i.e. the automated scanning of content from other websites and transferring the data obtained in this way to a separate system for further use?

Example

Company S wants to set up a price comparison portal. To this end, S creates a script that automatically searches various third-party websites for products and their prices and saves the results in a database owned by S. S then offers the content on its own website, which customers can search. Customers may be redirected to the respective third-party offers via hyperlinks.

What are the legal framework conditions?

Many legal aspects can be considered. However, the following must be observed

competition law in accordance with the UWG
the right as a database producer according to §§ 87a ff. of the German Copyright Act (UrhG)
the terms and conditions or terms of use of the third-party sites
a "virtual domiciliary right" on websites
the circumvention of technical protection mechanisms
Requirements of foreign law

Competition law in accordance with the UWG

Competition law is largely governed by the Unfair Competition Act (UWG) and contains numerous provisions that prohibit unfair conduct in the market. For example, according to § 4 No. 10 UWG, competitors may not be deliberately hindered.

Whether such a targeted obstruction exists in the case of screen scraping / web crawling also depends on the respective market situation and, for example, whether - in the example - company S is deliberately and with the intention of crowding out a certain third party in its development on the market or - and this should be examined more closely in the present case - whether the third party whose content is tapped via screen scraping / web crawling can no longer adequately promote its services on the market through its own efforts. The latter could, for example, need to be examined more closely if - in the example above - customers no longer visit the third-party website, but now only visit the new website of company S.

However, in a ruling from 2014, the Federal Court of Justice denied targeted obstruction in the case of screen scraping / web crawling of an airline's website.

Rights as a database producer pursuant to Sections 87a et seq. of the German Copyright Act (UrhG)

A database is independently protected by copyright law. A database exists, among other things, if

there is a collection of data
which are arranged systematically or methodically and
which are individually accessible by electronic means and if furthermore
the procurement, verification or presentation of the data requires a substantial investment in terms of type or scope.

It is important to note that the data does not have to be of a particular quality or characteristic. Rather, the central point is that the compilation required a certain investment (in particular a financial investment). The object of protection is therefore not so much the data itself as the investment in the compilation.

The data taken from the third-party websites can be part of such a database. With regard to the permissibility of screen scraping / web crawling, the first step is therefore to check on the basis of the specific website whether there really is a database in this sense, e.g. whether significant investments have actually been made.

Furthermore, the right of the database producer is not unlimited. Rather, there are limits. One barrier that is particularly relevant in the example case under consideration is that the database producer only has the right to prohibit the use of the database as a whole or a substantial part of the database in terms of type or scope. Conversely, the use of insignificant parts (in terms of type or scope) is permitted. So even if a database is available in the first step, it must be clarified in a second step whether the screen scraping / web crawling involves more than just an insignificant use in terms of type or scope (more precisely: reproduction, distribution or communication to the public).

What is the boundary between insignificant use and substantial use? There are no clear legal boundaries in this respect. First of all, it should be noted that the type or scope of use must be insignificant. This refers to a qualitative and a quantitative aspect. With the qualitative aspect in particular, it will always be difficult to find a clear limit, as the standard for quality must first be defined. With regard to a quantitative limit, it will also be difficult to find an exact limit across the board. This is already evident from the fact that the scope of the database under consideration must first be clarified. For example, how to deal with relational databases that are split into several linked individual tables or databases: Is the reference point then the individual table or the entire database?

However, figures have sometimes been cited in case law, in particular a limit of 10%. For example, in a ruling from 2011, the BGH assumed that the quantitative use of a database of up to 10% did not constitute a substantial use. In another case, the Higher Regional Court of Cologne assumed that a use of 10% constituted a qualitatively significant extent, at least in the case decided by the Higher Regional Court of Cologne.

General terms and conditions or terms of use of the third-party sites

Further framework conditions that must be observed during screen scraping / web crawling may result from the general terms and conditions or other terms of use of the third-party website or - if available - from other contracts with the third party. These contractual conditions may result in contractual prohibitions, which may even stipulate that evaluations may not take place even if there is no database and only insignificant use is made. The contractual provisions may also contain confidentiality obligations that must be observed. Such contractual provisions must therefore be examined in detail. The examination must also relate to whether the general terms and conditions or terms of use have been effectively agreed at all. This is because the mere reproduction of contractual terms in the "Imprint" or under the hyperlink "GTC", without further inclusion measures, does not generally lead to an effective agreement. In addition, it must be checked whether the contractual provisions are formulated in a way that is legally effective.

"Virtual domiciliary rights" on websites

In the past, attempts have also been made to derive rights as a website operator from a "virtual domiciliary right". However, a special "virtual domiciliary right" does not exist in law and - as far as relevant here - it has not yet been possible to derive separate rights from this with regard to screen scraping / web crawling.

Circumvention of technical protection mechanisms

More important is whether technical protection measures are circumvented during screen scraping / web crawling. This is an aspect that must be considered in the area of competition law, in the area of copyright law and in the case of a contractual relationship. It should also be noted that legally, a protective measure can be circumvented much more quickly than might be assumed from a purely technical perspective. Based on previous case law, the use of a "deep link" may already require an examination of whether technical protection measures have been circumvented.

Foreign law

Screen scraping and web crawling may quickly refer to websites of companies based outside the country's borders. Foreign law may then also have to be observed.

Conclusion

Screen scraping and web crawling are possible according to case law. However, there is no blanket release. Rather, the respective third-party website must be considered, even if this is difficult to reconcile with the technology , as the script or program runs in the same way in each case. However, depending on the specific issue, it will always be possible to identify certain framework conditions or limits that can be taken into account when designing the business model in order to at least significantly reduce the legal risk. Procedural considerations can also play a decisive role here and clarify how to respond appropriately to any warnings, summary proceedings or lawsuits.

Update (2021): The UrhG has since been amended and a new Section 44b UrhG (and also Section 87c (1) nos. 4 and 5 UrhG and Section 95b (1) no. 1 UrhG, among others) has been introduced for text and data mining, i.e. the automated analysis of individual or multiple digital or digitized works in order to obtain information, in particular about patterns, trends and correlations. Such text and data mining is generally permitted if the source data is deleted again. However, the basic prerequisite is that the rights holder has not reserved the right of use, whereby such a reservation of use is only effective online if it has been stored in machine-readable form. Despite this restriction, automated text and data mining is now more clearly possible. However, it is still unclear to what extent prohibitions can arise under other aspects (as described in the article above), as Section 44b UrhG is merely a copyright regulation. In the case of purely scientific exploitation, however, there are further possibilities.

Date: 5. Aug 2020