Scrapy

Scrapy
Developer(s)	Zyte (formerly Scrapinghub)
Initial release	26 June 2008
Stable release	2.11.0[1] / 18 September 2023
Repository	github.com/scrapy/scrapy ;
Written in	Python
Operating system	Windows, macOS, Linux
Type	Web crawler
License	BSD License
Website	scrapy.org

Scrapy (/ˈskreɪpaɪ/[2] SKRAY-peye) is a free and open-source web-crawling framework written in Python and developed in Cambuslang. Originally designed for web scraping, it can also be used to extract data using APIs or as a general-purpose web crawler.[3] It is currently maintained by Zyte (formerly Scrapinghub), a web-scraping development and services company.

Scrapy project architecture is built around "spiders", which are self-contained crawlers that are given a set of instructions. Following the spirit of other don't repeat yourself frameworks, such as Django,[4] it makes it easier to build and scale large crawling projects by allowing developers to reuse their code.

Some well-known companies and products using Scrapy are: Lyst,[5][6] Parse.ly,[7] Sayone Technologies,[8] Sciences Po Medialab,[9] Data.gov.uk’s World Government Data site.[10]

History

Scrapy was born at London-based web-aggregation and e-commerce company Mydeco, where it was developed and maintained by employees of Mydeco and Insophia (a web-consulting company based in Montevideo, Uruguay). The first public release was in August 2008 under the BSD license, with a milestone 1.0 release happening in June 2015.[11] In 2011, Zyte (formerly Scrapinghub) became the new official maintainer.[12][13]

References

"Release 2.11.0". 18 September 2023. Retrieved 19 September 2023.
Commit 975f150
Scrapy at a glance.
"Frequently Asked Questions". Frequently Asked Questions, Scrapy 2.8.0 documentation. Retrieved 28 July 2015.
Bell, Eddie; Heusser, Jonathan. "Scalable Scraping Using Machine Learning". Archived from the original on 4 June 2016. Retrieved 28 July 2015.
Scrapy | Companies using Scrapy
Montalenti, Andrew (October 27, 2012). "Web Crawling & Metadata Extraction in Python". Web Crawling & Metadata Extraction in Python - Speaker Deck. Retrieved May 11, 2015.
"Scrapy Companies". Scrapy | Companies using Scrapy.
Hyphe v0.0.0: the first release of our new webcrawler is out!
Ben Firshman [@bfirsh] (21 January 2010). "World Govt Data site uses Django, Solr, Haystack, Scrapy and other exciting buzzwords bit.ly/5jU3La #opendata #datastore" (Tweet) – via Twitter.
Medina, Julia (19 June 2015). "Scrapy 1.0 official release out!". scrapy-users (Mailing list).
Hoffman, Pablo (2013). List of the primary authors & contributors. Retrieved 18 November 2013.
Interview Scraping Hub.

Check out the list of top 21 Best Web Scraping tools in 2023

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[wikidata-6813c72cd999c5f3281f2a306a5b57c876e52f1d-v9-1] "Release 2.11.0". 18 September 2023. Retrieved 19 September 2023.

[2] Commit 975f150

[3] Scrapy at a glance.

[4] "Frequently Asked Questions". Frequently Asked Questions, Scrapy 2.8.0 documentation. Retrieved 28 July 2015.

[5] Bell, Eddie; Heusser, Jonathan. "Scalable Scraping Using Machine Learning". Archived from the original on 4 June 2016. Retrieved 28 July 2015.

[6] Scrapy | Companies using Scrapy

[7] Montalenti, Andrew (October 27, 2012). "Web Crawling & Metadata Extraction in Python". Web Crawling & Metadata Extraction in Python - Speaker Deck. Retrieved May 11, 2015.

[8] "Scrapy Companies". Scrapy | Companies using Scrapy.

[9] Hyphe v0.0.0: the first release of our new webcrawler is out!

[10] Ben Firshman [@bfirsh] (21 January 2010). "World Govt Data site uses Django, Solr, Haystack, Scrapy and other exciting buzzwords bit.ly/5jU3La #opendata #datastore" (Tweet) – via Twitter.

[11] Medina, Julia (19 June 2015). "Scrapy 1.0 official release out!". scrapy-users (Mailing list).

[list-12] Hoffman, Pablo (2013). List of the primary authors & contributors. Retrieved 18 November 2013.

[13] Interview Scraping Hub.