This is the Python SDK for Crawlab AI, an AI-powered web scraping platform maintained by Crawlab.
pip install crawlab-ai
An API token is required to use this SDK. You can get the API token from the Crawlab official website.
from crawlab_ai import read_list
# Define the URL and fields
url = "https://example.com"
# Get the data without specifying fields
df = read_list(url=url)
print(df)
# You can also specify fields
fields = ["title", "content"]
df = read_list(url=url, fields=fields)
# You can also return a list of dictionaries instead of a DataFrame
data = read_list(url=url, as_dataframe=False)
print(data)
Create a Scrapy spider by extending ScrapyListSpider
:
from crawlab_ai import ScrapyListSpider
class MySpider(ScrapyListSpider):
name = "my_spider"
start_urls = ["https://example.com"]
fields = ["title", "content"]
Then run the spider:
scrapy crawl my_spider