Apify

All

141 repositories

crawlee-python
Public
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
python crawler scraper automation web-crawler headless scraping crawling pip web-scraping
Python
•
Apache License 2.0
•342•5.2k•79•13•Updated Jan 31, 2025Jan 31, 2025
tester-mcp-client
Public
Model Context Protocol (MCP) Client for Apify's Actors
0•0•0•1•Updated Jan 31, 2025Jan 31, 2025
crawlee
Public
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
nodejs javascript npm crawler scraper automation typescript web-crawler headless scraping
TypeScript
•
Apache License 2.0
•740•17k•130•21•Updated Jan 31, 2025Jan 31, 2025
workflows
Public
Apify's reusable github workflows
Python
•4•7•4•6•Updated Jan 30, 2025Jan 30, 2025
actor-whitepaper-web
Public
Documentation site for the Actor Programming Model – a fresh take on serverless microapps. Built with Astro.
website actor-model astro apify
MDX
•
MIT License
•0•1•3•3•Updated Jan 30, 2025Jan 30, 2025
apify-client-python
Public
Apify API client for Python
api client scraping apify python
Python
•
Apache License 2.0
•12•53•10•5•Updated Jan 30, 2025Jan 30, 2025
apify-eslint-config
Public
Apify ESLint preset to be shared between projects
JavaScript
•
Apache License 2.0
•0•2•1•2•Updated Jan 30, 2025Jan 30, 2025
actors-mcp-server
Public
Model Context Protocol (MCP) Server for Apify's Actors
TypeScript
•
Apache License 2.0
•2•3•0•2•Updated Jan 30, 2025Jan 30, 2025
apify-sdk-js
Public
Apify SDK monorepo
actor apify nodejs javascript typescript sdk
TypeScript
•
Apache License 2.0
•41•128•11•10•Updated Jan 30, 2025Jan 30, 2025
fingerprint-suite
Public
Browser fingerprinting tools for anonymizing your scrapers. Developed by Apify.
scraping fingerprinting playwright typescript puppeteer
TypeScript
•
Apache License 2.0
•119•1.2k•21•9•Updated Jan 29, 2025Jan 29, 2025
apify-docs
Public
This project is the home of Apify's documentation.
API Blueprint
•
Apache License 2.0
•81•32•76•19•Updated Jan 29, 2025Jan 29, 2025
apify-cli
Public
Apify command-line interface helps you create, develop, build and run Apify actors, and manage the Apify cloud platform.
command-line headless-chrome puppeteer serveless apify
TypeScript
•20•128•39•6•Updated Jan 29, 2025Jan 29, 2025
apify-shared-js
Public
Utilities and constants shared across Apify projects.
TypeScript
•
Apache License 2.0
•11•12•5•1•Updated Jan 29, 2025Jan 29, 2025
actor-cmd
Public
TypeScript
•0•1•0•3•Updated Jan 29, 2025Jan 29, 2025
apify-sdk-python
Public
The Apify SDK for Python is the official library for creating Apify Actors in Python. It provides useful features like actor lifecycle management, local storage emulation, and actor event handling.
automation scraping apify python sdk
Python
•
Apache License 2.0
•10•122•13•2•Updated Jan 29, 2025Jan 29, 2025
impit
Public
impit | rust library for browser impersonation
Rust
•0•13•1•3•Updated Jan 29, 2025Jan 29, 2025
proxy-chain
Public
Node.js implementation of a proxy server (think Squid) with support for SSL, authentication and upstream proxy chaining.
javascript-library headless-chrome proxy-server proxychains
JavaScript
•
Apache License 2.0
•147•875•7•11•Updated Jan 29, 2025Jan 29, 2025
startupjobs2jazzHR
Public
JavaScript
•1•0•0•1•Updated Jan 28, 2025Jan 28, 2025
rustls
Public
Patched fork of `ruslts` for `impit`
Rust
•
Other
•674•0•0•0•Updated Jan 28, 2025Jan 28, 2025
apify-client-js
Public
Apify API client for JavaScript / Node.js.
TypeScript
•
Apache License 2.0
•28•67•18•5•Updated Jan 28, 2025Jan 28, 2025
actor-llmstxt-generator
Public
The /llms.txt Generator Actor 🕸️📄 extracts website content to create an llms.txt file for AI apps 🤖✨ like LLM fine-tuning and indexing. Output is available 📥 in the Key-Value Store for easy download and integration into workflows. 🚀
Python
•
Apache License 2.0
•1•3•1•1•Updated Jan 27, 2025Jan 27, 2025
actor-vector-database-integrations
Public
Transfer data from Apify Actors to vector databases (Chroma, Milvus, Pinecone, PostgreSQL (PG-Vector), Qdrant, and Weaviate)
Python
•
Apache License 2.0
•4•5•2•0•Updated Jan 25, 2025Jan 25, 2025
actor-templates
Public
This project is the 🏠 home of Apify Actor templates to help users quickly get started. Contributions welcome!
Python
•18•26•10•1•Updated Jan 23, 2025Jan 23, 2025
docusaurus-plugin-typedoc-api
Public
Apify's fork of `docusaurus-plugin-typedoc-api`, customized for our Python documentation.
TypeScript
•28•0•0•0•Updated Jan 22, 2025Jan 22, 2025
rag-web-browser
Public
RAG Web Browser is an Apify Actor to feed your LLM applications and RAG pipelines with up-to-date text content scraped from the web.
scraper ai crawling serp rag llm
TypeScript
•
Apache License 2.0
•4•22•3•0•Updated Jan 17, 2025Jan 17, 2025
homebrew-tap
Public
A Homebrew tap for Apify tools
Ruby
•1•8•0•4•Updated Jan 16, 2025Jan 16, 2025
apify-actor-docker
Public
Base Docker images for Apify actors.
Dockerfile
•
Apache License 2.0
•24•71•9•4•Updated Jan 14, 2025Jan 14, 2025
h2
Public
Patched fork of h2 for impit
Rust
•
MIT License
•289•0•0•0•Updated Jan 14, 2025Jan 14, 2025
push-actor-action
Public
A GitHub Action to push an Actor the the Apify platform
Apache License 2.0
•0•15•0•0•Updated Jan 14, 2025Jan 14, 2025
actor-whitepaper
Public
This whitepaper describes a new concept for building serverless microapps called Actors, which are easy to develop, share, integrate, and build upon. Actors are a reincarnation of the UNIX philosophy for programs running in the cloud.
python automation serverless scraping node-js agents
Apache License 2.0
•0•7•7•2•Updated Jan 8, 2025Jan 8, 2025