apify/crawlee

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

View on GitHub

What it does

Crawlee is an open-source toolkit that automatically visits websites, collects data from them, and saves that information for later use — all while mimicking human browsing behavior to avoid getting blocked. It's commonly used to gather large amounts of web content to feed into AI systems, research pipelines, or competitive intelligence tools.

Why it matters for PMs

As AI products increasingly depend on fresh, real-world data scraped from the web, having a reliable and evasion-capable collection tool becomes a competitive advantage — and Crawlee's 21,000+ stars signal it's become a go-to solution for teams building data pipelines. For founders and PMs, this represents the growing infrastructure layer powering AI training sets, market monitoring tools, and automated research products.

Early Signal Score21

Early stage — limited signal data

Stars

21.7k

Forks

1.2k

Contributors

110

Language

TypeScript

Get the weekly digest

What just moved on gitfind.ai — delivered every Tuesday. No noise, just signal.