apify/crawlee
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
What it does
Crawlee is an open-source toolkit that automatically visits websites, collects data from them, and saves that information for later use — all while mimicking human browsing behavior to avoid getting blocked. It's commonly used to gather large amounts of web content to feed into AI systems, research pipelines, or competitive intelligence tools.
Why it matters for PMs
As AI products increasingly depend on fresh, real-world data scraped from the web, having a reliable and evasion-capable collection tool becomes a competitive advantage — and Crawlee's 21,000+ stars signal it's become a go-to solution for teams building data pipelines. For founders and PMs, this represents the growing infrastructure layer powering AI training sets, market monitoring tools, and automated research products.
Early stage — limited signal data
Score updated Feb 18, 2026
Get the weekly digest
What just moved on gitfind.ai — delivered every Tuesday. No noise, just signal.