Firecrawl
Crawl and scrape websites into clean markdown or structured JSON — handles JavaScript rendering, PDFs, and dynamic content. LLM-ready output. Used as the data-ingestion layer for RAG pipelines and AI research agents.
LlamaIndex
Framework specialized in data ingestion, indexing, and retrieval for LLM applications. The go-to for complex RAG pipelines.