Loading...
Discovering amazing AI tools

Open-source ETL platform that converts complex documents into structured data for LLMs and GenAI workflows.

Open-source ETL platform that converts complex documents into structured data for LLMs and GenAI workflows.
Unstructured provides an open-source library and hosted platform to ingest, parse, enrich, chunk, and embed documents so they are ready for large language models and GenAI applications. It supports a wide range of document formats (PDF, HTML, Word, images, spreadsheets, email formats, etc.) and exposes modular building blocks (bricks) and SDKs to assemble transformation pipelines. The project includes both local libraries for preprocessing and a hosted Unstructured API; the company also offers an enterprise Platform for production-grade continuous workflows with partitioning, enrichments, and monitoring. Its value lies in automating complex document ETL to deliver higher-quality, LLM-ready structured outputs at scale.




