Now in Public Beta

Documents to Structured Data
in Seconds

Upload any PDF, image, or Office document. Get clean Markdown & JSON — optimized for RAG pipelines, AI agents, and LLM workflows.

Try Free Playground View API Docs

quickstart.ts

const result = await datascrub.parse("./report.pdf", {
  output: "markdown",     // or "json" | "chunks"
  chunkSize: 512,         // auto-chunk for RAG
  extractTables: true,    // structured table output
  languages: ["en", "zh"] // best CJK support
});

// → Clean Markdown + metadata + chunks
console.log(result.markdown);

Everything you need to turn documents into AI fuel

Stop wrestling with PDF parsers. Start building your AI product.

⚡

Blazing Fast Parsing

Process a 100-page PDF in under 10 seconds. Parallel page extraction with GPU-accelerated OCR.

🧩

RAG-Ready Output

Auto-chunking, metadata enrichment, and embedding-ready JSON. Skip the preprocessing pipeline.

🇨🇳

Best CJK Support

Industry-leading Chinese, Japanese, and Korean parsing. Mixed-language documents handled natively.

📊

Table Extraction

Complex tables, merged cells, multi-page tables — all converted to structured Markdown or JSON arrays.

🔌

3 Lines to Integrate

REST API + Node/Python SDKs. Drop-in replacement for LlamaParse or Unstructured.

🔒

SOC 2 & GDPR Ready

Documents processed in memory, never stored. Full audit trail. Enterprise-ready from day one.

Why developers switch to DataScrub

Tool	CJK / Chinese	Tables	RAG Output	Pricing	API DX
DataScrub	✅ Best	✅	✅ Built-in	$49/mo	✅ 3 lines
LlamaParse	⚠️ Weak	⚠️	✅ LlamaIndex only	$0.003/pg	✅
Unstructured	⚠️ Basic	⚠️	❌	$0.01/pg	⚠️ Complex
MinerU	✅ Best	✅	❌ No API	Self-host	❌
Reducto	❌	✅	❌	$0.01/pg	✅

Simple, predictable pricing

No per-page anxiety. Flat monthly plans with generous limits.

Free

$0forever

100 pages/mo

✓ REST API access
✓ Markdown output
✓ Community support
✓ 1 file at a time

Start Free

Pro

$49/month

5,000 pages/mo

✓ All output formats (MD/JSON/Chunks)
✓ Table extraction
✓ RAG-optimized chunking
✓ Priority processing
✓ Email support

Start Pro Trial

Business

$149/month

25,000 pages/mo

✓ Everything in Pro
✓ Webhooks & batch API
✓ Custom parsing rules
✓ SSO & team management
✓ Dedicated support

Documents to Structured Datain Seconds