Now in Public Beta

Documents to Structured Data
in Seconds

Upload any PDF, image, or Office document. Get clean Markdown & JSON — optimized for RAG pipelines, AI agents, and LLM workflows.

quickstart.ts
const result = await datascrub.parse("./report.pdf", {
  output: "markdown",     // or "json" | "chunks"
  chunkSize: 512,         // auto-chunk for RAG
  extractTables: true,    // structured table output
  languages: ["en", "zh"] // best CJK support
});

// → Clean Markdown + metadata + chunks
console.log(result.markdown);

Everything you need to turn documents into AI fuel

Stop wrestling with PDF parsers. Start building your AI product.

Blazing Fast Parsing

Process a 100-page PDF in under 10 seconds. Parallel page extraction with GPU-accelerated OCR.

🧩

RAG-Ready Output

Auto-chunking, metadata enrichment, and embedding-ready JSON. Skip the preprocessing pipeline.

🇨🇳

Best CJK Support

Industry-leading Chinese, Japanese, and Korean parsing. Mixed-language documents handled natively.

📊

Table Extraction

Complex tables, merged cells, multi-page tables — all converted to structured Markdown or JSON arrays.

🔌

3 Lines to Integrate

REST API + Node/Python SDKs. Drop-in replacement for LlamaParse or Unstructured.

🔒

SOC 2 & GDPR Ready

Documents processed in memory, never stored. Full audit trail. Enterprise-ready from day one.

Why developers switch to DataScrub

ToolCJK / ChineseTablesRAG OutputPricingAPI DX
DataScrub✅ Best✅ Built-in$49/mo✅ 3 lines
LlamaParse⚠️ Weak⚠️✅ LlamaIndex only$0.003/pg
Unstructured⚠️ Basic⚠️$0.01/pg⚠️ Complex
MinerU✅ Best❌ No APISelf-host
Reducto$0.01/pg

Simple, predictable pricing

No per-page anxiety. Flat monthly plans with generous limits.

Free

$0forever

100 pages/mo

  • REST API access
  • Markdown output
  • Community support
  • 1 file at a time
Start Free
Most Popular

Pro

$49/month

5,000 pages/mo

  • All output formats (MD/JSON/Chunks)
  • Table extraction
  • RAG-optimized chunking
  • Priority processing
  • Email support
Start Pro Trial

Business

$149/month

25,000 pages/mo

  • Everything in Pro
  • Webhooks & batch API
  • Custom parsing rules
  • SSO & team management
  • Dedicated support
Contact Us

Stop parsing documents.
Start building AI.

100 free pages every month. No credit card required. See results in 10 seconds.

Try DataScrub Free