What is x-content

x-content is your content extraction assistant that helps you extract the content you need from files. It supports a wide range of file types including but not limited to text files, Markdown files, JSON files, audio, video, URLs, and more.

Currently Supported Types

PDF
PowerPoint
- .ppt
- .pptx
Word
- .docx
- .doc
Excel
- .xls
- .xlsx
MD
WPS
TXT
Images (EXIF metadata and OCR)
- png jpeg tiff bmp
MP3, WAV, MP4
HTML
Text-based formats (CSV, JSON, XML)
ZIP files
EPubs
URL

Why Choose x-content

In enterprise-level RAG and Agent application scenarios, are you also facing these pain points?

Poor compatibility with traditional Office formats, leading to suboptimal large model processing results 😢
Complex tables in Word and PDF cannot be effectively extracted, affecting model understanding and output quality 😢
Image content embedded in documents is difficult to recognize, causing information loss 😢
Audio and video file content cannot be directly extracted and utilized, limiting multimodal application scenarios 😢
When file volumes are too large, content extraction is prone to failure or incompleteness 😢
Low accuracy in identifying key information such as invoices in financial systems, affecting business intelligence automation effectiveness 😢
PB-level massive data processing faces resource bottlenecks, and systems are prone to OOM crashes 😢
Content slicing rules are complex and diverse, requiring support for multiple splitting strategies such as character count, paragraph, and recursive sentence calculation...
...

With x-content, the above problems will be solved!

We have rich experience in processing PB-level big data, massive small files, and super-large files. We provide flexible HTTP, gRPC, MCP, and other docking solutions to meet different scenario needs.

More importantly, we support enterprise-level private deployment, ensuring the security, compliance, and independence of your data, allowing you to apply advanced content processing technology without any worries.

简体中文版 | English Version | 日本語版 | 繁體中文版

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.ja.md		README.ja.md
README.md		README.md
README.zh_CN.md		README.zh_CN.md
README.zh_TW.md		README.zh_TW.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

What is x-content

Currently Supported Types

Why Choose x-content

About

Uh oh!

Releases

Packages

x-data-tech/x-content

Folders and files

Latest commit

History

Repository files navigation

What is x-content

Currently Supported Types

Why Choose x-content

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages