名称: 文档处理工具 描述: 使用Nutrient DWS API处理、转换、OCR识别、提取、脱敏、签名和填写文档。支持PDF、DOCX、XLSX、PPTX、HTML和图像格式。
Nutrient 文档处理
使用Nutrient DWS处理器API处理文档。转换格式、提取文本和表格、OCR扫描文档、脱敏个人身份信息、添加水印、数字签名和填写PDF表单。
设置
在**nutrient.io**获取免费API密钥
export NUTRIENT_API_KEY="pdf_live_..."
所有请求都以multipart POST方式发送到https://api.nutrient.io/build,包含instructionsJSON字段。
操作
文档转换
# DOCX转PDF
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.docx=@document.docx" \
-F 'instructions={"parts":[{"file":"document.docx"}]}' \
-o output.pdf
# PDF转DOCX
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"docx"}}' \
-o output.docx
# HTML转PDF
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "index.html=@index.html" \
-F 'instructions={"parts":[{"html":"index.html"}]}' \
-o output.pdf
支持的输入格式:PDF、DOCX、XLSX、PPTX、DOC、XLS、PPT、PPS、PPSX、ODT、RTF、HTML、JPG、PNG、TIFF、HEIC、GIF、WebP、SVG、TGA、EPS。
提取文本和数据
# 提取纯文本
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"text"}}' \
-o output.txt
# 提取表格为Excel
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"xlsx"}}' \
-o tables.xlsx
OCR扫描文档
# OCR转可搜索PDF(支持100+种语言)
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "scanned.pdf=@scanned.pdf" \
-F 'instructions={"parts":[{"file":"scanned.pdf"}],"actions":[{"type":"ocr","language":"english"}]}' \
-o searchable.pdf
语言支持:通过ISO 639-2代码支持100+种语言(例如eng、deu、fra、spa、jpn、kor、chi_sim、chi_tra、ara、hin、rus)。完整语言名称如english或german也可用。查看完整OCR语言支持表获取所有支持代码。
脱敏敏感信息
# 基于模式(SSN、电子邮件)
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"redaction","strategy":"preset","strategyOptions":{"preset":"social-security-number"}},{"type":"redaction","strategy":"preset","strategyOptions":{"preset":"email-address"}}]}' \
-o redacted.pdf
# 基于正则表达式
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"redaction","strategy":"regex","strategyOptions":{"regex":"\\b[A-Z]{2}\\d{6}\\b"}}]}' \
-o redacted.pdf
预设模式:social-security-number、email-address、credit-card-number、international-phone-number、north-american-phone-number、date、time、url、ipv4、ipv6、mac-address、us-zip-code、vin。
添加水印
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"watermark","text":"CONFIDENTIAL","fontSize":72,"opacity":0.3,"rotation":-45}]}' \
-o watermarked.pdf
数字签名
# 自签名CMS签名
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"sign","signatureType":"cms"}]}' \
-o signed.pdf
填写PDF表单
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "form.pdf=@form.pdf" \
-F 'instructions={"parts":[{"file":"form.pdf"}],"actions":[{"type":"fillForm","formFields":{"name":"Jane Smith","email":"jane@example.com","date":"2026-02-06"}}]}' \
-o filled.pdf
MCP服务器(替代方案)
如需原生工具集成,使用MCP服务器替代curl:
{
"mcpServers": {
"nutrient-dws": {
"command": "npx",
"args": ["-y", "@nutrient-sdk/dws-mcp-server"],
"env": {
"NUTRIENT_DWS_API_KEY": "YOUR_API_KEY",
"SANDBOX_PATH": "/path/to/working/directory"
}
}
}
}
使用场景
- 文档格式转换(PDF、DOCX、XLSX、PPTX、HTML、图像)
- 从PDF提取文本、表格或键值对
- 扫描文档或图像的OCR识别
- 共享前脱敏个人身份信息
- 为草稿或机密文档添加水印
- 数字签名合同或协议
- 程序化填写PDF表单