Cohere v2 Python
概览
Cohere的v2 Chat API提供了强大的对话AI能力,特别专注于通过JSON Schema模式提供结构化输出。这项技能涵盖了从文本中提取实体、数据验证和集成模式,用于构建需要从LLMs获得一致、验证响应的生产就绪系统。
何时使用这项技能
在以下情况下应用这项技能:
- 从非结构化文本中提取结构化实体(姓名、日期、地点、组织)
- 构建命名实体识别(NER)系统
- 实现具有验证输出的数据提取管道
- 需要符合特定模式的JSON响应
- 处理文档以提取信息
- 使用约束输出构建分类系统
- 将LLM响应与下游数据库或API集成
核心能力
1. 基础聊天API
初始化并使用Cohere客户端进行对话任务:
import cohere
co = cohere.ClientV2(api_key="<YOUR API KEY>")
response = co.chat(
model="command-a-03-2025",
messages=[
{"role": "user", "content": "Summarize the key features of quantum computing."}
],
)
print(response.message.content[0].text)
可用模型:
command-a-03-2025- 最新一代模型
有关全面的API参数、流媒体、RAG和工具使用,请参考references/chat_api.md。
2. 使用JSON Schema模式的实体提取
Cohere v2的主要优势是使用JSON Schema模式提供结构化输出,保证响应符合您指定的模式。
简单实体提取:
text = "Dr. Sarah Johnson from Stanford University will speak at the AI Conference in Seattle on March 15th."
response = co.chat(
model="command-a-03-2025",
messages=[
{"role": "user", "content": f"Extract all entities: {text}"}
],
response_format={
"type": "json_object",
"schema": {
"type": "object",
"properties": {
"person": {"type": "string"},
"title": {"type": "string"},
"organization": {"type": "string"},
"event": {"type": "string"},
"location": {"type": "string"},
"date": {"type": "string", "format": "date"}
},
"required": ["person"]
}
}
)
import json
entities = json.loads(response.message.content[0].text)
关键原则:
- 顶级类型必须是
"object" - 至少有一个字段必须在
"required"数组中 - 严格强制执行模式 - 无效响应将重新生成
- 第一次请求有延迟开销;后续请求被缓存
3. 多实体提取
批量处理时提取实体数组:
text = """
John Smith works at Google as a Software Engineer in San Francisco.
Jane Doe is a Data Scientist at Meta in New York.
Bob Wilson leads the AI team at OpenAI in Seattle.
"""
response = co.chat(
model="command-a-03-2025",
messages=[
{"role": "user", "content": f"Extract all people and their details: {text}"}
],
response_format={
"type": "json_object",
"schema": {
"type": "object",
"properties": {
"people": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"company": {"type": "string"},
"role": {"type": "string"},
"location": {"type": "string"}
},
"required": ["name", "company"]
}
}
},
"required": ["people"]
}
}
)
result = json.loads(response.message.content[0].text)
for person in result["people"]:
print(f"{person['name']} works at {person['company']}")
4. 使用Enums进行分类
使用enums限制输出到特定类别:
text = "I absolutely love this product! The quality is amazing and customer service was helpful."
response = co.chat(
model="command-a-03-2025",
messages=[
{"role": "user", "content": f"Analyze sentiment and aspects: {text}"}
],
response_format={
"type": "json_object",
"schema": {
"type": "object",
"properties": {
"overall_sentiment": {
"type": "string",
"enum": ["positive", "negative", "neutral", "mixed"]
},
"aspects": {
"type": "array",
"items": {
"type": "object",
"properties": {
"aspect": {"type": "string"},
"sentiment": {
"type": "string",
"enum": ["positive", "negative", "neutral"]
}
},
"required": ["aspect", "sentiment"]
}
}
},
"required": ["overall_sentiment", "aspects"]
}
}
)
Enums的好处:
- 保证有效的类别值
- 消除后处理验证
- 支持直接数据库插入
- 支持下游逻辑无需错误处理
常见实体提取模式
命名实体识别(NER)
schema = {
"type": "object",
"properties": {
"entities": {
"type": "array",
"items": {
"type": "object",
"properties": {
"text": {"type": "string"},
"type": {
"type": "string",
"enum": ["PERSON", "ORGANIZATION", "LOCATION", "DATE", "EVENT", "PRODUCT"]
},
"context": {"type": "string"}
},
"required": ["text", "type"]
}
}
},
"required": ["entities"]
}
简历/CV解析
schema = {
"type": "object",
"properties": {
"name": {"type": "string"},
"email": {
"type": "string",
"pattern": "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"
},
"phone": {"type": "string"},
"experience": {
"type": "array",
"items": {
"type": "object",
"properties": {
"company": {"type": "string"},
"role": {"type": "string"},
"start_date": {"type": "string", "format": "date"},
"end_date": {"type": "string", "format": "date"},
"description": {"type": "string"}
},
"required": ["company", "role"]
}
},
"education": {
"type": "array",
"items": {
"type": "object",
"properties": {
"institution": {"type": "string"},
"degree": {"type": "string"},
"field": {"type": "string"},
"graduation_year": {"type": "integer"}
},
"required": ["institution"]
}
},
"skills": {
"type": "array",
"items": {"type": "string"}
}
},
"required": ["name"]
}
发票/收据提取
schema = {
"type": "object",
"properties": {
"invoice_number": {"type": "string"},
"invoice_date": {"type": "string", "format": "date"},
"vendor": {
"type": "object",
"properties": {
"name": {"type": "string"},
"address": {"type": "string"},
"tax_id": {"type": "string"}
},
"required": ["name"]
},
"items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"description": {"type": "string"},
"quantity": {"type": "number"},
"unit_price": {"type": "number"},
"total": {"type": "number"}
},
"required": ["description", "total"]
}
},
"subtotal": {"type": "number"},
"tax": {"type": "number"},
"total": {"type": "number"}
},
"required": ["invoice_number", "vendor", "total"]
}
医疗报告提取
schema = {
"type": "object",
"properties": {
"patient": {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"},
"gender": {
"type": "string",
"enum": ["male", "female", "other", "unknown"]
}
},
"required": ["name"]
},
"diagnosis": {
"type": "array",
"items": {
"type": "object",
"properties": {
"condition": {"type": "string"},
"severity": {
"type": "string",
"enum": ["mild", "moderate", "severe"]
},
"notes": {"type": "string"}
},
"required": ["condition"]
}
},
"medications": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"dosage": {"type": "string"},
"frequency": {"type": "string"}
},
"required": ["name"]
}
},
"visit_date": {"type": "string", "format": "date"}
},
"required": ["patient", "visit_date"]
}
高级模式功能
带验证的嵌套对象
schema = {
"type": "object",
"properties": {
"company": {
"type": "object",
"properties": {
"name": {"type": "string"},
"headquarters": {
"type": "object",
"properties": {
"street": {"type": "string"},
"city": {"type": "string"},
"country": {"type": "string"}
},
"required": ["city", "country"]
}
},
"required": ["name"]
}
},
"required": ["company"]
}
Schema重用与$ref
schema = {
"type": "object",
"$defs": {
"person": {
"type": "object",
"properties": {
"name": {"type": "string"},
"email": {"type": "string"},
"phone": {"type": "string"}
},
"required": ["name"]
}
},
"properties": {
"primary_contact": {"$ref": "#/$defs/person"},
"secondary_contact": {"$ref": "#/$defs/person"}
},
"required": ["primary_contact"]
}
格式验证
schema = {
"type": "object",
"properties": {
"created_at": {
"type": "string",
"format": "date-time" # ISO 8601: 2024-01-01T12:00:00Z
},
"birth_date": {
"type": "string",
"format": "date" # YYYY-MM-DD
},
"user_id": {
"type": "string",
"format": "uuid"
},
"email": {
"type": "string",
"pattern": "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"
}
},
"required": ["user_id"]
}
构建实体提取管道的工作流程
第1步:定义您的模式
# 确定您需要提取的实体
entity_schema = {
"type": "object",
"properties": {
"entities": {
"type": "array",
"items": {
"type": "object",
"properties": {
"text": {"type": "string"},
"type": {"type": "string", "enum": ["PERSON", "ORG", "LOCATION"]},
"confidence": {"type": "string", "enum": ["high", "medium", "low"]}
},
"required": ["text", "type"]
}
}
},
"required": ["entities"]
}
第2步:创建提取函数
def extract_entities(text, schema):
response = co.chat(
model="command-a-03-2025",
messages=[
{
"role": "system",
"content": "Extract entities accurately with appropriate confidence levels."
},
{
"role": "user",
"content": f"Extract all entities: {text}"
}
],
response_format={
"type": "json_object",
"schema": schema
}
)
return json.loads(response.message.content[0].text)
第3步:批量处理
documents = [
"Text 1...",
"Text 2...",
"Text 3..."
]
results = []
for doc in documents:
entities = extract_entities(doc, entity_schema)
results.append({
"document": doc,
"entities": entities["entities"]
})
第4步:存储到数据库
import surrealdb # 用SurrealDB示例
async def store_entities(entities):
async with Surreal("ws://localhost:8000/rpc") as db:
await db.signin({"user": "root", "pass": "root"})
await db.use("entities", "database")
for entity in entities["entities"]:
await db.create("entity", entity)
最佳实践
模式设计
- 从只需要的字段开始,逐步添加可选字段
- 使用enums进行分类,确保输出有效
- 利用格式验证(日期、uuid、电子邮件)提高数据质量
- 使用$ref处理重复结构,保持模式DRY
提示
- 系统消息覆盖用户指令 - 用于提取指南
- 在用户消息中明确指示提取内容
- 始终在JSON模式下指示模型生成JSON(无模式)
- 为复杂提取在系统消息中提供示例
性能
- 模式在第一次请求后被缓存 - 在调用之间重用模式
- 简单模式的延迟开销最小
- 复杂的嵌套模式增加中等处理时间
- 处理多个文档时考虑批量提取
错误处理
- 始终在try-except块中包装JSON解析
- 即使有模式强制执行,也要验证必需字段是否存在
- 用指数退避优雅地处理API错误
- 记录失败的提取,以便调试和重新处理
生产考虑
- 通过
response.meta.tokens监控令牌使用情况 - 实施速率限制和请求排队
- 缓存常见提取以减少API调用
- 根据任务复杂性与成本选择合适的模型
限制
不支持的模式功能
- 数值范围(minimum/maximum)
- 数组长度约束(minItems/maxItems)
- 字符串长度约束(minLength/maxLength)
- 一些复杂的正则表达式模式
当前限制
- JSON模式下不支持RAG
- 工具模式下最多200个字段
- 模式模式增加延迟开销
参考文档
这项技能包括全面的参考文档:
references/chat_api.md- 完整的Chat API参考,包括参数、流媒体、工具使用、RAG和对话管理references/structured_outputs.md- 深入的结构化输出指南,包括JSON Schema模式、验证、实体提取模式和高级功能
在实现特定功能或解决问题时加载这些参考。
其他资源
- API文档:https://docs.cohere.com/v2/docs/chat-api
- 结构化输出:https://docs.cohere.com/v2/docs/structured-outputs
- Python SDK:https://github.com/cohere-ai/cohere-python
- PyPI包:https://pypi.org/project/cohere/
- JSON Schema规范:https://json-schema.org/