Cohere v2 Python

概览

Cohere的v2 Chat API提供了强大的对话AI能力，特别专注于通过JSON Schema模式提供结构化输出。这项技能涵盖了从文本中提取实体、数据验证和集成模式，用于构建需要从LLMs获得一致、验证响应的生产就绪系统。

何时使用这项技能

在以下情况下应用这项技能：

从非结构化文本中提取结构化实体（姓名、日期、地点、组织）
构建命名实体识别（NER）系统
实现具有验证输出的数据提取管道
需要符合特定模式的JSON响应
处理文档以提取信息
使用约束输出构建分类系统
将LLM响应与下游数据库或API集成

核心能力

1. 基础聊天API

初始化并使用Cohere客户端进行对话任务：

import cohere

co = cohere.ClientV2(api_key="<YOUR API KEY>")

response = co.chat(
    model="command-a-03-2025",
    messages=[
        {"role": "user", "content": "Summarize the key features of quantum computing."}
    ],
)

print(response.message.content[0].text)

可用模型：

command-a-03-2025 - 最新一代模型

有关全面的API参数、流媒体、RAG和工具使用，请参考references/chat_api.md。

2. 使用JSON Schema模式的实体提取

Cohere v2的主要优势是使用JSON Schema模式提供结构化输出，保证响应符合您指定的模式。

简单实体提取：

text = "Dr. Sarah Johnson from Stanford University will speak at the AI Conference in Seattle on March 15th."

response = co.chat(
    model="command-a-03-2025",
    messages=[
        {"role": "user", "content": f"Extract all entities: {text}"}
    ],
    response_format={
        "type": "json_object",
        "schema": {
            "type": "object",
            "properties": {
                "person": {"type": "string"},
                "title": {"type": "string"},
                "organization": {"type": "string"},
                "event": {"type": "string"},
                "location": {"type": "string"},
                "date": {"type": "string", "format": "date"}
            },
            "required": ["person"]
        }
    }
)

import json
entities = json.loads(response.message.content[0].text)

关键原则：

顶级类型必须是"object"
至少有一个字段必须在"required"数组中
严格强制执行模式 - 无效响应将重新生成
第一次请求有延迟开销；后续请求被缓存

3. 多实体提取

批量处理时提取实体数组：

text = """
John Smith works at Google as a Software Engineer in San Francisco.
Jane Doe is a Data Scientist at Meta in New York.
Bob Wilson leads the AI team at OpenAI in Seattle.
"""

response = co.chat(
    model="command-a-03-2025",
    messages=[
        {"role": "user", "content": f"Extract all people and their details: {text}"}
    ],
    response_format={
        "type": "json_object",
        "schema": {
            "type": "object",
            "properties": {
                "people": {
                    "type": "array",
                    "items": {
                        "type": "object",
                        "properties": {
                            "name": {"type": "string"},
                            "company": {"type": "string"},
                            "role": {"type": "string"},
                            "location": {"type": "string"}
                        },
                        "required": ["name", "company"]
                    }
                }
            },
            "required": ["people"]
        }
    }
)

result = json.loads(response.message.content[0].text)
for person in result["people"]:
    print(f"{person['name']} works at {person['company']}")

4. 使用Enums进行分类

使用enums限制输出到特定类别：

text = "I absolutely love this product! The quality is amazing and customer service was helpful."

response = co.chat(
    model="command-a-03-2025",
    messages=[
        {"role": "user", "content": f"Analyze sentiment and aspects: {text}"}
    ],
    response_format={
        "type": "json_object",
        "schema": {
            "type": "object",
            "properties": {
                "overall_sentiment": {
                    "type": "string",
                    "enum": ["positive", "negative", "neutral", "mixed"]
                },
                "aspects": {
                    "type": "array",
                    "items": {
                        "type": "object",
                        "properties": {
                            "aspect": {"type": "string"},
                            "sentiment": {
                                "type": "string",
                                "enum": ["positive", "negative", "neutral"]
                            }
                        },
                        "required": ["aspect", "sentiment"]
                    }
                }
            },
            "required": ["overall_sentiment", "aspects"]
        }
    }
)

Enums的好处：

保证有效的类别值
消除后处理验证
支持直接数据库插入
支持下游逻辑无需错误处理

常见实体提取模式

命名实体识别（NER）

schema = {
    "type": "object",
    "properties": {
        "entities": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "text": {"type": "string"},
                    "type": {
                        "type": "string",
                        "enum": ["PERSON", "ORGANIZATION", "LOCATION", "DATE", "EVENT", "PRODUCT"]
                    },
                    "context": {"type": "string"}
                },
                "required": ["text", "type"]
            }
        }
    },
    "required": ["entities"]
}

简历/CV解析

schema = {
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "email": {
            "type": "string",
            "pattern": "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"
        },
        "phone": {"type": "string"},
        "experience": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "company": {"type": "string"},
                    "role": {"type": "string"},
                    "start_date": {"type": "string", "format": "date"},
                    "end_date": {"type": "string", "format": "date"},
                    "description": {"type": "string"}
                },
                "required": ["company", "role"]
            }
        },
        "education": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "institution": {"type": "string"},
                    "degree": {"type": "string"},
                    "field": {"type": "string"},
                    "graduation_year": {"type": "integer"}
                },
                "required": ["institution"]
            }
        },
        "skills": {
            "type": "array",
            "items": {"type": "string"}
        }
    },
    "required": ["name"]
}

发票/收据提取

schema = {
    "type": "object",
    "properties": {
        "invoice_number": {"type": "string"},
        "invoice_date": {"type": "string", "format": "date"},
        "vendor": {
            "type": "object",
            "properties": {
                "name": {"type": "string"},
                "address": {"type": "string"},
                "tax_id": {"type": "string"}
            },
            "required": ["name"]
        },
        "items": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "description": {"type": "string"},
                    "quantity": {"type": "number"},
                    "unit_price": {"type": "number"},
                    "total": {"type": "number"}
                },
                "required": ["description", "total"]
            }
        },
        "subtotal": {"type": "number"},
        "tax": {"type": "number"},
        "total": {"type": "number"}
    },
    "required": ["invoice_number", "vendor", "total"]
}

医疗报告提取

schema = {
    "type": "object",
    "properties": {
        "patient": {
            "type": "object",
            "properties": {
                "name": {"type": "string"},
                "age": {"type": "integer"},
                "gender": {
                    "type": "string",
                    "enum": ["male", "female", "other", "unknown"]
                }
            },
            "required": ["name"]
        },
        "diagnosis": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "condition": {"type": "string"},
                    "severity": {
                        "type": "string",
                        "enum": ["mild", "moderate", "severe"]
                    },
                    "notes": {"type": "string"}
                },
                "required": ["condition"]
            }
        },
        "medications": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "dosage": {"type": "string"},
                    "frequency": {"type": "string"}
                },
                "required": ["name"]
            }
        },
        "visit_date": {"type": "string", "format": "date"}
    },
    "required": ["patient", "visit_date"]
}

高级模式功能

带验证的嵌套对象

schema = {
    "type": "object",
    "properties": {
        "company": {
            "type": "object",
            "properties": {
                "name": {"type": "string"},
                "headquarters": {
                    "type": "object",
                    "properties": {
                        "street": {"type": "string"},
                        "city": {"type": "string"},
                        "country": {"type": "string"}
                    },
                    "required": ["city", "country"]
                }
            },
            "required": ["name"]
        }
    },
    "required": ["company"]
}

Schema重用与$ref

schema = {
    "type": "object",
    "$defs": {
        "person": {
            "type": "object",
            "properties": {
                "name": {"type": "string"},
                "email": {"type": "string"},
                "phone": {"type": "string"}
            },
            "required": ["name"]
        }
    },
    "properties": {
        "primary_contact": {"$ref": "#/$defs/person"},
        "secondary_contact": {"$ref": "#/$defs/person"}
    },
    "required": ["primary_contact"]
}

格式验证

schema = {
    "type": "object",
    "properties": {
        "created_at": {
            "type": "string",
            "format": "date-time"  # ISO 8601: 2024-01-01T12:00:00Z
        },
        "birth_date": {
            "type": "string",
            "format": "date"  # YYYY-MM-DD
        },
        "user_id": {
            "type": "string",
            "format": "uuid"
        },
        "email": {
            "type": "string",
            "pattern": "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"
        }
    },
    "required": ["user_id"]
}

构建实体提取管道的工作流程

第1步：定义您的模式

# 确定您需要提取的实体
entity_schema = {
    "type": "object",
    "properties": {
        "entities": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "text": {"type": "string"},
                    "type": {"type": "string", "enum": ["PERSON", "ORG", "LOCATION"]},
                    "confidence": {"type": "string", "enum": ["high", "medium", "low"]}
                },
                "required": ["text", "type"]
            }
        }
    },
    "required": ["entities"]
}

第2步：创建提取函数

def extract_entities(text, schema):
    response = co.chat(
        model="command-a-03-2025",
        messages=[
            {
                "role": "system",
                "content": "Extract entities accurately with appropriate confidence levels."
            },
            {
                "role": "user",
                "content": f"Extract all entities: {text}"
            }
        ],
        response_format={
            "type": "json_object",
            "schema": schema
        }
    )
    return json.loads(response.message.content[0].text)

第3步：批量处理

documents = [
    "Text 1...",
    "Text 2...",
    "Text 3..."
]

results = []
for doc in documents:
    entities = extract_entities(doc, entity_schema)
    results.append({
        "document": doc,
        "entities": entities["entities"]
    })

第4步：存储到数据库

import surrealdb  # 用SurrealDB示例

async def store_entities(entities):
    async with Surreal("ws://localhost:8000/rpc") as db:
        await db.signin({"user": "root", "pass": "root"})
        await db.use("entities", "database")

        for entity in entities["entities"]:
            await db.create("entity", entity)

最佳实践

模式设计

从只需要的字段开始，逐步添加可选字段
使用enums进行分类，确保输出有效
利用格式验证（日期、uuid、电子邮件）提高数据质量
使用$ref处理重复结构，保持模式DRY

提示

系统消息覆盖用户指令 - 用于提取指南
在用户消息中明确指示提取内容
始终在JSON模式下指示模型生成JSON（无模式）
为复杂提取在系统消息中提供示例

性能

模式在第一次请求后被缓存 - 在调用之间重用模式
简单模式的延迟开销最小
复杂的嵌套模式增加中等处理时间
处理多个文档时考虑批量提取

错误处理

始终在try-except块中包装JSON解析
即使有模式强制执行，也要验证必需字段是否存在
用指数退避优雅地处理API错误
记录失败的提取，以便调试和重新处理

生产考虑

通过response.meta.tokens监控令牌使用情况
实施速率限制和请求排队
缓存常见提取以减少API调用
根据任务复杂性与成本选择合适的模型

限制

不支持的模式功能

数值范围（minimum/maximum）
数组长度约束（minItems/maxItems）
字符串长度约束（minLength/maxLength）
一些复杂的正则表达式模式

当前限制

JSON模式下不支持RAG
工具模式下最多200个字段
模式模式增加延迟开销

参考文档

这项技能包括全面的参考文档：

references/chat_api.md - 完整的Chat API参考，包括参数、流媒体、工具使用、RAG和对话管理
references/structured_outputs.md - 深入的结构化输出指南，包括JSON Schema模式、验证、实体提取模式和高级功能

在实现特定功能或解决问题时加载这些参考。

其他资源

API文档：https://docs.cohere.com/v2/docs/chat-api
结构化输出：https://docs.cohere.com/v2/docs/structured-outputs
Python SDK：https://github.com/cohere-ai/cohere-python
PyPI包：https://pypi.org/project/cohere/
JSON Schema规范：https://json-schema.org/