Coherev2PythonSkill cohere-v2-python

掌握Cohere v2聊天API,专注于使用JSON Schema模式进行实体提取,适用于从文本中提取结构化实体、构建NER系统、实现数据提取管道等场景。

NLP 0 次安装 0 次浏览 更新于 3/1/2026

Cohere v2 Python

概览

Cohere的v2 Chat API提供了强大的对话AI能力,特别专注于通过JSON Schema模式提供结构化输出。这项技能涵盖了从文本中提取实体、数据验证和集成模式,用于构建需要从LLMs获得一致、验证响应的生产就绪系统。

何时使用这项技能

在以下情况下应用这项技能:

  • 从非结构化文本中提取结构化实体(姓名、日期、地点、组织)
  • 构建命名实体识别(NER)系统
  • 实现具有验证输出的数据提取管道
  • 需要符合特定模式的JSON响应
  • 处理文档以提取信息
  • 使用约束输出构建分类系统
  • 将LLM响应与下游数据库或API集成

核心能力

1. 基础聊天API

初始化并使用Cohere客户端进行对话任务:

import cohere

co = cohere.ClientV2(api_key="<YOUR API KEY>")

response = co.chat(
    model="command-a-03-2025",
    messages=[
        {"role": "user", "content": "Summarize the key features of quantum computing."}
    ],
)

print(response.message.content[0].text)

可用模型:

  • command-a-03-2025 - 最新一代模型

有关全面的API参数、流媒体、RAG和工具使用,请参考references/chat_api.md

2. 使用JSON Schema模式的实体提取

Cohere v2的主要优势是使用JSON Schema模式提供结构化输出,保证响应符合您指定的模式。

简单实体提取:

text = "Dr. Sarah Johnson from Stanford University will speak at the AI Conference in Seattle on March 15th."

response = co.chat(
    model="command-a-03-2025",
    messages=[
        {"role": "user", "content": f"Extract all entities: {text}"}
    ],
    response_format={
        "type": "json_object",
        "schema": {
            "type": "object",
            "properties": {
                "person": {"type": "string"},
                "title": {"type": "string"},
                "organization": {"type": "string"},
                "event": {"type": "string"},
                "location": {"type": "string"},
                "date": {"type": "string", "format": "date"}
            },
            "required": ["person"]
        }
    }
)

import json
entities = json.loads(response.message.content[0].text)

关键原则:

  • 顶级类型必须是"object"
  • 至少有一个字段必须在"required"数组中
  • 严格强制执行模式 - 无效响应将重新生成
  • 第一次请求有延迟开销;后续请求被缓存

3. 多实体提取

批量处理时提取实体数组:

text = """
John Smith works at Google as a Software Engineer in San Francisco.
Jane Doe is a Data Scientist at Meta in New York.
Bob Wilson leads the AI team at OpenAI in Seattle.
"""

response = co.chat(
    model="command-a-03-2025",
    messages=[
        {"role": "user", "content": f"Extract all people and their details: {text}"}
    ],
    response_format={
        "type": "json_object",
        "schema": {
            "type": "object",
            "properties": {
                "people": {
                    "type": "array",
                    "items": {
                        "type": "object",
                        "properties": {
                            "name": {"type": "string"},
                            "company": {"type": "string"},
                            "role": {"type": "string"},
                            "location": {"type": "string"}
                        },
                        "required": ["name", "company"]
                    }
                }
            },
            "required": ["people"]
        }
    }
)

result = json.loads(response.message.content[0].text)
for person in result["people"]:
    print(f"{person['name']} works at {person['company']}")

4. 使用Enums进行分类

使用enums限制输出到特定类别:

text = "I absolutely love this product! The quality is amazing and customer service was helpful."

response = co.chat(
    model="command-a-03-2025",
    messages=[
        {"role": "user", "content": f"Analyze sentiment and aspects: {text}"}
    ],
    response_format={
        "type": "json_object",
        "schema": {
            "type": "object",
            "properties": {
                "overall_sentiment": {
                    "type": "string",
                    "enum": ["positive", "negative", "neutral", "mixed"]
                },
                "aspects": {
                    "type": "array",
                    "items": {
                        "type": "object",
                        "properties": {
                            "aspect": {"type": "string"},
                            "sentiment": {
                                "type": "string",
                                "enum": ["positive", "negative", "neutral"]
                            }
                        },
                        "required": ["aspect", "sentiment"]
                    }
                }
            },
            "required": ["overall_sentiment", "aspects"]
        }
    }
)

Enums的好处:

  • 保证有效的类别值
  • 消除后处理验证
  • 支持直接数据库插入
  • 支持下游逻辑无需错误处理

常见实体提取模式

命名实体识别(NER)

schema = {
    "type": "object",
    "properties": {
        "entities": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "text": {"type": "string"},
                    "type": {
                        "type": "string",
                        "enum": ["PERSON", "ORGANIZATION", "LOCATION", "DATE", "EVENT", "PRODUCT"]
                    },
                    "context": {"type": "string"}
                },
                "required": ["text", "type"]
            }
        }
    },
    "required": ["entities"]
}

简历/CV解析

schema = {
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "email": {
            "type": "string",
            "pattern": "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"
        },
        "phone": {"type": "string"},
        "experience": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "company": {"type": "string"},
                    "role": {"type": "string"},
                    "start_date": {"type": "string", "format": "date"},
                    "end_date": {"type": "string", "format": "date"},
                    "description": {"type": "string"}
                },
                "required": ["company", "role"]
            }
        },
        "education": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "institution": {"type": "string"},
                    "degree": {"type": "string"},
                    "field": {"type": "string"},
                    "graduation_year": {"type": "integer"}
                },
                "required": ["institution"]
            }
        },
        "skills": {
            "type": "array",
            "items": {"type": "string"}
        }
    },
    "required": ["name"]
}

发票/收据提取

schema = {
    "type": "object",
    "properties": {
        "invoice_number": {"type": "string"},
        "invoice_date": {"type": "string", "format": "date"},
        "vendor": {
            "type": "object",
            "properties": {
                "name": {"type": "string"},
                "address": {"type": "string"},
                "tax_id": {"type": "string"}
            },
            "required": ["name"]
        },
        "items": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "description": {"type": "string"},
                    "quantity": {"type": "number"},
                    "unit_price": {"type": "number"},
                    "total": {"type": "number"}
                },
                "required": ["description", "total"]
            }
        },
        "subtotal": {"type": "number"},
        "tax": {"type": "number"},
        "total": {"type": "number"}
    },
    "required": ["invoice_number", "vendor", "total"]
}

医疗报告提取

schema = {
    "type": "object",
    "properties": {
        "patient": {
            "type": "object",
            "properties": {
                "name": {"type": "string"},
                "age": {"type": "integer"},
                "gender": {
                    "type": "string",
                    "enum": ["male", "female", "other", "unknown"]
                }
            },
            "required": ["name"]
        },
        "diagnosis": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "condition": {"type": "string"},
                    "severity": {
                        "type": "string",
                        "enum": ["mild", "moderate", "severe"]
                    },
                    "notes": {"type": "string"}
                },
                "required": ["condition"]
            }
        },
        "medications": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "dosage": {"type": "string"},
                    "frequency": {"type": "string"}
                },
                "required": ["name"]
            }
        },
        "visit_date": {"type": "string", "format": "date"}
    },
    "required": ["patient", "visit_date"]
}

高级模式功能

带验证的嵌套对象

schema = {
    "type": "object",
    "properties": {
        "company": {
            "type": "object",
            "properties": {
                "name": {"type": "string"},
                "headquarters": {
                    "type": "object",
                    "properties": {
                        "street": {"type": "string"},
                        "city": {"type": "string"},
                        "country": {"type": "string"}
                    },
                    "required": ["city", "country"]
                }
            },
            "required": ["name"]
        }
    },
    "required": ["company"]
}

Schema重用与$ref

schema = {
    "type": "object",
    "$defs": {
        "person": {
            "type": "object",
            "properties": {
                "name": {"type": "string"},
                "email": {"type": "string"},
                "phone": {"type": "string"}
            },
            "required": ["name"]
        }
    },
    "properties": {
        "primary_contact": {"$ref": "#/$defs/person"},
        "secondary_contact": {"$ref": "#/$defs/person"}
    },
    "required": ["primary_contact"]
}

格式验证

schema = {
    "type": "object",
    "properties": {
        "created_at": {
            "type": "string",
            "format": "date-time"  # ISO 8601: 2024-01-01T12:00:00Z
        },
        "birth_date": {
            "type": "string",
            "format": "date"  # YYYY-MM-DD
        },
        "user_id": {
            "type": "string",
            "format": "uuid"
        },
        "email": {
            "type": "string",
            "pattern": "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"
        }
    },
    "required": ["user_id"]
}

构建实体提取管道的工作流程

第1步:定义您的模式

# 确定您需要提取的实体
entity_schema = {
    "type": "object",
    "properties": {
        "entities": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "text": {"type": "string"},
                    "type": {"type": "string", "enum": ["PERSON", "ORG", "LOCATION"]},
                    "confidence": {"type": "string", "enum": ["high", "medium", "low"]}
                },
                "required": ["text", "type"]
            }
        }
    },
    "required": ["entities"]
}

第2步:创建提取函数

def extract_entities(text, schema):
    response = co.chat(
        model="command-a-03-2025",
        messages=[
            {
                "role": "system",
                "content": "Extract entities accurately with appropriate confidence levels."
            },
            {
                "role": "user",
                "content": f"Extract all entities: {text}"
            }
        ],
        response_format={
            "type": "json_object",
            "schema": schema
        }
    )
    return json.loads(response.message.content[0].text)

第3步:批量处理

documents = [
    "Text 1...",
    "Text 2...",
    "Text 3..."
]

results = []
for doc in documents:
    entities = extract_entities(doc, entity_schema)
    results.append({
        "document": doc,
        "entities": entities["entities"]
    })

第4步:存储到数据库

import surrealdb  # 用SurrealDB示例

async def store_entities(entities):
    async with Surreal("ws://localhost:8000/rpc") as db:
        await db.signin({"user": "root", "pass": "root"})
        await db.use("entities", "database")

        for entity in entities["entities"]:
            await db.create("entity", entity)

最佳实践

模式设计

  • 从只需要的字段开始,逐步添加可选字段
  • 使用enums进行分类,确保输出有效
  • 利用格式验证(日期、uuid、电子邮件)提高数据质量
  • 使用$ref处理重复结构,保持模式DRY

提示

  • 系统消息覆盖用户指令 - 用于提取指南
  • 在用户消息中明确指示提取内容
  • 始终在JSON模式下指示模型生成JSON(无模式)
  • 为复杂提取在系统消息中提供示例

性能

  • 模式在第一次请求后被缓存 - 在调用之间重用模式
  • 简单模式的延迟开销最小
  • 复杂的嵌套模式增加中等处理时间
  • 处理多个文档时考虑批量提取

错误处理

  • 始终在try-except块中包装JSON解析
  • 即使有模式强制执行,也要验证必需字段是否存在
  • 用指数退避优雅地处理API错误
  • 记录失败的提取,以便调试和重新处理

生产考虑

  • 通过response.meta.tokens监控令牌使用情况
  • 实施速率限制和请求排队
  • 缓存常见提取以减少API调用
  • 根据任务复杂性与成本选择合适的模型

限制

不支持的模式功能

  • 数值范围(minimum/maximum)
  • 数组长度约束(minItems/maxItems)
  • 字符串长度约束(minLength/maxLength)
  • 一些复杂的正则表达式模式

当前限制

  • JSON模式下不支持RAG
  • 工具模式下最多200个字段
  • 模式模式增加延迟开销

参考文档

这项技能包括全面的参考文档:

  • references/chat_api.md - 完整的Chat API参考,包括参数、流媒体、工具使用、RAG和对话管理
  • references/structured_outputs.md - 深入的结构化输出指南,包括JSON Schema模式、验证、实体提取模式和高级功能

在实现特定功能或解决问题时加载这些参考。

其他资源