name: 对话音频 description: “使用Dia TTS创建多说话者对话音频。覆盖说话者标签、情感控制、节奏、对话流程和后制作。适用于:播客、有声书、讲解、角色对话、对话内容。触发器:对话音频、多说话者、对话音频、dia tts、两个说话者、播客音频、角色声音、配音、对话生成、对话tts、多声音、说话者标签、对话录制” allowed-tools: Bash(infsh *)
对话音频
通过inference.sh CLI使用Dia TTS创建逼真的多说话者对话。
快速开始
curl -fsSL https://cli.inference.sh | sh && infsh login
# 两个说话者的对话
infsh app run falai/dia-tts --input '{
"prompt": "[S1] Have you tried the new feature yet? [S2] Not yet, but I heard it saves a ton of time. [S1] It really does. I cut my workflow in half. [S2] Okay, I am definitely trying it today."
}'
安装说明: 安装脚本只检测您的操作系统/架构,从
dist.inference.sh下载匹配的二进制文件,并验证其SHA-256校验和。不需要提升权限或后台进程。提供手动安装和验证。
说话者标签
Dia TTS使用[S1]和[S2]来区分两个说话者。
| 标签 | 角色 | 声音 |
|---|---|---|
[S1] |
说话者1 | 自动分配声音A |
[S2] |
说话者2 | 自动分配声音B |
规则:
- 始终以标签开始每个说话者回合
- 标签必须大写:
[S1]而不是[s1] - 每次生成最多2个说话者
- 每个说话者在会话中保持一致的声音
情感和表达控制
Dia TTS解释标点符号和非语言线索进行情感传递。
标点符号效果
| 标点符号 | 效果 | 示例 |
|---|---|---|
. |
中性、陈述性、中等暂停 | “This is important.” |
! |
强调、兴奋、活力 | “This is amazing!” |
? |
上升语调、疑问 | “Are you sure about that?” |
... |
犹豫、逐渐减弱、长暂停 | “I thought it would work… but it didn’t.” |
, |
短呼吸暂停 | “First, we analyze. Then, we act.” |
— 或 -- |
中断或转折 | “I was going to say — never mind.” |
非语言声音
Dia TTS支持括号内的声音描述:
(laughs) — 笑声
(sighs) — 恼怒或轻松
(clears throat) — 吸引注意力的暂停
(whispers) — 轻柔传递
(gasps) — 惊讶
带情感的例子
# 兴奋的对话
infsh app run falai/dia-tts --input '{
"prompt": "[S1] Guess what happened today! [S2] What? Tell me! [S1] We hit ten thousand users! [S2] (gasps) No way! That is incredible! [S1] I know... I still cannot believe it."
}'
# 严肃/深思的对话
infsh app run falai/dia-tts --input '{
"prompt": "[S1] We need to talk about the timeline. [S2] (sighs) I know. It is tight. [S1] Can we cut anything from the scope? [S2] Maybe... but it would mean dropping the analytics dashboard. [S1] That is a tough trade-off."
}'
# 教学/解释
infsh app run falai/dia-tts --input '{
"prompt": "[S1] So how does it actually work? [S2] Great question. Think of it like a pipeline. Data comes in on one end, gets processed in the middle, and comes out transformed on the other side. [S1] Like an assembly line? [S2] Exactly! Each step adds something."
}'
节奏控制
暂停层次
| 技术 | 暂停长度 | 用于 |
|---|---|---|
逗号 , |
~0.3秒 | 从句之间、列表项 |
句号 . |
~0.5秒 | 句子之间 |
省略号 ... |
~1.0秒 | 戏剧性暂停、思考、犹豫 |
| 新说话者标签 | ~0.3秒 | 自然的轮流间隔 |
速度控制
- 短句 = 更快的感知速度
- 带逗号的长句 = 有节奏的、深思的速度
- 问题后跟答案 = 吸引人的来回节奏
# 快节奏、有活力
infsh app run falai/dia-tts --input '{
"prompt": "[S1] Ready? [S2] Ready. [S1] Let us go! Three features. Five minutes. [S2] Hit it! [S1] Feature one: real-time sync."
}'
# 慢、沉思
infsh app run falai/dia-tts --input '{
"prompt": "[S1] I have been thinking about this for a while... and I think we need to change direction. [S2] What do you mean? [S1] The market has shifted. What worked last year... is not working now."
}'
对话结构模式
访谈格式
infsh app run falai/dia-tts --input '{
"prompt": "[S1] Welcome to the show. Today we have a special guest. Tell us about yourself. [S2] Thanks for having me! I am a product designer, and I have been building tools for creators for about ten years. [S1] What got you started in design? [S2] Honestly? I was terrible at coding but loved making things look good. (laughs) So design was the natural path."
}'
教程 / 讲解
infsh app run falai/dia-tts --input '{
"prompt": "[S1] Can you walk me through the setup process? [S2] Sure. Step one, install the CLI. It takes about thirty seconds. [S1] And then? [S2] Step two, run the login command. It will open your browser for authentication. [S1] That sounds simple. [S2] It is! Step three, you are ready to run your first app."
}'
辩论 / 讨论
infsh app run falai/dia-tts --input '{
"prompt": "[S1] I think we should go with option A. It is faster to implement. [S2] But option B scales better long-term. [S1] Sure, but we need something shipping this quarter. [S2] Fair point... what if we do A now with a migration path to B? [S1] That could work. Let us prototype it."
}'
后制作技巧
音量标准化
两个说话者应该音量一致。如果一个较大声:
# 合并平衡音频
infsh app run infsh/video-audio-merger --input '{
"video": "talking-head.mp4",
"audio": "dialogue.mp3",
"audio_volume": 1.0
}'
添加背景/音乐
# 合并对话与背景音乐
infsh app run infsh/media-merger --input '{
"media": ["dialogue.mp3", "background-music.mp3"]
}'
分段长对话
对于超过约30秒的对话,分段生成:
# 段1: 介绍
infsh app run falai/dia-tts --input '{
"prompt": "[S1] Welcome back to another episode..."
}'
# 段2: 主要内容
infsh app run falai/dia-tts --input '{
"prompt": "[S1] So let us dive into today s topic..."
}'
# 段3: 总结
infsh app run falai/dia-tts --input '{
"prompt": "[S1] Great conversation today..."
}'
# 合并所有段
infsh app run infsh/media-merger --input '{
"media": ["segment1.mp3", "segment2.mp3", "segment3.mp3"]
}'
脚本编写技巧
| 做 | 不做 |
|---|---|
| 写人们如何说话 | 写人们如何写作 |
| 短句子 (< 15词) | 长学术句子 |
| 缩略形式 (“can’t”, “won’t”) | 正式形式 (“cannot”, “will not”) |
| 自然填充词 (“So,”, “Well,”) | 每个句子完美形成 |
| 变化句子长度 | 所有句子相同长度 |
| 包括反应 (“Exactly!”, “Hmm.”) | 单向独白 |
| 生成前大声朗读 | 认为听起来正确 |
常见错误
| 错误 | 问题 | 修复 |
|---|---|---|
| 超过3个句子的独白 | 听起来像讲座,而不是对话 | 分解成交流 |
| 无情感变化 | 平淡、机器人式传递 | 使用标点符号和非语言线索 |
| 缺少说话者标签 | 声音不交替 | 始终以[S1]或[S2]开始每个回合 |
| 正式书面语言 | 听起来不自然口语化 | 使用缩略形式、短句子 |
| 主题间无暂停 | 感觉匆忙 | 使用...或场景中断 |
| 所有相同能量水平 | 单调 | 变化高/低能量时刻 |
相关技能
npx skills add inference-sh/skills@text-to-speech
npx skills add inference-sh/skills@ai-podcast-creation
npx skills add inference-sh/skills@ai-avatar-video
浏览所有应用:infsh app list