对话音频Skill dialogue-audio

对话音频技能使用Dia TTS人工智能技术,通过文本转语音生成逼真的多说话者对话音频。它支持说话者标签区分、情感控制、节奏调整和对话结构设计,适用于播客制作、有声书生成、角色对话和讲解内容创建。关键词包括:AI音频生成、多说话者对话、文本转语音、语音合成、情感语音、音频制作。

AIGC 0 次安装 0 次浏览 更新于 3/12/2026

name: 对话音频 description: “使用Dia TTS创建多说话者对话音频。覆盖说话者标签、情感控制、节奏、对话流程和后制作。适用于:播客、有声书、讲解、角色对话、对话内容。触发器:对话音频、多说话者、对话音频、dia tts、两个说话者、播客音频、角色声音、配音、对话生成、对话tts、多声音、说话者标签、对话录制” allowed-tools: Bash(infsh *)

对话音频

通过inference.sh CLI使用Dia TTS创建逼真的多说话者对话。

快速开始

curl -fsSL https://cli.inference.sh | sh && infsh login

# 两个说话者的对话
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] Have you tried the new feature yet? [S2] Not yet, but I heard it saves a ton of time. [S1] It really does. I cut my workflow in half. [S2] Okay, I am definitely trying it today."
}'

安装说明: 安装脚本只检测您的操作系统/架构,从dist.inference.sh下载匹配的二进制文件,并验证其SHA-256校验和。不需要提升权限或后台进程。提供手动安装和验证

说话者标签

Dia TTS使用[S1][S2]来区分两个说话者。

标签 角色 声音
[S1] 说话者1 自动分配声音A
[S2] 说话者2 自动分配声音B

规则:

  • 始终以标签开始每个说话者回合
  • 标签必须大写:[S1]而不是[s1]
  • 每次生成最多2个说话者
  • 每个说话者在会话中保持一致的声音

情感和表达控制

Dia TTS解释标点符号和非语言线索进行情感传递。

标点符号效果

标点符号 效果 示例
. 中性、陈述性、中等暂停 “This is important.”
! 强调、兴奋、活力 “This is amazing!”
? 上升语调、疑问 “Are you sure about that?”
... 犹豫、逐渐减弱、长暂停 “I thought it would work… but it didn’t.”
, 短呼吸暂停 “First, we analyze. Then, we act.”
-- 中断或转折 “I was going to say — never mind.”

非语言声音

Dia TTS支持括号内的声音描述:

(laughs)      — 笑声
(sighs)       — 恼怒或轻松
(clears throat) — 吸引注意力的暂停
(whispers)    — 轻柔传递
(gasps)       — 惊讶

带情感的例子

# 兴奋的对话
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] Guess what happened today! [S2] What? Tell me! [S1] We hit ten thousand users! [S2] (gasps) No way! That is incredible! [S1] I know... I still cannot believe it."
}'

# 严肃/深思的对话
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] We need to talk about the timeline. [S2] (sighs) I know. It is tight. [S1] Can we cut anything from the scope? [S2] Maybe... but it would mean dropping the analytics dashboard. [S1] That is a tough trade-off."
}'

# 教学/解释
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] So how does it actually work? [S2] Great question. Think of it like a pipeline. Data comes in on one end, gets processed in the middle, and comes out transformed on the other side. [S1] Like an assembly line? [S2] Exactly! Each step adds something."
}'

节奏控制

暂停层次

技术 暂停长度 用于
逗号 , ~0.3秒 从句之间、列表项
句号 . ~0.5秒 句子之间
省略号 ... ~1.0秒 戏剧性暂停、思考、犹豫
新说话者标签 ~0.3秒 自然的轮流间隔

速度控制

  • 短句 = 更快的感知速度
  • 带逗号的长句 = 有节奏的、深思的速度
  • 问题后跟答案 = 吸引人的来回节奏
# 快节奏、有活力
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] Ready? [S2] Ready. [S1] Let us go! Three features. Five minutes. [S2] Hit it! [S1] Feature one: real-time sync."
}'

# 慢、沉思
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] I have been thinking about this for a while... and I think we need to change direction. [S2] What do you mean? [S1] The market has shifted. What worked last year... is not working now."
}'

对话结构模式

访谈格式

infsh app run falai/dia-tts --input '{
  "prompt": "[S1] Welcome to the show. Today we have a special guest. Tell us about yourself. [S2] Thanks for having me! I am a product designer, and I have been building tools for creators for about ten years. [S1] What got you started in design? [S2] Honestly? I was terrible at coding but loved making things look good. (laughs) So design was the natural path."
}'

教程 / 讲解

infsh app run falai/dia-tts --input '{
  "prompt": "[S1] Can you walk me through the setup process? [S2] Sure. Step one, install the CLI. It takes about thirty seconds. [S1] And then? [S2] Step two, run the login command. It will open your browser for authentication. [S1] That sounds simple. [S2] It is! Step three, you are ready to run your first app."
}'

辩论 / 讨论

infsh app run falai/dia-tts --input '{
  "prompt": "[S1] I think we should go with option A. It is faster to implement. [S2] But option B scales better long-term. [S1] Sure, but we need something shipping this quarter. [S2] Fair point... what if we do A now with a migration path to B? [S1] That could work. Let us prototype it."
}'

后制作技巧

音量标准化

两个说话者应该音量一致。如果一个较大声:

# 合并平衡音频
infsh app run infsh/video-audio-merger --input '{
  "video": "talking-head.mp4",
  "audio": "dialogue.mp3",
  "audio_volume": 1.0
}'

添加背景/音乐

# 合并对话与背景音乐
infsh app run infsh/media-merger --input '{
  "media": ["dialogue.mp3", "background-music.mp3"]
}'

分段长对话

对于超过约30秒的对话,分段生成:

# 段1: 介绍
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] Welcome back to another episode..."
}'

# 段2: 主要内容
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] So let us dive into today s topic..."
}'

# 段3: 总结
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] Great conversation today..."
}'

# 合并所有段
infsh app run infsh/media-merger --input '{
  "media": ["segment1.mp3", "segment2.mp3", "segment3.mp3"]
}'

脚本编写技巧

不做
写人们如何说话 写人们如何写作
短句子 (< 15词) 长学术句子
缩略形式 (“can’t”, “won’t”) 正式形式 (“cannot”, “will not”)
自然填充词 (“So,”, “Well,”) 每个句子完美形成
变化句子长度 所有句子相同长度
包括反应 (“Exactly!”, “Hmm.”) 单向独白
生成前大声朗读 认为听起来正确

常见错误

错误 问题 修复
超过3个句子的独白 听起来像讲座,而不是对话 分解成交流
无情感变化 平淡、机器人式传递 使用标点符号和非语言线索
缺少说话者标签 声音不交替 始终以[S1][S2]开始每个回合
正式书面语言 听起来不自然口语化 使用缩略形式、短句子
主题间无暂停 感觉匆忙 使用...或场景中断
所有相同能量水平 单调 变化高/低能量时刻

相关技能

npx skills add inference-sh/skills@text-to-speech
npx skills add inference-sh/skills@ai-podcast-creation
npx skills add inference-sh/skills@ai-avatar-video

浏览所有应用:infsh app list