Cloudflare 宣布推出实时语音 AI 平台(Cloudflare Realtime Agents),正式进军低延迟对话式 AI 领域。该平台依托 Cloudflare 全球 330 多个节点的边缘网络,为开发者提供构建语音交互应用的完整解决方案。
![]()
新平台的核心组件包括 Realtime Agents(语音 AI 管道编排运行时)、WebRTC 音频传输支持、Workers AI 的 WebSocket 实时推理 以及 Deepgram 的语音识别/合成模型。通过这些功能,开发者可快速搭建自然流畅的语音代理应用。
下面的示例代码展示了如何创建一个继承自 RealtimeAgent 的 JavaScript 类,以进行以下操作:
export class MyAgent extends RealtimeAgent<Env> {
constructor(ctx: DurableObjectState, env: Env) {
super(ctx, env);
}
async init(agentId: string ,meetingId: string, authToken: string, workerUrl: string, accountId: string, apiToken: string) {
// Construct your text processor for generating responses to text
const textHandler = new MyTextHandler(this.env);
// Construct a Meeting object to join the RTK meeting
const transport = new RealtimeKitTransport(meetingId, authToken, [
{
media_kind: 'audio',
stream_kind: 'microphone',
},
]);
const { meeting } = transport;
// Construct a pipeline to take in meeting audio, transcribe it using
// Deepgram, and pass our generated responses through ElevenLabs to
// be spoken in the meeting
await this.initPipeline(
[transport, new DeepgramSTT(this.env.DEEPGRAM_API_KEY), textHandler, new ElevenLabsTTS(this.env.ELEVENLABS_API_KEY), transport],
agentId,
workerUrl,
accountId,
apiToken,
);
// The RTK meeting object is accessible to us, so we can register handlers
// on various events like participant joins/leaves, chat, etc.
// This is optional
meeting.participants.joined.on('participantJoined', (participant) => {
textHandler.speak(`Participant Joined ${participant.name}`);
});
meeting.participants.joined.on('participantLeft', (participant) => {
textHandler.speak(`Participant Left ${participant.name}`);
});
// Make sure to actually join the meeting after registering all handlers
await meeting.rtkMeeting.join();
}
async deinit() {
// Add any other cleanup logic required
await this.deinitPipeline();
}
}
Cloudflare 指出,要让语音交互达到“自然对话”的体验,总延迟需低于 800 毫秒,而其分布式边缘架构正好能满足这一苛刻要求。平台同时兼容多种 AI 模型和第三方服务,支持高度可组合的语音处理管道。
目前,Cloudflare Realtime Agents 已开放 Beta 公测,开发者可免费试用并基于该平台开发新一代实时语音 AI 应用。