VoxCPM 2

VoxCPM 2 Readiness Checklist

VoxCPM 2 is the current major release of the VoxCPM stack, designed for multilingual synthesis, voice design from text descriptions, controllable voice cloning, and 48kHz output. Production readiness depends on model access, GPU capacity, reference-audio quality, transcript availability, review policy, and the serving target.

View pricing plans

Best-fit use cases

  • A team is migrating from VoxCPM 1.x to VoxCPM 2 and wants a deployment checklist.
  • A creator tool needs voice design without requiring users to upload a reference clip.
  • A voice cloning feature needs strict review before customer-facing release.

Workflow steps

  1. Choose the intended generation mode: direct TTS, voice design, controllable cloning, ultimate cloning, or streaming.
  2. Check whether the workflow needs Hugging Face weights, ModelScope mirrors, NanoVLLM, or vLLM-Omni.
  3. Review target languages and audio-quality requirements.
  4. Confirm whether reference audio and transcripts meet the intended cloning mode.
  5. Export a deployment handoff with risks and next actions.

Common risks

  • Very long or highly expressive text can produce unstable results without review.
  • Production serving needs different infrastructure decisions than a local Gradio demo.
  • Older 1.x assumptions may not match VoxCPM 2 voice-design controls.

How VoxCPM Studio connects

Run the same intent through the readiness console, capture the script and voice mode, score unresolved blockers, and export a receipt after checkout.

Independent source-aware workflow

Keep upstream VoxCPM references visible while adding product-grade review.

Use the open-source project and documentation as technical source material, then use VoxCPM Studio to document team-specific decisions, approvals, and paid production handoffs.