Official API Documentation

Developer Docs

Integrate realistic Khmer Text-to-Speech (TTS) and voice cloning capabilities into your apps via standard RESTful endpoints.

infoOverview

Khmer Vox TTS exposes audio endpoints backed by VoxCPM. For production apps and business customers, use clone voice with a reference audio sample because normal VoxCPM speech can change speaker between generations. Single calls return MP3 audio bytes; batch calls create queued jobs that can be polled or downloaded as ZIP files containing MP3 files.

Try API Console Create API Key

Production path

Speech

POST

/api/v1/audio/speech

Fast single-call TTS for tests where speaker consistency is not required.

Clone

POST

/api/v1/audio/clone

Reference-audio voice for real products and business users.

Batch Clone

POST

/api/v1/audio/batch/clone

Async high-volume jobs with external IDs, metadata, and webhook callbacks.

Base URL

https://voxtts.online

Authentication

Bearer API key

Output

MP3 audio + MP3 job ZIP

Get started

Quickstart

Buy tokens

1Create an account or sign in.
2Buy Starter or Creator from Billing. Free accounts cannot create or use API keys.
3Open Dashboard > API and create an API key. Daily and monthly limits are optional.
4For dubbing tools, start with the multipart clone endpoint using sample, input, and voice_consent=true. Use batch clone with items for many segments.

Dubbing integration

Use this setup for video dubbing tools

Dubbing systems should use clone voice with a reference speaker sample. Do not depend on a named voice for business work because normal VoxCPM speech can change speaker between calls.

Test multipart clone

1. First live test

Call /api/v1/audio/clone with multipart/form-data. Send sample as a file, input as Khmer text, and voice_consent=true.

sample=@reference.wav
input=Khmer line text
voice_consent=true

2. Production segments

For many short lines, call /api/v1/audio/batch/clone with items. Put your segment ID in id or external_id so the ZIP filenames map back to video segments.

items[0].id=episode01_seg_0001
items[0].text=Khmer line text
output=MP3 ZIP

3. Timing work

The API returns MP3 audio. Your dubber should trim leading silence and fit audio to the target video timing after download.

trim leading silence
fit to start_ms/end_ms
mix into video

Live API test

Test Console

Paste a paid API key and send a real request from this browser. The key stays only in this form state and is not saved. Requests can use tokens.

Manage Keys

https://voxtts.online/api/v1/audio/clone

Recommended for dubbing tests. Sends multipart/form-data with sample, input, and voice_consent=true.

API key

Text input

Reference audio

This console uses the production clone format: multipart/form-data with sample, input, model, voice_consent=true, and output_format=mp3.

POST/api/v1/audio/speech

Text to Speech

Generate speech using our built-in standard preset voices: Female (Sokha) or Male (Piseth). Token cost is 1.5 tokens per estimated output second.

cURLexample

curl -X POST "https://voxtts.online/api/v1/audio/speech" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  --output speech.mp3 \
  --data '{
    "model": "voxcpm2",
    "voice": "female",  # Optional: "female" (Sokha) or "male" (Piseth)
    "input": "soursdey nih ku chea somleng khmer",
    "output_format": "mp3"
  }'

Pythonexample

import requests

response = requests.post(
    "https://voxtts.online/api/v1/audio/speech",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json",
    },
    json={
        "model": "voxcpm2",
        "voice": "female",  # Optional: "female" (Sokha) or "male" (Piseth)
        "input": "soursdey nih ku chea somleng khmer",
        "output_format": "mp3",
    },
    timeout=600,
)
response.raise_for_status()
open("speech.mp3", "wb").write(response.content)
print(response.headers.get("X-Remaining-Tokens"))

Field	Type	Description
input	string	Required. 1 to 5,000 characters.
voice	string	Optional. Set to 'female' (default preset, Sokha) or 'male' (Piseth) to select standard voices.
model	string	Optional. Default comes from the deployed worker, usually voxcpm2.
output_format	string	Optional. Only mp3 is supported by the public API.

POST/api/v1/audio/clone

Session Clone Voice

Recommended for real users and business API customers. For dubbing tools, use the multipart upload format first. The reference audio controls the speaker for the request/session and is not saved as a lifetime voice profile.

Multipart upload recommendedexample

curl -X POST "https://voxtts.online/api/v1/audio/clone" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  --output cloned.mp3 \
  -F "sample=@reference.wav;type=audio/wav" \
  -F "input=soursdey nih ku chea somleng khmer" \
  -F "model=voxcpm2" \
  -F "voice_consent=true" \
  -F "output_format=mp3"

Python multipart file sendexample

import requests

files = {
    "sample": ("ref.wav", open("reference.wav", "rb"), "audio/wav"),
}
data = {
    "input": "soursdey nih ku chea somleng khmer",
    "model": "voxcpm2",
    "voice_consent": "true",
    "output_format": "mp3",
}

response = requests.post(
    "https://voxtts.online/api/v1/audio/clone",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    files=files,
    data=data,
    timeout=600,
)
response.raise_for_status()
open("cloned.mp3", "wb").write(response.content)

Field	Type	Description
sample	file (binary)	Required. WAV or MP3 audio file containing clean voice sample, max 8.0MB.
input	string	Required. 1 to 5,000 characters.
voice_consent	boolean	Required. Must be set to true indicating the user has permission to use the reference voice.
model	string	Optional. Default is voxcpm2.
voice	string	Optional. Descriptive guide for prosody styling.

POST/api/v1/audio/batch/clone

Batch Clone Voice (Async)

Ideal for dubbing lines. Submit a reference voice sample and a list of text items. The request finishes instantly, returning a job ID that polls progress until ZIP download is ready.

cURL Multipart Batchexample

curl -X POST "https://voxtts.online/api/v1/audio/batch/clone" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "sample=@reference.wav;type=audio/wav" \
  -F "model=voxcpm2" \
  -F "voice_consent=true" \
  -F "webhook_url=https://your-app.com/callback" \
  -F "items=[{\"id\":\"seg01\",\"text\":\"soursdey segment one\"},{\"id\":\"seg02\",\"text\":\"somleng khmer segment two\"}]"

Python Multipart Batchexample

import json
import requests

files = {
    "sample": ("ref.wav", open("reference.wav", "rb"), "audio/wav"),
}
data = {
    "voice_consent": "true",
    "webhook_url": "https://your-app.com/callback",
    "items": json.dumps([
        {"id": "seg01", "text": "soursdey segment one"},
        {"id": "seg02", "text": "somleng khmer segment two"},
    ]),
}

response = requests.post(
    "https://voxtts.online/api/v1/audio/batch/clone",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    files=files,
    data=data,
    timeout=60,
)
print(response.json()) # Contains job_id, status: pending

Field	Type	Description
sample	file (binary)	Required. Voice sample sample for the batch speaker, max 8MB.
items	JSON array/string	Required list of generation items. Each item must have text. id and title are optional. Maximum 100 items per batch request.
voice_consent	boolean	Required. Must be set to true.
webhook_url	string	Optional. Webhook URL to receive a POST event when the batch completes or fails.
model	string	Optional. Default is voxcpm2.

Tokens and limits

Token Deductions

Standard Speech1.5 tokens / generated audio sec
Standard Voice Cloning2.0 tokens / generated audio sec
Approved Business Clone1.5 tokens / generated audio sec

Platform Safety Default Limits

Max Input Characters5,000 chars
Max Reference Audio size8 MB
API Concurrent Requests4 concurrency

Response Metadata

All successful speech and clone endpoints return the raw MP3 audio bytes directly in the response body. Additional usage metadata is provided via HTTP headers.

Content-Type

audio/mpeg

X-Remaining-Tokens

Available prepaid token balance left on the user's account after deducting for this request.

X-Credits-Charged

Prepaid tokens deducted for the generation duration.

X-Generation-Duration-Seconds

The exact duration of the generated audio in seconds (used to calculate X-Credits-Charged).

Failure states

Errors

HTTP Code	Cause / Recovery
401	Missing bearer token.
403	Invalid bearer token or suspended account.
402	Insufficient tokens for this request.
429	Rate limit or concurrency limit reached.
502	Upstream TTS worker failed.

Payment readiness

Payment Webhooks

Current checkout uses manual ABA PayWay QR receipt approval, so production PayWay webhooks are not active yet. After ABA PayWay issues production credentials, use the URL below so the app can check signature, amount, currency, order status, and duplicate callbacks before adding tokens automatically.

Webhook URLexample

https://voxtts.online/api/webhooks/payway

Current mode

Manual PayWay QR + admin receipt approval

Production mode

Dynamic PayWay QR + signed callback verification

Payment Guide Refund Policy

curl -X POST "https://voxtts.online/api/v1/audio/speech" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ --output speech.mp3 \ --data '{ "model": "voxcpm2", "voice": "female", # Optional: "female" (Sokha) or "male" (Piseth) "input": "soursdey nih ku chea somleng khmer", "output_format": "mp3" }'

import requests response = requests.post( "https://voxtts.online/api/v1/audio/speech", headers={ "Authorization": "Bearer YOUR_API_KEY", "Content-Type": "application/json", }, json={ "model": "voxcpm2", "voice": "female", # Optional: "female" (Sokha) or "male" (Piseth) "input": "soursdey nih ku chea somleng khmer", "output_format": "mp3", }, timeout=600, ) response.raise_for_status() open("speech.mp3", "wb").write(response.content) print(response.headers.get("X-Remaining-Tokens"))

Field

Type

Description

input

string

Required. 1 to 5,000 characters.

voice

string

Optional. Set to 'female' (default preset, Sokha) or 'male' (Piseth) to select standard voices.

model

string

Optional. Default comes from the deployed worker, usually voxcpm2.

output_format

string

Optional. Only mp3 is supported by the public API.

curl -X POST "https://voxtts.online/api/v1/audio/clone" \ -H "Authorization: Bearer YOUR_API_KEY" \ --output cloned.mp3 \ -F "sample=@reference.wav;type=audio/wav" \ -F "input=soursdey nih ku chea somleng khmer" \ -F "model=voxcpm2" \ -F "voice_consent=true" \ -F "output_format=mp3"

import requests files = { "sample": ("ref.wav", open("reference.wav", "rb"), "audio/wav"), } data = { "input": "soursdey nih ku chea somleng khmer", "model": "voxcpm2", "voice_consent": "true", "output_format": "mp3", } response = requests.post( "https://voxtts.online/api/v1/audio/clone", headers={"Authorization": "Bearer YOUR_API_KEY"}, files=files, data=data, timeout=600, ) response.raise_for_status() open("cloned.mp3", "wb").write(response.content)

Field

Type

Description

sample

file (binary)

Required. WAV or MP3 audio file containing clean voice sample, max 8.0MB.

input

string

Required. 1 to 5,000 characters.

voice_consent

boolean

Required. Must be set to true indicating the user has permission to use the reference voice.

model

string

Optional. Default is voxcpm2.

voice

string

Optional. Descriptive guide for prosody styling.

curl -X POST "https://voxtts.online/api/v1/audio/batch/clone" \ -H "Authorization: Bearer YOUR_API_KEY" \ -F "sample=@reference.wav;type=audio/wav" \ -F "model=voxcpm2" \ -F "voice_consent=true" \ -F "webhook_url=https://your-app.com/callback" \ -F "items=[{\"id\":\"seg01\",\"text\":\"soursdey segment one\"},{\"id\":\"seg02\",\"text\":\"somleng khmer segment two\"}]"

import json import requests files = { "sample": ("ref.wav", open("reference.wav", "rb"), "audio/wav"), } data = { "voice_consent": "true", "webhook_url": "https://your-app.com/callback", "items": json.dumps([ {"id": "seg01", "text": "soursdey segment one"}, {"id": "seg02", "text": "somleng khmer segment two"}, ]), } response = requests.post( "https://voxtts.online/api/v1/audio/batch/clone", headers={"Authorization": "Bearer YOUR_API_KEY"}, files=files, data=data, timeout=60, ) print(response.json()) # Contains job_id, status: pending

Field

Type

Description

sample

file (binary)

Required. Voice sample sample for the batch speaker, max 8MB.

items

JSON array/string

Required list of generation items. Each item must have text. id and title are optional. Maximum 100 items per batch request.

voice_consent

boolean

Required. Must be set to true.

webhook_url

string

Optional. Webhook URL to receive a POST event when the batch completes or fails.

model

string

Optional. Default is voxcpm2.

Response Metadata

All successful speech and clone endpoints return the raw MP3 audio bytes directly in the response body. Additional usage metadata is provided via HTTP headers.

Content-Type

audio/mpeg

X-Remaining-Tokens

Available prepaid token balance left on the user's account after deducting for this request.

X-Credits-Charged

Prepaid tokens deducted for the generation duration.

X-Generation-Duration-Seconds

The exact duration of the generated audio in seconds (used to calculate X-Credits-Charged).

HTTP Code

Cause / Recovery

401

Missing bearer token.

403

Invalid bearer token or suspended account.

402

Insufficient tokens for this request.

429

Rate limit or concurrency limit reached.

502

Upstream TTS worker failed.

Payment Webhooks

Webhook URLexample

https://voxtts.online/api/webhooks/payway

Current mode

Manual PayWay QR + admin receipt approval

Production mode

Dynamic PayWay QR + signed callback verification

Developer Docs

infoOverview

Speech

Clone

Batch Clone

Quickstart

Use this setup for video dubbing tools

1. First live test

2. Production segments

3. Timing work

Test Console

Text to Speech

Session Clone Voice

Batch Clone Voice (Async)

Tokens and limits

Token Deductions

Platform Safety Default Limits

Response Metadata

Errors

Payment Webhooks

Calibrating Vocal Cords...

Developer Docs

infoOverview

Speech

Clone

Batch Clone

Quickstart

Use this setup for video dubbing tools

1. First live test

2. Production segments

3. Timing work

Test Console

Text to Speech

Session Clone Voice

Batch Clone Voice (Async)

Tokens and limits

Token Deductions

Platform Safety Default Limits

Response Metadata

Errors

Payment Webhooks