Calibrating Vocal Cords...
Khmer Vox is getting ready for you.
Khmer Vox is getting ready for you.
Integrate realistic Khmer Text-to-Speech (TTS) and voice cloning capabilities into your apps via standard RESTful endpoints.
Khmer Vox TTS exposes audio endpoints backed by VoxCPM. For production apps and business customers, use clone voice with a reference audio sample because normal VoxCPM speech can change speaker between generations. Single calls return MP3 audio bytes; batch calls create queued jobs that can be polled or downloaded as ZIP files containing MP3 files.
Production path
/api/v1/audio/speech
Fast single-call TTS for tests where speaker consistency is not required.
/api/v1/audio/clone
Reference-audio voice for real products and business users.
/api/v1/audio/batch/clone
Async high-volume jobs with external IDs, metadata, and webhook callbacks.
Base URL
https://voxtts.online
Authentication
Bearer API key
Output
MP3 audio + MP3 job ZIP
Get started
Dubbing integration
Dubbing systems should use clone voice with a reference speaker sample. Do not depend on a named voice for business work because normal VoxCPM speech can change speaker between calls.
Call /api/v1/audio/clone with multipart/form-data. Send sample as a file, input as Khmer text, and voice_consent=true.
sample=@reference.wav input=Khmer line text voice_consent=true
For many short lines, call /api/v1/audio/batch/clone with items. Put your segment ID in id or external_id so the ZIP filenames map back to video segments.
items[0].id=episode01_seg_0001 items[0].text=Khmer line text output=MP3 ZIP
The API returns MP3 audio. Your dubber should trim leading silence and fit audio to the target video timing after download.
trim leading silence fit to start_ms/end_ms mix into video
Live API test
Paste a paid API key and send a real request from this browser. The key stays only in this form state and is not saved. Requests can use tokens.
https://voxtts.online/api/v1/audio/clone
Recommended for dubbing tests. Sends multipart/form-data with sample, input, and voice_consent=true.
This console uses the production clone format: multipart/form-data with sample, input, model, voice_consent=true, and output_format=mp3.
Generate speech using our built-in standard preset voices: Female (Sokha) or Male (Piseth). Token cost is 1.5 tokens per estimated output second.
curl -X POST "https://voxtts.online/api/v1/audio/speech" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
--output speech.mp3 \
--data '{
"model": "voxcpm2",
"voice": "female", # Optional: "female" (Sokha) or "male" (Piseth)
"input": "soursdey nih ku chea somleng khmer",
"output_format": "mp3"
}'import requests
response = requests.post(
"https://voxtts.online/api/v1/audio/speech",
headers={
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json",
},
json={
"model": "voxcpm2",
"voice": "female", # Optional: "female" (Sokha) or "male" (Piseth)
"input": "soursdey nih ku chea somleng khmer",
"output_format": "mp3",
},
timeout=600,
)
response.raise_for_status()
open("speech.mp3", "wb").write(response.content)
print(response.headers.get("X-Remaining-Tokens"))| Field | Type | Description |
|---|---|---|
| input | string | Required. 1 to 5,000 characters. |
| voice | string | Optional. Set to 'female' (default preset, Sokha) or 'male' (Piseth) to select standard voices. |
| model | string | Optional. Default comes from the deployed worker, usually voxcpm2. |
| output_format | string | Optional. Only mp3 is supported by the public API. |
Recommended for real users and business API customers. For dubbing tools, use the multipart upload format first. The reference audio controls the speaker for the request/session and is not saved as a lifetime voice profile.
curl -X POST "https://voxtts.online/api/v1/audio/clone" \ -H "Authorization: Bearer YOUR_API_KEY" \ --output cloned.mp3 \ -F "sample=@reference.wav;type=audio/wav" \ -F "input=soursdey nih ku chea somleng khmer" \ -F "model=voxcpm2" \ -F "voice_consent=true" \ -F "output_format=mp3"
import requests
files = {
"sample": ("ref.wav", open("reference.wav", "rb"), "audio/wav"),
}
data = {
"input": "soursdey nih ku chea somleng khmer",
"model": "voxcpm2",
"voice_consent": "true",
"output_format": "mp3",
}
response = requests.post(
"https://voxtts.online/api/v1/audio/clone",
headers={"Authorization": "Bearer YOUR_API_KEY"},
files=files,
data=data,
timeout=600,
)
response.raise_for_status()
open("cloned.mp3", "wb").write(response.content)| Field | Type | Description |
|---|---|---|
| sample | file (binary) | Required. WAV or MP3 audio file containing clean voice sample, max 8.0MB. |
| input | string | Required. 1 to 5,000 characters. |
| voice_consent | boolean | Required. Must be set to true indicating the user has permission to use the reference voice. |
| model | string | Optional. Default is voxcpm2. |
| voice | string | Optional. Descriptive guide for prosody styling. |
Ideal for dubbing lines. Submit a reference voice sample and a list of text items. The request finishes instantly, returning a job ID that polls progress until ZIP download is ready.
curl -X POST "https://voxtts.online/api/v1/audio/batch/clone" \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "sample=@reference.wav;type=audio/wav" \
-F "model=voxcpm2" \
-F "voice_consent=true" \
-F "webhook_url=https://your-app.com/callback" \
-F "items=[{\"id\":\"seg01\",\"text\":\"soursdey segment one\"},{\"id\":\"seg02\",\"text\":\"somleng khmer segment two\"}]"import json
import requests
files = {
"sample": ("ref.wav", open("reference.wav", "rb"), "audio/wav"),
}
data = {
"voice_consent": "true",
"webhook_url": "https://your-app.com/callback",
"items": json.dumps([
{"id": "seg01", "text": "soursdey segment one"},
{"id": "seg02", "text": "somleng khmer segment two"},
]),
}
response = requests.post(
"https://voxtts.online/api/v1/audio/batch/clone",
headers={"Authorization": "Bearer YOUR_API_KEY"},
files=files,
data=data,
timeout=60,
)
print(response.json()) # Contains job_id, status: pending| Field | Type | Description |
|---|---|---|
| sample | file (binary) | Required. Voice sample sample for the batch speaker, max 8MB. |
| items | JSON array/string | Required list of generation items. Each item must have text. id and title are optional. Maximum 100 items per batch request. |
| voice_consent | boolean | Required. Must be set to true. |
| webhook_url | string | Optional. Webhook URL to receive a POST event when the batch completes or fails. |
| model | string | Optional. Default is voxcpm2. |
All successful speech and clone endpoints return the raw MP3 audio bytes directly in the response body. Additional usage metadata is provided via HTTP headers.
Content-Type
audio/mpeg
X-Remaining-Tokens
Available prepaid token balance left on the user's account after deducting for this request.
X-Credits-Charged
Prepaid tokens deducted for the generation duration.
X-Generation-Duration-Seconds
The exact duration of the generated audio in seconds (used to calculate X-Credits-Charged).
Failure states
| HTTP Code | Cause / Recovery |
|---|---|
| 401 | Missing bearer token. |
| 403 | Invalid bearer token or suspended account. |
| 402 | Insufficient tokens for this request. |
| 429 | Rate limit or concurrency limit reached. |
| 502 | Upstream TTS worker failed. |
Payment readiness
Current checkout uses manual ABA PayWay QR receipt approval, so production PayWay webhooks are not active yet. After ABA PayWay issues production credentials, use the URL below so the app can check signature, amount, currency, order status, and duplicate callbacks before adding tokens automatically.
https://voxtts.online/api/webhooks/payway
Current mode
Manual PayWay QR + admin receipt approval
Production mode
Dynamic PayWay QR + signed callback verification