Submit audio for asynchronous moderation and receive the results through a callback URL.
API Description
Asynchronous audio moderation API. Submits one audio clip per request and receives an immediate acknowledgement with a btId; the full moderation result is delivered later via callback to the URL you provide. Detects regulatory risks in audio content — including political content, pornography, advertising, and violence & terrorism — and can additionally identify minors, voice timbre, language, and other content based on the business scenarios enabled on your account.
Host audio on a highly available CDN before submitting the URL. DeepCleer fetches each clip directly from the origin server you provide, so any outage or single point of failure on your side will cause fetch failures — and an audio clip that can't be fetched can't be moderated.
Timeout Suggestion
Asynchronous request (this endpoint): recommended timeout of 5 seconds for the acknowledgement call
Callback delivery: results arrive separately; keep your callback handler fast (< 2 seconds) so DeepCleer doesn't retry unnecessarily
ℹ️
The initial acknowledgement returns almost immediately once the request is validated. End-to-end moderation time is dominated by how long DeepCleer takes to fetch your audio and run the requested detection types, so keep your hosting fast and reliable. Actual duration varies with the request type and the audio size.
Callback Mechanism
Results are delivered to the callback URL you supply in the request. When DeepCleer calls your endpoint:
Your endpoint must respond with HTTP 200 OK within a few seconds. Any non-2xx response or timeout is treated as a delivery failure.
On failure, DeepCleer retries with exponential backoff. After repeated failures the result is dropped; we recommend monitoring your callback handler and reprocessing via btId if needed.
Your endpoint should be idempotent on btId — the same result may be delivered more than once if an earlier delivery succeeded but the response was lost in transit.
Request
Request URL
Cluster
Request URL
Silicon Valley
http://api-audio-gg.fengkongcloud.com/audio/v4
Singapore
http://api-audio-xjp.fengkongcloud.com/audio/v4
Request Parameters
Parameter
Type
Required
Max Length
Description
accessKey
string
Yes
20
API authentication key. The default accessKey is sent in your onboarding email.
appId
string
Yes
64
Application identifier, such as web for your web application or app for your mobile app. The default appId is sent in your onboarding email. Contact DeepCleer if you need a new appId.
eventId
string
Yes
64
Event identifier used to distinguish moderation scenarios in your application, such as voiceMessage for chat voice messages or liveAudio for livestream audio. The default eventId is sent in your onboarding email. Contact DeepCleer if you need a new eventId.
type
string
Conditional
64
Risk detection types to run. Either type or businessType must be provided; you can also provide both. Multiple values can be combined with underscores (for example, POLITY_EROTIC_MOAN_ADVERT). See Risk Detection Types for the full catalog.
businessType
string
Conditional
128
Business detection types to run — your organization's custom moderation categories, configured with DeepCleer separately from the built-in type catalog. Either type or businessType must be provided; you can also provide both. Multiple values can be combined with underscores. See Business Detection Types for the full catalog.
contentType
string
Yes
—
Format of the audio content. URL: audio URL address. RAW: base64-encoded audio data.
content
string
Yes
—
Audio content — either a URL address or base64-encoded data. Base64 data limit: 15 MB. Only PCM, WAV, and MP3 formats are supported for base64. PCM format must use 16-bit little-endian encoding. PCM and WAV formats are recommended.
btId
string
Yes
128
Client-side unique audio identifier used to correlate callback results. Echoed back in both the acknowledgement and the callback payload. Must be unique per request (truncated if the 128-character limit is exceeded).
callback
string
Yes
—
HTTP callback URL. DeepCleer sends the full moderation result to this URL once processing completes. See Callback Mechanism for delivery semantics.
translationTargetLang
string
No
—
Translation target language. When provided, the transcribed audio text is translated into this language and returned in the callback. Contact DeepCleer to enable this feature. zh: Chinese. en: English.
Combine multiple types with underscores (for example, POLITY_EROTIC_MOAN). Recommended starter combination: POLITY_EROTIC_MOAN_ADVERT.
Value
Description
AUDIOPOLITICAL
Top-leader voiceprint detection
POLITY
Political content detection
EROTIC
Pornographic content detection
ADVERT
Advertising detection
ADLAW
Advertising law violation detection
BAN
Prohibited content detection
VIOLENT
Violence & terrorism detection
ANTHEN
National anthem detection
MOAN
Moaning detection
DIRTY
Abusive language detection
BANEDAUDIO
Prohibited songs detection
COPYRIGHTSONGS
Copyrighted songs detection
Business Detection Types
Combine multiple types with underscores. To detect timbre, singing, or language, GENDER must also be included.
Value
Description
SING
Singing detection
LANGUAGE
Language detection
GENDER
Gender detection
TIMBRE
Voice timbre detection
VOICE
Voice attributes
MINOR
Minor detection
AUDIOSCENE
Audio scene detection
AGE
Age detection
data Object
Parameter
Type
Required
Max Length
Description
retryUrl
string
No
—
Fallback audio URL. Used when the primary content URL fails to download. Only applies when contentType is URL.
tokenId
string
Yes
64
User account identifier. Recommended to pass the user UID (can be encrypted). Used for behavioral risk detection (spam, advertising, etc.). Must be an alphanumeric string with underscores and hyphens, up to 64 characters.
receiveTokenId
string
Conditional
64
Message receiver's tokenId for private-chat scenarios. Alphanumeric with underscores and hyphens, up to 64 characters. Required when eventId is message.
deviceId
string
No
128
DeepCleer device fingerprint identifier generated by the DeepCleer SDK. Used for device-level behavior analysis.
ip
string
No
64
Public IPv4 or IPv6 address of the user who submitted the audio. Used for IP-based behavior analysis.
dataId
string
No
—
Custom data identifier. Passed through by your application for your own record-keeping.
level
int32
No
—
User level for configuring different interception strategies. See User Levels.
gender
int32
No
—
User gender. 0: male. 1: female. 2: unknown.
nickname
string
No
150
User nickname. Max 150 characters (truncated if exceeded).
room
string
No
64
Live room / game room ID. When the scenario is a live audio room, strongly recommended to provide so per-room strategies can be applied and context recognition is enabled.
lang
string
No
—
Audio language used for ASR (audio-to-text transcription). Default: zh. For international audio that cannot be categorized, use auto for automatic language detection. Distinct from acceptLang (which controls the language of the returned labels). See Supported Languages.
formatInfo
string
Conditional
—
Audio data format. Required when contentType is RAW. Values: pcm, wav, mp3.
rate
int32
Conditional
—
Audio sample rate, in Hz. Required when formatInfo is pcm. Range: 8000–32000.
track
int32
Conditional
—
Audio channel count. Required when formatInfo is pcm.1: mono. 2: stereo.
returnAllText
int32
No
—
Controls which audio segments are returned in audioDetail. 0 (default): return only risk segments (REVIEW and REJECT). 1: return all segments (including PASS). This only controls segment-level output; it does not affect the overall result.
audioDetectStep
int32
No
—
Interval moderation step size (1–36). A value of 1 skips one 10-second audio segment between moderated segments, 2 skips two, and so on. When omitted, all audio content is moderated. When enabled, it is recommended to also set returnAllText to 1 and consume the ASR result from each moderated segment.
extra
object
No
—
Auxiliary parameters.
extra.passThrough
object
No
1024
Client pass-through field. DeepCleer does not process this field; all content under it is echoed back in the callback payload.
Supported Languages
Value
Language
en
English
zh
Chinese (default)
ar
Arabic
hi
Hindi
es
Spanish
fr
French
ru
Russian
pt
Portuguese
id
Indonesian
de
German
ja
Japanese
tr
Turkish
vi
Vietnamese
it
Italian
th
Thai
tl
Filipino
ko
Korean
ms
Malay
auto
Automatic language detection (contact DeepCleer to enable)
User Levels
Value
Description
0
Lowest-level user (e.g., newly registered, completely inactive, or level-0 users)
1
Lower-level user (e.g., low activity or low-level users)
2
Mid-level user (e.g., moderately active or mid-level users)
3
Higher-level user (e.g., highly active or high-level users)
4
Highest-level user (e.g., paying users, VIP users)
Synchronous Response
The synchronous response is an acknowledgement only. It confirms whether the request was accepted for asynchronous processing. The actual moderation result is delivered later to your callback URL — see Callback Parameters.
Response Parameters
Parameter
Type
Required
Description
requestId
string
Yes
Unique DeepCleer request identifier. Strongly recommended to save for troubleshooting.
Overall disposition recommendation. PASS: normal (allow). REVIEW: suspicious (route to manual review). REJECT: violation (block). During initial integration, we recommend tuning your interception thresholds before using this value for hard blocks.
Legacy audio tags — gender, timbre, and singing detection. New integrations should use businessLabels inside audioDetail instead. See audioTags Object.
requestParams
object
Yes
Echo of all fields submitted under data in the original request. Useful for correlating callbacks with the original payload.
Level 1 risk label. Returns normal when riskLevel is PASS.
riskLabel2
string
Yes
Level 2 risk label. Empty when riskLevel is PASS.
riskLabel3
string
Yes
Level 3 risk label. Empty when riskLevel is PASS.
riskDescription
string
Yes
Risk description. Returns "Normal" when riskLevel is PASS. Format: "Level 1: Level 2: Level 3". For reference only — do not use for programmatic logic.
Note on field casing:audioStarttime and audioEndtime are preserved exactly as returned on the wire (lowercase t). Flag for v5 cleanup alongside other inconsistent casings such as face_num and b_advertise_risk_tokenid.
Segment allLabels
Each element in the allLabels array:
Parameter
Type
Required
Description
riskLabel1
string
Yes
Level 1 risk label.
riskLabel2
string
Yes
Level 2 risk label.
riskLabel3
string
Yes
Level 3 risk label.
riskDescription
string
Yes
Risk description. For reference only — do not use for programmatic logic.
Legacy compatibility field. New integrations should consume businessLabels inside audioDetail instead.
Parameter
Type
Required
Description
gender
object
No
Gender detection result.
gender.label
string
Yes
Gender label name (e.g., Male, Female).
gender.probability
int32
No
Gender probability on a legacy 0–100 scale (higher values indicate greater likelihood). Note that other probability fields in this API use the modern 0–1 scale.
timbre
array
No
Voice timbre detection results. Each element contains label and probability. Possible label values: Uncle, Young Man, Boy, Elderly Man, Queen, Mature Woman, Young Woman, Loli, Middle-aged Woman, Male, Female, No Voice.
language
array
No
Language detection results. Each element contains label (see Language Labels) and probability (modern responses) or confidence (legacy responses).
Language Labels
Label
Description
0
Mandarin Chinese
1
English
2
Cantonese
3
Tibetan
4
Uyghur
5
Mongolian
6
Korean
-1
Other languages
auxInfo Object
Parameter
Type
Required
Description
errorCode
int32
No
Processing-stage error code. 2003: audio download failure. 2007: no valid audio data to moderate.
passThrough
object
No
Client pass-through field. Same value as data.extra.passThrough in the request.
Token Labels
Both tokenProfileLabels and tokenRiskLabels share the same structure:
Parameter
Type
Required
Description
label1
string
No
Level 1 label.
label2
string
No
Level 2 label.
label3
string
No
Level 3 label.
description
string
No
Label description. For reference only — do not use for programmatic logic.
timestamp
int64
No
Label timestamp. 13-digit Unix timestamp in milliseconds (UTC).