Async API
API Description
Asynchronous audio moderation API. Submits one audio clip per request and receives an immediate acknowledgement with a btId; the full moderation result is delivered later via callback to the URL you provide. Detects regulatory risks in audio content — including political content, pornography, advertising, and violence & terrorism — and can additionally identify minors, voice timbre, language, and other content based on the business scenarios enabled on your account.
Requirements
| Item | Specification |
|---|---|
| Protocol | HTTP or HTTPS |
| Method | POST |
| Encoding | UTF-8 |
| Format | All request and response parameters use JSON |
Audio Requirements
| Item | Specification |
|---|---|
| Audio types | URL, BASE64 |
| Supported formats | WAV, MP3, AAC, AMR, 3GP, M4A, WMA, OGG, APE, FLAC, ALAC, WAVPACK, SILK_V3, etc. |
| Size limit | ≤ 18 MB |
Host audio on a highly available CDN before submitting the URL. DeepCleer fetches each clip directly from the origin server you provide, so any outage or single point of failure on your side will cause fetch failures — and an audio clip that can't be fetched can't be moderated.
Timeout Suggestion
- Asynchronous request (this endpoint): recommended timeout of 5 seconds for the acknowledgement call
- Callback delivery: results arrive separately; keep your callback handler fast (< 2 seconds) so DeepCleer doesn't retry unnecessarily
The initial acknowledgement returns almost immediately once the request is validated. End-to-end moderation time is dominated by how long DeepCleer takes to fetch your audio and run the requested detection types, so keep your hosting fast and reliable. Actual duration varies with the request
typeand the audio size.
Callback Mechanism
Results are delivered to the callback URL you supply in the request. When DeepCleer calls your endpoint:
- The request body is a JSON payload matching Callback Parameters.
- Your endpoint must respond with HTTP
200 OKwithin a few seconds. Any non-2xx response or timeout is treated as a delivery failure. - On failure, DeepCleer retries with exponential backoff. After repeated failures the result is dropped; we recommend monitoring your callback handler and reprocessing via
btIdif needed. - Your endpoint should be idempotent on
btId— the same result may be delivered more than once if an earlier delivery succeeded but the response was lost in transit.
Request
Request URL
| Cluster | Request URL |
|---|---|
| Silicon Valley | http://api-audio-gg.fengkongcloud.com/audio/v4 |
| Singapore | http://api-audio-xjp.fengkongcloud.com/audio/v4 |
Request Parameters
| Parameter | Type | Required | Max Length | Description |
|---|---|---|---|---|
accessKey | string | Yes | 20 | API authentication key. The default accessKey is sent in your onboarding email. |
appId | string | Yes | 64 | Application identifier, such as web for your web application or app for your mobile app. The default appId is sent in your onboarding email. Contact DeepCleer if you need a new appId. |
eventId | string | Yes | 64 | Event identifier used to distinguish moderation scenarios in your application, such as voiceMessage for chat voice messages or liveAudio for livestream audio. The default eventId is sent in your onboarding email. Contact DeepCleer if you need a new eventId. |
type | string | Conditional | 64 | Risk detection types to run. Either type or businessType must be provided; you can also provide both. Multiple values can be combined with underscores (for example, POLITY_EROTIC_MOAN_ADVERT). See Risk Detection Types for the full catalog. |
businessType | string | Conditional | 128 | Business detection types to run — your organization's custom moderation categories, configured with DeepCleer separately from the built-in type catalog. Either type or businessType must be provided; you can also provide both. Multiple values can be combined with underscores. See Business Detection Types for the full catalog. |
contentType | string | Yes | — | Format of the audio content. URL: audio URL address. RAW: base64-encoded audio data. |
content | string | Yes | — | Audio content — either a URL address or base64-encoded data. Base64 data limit: 15 MB. Only PCM, WAV, and MP3 formats are supported for base64. PCM format must use 16-bit little-endian encoding. PCM and WAV formats are recommended. |
btId | string | Yes | 128 | Client-side unique audio identifier used to correlate callback results. Echoed back in both the acknowledgement and the callback payload. Must be unique per request (truncated if the 128-character limit is exceeded). |
callback | string | Yes | — | HTTP callback URL. DeepCleer sends the full moderation result to this URL once processing completes. See Callback Mechanism for delivery semantics. |
translationTargetLang | string | No | — | Translation target language. When provided, the transcribed audio text is translated into this language and returned in the callback. Contact DeepCleer to enable this feature. zh: Chinese. en: English. |
acceptLang | string | Yes | — | Language for returned labels. Set en by default. |
data | object | Yes | 1 MB | Request data content. See data Object. |
Risk Detection Types
Combine multiple types with underscores (for example, POLITY_EROTIC_MOAN). Recommended starter combination: POLITY_EROTIC_MOAN_ADVERT.
| Value | Description |
|---|---|
AUDIOPOLITICAL | Top-leader voiceprint detection |
POLITY | Political content detection |
EROTIC | Pornographic content detection |
ADVERT | Advertising detection |
ADLAW | Advertising law violation detection |
BAN | Prohibited content detection |
VIOLENT | Violence & terrorism detection |
ANTHEN | National anthem detection |
MOAN | Moaning detection |
DIRTY | Abusive language detection |
BANEDAUDIO | Prohibited songs detection |
COPYRIGHTSONGS | Copyrighted songs detection |
Business Detection Types
Combine multiple types with underscores. To detect timbre, singing, or language, GENDER must also be included.
| Value | Description |
|---|---|
SING | Singing detection |
LANGUAGE | Language detection |
GENDER | Gender detection |
TIMBRE | Voice timbre detection |
VOICE | Voice attributes |
MINOR | Minor detection |
AUDIOSCENE | Audio scene detection |
AGE | Age detection |
data Object
data Object| Parameter | Type | Required | Max Length | Description |
|---|---|---|---|---|
retryUrl | string | No | — | Fallback audio URL. Used when the primary content URL fails to download. Only applies when contentType is URL. |
tokenId | string | Yes | 64 | User account identifier. Recommended to pass the user UID (can be encrypted). Used for behavioral risk detection (spam, advertising, etc.). Must be an alphanumeric string with underscores and hyphens, up to 64 characters. |
receiveTokenId | string | Conditional | 64 | Message receiver's tokenId for private-chat scenarios. Alphanumeric with underscores and hyphens, up to 64 characters. Required when eventId is message. |
deviceId | string | No | 128 | DeepCleer device fingerprint identifier generated by the DeepCleer SDK. Used for device-level behavior analysis. |
ip | string | No | 64 | Public IPv4 or IPv6 address of the user who submitted the audio. Used for IP-based behavior analysis. |
dataId | string | No | — | Custom data identifier. Passed through by your application for your own record-keeping. |
level | int32 | No | — | User level for configuring different interception strategies. See User Levels. |
gender | int32 | No | — | User gender. 0: male. 1: female. 2: unknown. |
nickname | string | No | 150 | User nickname. Max 150 characters (truncated if exceeded). |
room | string | No | 64 | Live room / game room ID. When the scenario is a live audio room, strongly recommended to provide so per-room strategies can be applied and context recognition is enabled. |
lang | string | No | — | Audio language used for ASR (audio-to-text transcription). Default: zh. For international audio that cannot be categorized, use auto for automatic language detection. Distinct from acceptLang (which controls the language of the returned labels). See Supported Languages. |
formatInfo | string | Conditional | — | Audio data format. Required when contentType is RAW. Values: pcm, wav, mp3. |
rate | int32 | Conditional | — | Audio sample rate, in Hz. Required when formatInfo is pcm. Range: 8000–32000. |
track | int32 | Conditional | — | Audio channel count. Required when formatInfo is pcm. 1: mono. 2: stereo. |
returnAllText | int32 | No | — | Controls which audio segments are returned in audioDetail. 0 (default): return only risk segments (REVIEW and REJECT). 1: return all segments (including PASS). This only controls segment-level output; it does not affect the overall result. |
audioDetectStep | int32 | No | — | Interval moderation step size (1–36). A value of 1 skips one 10-second audio segment between moderated segments, 2 skips two, and so on. When omitted, all audio content is moderated. When enabled, it is recommended to also set returnAllText to 1 and consume the ASR result from each moderated segment. |
extra | object | No | — | Auxiliary parameters. |
extra.passThrough | object | No | 1024 | Client pass-through field. DeepCleer does not process this field; all content under it is echoed back in the callback payload. |
Supported Languages
| Value | Language |
|---|---|
en | English |
zh | Chinese (default) |
ar | Arabic |
hi | Hindi |
es | Spanish |
fr | French |
ru | Russian |
pt | Portuguese |
id | Indonesian |
de | German |
ja | Japanese |
tr | Turkish |
vi | Vietnamese |
it | Italian |
th | Thai |
tl | Filipino |
ko | Korean |
ms | Malay |
auto | Automatic language detection (contact DeepCleer to enable) |
User Levels
| Value | Description |
|---|---|
0 | Lowest-level user (e.g., newly registered, completely inactive, or level-0 users) |
1 | Lower-level user (e.g., low activity or low-level users) |
2 | Mid-level user (e.g., moderately active or mid-level users) |
3 | Higher-level user (e.g., highly active or high-level users) |
4 | Highest-level user (e.g., paying users, VIP users) |
Synchronous Response
The synchronous response is an acknowledgement only. It confirms whether the request was accepted for asynchronous processing. The actual moderation result is delivered later to your callback URL — see Callback Parameters.
Response Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
requestId | string | Yes | Unique DeepCleer request identifier. Strongly recommended to save for troubleshooting. |
code | int32 | Yes | Response code. See Response Codes. |
message | string | Yes | Response message corresponding to the code. |
btId | string | Yes | Client-side audio identifier echoed back. Returned when code is 1100. |
Response Codes
| Code | Message |
|---|---|
1100 | Success |
1901 | QPS limit exceeded |
1902 | Invalid parameters |
1903 | Service failure |
1904 | Download failure |
1905 | Decoding failure |
9101 | Unauthorized operation |
Callback Parameters
The callback payload delivers the full moderation result for one audio clip.
Parameters other than
code,message, andrequestIdare only guaranteed to be returned whencodeis1100.
| Parameter | Type | Required | Description |
|---|---|---|---|
requestId | string | Yes | Unique DeepCleer request identifier for this audio clip. |
btId | string | Yes | Client-side audio identifier echoed back from the request. |
code | int32 | Yes | Response code. See Callback Response Codes. |
message | string | Yes | Response message corresponding to the code. |
riskLevel | string | Yes | Overall disposition recommendation. PASS: normal (allow). REVIEW: suspicious (route to manual review). REJECT: violation (block). During initial integration, we recommend tuning your interception thresholds before using this value for hard blocks. |
audioText | string | Yes | Full audio-to-text transcription result. |
audioTime | int32 | Yes | Total audio duration in seconds. |
audioDetail | array | Yes | Per-segment moderation results. See audioDetail Array. |
audioTags | object | No | Legacy audio tags — gender, timbre, and singing detection. New integrations should use businessLabels inside audioDetail instead. See audioTags Object. |
requestParams | object | Yes | Echo of all fields submitted under data in the original request. Useful for correlating callbacks with the original payload. |
auxInfo | object | No | Auxiliary information. See auxInfo Object. |
tokenProfileLabels | array | No | Account attribute labels. Returned only when tokenId is provided and the labeling service is enabled. See Token Labels. |
tokenRiskLabels | array | No | Account risk labels. Returned only when tokenId is provided and the labeling service is enabled. See Token Labels. |
Callback Response Codes
| Code | Message |
|---|---|
1100 | Success |
1101 | Processing |
1901 | QPS limit exceeded |
1902 | Invalid parameters |
1903 | Service failure |
1904 | Download failure |
1905 | Decoding failure |
9100 | Insufficient balance |
9101 | Unauthorized operation |
audioDetail Array
audioDetail ArrayEach element represents one audio segment:
| Parameter | Type | Required | Description |
|---|---|---|---|
requestId | string | Yes | Unique identifier for this audio segment. |
audioStarttime | float | Yes | Segment start time relative to the audio beginning, in seconds. |
audioEndtime | float | Yes | Segment end time relative to the audio beginning, in seconds. |
audioUrl | string | Yes | Audio segment URL (MP3 format). |
riskLevel | string | Yes | Segment risk level. PASS: normal. REVIEW: suspicious. REJECT: violation. |
riskLabel1 | string | Yes | Level 1 risk label. Returns normal when riskLevel is PASS. |
riskLabel2 | string | Yes | Level 2 risk label. Empty when riskLevel is PASS. |
riskLabel3 | string | Yes | Level 3 risk label. Empty when riskLevel is PASS. |
riskDescription | string | Yes | Risk description. Returns "Normal" when riskLevel is PASS. Format: "Level 1: Level 2: Level 3". For reference only — do not use for programmatic logic. |
riskDetail | object | No | Risk detail for this segment. See Segment riskDetail. |
allLabels | array | No | All risk labels matched in this segment. See Segment allLabels. |
businessLabels | array | No | All business labels matched in this segment. See Segment businessLabels. |
Note on field casing:
audioStarttimeandaudioEndtimeare preserved exactly as returned on the wire (lowercaset). Flag for v5 cleanup alongside other inconsistent casings such asface_numandb_advertise_risk_tokenid.
Segment allLabels
allLabelsEach element in the allLabels array:
| Parameter | Type | Required | Description |
|---|---|---|---|
riskLabel1 | string | Yes | Level 1 risk label. |
riskLabel2 | string | Yes | Level 2 risk label. |
riskLabel3 | string | Yes | Level 3 risk label. |
riskDescription | string | Yes | Risk description. For reference only — do not use for programmatic logic. |
riskLevel | string | Yes | Risk level: PASS, REVIEW, or REJECT. |
probability | float | No | Confidence score (0–1). Higher values indicate greater confidence. |
riskDetail | object | No | Risk detail. Same structure as Segment riskDetail. |
Segment riskDetail
riskDetail| Parameter | Type | Required | Description |
|---|---|---|---|
audioText | string | No | Audio-to-text transcription result for this segment. |
riskSource | int32 | No | Risk source: 1000 (no risk), 1001 (text risk), 1003 (audio risk). |
matchedLists | array | No | Matched custom list information. Returned only when a custom list is hit. See Matched Lists. |
riskSegments | array | No | High-risk content segments. Present when political, terrorism, prohibited, competitive, or advertising-law content is detected. See Risk Segments. |
Matched Lists
| Parameter | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Name of the matched list. |
words | array | Yes | Sensitive word details. |
words[].word | string | Yes | The matched sensitive word. |
words[].position | array | Yes | Position of the sensitive word. |
Risk Segments
| Parameter | Type | Required | Description |
|---|---|---|---|
segment | string | No | High-risk content segment text. |
position | array | No | Position of the segment (0-indexed). |
Segment businessLabels
businessLabelsEach element in the businessLabels array:
| Parameter | Type | Required | Description |
|---|---|---|---|
businessLabel1 | string | Yes | Level 1 business label. |
businessLabel2 | string | Yes | Level 2 business label. |
businessLabel3 | string | Yes | Level 3 business label. |
businessDescription | string | Yes | Business label description. Format: "Level 1: Level 2: Level 3". |
confidenceLevel | int32 | No | Confidence level (0–2). Higher values indicate greater confidence. |
probability | float | No | Confidence score (0–1). |
businessDetail | object | No | Detailed information. Reserved field. |
audioTags Object
audioTags ObjectLegacy compatibility field. New integrations should consume
businessLabelsinsideaudioDetailinstead.
| Parameter | Type | Required | Description |
|---|---|---|---|
gender | object | No | Gender detection result. |
gender.label | string | Yes | Gender label name (e.g., Male, Female). |
gender.probability | int32 | No | Gender probability on a legacy 0–100 scale (higher values indicate greater likelihood). Note that other probability fields in this API use the modern 0–1 scale. |
timbre | array | No | Voice timbre detection results. Each element contains label and probability. Possible label values: Uncle, Young Man, Boy, Elderly Man, Queen, Mature Woman, Young Woman, Loli, Middle-aged Woman, Male, Female, No Voice. |
language | array | No | Language detection results. Each element contains label (see Language Labels) and probability (modern responses) or confidence (legacy responses). |
Language Labels
| Label | Description |
|---|---|
0 | Mandarin Chinese |
1 | English |
2 | Cantonese |
3 | Tibetan |
4 | Uyghur |
5 | Mongolian |
6 | Korean |
-1 | Other languages |
auxInfo Object
auxInfo Object| Parameter | Type | Required | Description |
|---|---|---|---|
errorCode | int32 | No | Processing-stage error code. 2003: audio download failure. 2007: no valid audio data to moderate. |
passThrough | object | No | Client pass-through field. Same value as data.extra.passThrough in the request. |
Token Labels
Both tokenProfileLabels and tokenRiskLabels share the same structure:
| Parameter | Type | Required | Description |
|---|---|---|---|
label1 | string | No | Level 1 label. |
label2 | string | No | Level 2 label. |
label3 | string | No | Level 3 label. |
description | string | No | Label description. For reference only — do not use for programmatic logic. |
timestamp | int64 | No | Label timestamp. 13-digit Unix timestamp in milliseconds (UTC). |
Examples
Request Example
{
"accessKey": "YOUR_ACCESS_KEY",
"appId": "default",
"eventId": "default",
"type": "POLITY_EROTIC_ADVERT_MOAN",
"businessType": "GENDER_TIMBRE_SING_LANGUAGE",
"btId": "test1",
"contentType": "URL",
"content": "https://example.com/audio/sample.mp3",
"callback": "http://www.example.com/callback",
"data": {
"returnAllText": 1,
"tokenId": "token-short"
}
}Response Example
{
"code": 1100,
"message": "Success",
"requestId": "6a9cb980346dfea41111656a514e9109",
"btId": "1604311839040"
}Callback Example
{
"requestId": "6a9cb980346dfea41111656a514e9109",
"btId": "1604311839040",
"code": 1100,
"message": "Success",
"riskLevel": "PASS",
"audioDetail": [
{
"requestId": "6a9cb980346dfea41111656a514e9109_a0000",
"audioStarttime": 0,
"audioEndtime": 10,
"audioUrl": "http://example.com/audio_segment_a0000.mp3",
"businessLabels": [
{
"businessDescription": "Singing: Singing: Singing",
"businessDetail": {},
"businessLabel1": "sing",
"businessLabel2": "changge",
"businessLabel3": "changge",
"confidenceLevel": 2,
"probability": 0.858334402569294
}
],
"allLabels": [],
"riskLevel": "PASS",
"riskLabel1": "normal",
"riskLabel2": "",
"riskLabel3": "",
"riskDescription": "Normal",
"riskDetail": {
"audioText": ""
}
},
{
"requestId": "6a9cb980346dfea41111656a514e9109_a0001",
"audioStarttime": 10,
"audioEndtime": 20,
"audioUrl": "http://example.com/audio_segment_a0001.mp3",
"riskLevel": "PASS",
"riskLabel1": "normal",
"riskLabel2": "",
"riskLabel3": "",
"riskDescription": "Normal",
"riskDetail": {
"audioText": ""
}
},
{
"requestId": "6a9cb980346dfea41111656a514e9109_a0002",
"audioStarttime": 20,
"audioEndtime": 30,
"audioUrl": "http://example.com/audio_segment_a0002.mp3",
"riskLevel": "PASS",
"riskLabel1": "normal",
"riskLabel2": "",
"riskLabel3": "",
"riskDescription": "Normal",
"riskDetail": {
"audioText": ""
}
}
],
"audioTags": {
"gender": {
"label": "Female",
"probability": 95
},
"language": [
{
"confidence": 0,
"label": 2
},
{
"confidence": 99,
"label": 0
},
{
"confidence": 0,
"label": 1
}
],
"song": 0,
"timbre": [
{
"label": "Female",
"probability": 95
},
{
"label": "Queen",
"probability": 12
},
{
"label": "Mature Woman",
"probability": 37
},
{
"label": "Young Woman",
"probability": 56
},
{
"label": "Middle-aged Woman",
"probability": 67
},
{
"label": "Loli",
"probability": 24
}
]
}
}Updated 7 days ago