Sync API
API Description
Synchronous audio moderation API. Submits one audio clip per request and returns the moderation result directly in the response. Detects regulatory risks in audio content — including political content, pornography, advertising, and violence & terrorism — and can additionally identify minors, voice timbre, language, and other content based on the business scenarios enabled on your account.
Requirements
| Item | Specification |
|---|---|
| Protocol | HTTP or HTTPS |
| Method | POST |
| Encoding | UTF-8 |
| Format | All request and response parameters use JSON |
Audio Requirements
| Item | Specification |
|---|---|
| Audio types | URL, BASE64 |
| Supported formats | WAV, MP3, AAC, AMR, 3GP, M4A, WMA, OGG, APE, FLAC, ALAC, WAVPACK, SILK_V3, etc. |
| Duration limit | ≤ 60 seconds |
| Size limit | ≤ 18 MB |
Host audio on a highly available CDN before submitting the URL. DeepCleer fetches each clip directly from the origin server you provide, so any outage or single point of failure on your side will cause fetch failures — and an audio clip that can't be fetched can't be moderated.
Timeout Suggestion
- Synchronous request: recommended timeout of 10 seconds
- Asynchronous request: recommended timeout of 5 seconds
End-to-end response time is dominated by how long DeepCleer takes to fetch your audio — keep your hosting fast and reliable. Once the clip is in hand, processing time varies with the request
typeand the audio size.
Request
Request URL
| Cluster | Request URL |
|---|---|
| Singapore | http://api-audio-xjp.fengkongcloud.com/audiomessage/v4 |
Request Parameters
| Parameter | Type | Required | Max Length | Description |
|---|---|---|---|---|
accessKey | string | Yes | 20 | API authentication key. The default accessKey is sent in your onboarding email. |
appId | string | Yes | 64 | Application identifier, such as web for your web application or app for your mobile app. The default appId is sent in your onboarding email. Contact DeepCleer if you need a new appId. |
eventId | string | Yes | 64 | Event identifier used to distinguish moderation scenarios in your application, such as voiceMessage for chat voice messages or liveAudio for livestream audio. The default eventId is sent in your onboarding email. Contact DeepCleer if you need a new eventId. |
type | string | Conditional | 64 | Risk detection types to run. Either type or businessType must be provided; you can also provide both. Multiple values can be combined with underscores (for example, POLITY_EROTIC_MOAN_ADVERT). See Risk Detection Types for the full catalog. |
businessType | string | Conditional | 128 | Business detection types to run — your organization's custom moderation categories, configured with DeepCleer separately from the built-in type catalog. Either type or businessType must be provided; you can also provide both. Multiple values can be combined with underscores. See Business Detection Types for the full catalog. |
contentType | string | Yes | — | Format of the audio payload supplied in content. URL: a publicly fetchable audio URL. RAW: base64-encoded audio data. |
content | string | Yes | — | Audio content — either a URL or base64-encoded data. Base64 payload limit: 15 MB; only PCM, WAV, and MP3 formats are accepted for base64. PCM must use 16-bit little-endian encoding. PCM and WAV are recommended. |
btId | string | Yes | 128 | Client-side unique identifier for this audio clip. Echoed back in the response so you can correlate inputs and outputs. Truncated if it exceeds 128 characters; must not be reused across concurrent requests. |
acceptLang | string | Yes | — | Language of the labels and descriptions in the response. Set en by default. |
data | object | Yes | 1 MB | Request payload. See data Object. |
Risk Detection Types
Combine multiple values with underscores (for example, POLITY_EROTIC_MOAN). The recommended starting set for general moderation is POLITY_EROTIC_MOAN_ADVERT.
| Value | Description |
|---|---|
AUDIOPOLITICAL | Top-leader voiceprint detection |
POLITY | Political content detection |
EROTIC | Pornographic content detection |
ADVERT | Advertising detection |
BAN | Prohibited content detection |
VIOLENT | Violence & terrorism detection |
ANTHEN | National anthem detection |
MOAN | Moaning detection |
DIRTY | Abusive language detection |
BANEDAUDIO | Prohibited songs detection |
COPYRIGHTSONGS | Copyrighted songs detection |
Business Detection Types
Combine multiple values with underscores. To detect TIMBRE, SING, or LANGUAGE, GENDER must also be included in the same request.
| Value | Description |
|---|---|
SING | Singing detection |
LANGUAGE | Language detection |
GENDER | Gender detection |
TIMBRE | Voice timbre detection |
VOICE | Voice attributes |
MINOR | Minor detection |
AUDIOSCENE | Audio scene detection |
AGE | Age detection |
data Object
data Object| Parameter | Type | Required | Max Length | Description |
|---|---|---|---|---|
tokenId | string | No | 64 | Stable identifier for the end user, typically your internal user UID (an encrypted UID is fine). Used for behavioral-risk signals such as spam and repeat-offender detection. Alphanumeric with underscores and hyphens, up to 64 characters. |
receiveTokenId | string | Conditional | 64 | tokenId of the message recipient in a one-to-one chat. Alphanumeric with underscores and hyphens, up to 64 characters. Required when eventId is message. |
deviceId | string | No | 128 | Device-fingerprint identifier issued by DeepCleer. Generated by the DeepCleer SDK on the end user's device. |
ip | string | No | 64 | Public IP address of the user who submitted the audio. Accepts IPv4 or IPv6. |
dataId | string | No | 128 | Client-side identifier attached to the moderation call. DeepCleer echoes it back with the result, letting you correlate your source record (a message ID, voice-note ID, review ID, etc.) with the moderation verdict — typically used to look up historical decisions in your own database or in the DeepCleer console. |
level | int32 | No | — | User level. See User Levels. |
gender | int32 | No | — | User's gender. 0: male. 1: female. 2: unknown. |
formatInfo | string | Conditional | — | Audio data format. Required when contentType is RAW. Accepted values: pcm, wav, mp3. |
rate | int32 | Conditional | — | Audio sample rate, in Hz. Required when formatInfo is pcm. Range: 8000–32000. |
track | int32 | Conditional | — | Audio channel count. Required when formatInfo is pcm. 1: mono. 2: stereo. |
returnAllText | int32 | No | — | Controls which audio segments are included in audioDetail. 0 (default): return only risk-bearing segments (REVIEW and REJECT). 1: return all segments, including PASS. This only affects segment-level results in audioDetail; it does not change the overall riskLevel. |
User Levels
| Value | Description |
|---|---|
0 | Lowest-level user (e.g., newly registered, completely inactive, or level-0 users) |
1 | Lower-level user (e.g., low activity or low-level users) |
2 | Mid-level user (e.g., moderately active or mid-level users) |
3 | Higher-level user (e.g., highly active or high-level users) |
4 | Highest-level user (e.g., paying users, VIP users) |
Response
Response Parameters
Fields other than
code,message, andrequestIdare guaranteed to be returned only whencodeis1100.
| Parameter | Type | Required | Description |
|---|---|---|---|
requestId | string | Yes | Unique DeepCleer request identifier. Strongly recommended to save for troubleshooting and optimization. |
code | int32 | Yes | Response code. See Response Codes. |
message | string | Yes | Human-readable message corresponding to code. |
detail | object | No | Detailed moderation result. Returned when code is 1100. See detail Object. |
Response Codes
| Code | Description |
|---|---|
1100 | Success |
1101 | Processing |
1901 | QPS limit exceeded |
1902 | Invalid parameters |
1903 | Service failure |
1904 | Download failure |
1905 | Decoding failure |
9100 | Insufficient balance |
9101 | Unauthorized operation |
detail Object
detail Object| Parameter | Type | Required | Description |
|---|---|---|---|
riskLevel | string | Yes | Overall disposition recommendation. PASS: normal (allow). REVIEW: suspicious (manual review recommended). REJECT: violation (reject). During initial integration, do not wire results directly into automated blocking — tune your interception thresholds against historical traffic first. |
audioText | string | Yes | Full audio-to-text transcription of the clip. |
audioTime | int32 | Yes | Total audio duration, in seconds. |
audioDetail | array | Yes | Per-segment moderation results. See audioDetail Array. |
audioTags | object | No | Audio attribute tags including gender, timbre, and language. Legacy compatibility field — for new integrations, read these signals from businessLabels inside audioDetail instead. See audioTags Object. |
audioDetail Array
audioDetail ArrayEach element is the moderation result for one audio segment:
| Parameter | Type | Required | Description |
|---|---|---|---|
requestId | string | Yes | Unique DeepCleer identifier for this audio segment. |
audioStarttime | float | Yes | Segment start time relative to the beginning of the clip, in seconds. |
audioEndtime | float | Yes | Segment end time relative to the beginning of the clip, in seconds. |
audioUrl | string | Yes | URL of the segment audio (MP3 format). |
riskLevel | string | Yes | Segment disposition. PASS: normal (allow). REVIEW: suspicious (manual review recommended). REJECT: violation (reject). |
riskLabel1 | string | Yes | Level-1 risk label. Returns normal when riskLevel is PASS. |
riskLabel2 | string | Yes | Level-2 risk label. Empty when riskLevel is PASS. |
riskLabel3 | string | Yes | Level-3 risk label. Empty when riskLevel is PASS. |
riskDescription | string | Yes | Risk description. Returns Normal when riskLevel is PASS. Format: "Level 1: Level 2: Level 3". Human-readable summary intended for display only — do not parse for programmatic logic; branch on riskLabel1 / riskLabel2 / riskLabel3 instead. |
riskDetail | object | No | Detail backing this segment's risk decision. See Segment riskDetail. |
allLabels | array | No | All risk labels detected on this segment. See Segment allLabels. |
businessLabels | array | No | All business labels detected on this segment. See Segment businessLabels. |
Segment allLabels
allLabelsEach element in the allLabels array:
| Parameter | Type | Required | Description |
|---|---|---|---|
riskLabel1 | string | Yes | Level-1 risk label. |
riskLabel2 | string | Yes | Level-2 risk label. |
riskLabel3 | string | Yes | Level-3 risk label. |
riskDescription | string | Yes | Risk description. Display only — do not parse for programmatic logic. |
riskLevel | string | Yes | Disposition for this label: PASS, REVIEW, or REJECT. |
probability | float | No | Confidence score in the range 0–1. Higher values indicate greater confidence. |
riskDetail | object | No | Detail backing this label. See Segment riskDetail. |
Segment riskDetail
riskDetail| Parameter | Type | Required | Description |
|---|---|---|---|
audioText | string | No | Audio-to-text transcription for this segment. |
riskSource | int32 | No | Where the risk was identified. 1000: no risk. 1001: text risk (transcribed text). 1003: audio risk (acoustic content). |
matchedLists | array | No | Custom-list match information. Returned only when a custom keyword list is hit. See Matched Lists. |
riskSegments | array | No | High-risk content segments inside the transcription. See Risk Segments. |
Matched Lists
| Parameter | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Name of the matched custom list. |
words | array | Yes | Sensitive-word matches within the list. |
words[].word | string | Yes | The matched sensitive word. |
words[].position | array | Yes | Position of the matched word. |
Risk Segments
| Parameter | Type | Required | Description |
|---|---|---|---|
segment | string | No | The high-risk content segment. |
position | array | No | Position of the segment in the source text (0-indexed). |
Segment businessLabels
businessLabelsEach element in the businessLabels array:
| Parameter | Type | Required | Description |
|---|---|---|---|
businessLabel1 | string | Yes | Level-1 business label. |
businessLabel2 | string | Yes | Level-2 business label. |
businessLabel3 | string | Yes | Level-3 business label. |
businessDescription | string | Yes | Business label description. Format: "Level 1: Level 2: Level 3". |
confidenceLevel | int32 | No | Coarse confidence bucket in the range 0–2. Higher values indicate greater confidence. |
probability | float | No | Confidence score in the range 0–1. |
businessDetail | object | No | Detail backing the business label. Reserved for future use. |
audioTags Object
audioTags ObjectLegacy compatibility field. For new integrations, read the equivalent attributes from
businessLabelsinsideaudioDetailinstead.
| Parameter | Type | Required | Description |
|---|---|---|---|
gender | object | No | Gender detection result. |
gender.label | string | Yes | Gender label name (for example, Male, Female). |
gender.probability | int32 | No | Gender probability in the range 0–100. Higher values indicate greater confidence. (Note: this legacy field uses a 0–100 scale, while modern probability fields use the 0–1 scale.) |
timbre | array | No | Voice timbre detection results. Each element contains label and probability. Possible label values: Uncle, Young Man, Boy, Elderly Man, Queen, Mature Woman, Young Woman, Loli, Middle-aged Woman, Male, Female, No Voice. |
language | array | No | Language detection results. Each element contains label (see Language Labels) and confidence. |
Language Labels
| Label | Description |
|---|---|
0 | Mandarin Chinese |
1 | English |
2 | Cantonese |
3 | Tibetan |
4 | Uyghur |
5 | Mongolian |
6 | Korean |
-1 | Other languages |
Examples
Request Example
{
"accessKey": "YOUR_ACCESS_KEY",
"appId": "default",
"eventId": "default",
"type": "POLITY_EROTIC",
"businessType": "TIMBRE",
"btId": "test1",
"contentType": "URL",
"content": "https://example.com/audio/sample.mp3",
"data": {
"returnAllText": 1,
"tokenId": "token-short"
}
}Response Example
{
"code": 1100,
"message": "Success",
"requestId": "817c8509359500c898a762ffe93a582b",
"btId": "1667392054643",
"detail": {
"audioDetail": [
{
"requestId": "817c8509359500c898a762ffe93a582b_a0000",
"audioStarttime": 0,
"audioEndtime": 10,
"audioUrl": "http://example.com/audio_segment_a0000.mp3",
"businessLabels": [
{
"businessDescription": "Singing: Singing: Singing",
"businessDetail": {},
"businessLabel1": "sing",
"businessLabel2": "changge",
"businessLabel3": "changge",
"confidenceLevel": 2,
"probability": 0.858334402569294
}
],
"allLabels": [],
"riskLevel": "REJECT",
"riskLabel1": "abuse",
"riskLabel2": "buwenmingyongyu",
"riskLabel3": "qingdubuwenmingyongyu",
"riskDescription": "Abuse: Uncivilized language: Mild uncivilized language",
"riskDetail": {
"audioText": "Recognized audio text content..."
}
}
],
"audioTags": {
"gender": {
"label": "Female",
"probability": 95
},
"language": [
{
"confidence": 0,
"label": 2
},
{
"confidence": 99,
"label": 0
},
{
"confidence": 0,
"label": 1
}
],
"song": 0,
"timbre": [
{
"label": "Female",
"probability": 95
},
{
"label": "Queen",
"probability": 12
},
{
"label": "Mature Woman",
"probability": 37
},
{
"label": "Young Woman",
"probability": 56
},
{
"label": "Middle-aged Woman",
"probability": 67
},
{
"label": "Loli",
"probability": 24
}
]
},
"audioText": "Recognized audio text content...",
"audioTime": 10,
"code": 1100,
"requestParams": {
"channel": "TEST",
"lang": "zh",
"returnAllText": 1,
"tokenId": "test01"
},
"riskLevel": "REJECT"
}
}Updated 7 days ago