Submit audio for synchronous moderation and receive the results directly in the response.
API Description
Synchronous audio moderation API. Submits one audio clip per request and returns the moderation result directly in the response. Detects regulatory risks in audio content — including political content, pornography, advertising, and violence & terrorism — and can additionally identify minors, voice timbre, language, and other content based on the business scenarios enabled on your account.
Host audio on a highly available CDN before submitting the URL. DeepCleer fetches each clip directly from the origin server you provide, so any outage or single point of failure on your side will cause fetch failures — and an audio clip that can't be fetched can't be moderated.
Timeout Suggestion
Synchronous request: recommended timeout of 10 seconds
Asynchronous request: recommended timeout of 5 seconds
ℹ️
End-to-end response time is dominated by how long DeepCleer takes to fetch your audio — keep your hosting fast and reliable. Once the clip is in hand, processing time varies with the request type and the audio size.
API authentication key. The default accessKey is sent in your onboarding email.
appId
string
Yes
64
Application identifier, such as web for your web application or app for your mobile app. The default appId is sent in your onboarding email. Contact DeepCleer if you need a new appId.
eventId
string
Yes
64
Event identifier used to distinguish moderation scenarios in your application, such as voiceMessage for chat voice messages or liveAudio for livestream audio. The default eventId is sent in your onboarding email. Contact DeepCleer if you need a new eventId.
type
string
Conditional
64
Risk detection types to run. Either type or businessType must be provided; you can also provide both. Multiple values can be combined with underscores (for example, POLITY_EROTIC_MOAN_ADVERT). See Risk Detection Types for the full catalog.
businessType
string
Conditional
128
Business detection types to run — your organization's custom moderation categories, configured with DeepCleer separately from the built-in type catalog. Either type or businessType must be provided; you can also provide both. Multiple values can be combined with underscores. See Business Detection Types for the full catalog.
contentType
string
Yes
—
Format of the audio payload supplied in content. URL: a publicly fetchable audio URL. RAW: base64-encoded audio data.
content
string
Yes
—
Audio content — either a URL or base64-encoded data. Base64 payload limit: 15 MB; only PCM, WAV, and MP3 formats are accepted for base64. PCM must use 16-bit little-endian encoding. PCM and WAV are recommended.
btId
string
Yes
128
Client-side unique identifier for this audio clip. Echoed back in the response so you can correlate inputs and outputs. Truncated if it exceeds 128 characters; must not be reused across concurrent requests.
acceptLang
string
Yes
—
Language of the labels and descriptions in the response. Set en by default.
Combine multiple values with underscores (for example, POLITY_EROTIC_MOAN). The recommended starting set for general moderation is POLITY_EROTIC_MOAN_ADVERT.
Value
Description
AUDIOPOLITICAL
Top-leader voiceprint detection
POLITY
Political content detection
EROTIC
Pornographic content detection
ADVERT
Advertising detection
BAN
Prohibited content detection
VIOLENT
Violence & terrorism detection
ANTHEN
National anthem detection
MOAN
Moaning detection
DIRTY
Abusive language detection
BANEDAUDIO
Prohibited songs detection
COPYRIGHTSONGS
Copyrighted songs detection
Business Detection Types
Combine multiple values with underscores. To detect TIMBRE, SING, or LANGUAGE, GENDER must also be included in the same request.
Value
Description
SING
Singing detection
LANGUAGE
Language detection
GENDER
Gender detection
TIMBRE
Voice timbre detection
VOICE
Voice attributes
MINOR
Minor detection
AUDIOSCENE
Audio scene detection
AGE
Age detection
data Object
Parameter
Type
Required
Max Length
Description
tokenId
string
No
64
Stable identifier for the end user, typically your internal user UID (an encrypted UID is fine). Used for behavioral-risk signals such as spam and repeat-offender detection. Alphanumeric with underscores and hyphens, up to 64 characters.
receiveTokenId
string
Conditional
64
tokenId of the message recipient in a one-to-one chat. Alphanumeric with underscores and hyphens, up to 64 characters. Required when eventId is message.
deviceId
string
No
128
Device-fingerprint identifier issued by DeepCleer. Generated by the DeepCleer SDK on the end user's device.
ip
string
No
64
Public IP address of the user who submitted the audio. Accepts IPv4 or IPv6.
dataId
string
No
128
Client-side identifier attached to the moderation call. DeepCleer echoes it back with the result, letting you correlate your source record (a message ID, voice-note ID, review ID, etc.) with the moderation verdict — typically used to look up historical decisions in your own database or in the DeepCleer console.
Audio data format. Required when contentType is RAW. Accepted values: pcm, wav, mp3.
rate
int32
Conditional
—
Audio sample rate, in Hz. Required when formatInfo is pcm. Range: 8000–32000.
track
int32
Conditional
—
Audio channel count. Required when formatInfo is pcm.1: mono. 2: stereo.
returnAllText
int32
No
—
Controls which audio segments are included in audioDetail. 0 (default): return only risk-bearing segments (REVIEW and REJECT). 1: return all segments, including PASS. This only affects segment-level results in audioDetail; it does not change the overall riskLevel.
User Levels
Value
Description
0
Lowest-level user (e.g., newly registered, completely inactive, or level-0 users)
1
Lower-level user (e.g., low activity or low-level users)
2
Mid-level user (e.g., moderately active or mid-level users)
3
Higher-level user (e.g., highly active or high-level users)
4
Highest-level user (e.g., paying users, VIP users)
Response
Response Parameters
ℹ️
Fields other than code, message, and requestId are guaranteed to be returned only when code is 1100.
Parameter
Type
Required
Description
requestId
string
Yes
Unique DeepCleer request identifier. Strongly recommended to save for troubleshooting and optimization.
Detailed moderation result. Returned when code is 1100. See detail Object.
Response Codes
Code
Description
1100
Success
1101
Processing
1901
QPS limit exceeded
1902
Invalid parameters
1903
Service failure
1904
Download failure
1905
Decoding failure
9100
Insufficient balance
9101
Unauthorized operation
detail Object
Parameter
Type
Required
Description
riskLevel
string
Yes
Overall disposition recommendation. PASS: normal (allow). REVIEW: suspicious (manual review recommended). REJECT: violation (reject). During initial integration, do not wire results directly into automated blocking — tune your interception thresholds against historical traffic first.
Audio attribute tags including gender, timbre, and language. Legacy compatibility field — for new integrations, read these signals from businessLabels inside audioDetail instead. See audioTags Object.
audioDetail Array
Each element is the moderation result for one audio segment:
Parameter
Type
Required
Description
requestId
string
Yes
Unique DeepCleer identifier for this audio segment.
audioStarttime
float
Yes
Segment start time relative to the beginning of the clip, in seconds.
audioEndtime
float
Yes
Segment end time relative to the beginning of the clip, in seconds.
Level-1 risk label. Returns normal when riskLevel is PASS.
riskLabel2
string
Yes
Level-2 risk label. Empty when riskLevel is PASS.
riskLabel3
string
Yes
Level-3 risk label. Empty when riskLevel is PASS.
riskDescription
string
Yes
Risk description. Returns Normal when riskLevel is PASS. Format: "Level 1: Level 2: Level 3". Human-readable summary intended for display only — do not parse for programmatic logic; branch on riskLabel1 / riskLabel2 / riskLabel3 instead.
Where the risk was identified. 1000: no risk. 1001: text risk (transcribed text). 1003: audio risk (acoustic content).
matchedLists
array
No
Custom-list match information. Returned only when a custom keyword list is hit. See Matched Lists.
riskSegments
array
No
High-risk content segments inside the transcription. See Risk Segments.
Matched Lists
Parameter
Type
Required
Description
name
string
Yes
Name of the matched custom list.
words
array
Yes
Sensitive-word matches within the list.
words[].word
string
Yes
The matched sensitive word.
words[].position
array
Yes
Position of the matched word.
Risk Segments
Parameter
Type
Required
Description
segment
string
No
The high-risk content segment.
position
array
No
Position of the segment in the source text (0-indexed).
Segment businessLabels
Each element in the businessLabels array:
Parameter
Type
Required
Description
businessLabel1
string
Yes
Level-1 business label.
businessLabel2
string
Yes
Level-2 business label.
businessLabel3
string
Yes
Level-3 business label.
businessDescription
string
Yes
Business label description. Format: "Level 1: Level 2: Level 3".
confidenceLevel
int32
No
Coarse confidence bucket in the range 0–2. Higher values indicate greater confidence.
probability
float
No
Confidence score in the range 0–1.
businessDetail
object
No
Detail backing the business label. Reserved for future use.
audioTags Object
⚠️
Legacy compatibility field. For new integrations, read the equivalent attributes from businessLabels inside audioDetail instead.
Parameter
Type
Required
Description
gender
object
No
Gender detection result.
gender.label
string
Yes
Gender label name (for example, Male, Female).
gender.probability
int32
No
Gender probability in the range 0–100. Higher values indicate greater confidence. (Note: this legacy field uses a 0–100 scale, while modern probability fields use the 0–1 scale.)
timbre
array
No
Voice timbre detection results. Each element contains label and probability. Possible label values: Uncle, Young Man, Boy, Elderly Man, Queen, Mature Woman, Young Woman, Loli, Middle-aged Woman, Male, Female, No Voice.
language
array
No
Language detection results. Each element contains label (see Language Labels) and confidence.