Use this API to moderate a specified audio stream.
DeepCleer Audio Stream Moderation streams a live audio source from a public URL or a supported RTC provider, slices it into segments, and continuously pushes per-segment moderation results to your callback URL.
API Description
The Audio Stream Moderation API detects risks such as political sensitivity, pornography, advertising, terrorism, abuse, prohibited songs, and copyrighted songs in live or recorded audio streams. It can also identify business attributes such as gender, age, timbre, language, audio scene, singing, minors, and human-voice authenticity to support your business scenarios.
Submit a stream URL or RTC pull configuration once; DeepCleer maintains the pull, segments the audio, and continuously delivers moderation results to your callback URL until the stream ends or you stop the task.
Requirements
Item
Specification
Protocol
HTTP or HTTPS
Method
POST
Encoding
UTF-8
Format
All request and response parameters use JSON
Stream Pull Retry Mechanism
To guard against transient network failures, DeepCleer automatically retries failed stream pulls. The retry policy varies by stream type:
Stream Source
Retry Count
Retry Interval
Standard rtmp / http / hls
12
5s, 10s, 15s, … (incrementing by 5s, capped at 60s)
Agora SDK recording
2
0s (immediate retries)
Zego SDK recording
10
30s between each retry
If all retries fail the task is closed and (when returnFinishInfo is 1) a stream-end callback is delivered with an auxInfo.errorCode describing the failure.
Timeout Suggestion
Recommended request timeout: 5 seconds.
ℹ️
This timeout applies to the synchronous acknowledgement only. Moderation results are delivered asynchronously through your callback URL once the stream pull has stabilized.
Callback Mechanism
When DeepCleer pushes a result to your callback URL and your endpoint responds with HTTP 200, the delivery is considered successful. If any other status code is returned (or the request fails), the system retries up to 20 times.
API authentication key. The default accessKey is sent in your onboarding email.
appId
string
Yes
64
Application identifier, such as web for your web application or app for your mobile app. The default appId is sent in your onboarding email. Contact DeepCleer if you need a new appId.
eventId
string
Yes
64
Event identifier used to distinguish moderation scenarios in your application, such as voiceMessage for chat voice messages or liveAudio for livestream audio. The default eventId is sent in your onboarding email. Contact DeepCleer if you need a new eventId.
type
string
Conditional
—
Risk detection types. Either type or businessType (or both) must be provided. See Detection Types. Combine multiple types with underscores, e.g. POLITY_EROTIC_MOAN.
businessType
string
Conditional
—
Business detection labels. Either type or businessType (or both) must be provided. See Business Detection Types. Combine multiple types with underscores. When detecting timbre, singing, or language, GENDER must be included.
Values for the businessType field. Combine multiple values with underscores (e.g. GENDER_TIMBRE_SING_LANGUAGE).
Value
Description
GENDER
Speaker gender
AGE
Speaker age
TIMBRE
Speaker timbre
SING
Singing detection
LANGUAGE
Language identification
VOICE
Human-voice attribute
AUDIOSCENE
Audio scene
data Object Parameters
Parameter
Type
Required
Max Length
Description
tokenId
string
Yes
64
User account identifier. Recommended to pass the user ID for behavioral risk detection.
btId
string
Yes
128
Unique audio identifier used to query a specific stream.
streamType
string
Yes
—
Stream source type. See Stream Types. When using an RTC SDK recording option (Agora, Zego, TRTC, Volc, Giants, Aliyun, NetEase Yunxin), additional recording fees may be charged on the RTC provider's side — please consult the provider for details.
url
string
Conditional
—
Live stream URL. Required when streamType is NORMAL.
Zego pull configuration. Required when streamType is ZEGO. See zegoParam Object.
agoraParam
object
Conditional
—
Agora pull configuration. Required when streamType is AGORA. See agoraParam Object.
trtcParam
object
Conditional
—
TRTC pull configuration. Required when streamType is TRTC. See trtcParam Object.
volcParam
object
Conditional
—
Volcengine pull configuration. Required when streamType is VOLC. See volcParam Object.
ginParam
object
Conditional
—
Giants pull configuration. Required when streamType is GIN. See ginParam Object.
aliParam
object
Conditional
—
Aliyun pull configuration. Required when streamType is ALI. See aliParam Object.
yunxinParam
object
Conditional
—
NetEase Yunxin pull configuration. Required when streamType is YUNXIN. See yunxinParam Object.
returnPreText
int32
No
—
Whether to return the transcribed text of the segment immediately preceding a violating segment. 0 (default): do not return. 1: return.
returnPreAudio
int32
No
—
Whether to return the audio URL of the segment immediately preceding a violating segment. 0 (default): do not return. 1: return a 20-second clip combining the preceding and current segments.
returnFinishInfo
int32
No
—
Whether to push a stream-end callback. 0 (default): no end callback. 1: send an end callback with statCode set. Recommended: 1 — without it no callback will be produced when the stream ends.
returnAllText
int32
No
—
Callback granularity. 0 (default): only push when violations are detected. 1: push the latest 10-second result every 10 seconds regardless of risk level. Recommended: 1 — without it no callback will be produced during silent or risk-free periods.
extra
object
No
—
Auxiliary parameters.
passThrough
object
No
—
Client pass-through field. DeepCleer does not process this field; it is returned as-is in the callback.
liveTitle
string
No
—
Room title (used when human review is enabled).
anchorName
string
No
—
User nickname (used when human review is enabled).
audioDetectStep
int32
No
—
Segment-sampling step. Range 1–36; default reviews every segment. 1 reviews odd-numbered segments only, 2 reviews one of every three segments, and so on.
receiveTokenId
string
Conditional
64
Message receiver's tokenId. Alphanumeric with underscores and hyphens, up to 64 characters. Required when eventId is message.
deviceId
string
No
128
DeepCleer device fingerprint identifier, generated by the DeepCleer SDK for user behavior analysis.
ip
string
No
64
Client public IP address (IPv4 or IPv6) for IP-based user behavior analysis.
level
int32
No
—
User level for configuring different interception strategies. See User Levels.
gender
int32
No
—
User gender. 0: unknown. 1: male. 2: female.
Stream Types
Values for the streamType field.
Value
Description
NORMAL
Standard public URL pull. Supports rtmp, rtmps, hls, http, https protocols and flv, m3u8 and similar formats.
ZEGO
Zego SDK recording
AGORA
Agora SDK recording
TRTC
Tencent TRTC recording
VOLC
Volcengine recording
GIN
Giants recording
ALI
Aliyun recording
YUNXIN
NetEase Yunxin
Supported Languages
Values for the lang field. Default: zh.
Value
Language
zh
Chinese
en
English
ar
Arabic
hi
Hindi
es
Spanish
fr
French
ru
Russian
pt
Portuguese
id
Indonesian
de
German
ja
Japanese
tr
Turkish
vi
Vietnamese
it
Italian
th
Thai
tl
Filipino
ko
Korean
ms
Malay
auto
Automatic language detection (contact DeepCleer to enable)
User Levels
Value
Description
0
Lowest-level user (e.g., newly registered, completely inactive, or level-0 users)
1
Lower-level user (e.g., low activity or low-level users)
2
Mid-level user (e.g., moderately active or mid-level users)
3
Higher-level user (e.g., highly active or high-level users)
4
Highest-level user (e.g., paying users, VIP users)
zegoParam Object
Parameter
Type
Required
Description
tokenId
string
Yes
Zego identify_token used for login. See the Zego documentation. Each request must regenerate this token; it uniquely identifies the moderation request.
streamId
string
Conditional
Stream identifier (uniquely maps to one audio stream). At least one of streamId or roomId must be provided.
roomId
string
Conditional
Room identifier (uniquely maps to one room). At least one of streamId or roomId must be provided.
isMixingEnabled
boolean
No
Recording mode. true: mixed stream — all users in the room are merged into a single recorded stream. When both streamId and roomId are provided, streamId takes precedence. false: separated streams — each user is recorded individually. In this case roomId is required and streamId must not be provided.
initDomain
int32
Conditional
Required when the Zego client init uses an isolation domain or random userId. Values: 0 default version; 1 isolation domain only; 2 isolation domain + random userId; 3 SDK update with bug fixes; 4 custom SEI; 5 VAD silence detection (token uniqueness check, must regenerate per request); 6 per-stream submission control in room-scoped pull mode. Recommended: 6. Default: 0.
agoraParam Object
Parameter
Type
Required
Description
appId
string
Yes
Agora-issued appId. Distinct from the DeepCleerappId.
channel
string
Yes
Agora channel name.
token
string
No
Optional Agora token for higher-security accounts. See the Agora documentation. Set the validity period longer than the channel duration to avoid expiry. Maximum Agora token validity is 24 hours; for longer channels enable returnFinishInfo: 1 and watch for auxInfo.errorCode = 3005 in the stream-end callback to know when to refresh the token.
uid
int32
Conditional
Unsigned 32-bit user ID. Required when token is provided and must match the uid used to generate the token. Must be different from any uid actually present in the room.
isMixingEnabled
boolean
No
true (default): mixed stream — one stream per room. false: separated streams — one stream per microphone slot.
channelProfile
int32
No
Channel profile. 0 (default): Communication (1-on-1 or group, all users may speak). 1: Live broadcast (host / audience roles).
subscribeMode
string
No
Subscription mode. AUTO (default): subscribe to all streams in the room. UNTRUSTED: with untrustedUserIdList, subscribe only to the listed users (separated streams only). TRUSTED: with trustedUserIdList, subscribe only to users not in the list (separated streams only).
trustedUserIdList
array
No
Trusted user list. Active when subscribeMode = TRUSTED. May be empty. Each element is a stringified uint32 value, e.g. ["123","456"].
untrustedUserIdList
array
No
Untrusted user list. Active when subscribeMode = UNTRUSTED. Must be non-empty. Each element is a stringified uint32 value, e.g. ["123","456"].
trtcParam Object
Parameter
Type
Required
Description
sdkAppId
int32
Yes
Tencent-issued sdkAppId.
demoSences
int32
Yes
Recording type. 2: separated stream recording. 4: mixed stream recording. (Note: the source spelling demoSences is preserved for wire compatibility; this appears to be a typo of demoScenes and is a candidate for v5 cleanup.)
userId
string
Yes
Recording-side userId, up to 32 bits. Allowed characters: a-z, A-Z, 0-9, underscore, hyphen.
userSig
string
Yes
Verification signature for the recording userId (functions as the login password).
roomId
int32
Conditional
Numeric room ID (range 1–4294967294). One of roomId or strRoomId must be provided. When both are present, roomId takes precedence.
strRoomId
string
Conditional
String room ID (allowed characters: a-z, A-Z, 0-9, underscore, hyphen). One of roomId or strRoomId must be provided. When both are present, roomId takes precedence.
uid
string
No
Specific user ID to moderate. If omitted, all publishing users in the room are pulled and moderated. To moderate a subset of users, submit multiple requests with different recording-side userId / userSig. Distinct from the recording userId.
volcParam Object
Parameter
Type
Required
Description
appId
string
Yes
Volcengine-issued appId. Distinct from the DeepCleerappId.
Subscription mode. AUTO (default): subscribe to all streams in the room. UNTRUSTED: with untrustedUserIdList, subscribe only to listed users — list must be non-empty or the request fails. TRUSTED: with trustedUserIdList, subscribe only to users not in the list — if no qualifying user joins within a grace period, DeepCleer will end the moderation.
trustedUserIdList
array
No
Trusted user list. Active when subscribeMode = TRUSTED. May be empty.
untrustedUserIdList
array
No
Untrusted user list. Active when subscribeMode = UNTRUSTED. Must be non-empty.
ginParam Object
Parameter
Type
Required
Description
tokenId
string
Yes
Room token used by the pull endpoint to log in to the room. Provided by Giants.
roomId
string
Yes
Room number (uniquely maps to one room). The server pulls and records on a per-room basis.
isMixingEnabled
boolean
No
true (default): mixed stream — all users in the room merged into one stream. false: separated streams — each user recorded individually.
ip
string
Yes
Designated server IP address.
port
string
Yes
Designated port.
aliParam Object
Parameter
Type
Required
Description
token
string
Yes
Authentication token used by the pull endpoint to join the channel. See the Aliyun documentation. A new token must be generated for every moderation request.
room
string
Yes
Room ID. Non-empty, must exactly match the channelID used to generate the token. The server pulls and records on a per-room basis. The same room will not trigger duplicate pulls.
userId
string
Yes
Pull-bot user ID. Must exactly match the userId used to generate the token. Non-empty.
isMixingEnabled
boolean
No
true (default): mixed stream — all users in the room merged into one stream. false: separated streams — each user recorded individually.
yunxinParam Object
Parameter
Type
Required
Description
token
string
Yes
Authentication token used by the pull endpoint to join the channel. See the NetEase Yunxin documentation. A new token must be generated for every moderation request.
cname
string
Yes
Channel name. Non-empty, must exactly match the cname used to generate the token.
uid
int32
Yes
Pull-bot uid. Must exactly match the uid used to generate the token.
appKey
string
Yes
App key issued by NetEase Yunxin.
Response
The synchronous response is an acknowledgement only — it confirms that DeepCleer has accepted the moderation task. Per-segment moderation results are delivered asynchronously through the callback URL you provided.
Response Parameters
ℹ️
Parameters other than code, message, and requestId are only guaranteed to be returned when code is 1100.
Detailed information for error scenarios. See errorCode and dupRequestId below.
errorCode
int32
No
Detailed status code. 1001: duplicate stream submission.
dupRequestId
string
No
Returned when errorCode is 1001 (duplicate submission). If the original request response was lost but the stream has already entered moderation, the original requestId is unknown to the caller. Resubmit the same stream and use the returned dupRequestId to call the close-moderation endpoint.
Response Codes
Code
Message
1100
Success
1901
QPS or stream-count limit exceeded
1902
Invalid parameters
1903
Service failure
1904
Stream pull failure
9101
Unauthorized operation
⚠️
The synchronous response field is named errorCode (camelCase). This differs from the Video Stream Moderation sync response, which uses lowercase errorcode. Both are documented as-returned and are candidates for casing alignment in the v5 cleanup.
Callback Mechanism
Once the stream pull stabilizes, DeepCleer continuously pushes per-segment moderation results to your callback URL. The push cadence depends on returnAllText:
returnAllText = 0: a callback is sent only when a segment is found to contain a violation.
returnAllText = 1: a callback is sent every 10 seconds covering the most recent 10-second segment, regardless of risk level.
Payloads are delivered as JSON in the HTTP request body.
Callback Parameters
Parameter
Type
Required
Description
requestId
string
Yes
Unique DeepCleer identifier for this stream segment.
btId
string
Yes
Client-side audio identifier (echoed from the request).
code
int32
Yes
Response code. 1100: success. Other codes match the synchronous response. Fields other than message and requestId are only present when code is 1100.
message
string
Yes
Response message corresponding to the code.
statCode
int32
No
Moderation lifecycle status. 0: in progress (regular per-segment result). 1: moderation finished (stream-end callback). Only present when returnFinishInfo is 1. Note: the semantics here differ from the Video Stream Moderation API, where statCode0 means a regular result and 1 means a stream-end callback at the same parameter location. Treat the two APIs as having distinct callback lifecycles even if the field name matches.
requestParams
object
Yes
Echo of the original request parameters.
audioDetail
object
No
Per-segment audio moderation result. Returned when code is 1100 and statCode is 0. See audioDetail Object.
auxInfo
object
No
Stream-end auxiliary information. Returned when statCode is 1. See Stream-End auxInfo.
Level 1 risk label. Returns normal when riskLevel is PASS.
riskLabel2
string
Yes
Level 2 risk label. Empty when riskLevel is PASS.
riskLabel3
string
Yes
Level 3 risk label. Empty when riskLevel is PASS.
riskDescription
string
Yes
Risk description. Returns "Normal" when riskLevel is PASS. Hits against custom lists return "Matched custom list". Otherwise format: "Level 1: Level 2: Level 3". For reference only — do not use for programmatic logic.
audioText
string
No
Transcribed text of the segment. When returnPreText is 1, contains both the preceding and current segment text; when 0, contains the current segment text only.
preAudioUrl
string
No
URL of a 20-second clip combining the preceding and current audio segments. Returned only when returnPreAudio is 1.
riskDetail
object
No
Detailed risk information. Returned when code is 1100. See riskDetail Object.
auxInfo
object
Yes
Auxiliary information for this segment. See Segment auxInfo.
businessLabels
array
No
Business labels for this audio segment (gender, timbre, singing, etc.). See businessLabels Array.
allLabels
array
No
All risk labels detected for this segment. See allLabels Array.
tokenProfileLabels
array
No
Account attribute labels. Returned only when the labeling service is enabled. See Token Labels.
tokenRiskLabels
array
No
Account risk labels. Returned only when the labeling service is enabled. See Token Labels.
speakers
array
No
Per-second speaker activity within this segment. See speakers Array. Currently only present in Agora mixed streams.
vadCode
int32
No
Silence flag for this segment. 0: silent segment. 1: non-silent segment.
audioTags
object
No
Legacy audio attribute labels (gender, timbre, language, singing). See audioTags Object. For new integrations prefer businessLabels.
riskDetail Object
Parameter
Type
Required
Description
riskSource
int32
Yes
Risk source. 1000: no risk. 1001: text risk. 1003: audio risk.
audioText
string
No
Transcribed text used during moderation.
matchedLists
array
No
Matched custom list information. Returned only when a custom list is hit. See Matched Lists.
riskSegments
array
No
High-risk content segments. Present when political, terrorism, prohibited, competitive, or advertising-law content is detected. See Risk Segments.
Matched Lists
Parameter
Type
Required
Description
name
string
Yes
Name of the matched list.
words
array
Yes
Sensitive word details.
words[].word
string
Yes
The matched sensitive word.
words[].position
array
Yes
Position of the sensitive word (0-indexed).
Risk Segments
Parameter
Type
Required
Description
segment
string
No
High-risk content segment.
position
array
No
Position of the segment within the transcript (0-indexed).
Segment auxInfo
Parameter
Type
Required
Description
audioStartTime
string
Yes
Absolute start time of the violating content within the stream. (Note: this field uses uppercase T here. The standalone Audio Sync/Async/Query APIs return the same conceptual field as audioStarttime (lowercase t). The on-the-wire casing is preserved as-returned and is a candidate for v5 alignment.)
audioEndTime
string
Yes
Absolute end time of the violating content within the stream. (Same casing-inconsistency note as audioStartTime.)
beginProcessTime
int64
Yes
Processing start time. 13-digit Unix timestamp in milliseconds (UTC).
finishProcessTime
int64
Yes
Processing finish time. 13-digit Unix timestamp in milliseconds (UTC).
userId
int32
No
In-room user ID for the speaker. Present only for Agora separated streams. Distinct from the agoraParam.uid used for token generation.
strUserId
string
No
In-room user ID for the speaker. Present for separated streams of ALI, TRTC, ZEGO, VOLC, and GIN. Distinct from trtcParam.uid (TRTC separated) and aliParam.userId (Aliyun separated).
room
string
No
Room number.
seiInfo
array
No
SEI information. Contact DeepCleer to enable.
passThrough
object
No
Pass-through field. Same value as data.extra.passThrough from the request.
businessLabels Array
Each element in the array:
Parameter
Type
Required
Description
businessLabel1
string
No
Level 1 business label.
businessLabel2
string
No
Level 2 business label.
businessLabel3
string
No
Level 3 business label.
businessDescription
string
No
Business label description. Format: "Level 1: Level 2: Level 3". For reference only — do not use for programmatic logic.
Risk reason. For reference only — do not use for programmatic logic.
speakers Array
Per-second speaker uid + volume sampling for the audio segment, ordered chronologically. The outer array contains up to 10 elements (one per sampled second). Each inner element is itself an array describing every speaker active at that second.
Currently only populated in Agora mixed stream moderation.
Each inner object:
Parameter
Type
Required
Description
uid
int32
Yes
In-room speaker uid.
volume
int32
Yes
Volume level. Range 0–255.
audioTags Object
Legacy audio attribute labels. Returned when the corresponding type value is requested. For new integrations, prefer the businessLabels array; the structure here is preserved for backwards compatibility.
Parameter
Type
Required
Description
gender
object
No
Gender label. Returned when type includes GENDER. See Gender Label.
timbre
array
No
Timbre labels. Returned when type includes TIMBRE. See Timbre Labels.
song
int32
No
Singing label. Returned when type includes SING. 0: no singing detected. 1: singing detected.
language
object
No
Language label. Returned when type includes LANGUAGE. See Language Labels.
Gender Label
Parameter
Type
Required
Description
label
string
Yes
Gender label name. Possible values (Chinese as returned by the legacy API): 男性 (male), 女性 (female).
probability
int32
Yes
Confidence score on a 0–100 scale. Higher values indicate greater confidence. (Legacy 0–100 scale — modern endpoints use a 0–1 scale; this is a v5-cleanup candidate.)
Timbre Labels
Each element in the array:
Parameter
Type
Required
Description
label
string
Yes
Timbre category. Possible values (Chinese as returned by the legacy API): 大叔 (older male), 青年 (young male), 正太 (boy), 老年 (elderly), 女王 (mature woman), 御姐 (assertive woman), 少女 (young woman), 萝莉 (girl), 大妈 (older female).
probability
int32
Yes
Confidence score on a 0–100 scale. Higher values indicate greater confidence. (Legacy 0–100 scale — see Gender Label note.)
Confidence score on a 0–100 scale. (Legacy 0–100 scale — see Gender Label note.)
Language Codes
Value
Language
0
Mandarin Chinese
1
English
2
Cantonese
3
Tibetan
4
Uyghur
5
Mongolian
6
Korean
-1
Other
Token Labels
Both tokenProfileLabels and tokenRiskLabels share the same structure:
Parameter
Type
Required
Description
label1
string
No
Level 1 label.
label2
string
No
Level 2 label.
label3
string
No
Level 3 label.
description
string
No
Label description. For reference only — do not use for programmatic logic.
timestamp
int64
No
Label assignment time. 13-digit Unix timestamp in milliseconds (UTC).
Stream-End auxInfo
Returned only in the stream-end callback (statCode = 1). Indicates why the moderation task ended.
Parameter
Type
Required
Description
errorCode
int32
Yes
Stream-end error code. 3001: stream URL access failure (e.g. HTTP 404 / 403). 3002: invalid stream data (e.g. "Invalid data found when processing input"). 3003: stream not found (e.g. Zego error 197612). 3004: stream returned no audio data. 3005: pull token invalid or expired — refresh the token and resubmit (e.g. expired Agora token, invalid TRTC userSig).
streamTime
int64
No
Submitted stream duration. Returned in the final stream-end callback. When audioDetectStep is configured this value may differ from the actual stream length.
ℹ️
These auxInfo.errorCode values (3001–3005) are specific to streaming pull failures. They are distinct from the 2003 / 2007 codes used by the standalone Audio Sync, Async, and Query APIs, which describe file-fetch and decode failures. Integrators using both interfaces should map the two namespaces separately.