Video stream moderation API. Submits a live or pre-recorded video stream URL (or RTC channel) and DeepCleer pulls the stream from the source, captures frames, extracts audio segments, and delivers per-frame and per-audio-segment results to the callback URLs you supply. Detects regulatory risks in both the visual track (political, pornography, violence & terrorism, QR codes, advertisements, image-text violations) and the audio track (political, pornography, advertising, prohibited, profanity, moaning, top-leader voiceprint, national anthem, prohibited songs), and can additionally identify business-scenario attributes such as gender, voice timbre, language, age, and minor presence.
Requirements
Item
Specification
Protocol
HTTP or HTTPS
Method
POST
Encoding
UTF-8
Format
All request and response parameters use JSON
Stream Requirements
Item
Specification
Standard stream URLs
RTMP, RTMPS, HLS, HTTP, HTTPS protocols; FLV, M3U8, and other common container formats
RTC providers
Agora, Tencent (TRTC), Zego, Volcano Engine, Alibaba Cloud — each with provider-specific parameters (see Stream Type)
Timeout Suggestion
Recommended timeout: 7 seconds for the submission call
Internal processing timeout: 3 seconds with one automatic retry; normal API response time is within 100 ms
ℹ️
The submission call only registers the stream for moderation — it returns almost immediately. Frame and audio results are delivered separately via the callback URLs you supply. Keep your callback handlers fast (< 2 seconds) so DeepCleer doesn't trigger unnecessary retries.
Callback Mechanism
Results are delivered to the imgCallback and audioCallback URLs you supply in the request. When DeepCleer calls your endpoint:
Your endpoint must respond with HTTP 200 OK. Any non-200 response is treated as a delivery failure.
On failure, DeepCleer retries with the following intervals (in seconds): [5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60]. After 12 failed retries, the segment is dropped.
Your endpoint should be idempotent on requestId + imgUrl / audioUrl — the same segment may be delivered more than once if an earlier delivery succeeded but the response was lost in transit.
Stream Pull Retry Mechanism
To reduce failures caused by transient network issues, DeepCleer retries failed stream pulls as follows:
Standard streams and Zego / Tencent / Volcano streams: up to 12 retries. Each attempt lasts 5 minutes; intervals between attempts follow [5, 10, 15, 20, …, 60] seconds. For example, DeepCleer first attempts continuous stream pulling for 5 minutes; if unsuccessful, it waits 5 seconds and pulls again for another 5 minutes; if still unsuccessful, it waits 10 seconds and pulls again, and so on.
Agora streams: no retries. The connection is closed after a 5-minute stream pull timeout.
API authentication key. The default accessKey is sent in your onboarding email.
appId
string
Yes
64
Application identifier, such as web for your web application or app for your mobile app. The default appId is sent in your onboarding email. Contact DeepCleer if you need a new appId. Strictly validated.
eventId
string
Yes
64
Event identifier used to distinguish moderation scenarios in your application, such as liveStream for live broadcasts or rtcRoom for RTC channels. The default eventId is sent in your onboarding email. Contact DeepCleer if you need a new eventId.
imgType
string
Conditional
64
Image (frame) risk detection types. At least one of imgType or imgBusinessType must be provided. Multiple values can be combined with underscores (for example, POLITY_QRCODE_ADVERT). See
Image Detection Types
Combine multiple types with underscores (for example, POLITY_QRCODE_ADVERT).
Combine multiple types with underscores (for example, POLITY_EROTIC). Use NONE to skip audio moderation entirely.
Value
Description
POLITY
Political content detection
EROTIC
Pornographic content detection
ADVERT
Advertising detection
BAN
Prohibited content detection
VIOLENT
Violence & terrorism detection
DIRTY
Profanity / abusive language detection
ADLAW
Advertising law violation detection
MOAN
Moaning detection
AUDIOPOLITICAL
Top-leader voiceprint detection
ANTHEN
National anthem detection
BANEDAUDIO
Prohibited songs detection
NONE
Skip audio detection
Audio Business Detection Types
Combine multiple types with underscores. To detect timbre, singing, or language, GENDER must also be included.
Value
Description
SING
Singing detection
LANGUAGE
Language detection (Chinese, English, Cantonese, Tibetan, Uyghur, Korean, Mongolian, Other)
MINOR
Minor detection
GENDER
Gender detection
TIMBRE
Voice timbre detection
VOICE
Voice attributes
AUDIOSCENE
Audio scene detection
AGE
Age detection
APPNAME
App name detection
data Object
Parameter
Type
Required
Max Length
Description
streamType
string
Yes
—
Video stream source type. See
Stream Type
Value
Description
NORMAL
Standard stream URL (RTMP, RTMPS, HLS, HTTP, HTTPS protocols). Provide via data.url.
AGORA
Agora moderation. Provide
User Levels
Value
Description
0
Lowest-level user (e.g., newly registered, completely inactive, or level-0 users)
1
Lower-level user (e.g., low activity or low-level users)
2
Mid-level user (e.g., moderately active or mid-level users)
3
Higher-level user (e.g., highly active or high-level users)
4
Highest-level user (e.g., paying users, VIP users)
Supported Languages
Value
Language
en
English
zh
Chinese (default)
ar
Arabic
hi
Hindi
es
Spanish
fr
French
ru
Russian
pt
Portuguese
id
Indonesian
de
German
ja
Japanese
tr
Turkish
vi
Vietnamese
it
Italian
th
Thai
tl
Filipino
ko
Korean
ms
Malay
auto
Automatic language detection (contact DeepCleer to enable)
data.extra Object
Parameter
Type
Required
Max Length
Description
passThrough
object
No
1024
Client pass-through field. DeepCleer does not process this field; it is echoed back in the callback payload as-is.
data.agoraParam Object
Required when streamType is AGORA.
Parameter
Type
Required
Max Length
Description
appId
string
Yes
64
Application identifier provided by Agora.
channel
string
Yes
64
Channel name provided by Agora.
channelProfile
int32
No
—
Agora recording channel mode. 0 (default): communication mode (1-on-1 or group chats where any user can speak freely). 1: live broadcast mode (host / audience roles).
enableH265Support
boolean
No
—
Whether to support H.265 video stream recording. false (default): do not support H.265 — remote users cannot send H.265 video. true: support H.265.
enableIntraRequest
boolean
No
—
Whether to enable keyframe requests. Default: true. Improves audio/video experience under weak network conditions. To enable seeking in recordings made in individual stream mode, set this to false. false: disable keyframe requests — all senders send keyframes every 2 seconds, individual-stream recordings support seeking. true: sender controls keyframe requests, individual-stream recordings do not support seeking.
subscribeMode
string
No
—
Subscription mode. AUTO (default): subscribe to all streams in the room. UNTRUSTED: pair with untrustedUserIdList — subscribe only to streams from users in the list (empty list returns a parameter error). TRUSTED: pair with trustedUserIdList — subscribe only to streams from users not in the list. In TRUSTED mode, if no users outside the list join the room within a certain time, DeepCleer proactively ends moderation.
token
string
No
64
Authentication token. See Agora documentation for generation: <https://docs.agora.io/cn/Recording/token_server?platform=CPP>. Set the token validity period to exceed the channel duration to prevent stream-pull failures due to token expiration. Agora currently caps token validity at 24 hours; for channels lasting longer, set returnFinishInfo to 1, watch for the end-callback statCode of 1 (invalid or expired pull token), generate a new token, and resubmit the channel for moderation if it still requires it.
uid
int32
No
—
A 32-bit unsigned integer. When token is provided, supply the user ID used to generate the token. Must not collide with any actual user UID in the room — the recording-side UID must be unique.
trustedUserIdList
array
No
—
Trusted-user UID list. Effective when subscribeMode is TRUSTED. Must not be empty. DeepCleer will not subscribe to streams from users in this list. Comma-separated UID array (e.g., [1, 2]). Maximum: 17 users.
untrustedUserIdList
array
No
—
Untrusted-user UID list. Effective when subscribeMode is UNTRUSTED. Must not be empty. DeepCleer will only subscribe to streams from users in this list. Comma-separated UID array (e.g., [1, 2]). Maximum: 17 users.
data.trtcParam Object
Required when streamType is TRTC.
Parameter
Type
Required
Max Length
Description
sdkAppId
int32
Yes
—
The sdkAppId provided by Tencent.
roomId
int32
Conditional
10
Numeric room ID. Range: 1–4294967294. One of roomId or strRoomId must be provided. If both are present, roomId takes priority. Currently a maximum of 8 users per room can be moderated.
strRoomId
string
Conditional
128
String room ID. Allowed characters: letters (a–z, A–Z), digits (0–9), underscores, and hyphens. One of roomId or strRoomId must be provided. If both are present, roomId takes priority.
userId
string
Yes
32
The userId assigned to the recording end. Allowed characters: letters (a–z, A–Z), digits (0–9), underscores, and hyphens.
userSig
string
Yes
128
Verification signature corresponding to the recording userId. Equivalent to a login password.
Recording type. 2: individual stream recording. 4: mixed stream recording. (Note: field name is demoSences on the wire — preserved as a typo of "demoScenes"; flag for v5 cleanup.)
data.zegoParam Object
Required when streamType is ZEGO.
Parameter
Type
Required
Max Length
Description
roomId
string
Yes
64
Zego room ID.
tokenId
string
Yes
64
Authentication information provided by Zego. Used to obtain identify_token for login. See Zego documentation for generation: <https://doc-zh.zego.im/article/15258>. Note:tokenId is a unique identifier — a new token must be generated for each moderation request.
data.volcParam Object
Required when streamType is VOLC.
Parameter
Type
Required
Max Length
Description
appId
string
Yes
64
Application identifier provided by Volcano Engine.
roomId
string
Yes
128
Room ID.
userId
string
Yes
32
The userId assigned to the recording end. Allowed characters: letters (a–z, A–Z), digits (0–9), underscores, and hyphens.
token
string
Yes
64
Verification signature corresponding to the recording userId. Equivalent to a login password.
data.aliParam Object
Required when streamType is ALI.
Parameter
Type
Required
Max Length
Description
room
string
Yes
64
Room ID. Must exactly match the channelID used to generate the token. DeepCleer pulls and records streams on a per-room basis. room is a unique identifier — duplicate rooms will not result in duplicate stream pulls.
The synchronous response is an acknowledgement only — it confirms whether the stream was accepted for moderation. Frame and audio results are delivered later to the callback URLs you supplied — see Stream Segment Callback Parameters and (when returnFinishInfo is 1) Stream End Callback Parameters .
Response Parameters
Parameter
Type
Required
Description
requestId
string
Yes
Unique DeepCleer request identifier. Save this — you will need it to cancel moderation, correlate callbacks, and troubleshoot.
code
int32
Yes
Response code. See
Response Codes
Code
Message
1100
Success
1901
QPS limit exceeded
1902
Invalid parameters
1903
Service failure
1904
Stream count limit exceeded
9101
Unauthorized operation
Stream Segment Callback Parameters
Per-segment results are delivered to imgCallback (when contentType is 1) and audioCallback (when contentType is 2). Both callback payloads share the envelope below.
ℹ️
Parameters other than code, message, and requestId are only guaranteed to be returned when code is 1100.
Parameter
Type
Required
Description
requestId
string
Yes
Unique DeepCleer request identifier (same as the value returned in the synchronous acknowledgement).
Callback status code. 0: regular moderation result callback. 1: stream-end result callback (only when returnFinishInfo is 1).
contentType
int32
Yes
Distinguishes between image and audio callbacks. 1: image (frame) callback. 2: audio segment callback.
frameDetail
object
No
Frame moderation detail. Present when contentType is 1 and the frame has a risky label, or when returnAllImg is 1. See frameDetail Object.
audioDetail
object
No
Audio segment moderation detail. Present when contentType is 2 and the segment has a risky label, or when returnAllText is 1. See audioDetail Object.
auxInfo
object
No
Auxiliary information. Contains passThrough echoed from data.extra.passThrough in the original request.
frameDetail Object
Present when there are risky frames or when returnAllImg is 1.
Parameter
Type
Required
Description
imgUrl
string
Yes
URL of the captured frame.
riskLevel
string
Yes
Disposition recommendation. PASS: normal (allow). REVIEW: suspicious (route to manual review). REJECT: violation (block).
riskLabel1
string
Yes
Level 1 risk label. Returns normal when riskLevel is PASS.
riskLabel2
string
Yes
Level 2 risk label. Empty when riskLevel is PASS.
riskLabel3
string
Yes
Level 3 risk label. Empty when riskLevel is PASS.
riskDescription
string
Yes
Risk description. Returns "Normal" when riskLevel is PASS. Returns "Hit custom list" when a customer-defined list is matched. Otherwise format: "Level 1: Level 2: Level 3". For reference only — do not use for programmatic logic.
Identifier. The same person has the same ID across different labels.
person_ratio
float
No
Person-to-frame ratio (0–1).
probability
float
No
Confidence score (0–1).
location
array
No
Person position coordinates.
frameDetail.auxInfo
Parameter
Type
Required
Description
beginProcessTime
int64
Yes
Processing start time. 13-digit Unix timestamp in milliseconds (UTC).
finishProcessTime
int64
Yes
Processing end time. 13-digit Unix timestamp in milliseconds (UTC).
detectType
int32
No
Whether the captured frame was actually moderated. Only returned when the detectStep request parameter is set. 1: the frame was moderated. 2: the frame was skipped (per detectStep).
imgTime
string
No
Time the frame was captured (absolute time of the frame in the video stream).
room
string
No
Room ID.
similarityDedup
int32
No
Auxiliary parameter. Only returned when the similar-frame deduplication feature changed the outer riskLevel from REJECT / REVIEW to PASS. 1: similar-frame deduplication is active.
strUserId
string
No
User identifier for distinguishing violating users within a room. Unrelated to the userId request parameter — this is the individual stream user ID. Returned for: ZEGO room-level moderation, TRTC individual-stream moderation, VOLC, and ALI.
userId
int32
No
Agora user account identifier. Only present in individual-stream Agora scenarios. The returned userId is the actual user ID in the room, unrelated to the uid request parameter.
frameDetail.businessLabels
Each element in the array:
Parameter
Type
Required
Description
businessLabel1
string
Yes
Level 1 business label.
businessLabel2
string
Yes
Level 2 business label.
businessLabel3
string
Yes
Level 3 business label.
businessDescription
string
Yes
Business label description. Format: "Level 1: Level 2: Level 3".
Business label details. May contain face_num, person_num, faces, objects, and persons with the same structure as frameDetail.riskDetail.
audioDetail Object
Present when there are risky audio segments or when returnAllText is 1.
Parameter
Type
Required
Description
audioUrl
string
Yes
URL of the audio segment.
riskLevel
string
Yes
Disposition recommendation. PASS: normal (allow). REVIEW: suspicious (route to manual review). REJECT: violation (block).
riskLabel1
string
Yes
Level 1 risk label. Returns normal when riskLevel is PASS.
riskLabel2
string
Yes
Level 2 risk label. Empty when riskLevel is PASS.
riskLabel3
string
Yes
Level 3 risk label. Empty when riskLevel is PASS.
riskDescription
string
Yes
Risk description. Returns "Normal" when riskLevel is PASS. Returns "Hit custom list" when a customer-defined list is matched. Otherwise format: "Level 1: Level 2: Level 3". For reference only — do not use for programmatic logic.
Audio-to-text transcription for this segment. When returnPreText is 1 and the current segment is REJECT, returns 20 seconds of transcript covering the previous 10 seconds plus the current 10 seconds. Otherwise returns only the current segment transcript.
preAudioUrl
string
No
Previous audio segment URL. When returnPreAudio is 1 and the current segment is REJECT, returns a 20-second audio clip covering the previous 10 seconds plus the current 10 seconds. Otherwise not returned.
Level 1 risk label. Returns normal when riskLevel is PASS.
riskLabel2
string
Yes
Level 2 risk label. Empty when riskLevel is PASS.
riskLabel3
string
Yes
Level 3 risk label. Empty when riskLevel is PASS.
riskDescription
string
Yes
Risk description. Returns "Normal" when riskLevel is PASS. Returns "Hit custom list" when a customer-defined list is matched. Otherwise format: "Level 1: Level 2: Level 3". For reference only — do not use for programmatic logic.
audioDetail.riskDetail
Parameter
Type
Required
Description
riskSource
int32
Yes
Risk source. 1000: no risk. 1001: text risk. 1002: visual image risk. 1003: audio voice risk.
audioText
string
No
Audio-to-text transcription for this segment.
matchedLists
array
No
Matched custom list information. Returned only when a customer-defined list is hit. Same structure as Matched Lists.
riskSegments
array
No
High-risk content segments. Present when political, terrorism, prohibited, competitor, or advertising-law content is detected. Same structure as Risk Segments.
audioDetail.auxInfo
Parameter
Type
Required
Description
audioStartTime
string
Yes
Violation content start time (absolute time).
audioEndTime
string
Yes
Violation content end time (absolute time).
beginProcessTime
int64
Yes
Processing start time. 13-digit Unix timestamp in milliseconds (UTC).
finishProcessTime
int64
Yes
Processing end time. 13-digit Unix timestamp in milliseconds (UTC).
room
string
No
Room ID.
strUserId
string
No
User identifier for distinguishing violating users within a room. Unrelated to the userId request parameter — this is the individual stream user ID. Returned for: ZEGO room-level moderation, TRTC individual-stream moderation, VOLC, and ALI.
userId
int32
No
Agora user account identifier. Only present in individual-stream Agora scenarios. The returned userId is the actual user ID in the room, unrelated to the uid request parameter.
Note on field casing:audioStartTime and audioEndTime use uppercase T, while the standalone Audio Moderation APIs use lowercase t (audioStarttime / audioEndtime). Preserved exactly as returned on the wire — flag for v5 cleanup.
audioDetail.businessLabels
Each element in the array:
Parameter
Type
Required
Description
businessLabel1
string
Yes
Level 1 business label.
businessLabel2
string
Yes
Level 2 business label.
businessLabel3
string
Yes
Level 3 business label.
businessDescription
string
Yes
Business label description. Format: "Level 1: Level 2: Level 3".
Business risk detail. Fields: riskSource (int32; 1000/1001/1003), audioText (string), matchedLists (array, same as above), riskSegments (array, same as above).
tokenProfileLabels — Account Attribute Labels
Returned only when tokenId is provided and the labeling service is enabled. Each element in the array:
Parameter
Type
Required
Description
label1
string
No
Level 1 label.
label2
string
No
Level 2 label.
label3
string
No
Level 3 label.
description
string
No
Label description. For reference only — do not use for programmatic logic.
timestamp
int64
No
Label timestamp. 13-digit Unix timestamp in milliseconds (UTC).
tokenRiskLabels — Account Risk Labels
Returned only when tokenId is provided and the labeling service is enabled. Each element in the array:
Parameter
Type
Required
Description
label1
string
No
Level 1 label.
label2
string
No
Level 2 label.
label3
string
No
Level 3 label.
description
string
No
Label description. For reference only — do not use for programmatic logic.
timestamp
int64
No
Label timestamp. 13-digit Unix timestamp in milliseconds (UTC).
Stream End Callback Parameters
Delivered when returnFinishInfo is 1 and moderation of the stream completes (either naturally, via stream-pull failure, or via the close-moderation API).
Parameter
Type
Required
Description
requestId
string
Yes
Unique DeepCleer request identifier (same as the value returned in the synchronous acknowledgement).
Total stream moderation duration. Returned in the final callback after the stream ends. When interval moderation (audioDetectStep) is applied, this may differ from the actual stream duration.
requestParams
object
No
Echo of all fields submitted under data in the original request. Returned when contentType is 2.
detail
object
No
Detail information. Returned when contentType is 1. Contains requestParams (object): echo of all fields submitted under data in the original request.
{
"audioDetail": {
"allLabels": [
{
"riskDescription": "Political involvement: Leader One: Leader One",
"riskLabel1": "politics",
"riskLabel2": "yihaolingdao",
"riskLabel3": "yihaolingdao",
"riskLevel": "REJECT"
}
],
"audioText": "Emphasized in important instructions that vocational education has broad prospects in the new journey of comprehensively building a modern socialist country. General Secretary of the CPC Central Committee and President of the State.",
"audioUrl": "http://bj-voice-mp3-1251671073.cos.ap-beijing.myqcloud.com/POST_VIDEOSTREAM%2FPOST_VIDEOSTREAM_AUDIO%2FMP3%2F20221027%2Fy28f8a4f1264085b321f12223wqed1121retestpvvvvv44321we12_3.mp3?q-sign-algorithm=sha1&q-ak=AKIDg9LHyOYSAcmfHekZ6NN6XidHflbASUHn&q-sign-time=1666876123%3B1669468123&q-key-time=1666876123%3B1669468123&q-header-list=host&q-url-param-list=&q-signature=f32da45be186fd4a8ed063e499d3f4e0f4f5fc19",
"auxInfo": {
"audioEndTime": "2022-10-27 21:08:42",
"audioStartTime": "2022-10-27 21:08:32",
"beginProcessTime": 1666876123332,
"finishProcessTime": 1666876123893,
"room": "y1123413312ewe24sv2"
},
"businessLabels": [],
"content": "",
"preAudioUrl": "http://bj-voice-mp3-1251671073.cos.ap-beijing.myqcloud.com/POST_VIDEOSTREAM%2FPOST_VIDEOSTREAM_AUDIO%2FMP3%2F20221027%2Fy28f8a4f1264085b321f12223wqed1121retestpvvvvv44321we12_3_pre.mp3?q-sign-algorithm=sha1&q-ak=AKIDg9LHyOYSAcmfHekZ6NN6XidHflbASUHn&q-sign-time=1666876123%3B1669468123&q-key-time=1666876123%3B1669468123&q-header-list=host&q-url-param-list=&q-signature=449fdcab8a3c11d5132f43f78c61e6663f5c08d6",
"riskDescription": "Political involvement: Leader One: Leader One",
"riskDetail": {
"audioText": "Stressed in important instructions that vocational education has broad prospects in the new journey of comprehensively building a modern socialist country. General Secretary of the CPC Central Committee and President of the State.",
"riskSource": 1001
},
"riskLabel1": "politics",
"riskLabel2": "yihaolingdao",
"riskLabel3": "yihaolingdao",
"riskLevel": "REJECT"
},
"code": 1100,
"contentType": 2,
"message": "Success",
"requestId": "y28f8a4f1264085b321f12223wqed1121retestpvvvvv44321we12_3",
"statCode": 0
}