Submit a video stream for real-time content moderation to detect regulatory and business-specific risks in frames and audio.
Submit a video stream for real-time content moderation to detect regulatory risks and business-specific content in both captured frames and audio segments.
Frame detection identifies: political content, pornography, advertising, violence & terrorism, and other regulatory risks. It can also recognize faces, logos, flora & fauna, and other business-specific content based on your use case.
Audio detection identifies: political content, pornography, advertising, and other regulatory risks. It can also recognize gender, voice timbre, minors, and other business-specific content based on your use case.
API Description
Submit video stream information for moderation. Once stable stream pulling begins, detection results are continuously sent to the specified callback URL.
Requirements
Item
Specification
Protocol
HTTP or HTTPS
Method
POST
Encoding
UTF-8
Format
All request and response parameters use JSON
Supported Protocols
Standard stream URLs currently support RTMP, RTMPS, HLS, HTTP, and HTTPS protocols, including FLV and M3U8 formats.
Callback Mechanism
When the user receives a push result and returns an HTTP status code of 200, the push is considered successful. Otherwise, the system retries (up to the maximum retry count). Retry intervals in seconds: [5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60]. After 12 failed attempts, no further retries are made.
Stream Pull Retry Mechanism
To prevent stream pull failures caused by network issues, the ISHUMEI video stream service has a built-in retry mechanism:
Standard streams / ZEGO / TRTC / Volcano streams: 12 retries total, each lasting 5 minutes, with intervals of [5, 10, 15, 20, ..., 60] seconds. For example, ISHUMEI first attempts to pull the stream for 5 minutes continuously. If unsuccessful, it waits 5 seconds then pulls for another 5 minutes. If still unsuccessful, it waits 10 seconds then pulls for another 5 minutes, and so on.
Agora streams: No retry. The connection is closed after a 5-minute pull timeout.
Timeout
Recommended timeout: 7s
Internal processing timeout is 3s with one automatic retry. Normal response time is within 100ms.
Chinese Video Stream, English Video Stream, Arabic Video Stream
Request Parameters
Parameter
Type
Required
Max Length
Description
accessKey
string
Yes
20
Company key for authentication, provided by ISHUMEI when the service is activated.
eventId
string
Yes
64
Event identifier. The value must be agreed upon with ISHUMEI in advance.
appId
string
Yes
64
Application identifier. This field is strictly validated and the value must be agreed upon with ISHUMEI in advance.
imgType
string
No
64
Regulatory detection types for video stream frames. At least one of imgType or imgBusinessType is required. See Image Detection Types.
audioType
string
No
64
Regulatory detection types for video stream audio. At least one of audioType or audioBusinessType is required. See Audio Detection Types.
imgBusinessType
string
No
128
Business detection types for video stream frames. At least one of imgType or imgBusinessType is required. See business label types for available values.
audioBusinessType
string
No
128
Business detection types for video stream audio. At least one of audioType or audioBusinessType is required. See Audio Business Types.
imgCallback
string
Yes
1024
Image callback URL. Frame capture detection results from the video stream are sent to this URL.
audioCallback
string
No
1024
Audio callback URL. Audio segment detection results from the video stream are sent to this URL. Required when audio detection is needed.
data
object
Yes
—
Request data content. Maximum size: 1 MB. See data Object.
Image Detection Types
Combine multiple types with underscores (e.g., POLITY_QRCODE_ADVERT).
User account identifier. Pass the user ID for risk detection of spam, advertising, and other behavioral dimensions.
url
string
No
600
URL of the standard video stream to be moderated.
anchorName
string
No
—
Anchor/host name. Typically used for manual review purposes.
audioDetectStep
int32
No
—
Audio moderation step size. Integer from 1–36. A value of 1 skips one 10-second audio segment, 2 skips two, and so on. When not set, all audio content is moderated.
detectFrequency
int32
No
—
Frame capture interval in seconds (1–60). Decimals are rounded down; values less than 1 are treated as 1. Default: 3 seconds.
detectStep
int32
No
—
Frame image detection step size. Only one frame per step is actually detected. Value ≥ 1. When not set, all captured frames are moderated.
deviceId
string
No
128
ISHUMEI device fingerprint identifier, generated by the ISHUMEI SDK for user behavior analysis.
gender
string
No
—
User gender. Suggested values: male, female, ambiguity.
imgBusinessDetectStep
int32
No
—
Image business label detection step size. Only one frame per step is detected for imgBusinessType. Value ≥ 1. Default: 1 (all frames are checked for business labels).
imgCompareBase
string
No
1024
Base image URL for face comparison. Required when businessType includes FACECOMPARE. Supported formats: jpg, jpeg, png, webp, gif, tiff, tif, heif. Recommended minimum resolution: 256×256. Animated image formats are not supported for base images.
ip
string
No
64
Client public IP address for IP-based user behavior analysis.
lang
string
No
—
Language type for text detection in captured frames and audio segments. Default: zh. See Supported Languages.
level
int32
No
—
User level. Different interception strategies can be configured per level. See User Levels.
liveCover
string
No
—
Live stream cover image. Typically used for manual review purposes.
liveTitle
string
No
—
Live stream title. Typically used for manual review purposes.
receiveTokenId
string
No
64
Message receiver's tokenId. Alphanumeric string with underscores and hyphens, up to 64 characters.
returnAllImg
int32
No
—
Controls which frames are returned in callbacks. 0 (default): return only non-pass risk level frames. 1: return all risk level frames.
returnAllText
int32
No
—
Controls which audio results are returned in callbacks. 0 (default): return only non-pass risk level audio segments and text. 1: return all risk level audio segments and text.
returnFinishInfo
int32
No
—
Stream end callback notification. 0 (default): do not send end notification. 1: send end notification when moderation finishes; callback includes statCode.
returnPreAudio
int32
No
—
Whether to return the previous audio segment. 1: preAudioUrl contains a 20-second audio link (previous 10s + current 10s) when the current segment is rejected. 0: do not return previous segment info.
returnPreText
int32
No
—
Whether to return the previous audio segment text. 1: content contains 20 seconds of text (previous 10s + current 10s) when the current segment is rejected. 0: do not return previous segment text.
room
string
No
64
Live room / game room ID. Different strategies can be applied per room.
streamName
string
No
64
Video stream name. Displayed in the dashboard. Recommended to provide.
agoraParam
object
No
—
Agora recording parameters. Required when streamType is AGORA. See agoraParam.
aliParam
object
No
—
Alibaba Cloud recording parameters. Required when streamType is ALI. See aliParam.
trtcParam
object
No
—
Tencent TRTC recording parameters. Required when streamType is TRTC. See trtcParam.
volcParam
object
No
—
Volcano Engine recording parameters. Required when streamType is VOLC. See volcParam.
zegoParam
object
No
—
ZEGO recording parameters. Required when streamType is ZEGO. See zegoParam.
extra
object
No
—
Extra parameters.
extra.passThrough
object
No
1024
Client pass-through field. ISHUMEI does not process this field; it is returned as-is with the result.
acceptLang
string
No
—
Language for returned labels. zh (default): Chinese. en: English.
Stream Types
Value
Description
NORMAL
Standard stream URL. Supports RTMP, RTMPS, HLS, HTTP, HTTPS protocols.
AGORA
Agora moderation.
TRTC
Tencent TRTC moderation.
ZEGO
ZEGO moderation.
VOLC
Volcano Engine moderation.
ALI
Alibaba Cloud moderation.
💰
When using an RTC SDK recording solution, additional recording fees may be incurred on the RTC provider side. Consult the relevant RTC provider for specific pricing.
Supported Languages
Value
Language
zh
Chinese (default)
en
English
ar
Arabic
hi
Hindi
es
Spanish
fr
French
ru
Russian
pt
Portuguese
id
Indonesian
de
German
ja
Japanese
tr
Turkish
vi
Vietnamese
it
Italian
th
Thai
tl
Filipino
ko
Korean
ms
Malay
auto
Automatic language detection (contact ISHUMEI to enable interception standards)
User Levels
Value
Description
0
Lowest-level user (e.g., newly registered, completely inactive, or level-0 users)
1
Lower-level user (e.g., low activity or low-level users)
2
Mid-level user (e.g., moderately active or mid-level users)
3
Higher-level user (e.g., highly active or high-level users)
4
Highest-level user (e.g., paying users, VIP users)
agoraParam Object
Required when streamType is AGORA.
Parameter
Type
Required
Max Length
Description
appId
string
Yes
64
Application identifier provided by Agora.
channel
string
Yes
64
Channel name provided by Agora.
channelProfile
int32
No
32
Channel mode. 0 (default): Communication (any user can speak freely). 1: Live broadcast (host and audience roles).
enableH265Support
boolean
No
—
Whether to support H.265 video stream recording. false (default): do not support. true: support H.265.
enableIntraRequest
boolean
No
—
Whether to enable keyframe requests. Default: true (improves audio/video experience on weak networks). Set to false to enable seeking in single-stream recordings. When true, single-stream recordings cannot seek to specific positions.
subscribeMode
string
No
—
Subscription mode. AUTO: automatically subscribe to all streams in the room (default). UNTRUSTED: only subscribe to users in untrustedUserIdList. TRUSTED: subscribe to all users except those in trustedUserIdList.
token
string
No
64
Authentication token for high-security scenarios. See Agora documentation for generation details. Set the token validity period longer than the channel duration. The maximum Agora token validity is 24 hours. For channels lasting longer than 24 hours, set returnFinishInfo to 1 and re-submit the channel with a new token when receiving an end callback (statCode = 1) due to token expiration.
uid
int32
No
64
32-bit unsigned integer. Required when token is provided — must match the user ID used to generate the token. This UID must not belong to an actual user in the room.
trustedUserIdList
array
No
—
Trusted user list. Effective when subscribeMode is TRUSTED. Cannot be empty. ISHUMEI will not subscribe to streams from these users. Comma-separated UID array (e.g., [1,2]), up to 17 users.
untrustedUserIdList
array
No
—
Untrusted user list. Effective when subscribeMode is UNTRUSTED. Cannot be empty. ISHUMEI will only subscribe to streams from these users. Comma-separated UID array (e.g., [1,2]), up to 17 users.
aliParam Object
Required when streamType is ALI.
Parameter
Type
Required
Max Length
Description
room
string
Yes
64
Room ID. Must exactly match the channelID used to generate the token. The service pulls and records streams per room. room is the unique identifier — duplicate room values will not trigger duplicate stream pulls.
token
string
Yes
64
Token for joining the channel. See Alibaba Cloud documentation for generation details. A new token must be generated for each moderation request.
userId
int32
No
32
Alibaba Cloud user account identifier.
trtcParam Object
Required when streamType is TRTC.
Parameter
Type
Required
Max Length
Description
sdkAppId
int32
Yes
64
SDK App ID provided by Tencent.
strRoomId
string
Yes
128
Room number. Only allows letters (a-zA-Z), digits (0-9), underscores, and hyphens. When both strRoomId and roomId are provided, roomId takes priority.
userId
string
Yes
32
User ID assigned to the recording client. Only allows letters (a-zA-Z), digits (0-9), underscores, and hyphens. Max 32 characters.
userSig
string
Yes
128
Authentication signature for the recording userId, equivalent to a login password.
appScene
int32
Yes
1
Application scenario. 0: video call (default). 1: video live broadcast. See Tencent documentation.
demoSences
int32
Yes
—
Recording type. 2: separate stream recording. 4: mixed stream recording.
roomId
int32
No
10
Room number (1–4294967294). Either roomId or strRoomId is required. When both are provided, roomId takes priority. A maximum of 8 users can be moderated per room.
volcParam Object
Required when streamType is VOLC.
Parameter
Type
Required
Max Length
Description
appId
string
Yes
64
Application identifier provided by Volcano Engine.
roomId
string
Yes
128
Room number.
token
string
Yes
64
Authentication signature for the recording userId, equivalent to a login password.
userId
string
Yes
32
User ID assigned to the recording client. Only allows letters (a-zA-Z), digits (0-9), underscores, and hyphens. Max 32 characters.
zegoParam Object
Required when streamType is ZEGO.
Parameter
Type
Required
Max Length
Description
roomId
string
Yes
64
ZEGO room number.
tokenId
string
Yes
64
ZEGO authentication token (identify_token). See ZEGO documentation for generation details. tokenId is a unique identifier — a new token must be generated for each moderation request.
Response message corresponding to the code. See Response Codes.
detail
object
No
Detail information.
detail.dupRequestId
string
No
Duplicate requestId. Returned when errorcode is 1001 (duplicate stream push). Use this dupRequestId to call the close stream API if the original requestId was not received.
detail.errorcode
int32
No
1001: duplicate stream push.
Response Codes
Code
Message
Description
1100
Success
The request completed successfully.
1901
QPS limit exceeded
The request rate limit has been exceeded.
1902
Invalid parameters
One or more request parameters are invalid.
1903
Service failure
An internal service error occurred.
1904
Stream count limit exceeded
The maximum number of concurrent streams has been reached.
9101
Unauthorized operation
The provided accessKey does not have permission for this operation.
Stream Segment Callback Parameters
ℹ️
Parameters other than code, message, and requestId are only guaranteed to be returned when code is 1100.
Level 1 risk label. Returns normal when riskLevel is PASS.
riskLabel2
string
Yes
Level 2 risk label. Empty when riskLevel is PASS.
riskLabel3
string
Yes
Level 3 risk label. Empty when riskLevel is PASS.
riskDescription
string
Yes
Risk description. Returns "Normal" when riskLevel is PASS. Format: "Level 1: Level 2: Level 3". Returns "Hit custom list" when a user-defined list is matched. For reference only — do not use for programmatic logic.
Whether the frame was actually detected. Only returned when detectStep is set. 1: frame was detected. 2: frame was not detected.
imgTime
string
No
Absolute timestamp when the frame violation occurred.
room
string
No
Room number.
similarityDedup
int32
No
Returned only when the similar frame deduplication feature changes the outer riskLevel from reject/review to pass. Value 1 indicates deduplication was applied.
strUserId
string
No
User identifier for distinguishing violating users within a room. Unrelated to the request userId. Returned for: ZEGO room-based moderation, TRTC separate stream moderation, VOLC moderation, ALI moderation.
userId
int32
No
Agora user account identifier. Only present in separate stream scenarios. This is the actual user ID in the room, unrelated to the request uid.
Names and positions of politically sensitive individuals in the image. Up to 10 entries (highest probability selected if more than 10). See Face Object.
objects
array
No
Detected objects/logos with names and positions. See Object Info.
ocrText
object
No
OCR text content. Present when imgType includes IMGTEXTRISK or ADVERT. Contains text (string): recognized text in the image.
matchedLists
array
No
Matched custom list information. Returned only when a custom list is hit. See Matched Lists.
riskSegments
array
No
High-risk content segments. Present when political, terrorism, prohibited, competitive, or advertising law content is detected. See Risk Segments.
persons
array
No
Person names and positions. When "person - multiple persons" label is hit, the array contains multiple elements (up to 10, highest probability selected). See Person Object.
Face Object
Parameter
Type
Required
Description
id
string
No
Identifier. The same person at the same position has the same ID across different labels. If the same person appears N times, N IDs are assigned.
name
string
No
Person name.
face_ratio
float
No
Face-to-image ratio (0–1). Higher values indicate a larger face proportion.
probability
float
No
Confidence score (0–1).
location
array
No
Face position coordinates [x1, y1, x2, y2] representing the top-left and bottom-right corners. Example: [207, 522, 340, 567] where 207=top-left X, 522=top-left Y, 340=bottom-right X, 567=bottom-right Y.
Object Info
Parameter
Type
Required
Description
id
string
No
Object/logo identifier. The same object at the same position has the same ID across different labels.
name
string
No
Object name.
probability
float
No
Confidence score (0–1).
qrContent
string
No
QR code URL detected in the image.
location
array
No
Object position coordinates [x1, y1, x2, y2] representing the top-left and bottom-right corners.
Matched Lists
Parameter
Type
Required
Description
name
string
No
Name of the matched list.
words
array
No
Sensitive word information from the matched list.
words[].word
string
No
The matched sensitive word.
words[].position
array
No
Position of the sensitive word.
Risk Segments
Parameter
Type
Required
Description
segment
string
No
High-risk content segment.
position
array
No
Position of the high-risk content segment (0-indexed).
Person Object
Parameter
Type
Required
Description
id
string
No
Identifier. The same person has the same ID across different labels. If the same person appears N times, N IDs are assigned.
person_ratio
float
No
Person-to-image ratio (0–1). Higher values indicate a larger person proportion.
probability
float
No
Confidence score (0–1).
location
array
No
Person position coordinates.
Frame businessLabels
Each element in the businessLabels array:
Parameter
Type
Required
Description
businessLabel1
string
Yes
Level 1 business label.
businessLabel2
string
Yes
Level 2 business label.
businessLabel3
string
Yes
Level 3 business label.
businessDescription
string
Yes
Business label description. Format: "Level 1: Level 2: Level 3".
Recognized text content of the current audio segment.
content
string
No
Audio text content. When returnPreText is 1 and the current segment is rejected, contains 20 seconds of text (previous 10s + current 10s). Otherwise, contains only the current segment text.
preAudioUrl
string
No
Previous audio segment URL. When returnPreAudio is 1 and the current segment is rejected, contains a 20-second audio link (previous 10s + current 10s). Otherwise, not returned.
User identifier for distinguishing violating users within a room. Returned for: ZEGO room-based moderation, TRTC separate stream moderation, VOLC moderation, ALI moderation.
userId
int32
No
Agora user account identifier. Only present in separate stream scenarios.
passThrough
object
No
Client pass-through field returned as-is.
Audio businessLabels
Each element in the businessLabels array:
Parameter
Type
Required
Description
businessLabel1
string
Yes
Level 1 business label.
businessLabel2
string
Yes
Level 2 business label.
businessLabel3
string
Yes
Level 3 business label.
businessDescription
string
Yes
Business label description. Format: "Level 1: Level 2: Level 3".
Response message corresponding to the code. See Response Codes.
riskLevel
string
Yes
Overall stream disposition recommendation at the end of moderation. PASS: normal, REVIEW: suspicious, REJECT: violation.
statCode
int32
Yes
Callback status code. 0: moderation result callback. 1: stream end result callback.
contentType
int32
Yes
Distinguishes callback end type. 1: image moderation end callback. 2: audio moderation end callback.
pullStreamSuccess
bool
Yes
Whether stream pulling was successful. true: success. false: failure (no frames were captured successfully).
auxInfo
object
Yes
Auxiliary information.
auxInfo.streamTime
int32
Yes
Stream moderation duration in seconds. Returned on the final callback after the stream ends. Represents the total moderation time (may differ from the actual stream duration if interval-based moderation is used).
requestParams
object
No
Returns all fields from the request data parameter. Returned when contentType is 2.
detail
object
No
Detail information. Returned when contentType is 1.
detail.requestParams
object
Yes
Returns all fields from the request data parameter.