Submit a video for content moderation to detect regulatory and business-specific risks in frames and audio.
Submit a video for content moderation to detect regulatory risks and business-specific content in both captured frames and audio segments.
Frame detection identifies: political content, pornography, advertising, violence & terrorism, and other regulatory risks. It can also recognize faces, logos, flora & fauna, and other business-specific content based on your use case.
Audio detection identifies: political content, pornography, advertising, and other regulatory risks. It can also recognize gender, voice timbre, minors, and other business-specific content based on your use case.
API Description
Submit video information for moderation with configurable frame capture frequency. Retrieve results asynchronously via callback to a specified URL, or poll the active query endpoint periodically. Processing time is approximately one-third of the video file duration.
Internal processing timeout is 3s with one automatic retry. Normal request latency is approximately 5ms.
Callback Mechanism
When the user receives a push result and returns an HTTP status code of 200, the push is considered successful. Otherwise, the system retries (up to the maximum retry count). Retry intervals in seconds: [5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 120, 120, 120, 120, 120, 120]. After 20 failed attempts, no further retries are made.
Request
Request URL
Cluster
Request URL
Supported Products
Beijing Video
http://api-video-bj.fengkongcloud.com/video/v4
Chinese Video File
Shanghai Video
http://api-video-sh.fengkongcloud.com/video/v4
Chinese Video File
Singapore Video
http://api-video-xjp.fengkongcloud.com/video/v4
Chinese Video File, English Video File, Arabic Video File
Request Parameters
Parameter
Type
Required
Max Length
Description
accessKey
string
Yes
20
Company key for authentication, provided by ISHUMEI when the service is activated.
eventId
string
Yes
64
Event identifier. The value must be agreed upon with ISHUMEI in advance.
appId
string
Yes
64
Application identifier. This field is strictly validated and the value must be agreed upon with ISHUMEI in advance.
imgType
string
No
64
Regulatory detection types for video frames. At least one of imgType or imgBusinessType is required. See Image Detection Types.
audioType
string
No
64
Regulatory detection types for video audio. At least one of audioType or audioBusinessType is required. See Audio Detection Types.
imgBusinessType
string
No
128
Business detection types for video frames. At least one of imgType or imgBusinessType is required. See business label types for available values.
audioBusinessType
string
No
128
Business detection types for video audio. At least one of audioType or audioBusinessType is required. See Audio Business Types.
callback
string
No
500
Callback URL. When this field is non-empty, the service sends moderation results to this URL (supports http/https).
data
object
Yes
—
Request data content. Maximum size: 1 MB. See data Object.
Image Detection Types
Combine multiple types with underscores (e.g., POLITY_QRCODE_ADVERT).
Combine multiple types with underscores (e.g., POLITY_EROTIC).
Value
Description
POLITY
Political content detection
EROTIC
Pornographic content detection
ADVERT
Advertising detection
BAN
Prohibited content detection
VIOLENT
Violence & terrorism detection
DIRTY
Abusive language detection
ADLAW
Advertising law violation detection
MOAN
Moaning detection
AUDIOPOLITICAL
Top leader voiceprint detection
ANTHEN
National anthem detection
BANEDAUDIO
Prohibited songs detection
NONE
Do not detect audio
Audio Business Types
Combine multiple types with underscores. To detect timbre, singing, or language, you must also include GENDER.
Value
Description
SING
Singing detection
LANGUAGE
Language detection (Chinese, English, Cantonese, Tibetan, Uyghur, Korean, Mongolian, Other)
MINOR
Minor detection
GENDER
Gender detection
TIMBRE
Voice timbre detection
VOICE
Voice attributes
AUDIOSCENE
Audio scene detection
AGE
Age detection
data Object
Parameter
Type
Required
Max Length
Description
btId
string
Yes
64
Client-side unique request identifier.
tokenId
string
Yes
64
User account identifier. Pass the user ID for risk detection of spam, advertising, and other behavioral dimensions.
url
string
Yes
600
URL of the video to be moderated.
audioDetectStep
int32
No
—
Audio moderation step size for video files. Integer from 1–36. A value of 1 skips one 10-second audio segment, 2 skips two, and so on. When not set, all audio content is moderated.
checkFrameCount
int32
No
—
Fixed number of frames to capture. Includes the first and last frames by default; remaining positions are calculated as video_duration / frame_count (rounded to 3 decimal places, values > 0 are used). This parameter has the highest priority: checkFrameCount > advancedFrequency > detectFrequency. If the video duration cannot be determined, falls back to detectFrequency.
dataId
string
No
128
Custom data ID. Can be used for searching in the ISHUMEI SaaS dashboard.
detectFrequency
int32
No
—
Frame capture interval in seconds (1–60). Default: 5 seconds.
deviceId
string
No
128
ISHUMEI device fingerprint identifier, generated by the ISHUMEI SDK for user behavior analysis.
gender
string
No
—
User gender. Suggested values: male, female, ambiguity.
ip
string
No
64
Client public IP address for IP-based user behavior analysis.
lang
string
No
—
Language type for text detection in captured frames and audio segments. Default: zh. See Supported Languages.
level
int32
No
—
User level. Different interception strategies can be configured per level. See User Levels.
receiveTokenId
string
No
64
Message receiver's tokenId. Alphanumeric string with underscores and hyphens, up to 64 characters.
returnAllAudio
int32
No
—
Controls which audio segments are returned. 0 (default): return only non-pass risk level segments. 1: return all risk level segments.
returnAllImg
int32
No
—
Controls which video frames are returned. 0 (default): return only non-pass risk level frames. 1: return all risk level frames.
returnAllVideo
integer
No
—
Controls which video clips are returned. Only effective when detection types include DANCE. 0 (default): return only non-pass risk level clips. 1: return all risk level clips.
videoTitle
string
No
128
Video name. Displayed in the dashboard.
advancedFrequency
object
No
—
Advanced frame capture interval configuration. When set, the default capture strategy is overridden. See advancedFrequency Object.
extra
object
No
—
Extra parameters.
extra.passThrough
object
No
1024
Client pass-through field. ISHUMEI does not process this field; it is returned as-is with the result.
extra.acceptLang
string
No
—
Language for returned labels. zh (default): Chinese. en: English.
Supported Languages
Value
Language
zh
Chinese (default)
en
English
ar
Arabic
hi
Hindi
es
Spanish
fr
French
ru
Russian
pt
Portuguese
id
Indonesian
de
German
ja
Japanese
tr
Turkish
vi
Vietnamese
it
Italian
th
Thai
tl
Filipino
ko
Korean
ms
Malay
auto
Automatic language detection (contact ISHUMEI to enable interception standards)
User Levels
Value
Description
0
Lowest-level user (e.g., newly registered, completely inactive, or level-0 users)
1
Lower-level user (e.g., low activity or low-level users)
2
Mid-level user (e.g., moderately active or mid-level users)
3
Higher-level user (e.g., highly active or high-level users)
4
Highest-level user (e.g., paying users, VIP users)
advancedFrequency Object
Configure dynamic frame capture rates based on video duration.
Parameter
Type
Required
Description
durationPoints
int_array
No
Video duration interval breakpoints (in seconds). Maximum of 5 values.
frequencies
int_array
No
Frame capture frequencies corresponding to each duration interval (1–60 seconds). Maximum of 6 values. The frequencies array must have exactly one more element than durationPoints. Invalid or empty values return error code 1902.
Frame image risk details. Returned when risky frames exist or returnAllImg=1. See frameDetail Array.
audioDetail
array
No
Audio segment risk details. Returned when risky segments exist or returnAllAudio=1. See audioDetail Array.
tokenProfileLabels
array
No
Account attribute labels. Returned only when tokenId is provided and the labeling service is enabled. See Token Labels.
tokenRiskLabels
array
No
Account risk labels. Returned only when tokenId is provided and the labeling service is enabled. See Token Labels.
Callback auxInfo Object
Parameter
Type
Required
Description
billingAudioDuration
float
Yes
Audio duration (in seconds) in the current video for billing purposes. If the audio track duration differs from the video duration, billing is based on the actual audio track duration (may be 0 if no audio track exists).
billingImgNum
int32
Yes
Number of captured frame images in the current video for billing purposes.
frameCount
int32
Yes
Number of returned video frames. When returnAllImg=0, this is the risk frame count; when returnAllImg=1, this is the total count.
time
float
Yes
Video duration in seconds.
passThrough
object
No
Client pass-through field returned as-is.
frameDetail Array
Each element in the array represents a captured frame with the following fields:
Level 1 risk label. Returns normal when riskLevel is PASS.
riskLabel2
string
Yes
Level 2 risk label. Empty when riskLevel is PASS.
riskLabel3
string
Yes
Level 3 risk label. Empty when riskLevel is PASS.
riskDescription
string
Yes
Risk description. Returns "Normal" when riskLevel is PASS. Format: "Level 1 label: Level 2 label: Level 3 label". Returns "Hit custom list" when a user-defined list is matched.
Risk description. Returns "Normal" when riskLevel is PASS. Format: "Level 1: Level 2: Level 3". For reference only — do not use this value for programmatic logic.
probability
float
No
Confidence score between 0 and 1. Higher values indicate greater confidence.
Similarity between the current frame and the previous frame. The first frame is compared against a pure black background image. Value range: 0–1 (closer to 1 = more similar).
Names and positions of politically sensitive individuals in the image. Up to 10 entries (highest probability selected if more than 10). See Face Object.
objects
array
No
Detected objects/logos with names and positions. See Object Info.
ocrText
object
No
OCR text content. Present when imgType includes IMGTEXTRISK or ADVERT. Contains text (string): recognized text in the image.
matchedLists
array
No
Matched custom list information. Returned only when a custom list is hit. See Matched Lists.
riskSegments
array
No
High-risk content segments. Present when political, terrorism, prohibited, competitive, or advertising law content is detected. See Risk Segments.
persons
array
No
Person names and positions. When "person - multiple persons" label is hit, the array contains multiple elements (up to 10, highest probability selected). See Person Object.
Face Object
Parameter
Type
Required
Description
id
string
No
Identifier. The same person at the same position has the same ID across different labels. If the same person appears N times, N IDs are assigned.
name
string
No
Person name.
face_ratio
float
No
Face-to-image ratio (0–1). Higher values indicate a larger face proportion.
probability
float
No
Confidence score (0–1).
location
array
No
Face position coordinates [x1, y1, x2, y2] representing the top-left and bottom-right corners. Example: [207, 522, 340, 567] where 207=top-left X, 522=top-left Y, 340=bottom-right X, 567=bottom-right Y.
Object Info
Parameter
Type
Required
Description
id
string
No
Object/logo identifier. The same object at the same position has the same ID across different labels.
name
string
No
Object name.
probability
float
No
Confidence score (0–1).
qrContent
string
No
QR code URL detected in the image.
location
array
No
Object position coordinates [x1, y1, x2, y2] representing the top-left and bottom-right corners.
Matched Lists
Parameter
Type
Required
Description
name
string
No
Name of the matched list.
words
array
No
Sensitive word information from the matched list.
words[].word
string
No
The matched sensitive word.
words[].position
array
No
Position of the sensitive word.
Risk Segments
Parameter
Type
Required
Description
segment
string
No
High-risk content segment.
position
array
No
Position of the high-risk content segment (0-indexed).
Person Object
Parameter
Type
Required
Description
id
string
No
Identifier. The same person has the same ID across different labels. If the same person appears N times, N IDs are assigned.
person_ratio
float
No
Person-to-image ratio (0–1). Higher values indicate a larger person proportion.
probability
float
No
Confidence score (0–1).
location
array
No
Person position coordinates.
Frame businessLabels
Each element in the businessLabels array:
Parameter
Type
Required
Description
businessLabel1
string
Yes
Level 1 business label.
businessLabel2
string
Yes
Level 2 business label.
businessLabel3
string
Yes
Level 3 business label.
businessDescription
string
Yes
Business label description. Format: "Level 1: Level 2: Level 3".