DeepCleer Async Video Moderation submits a video file for content moderation in both captured frames and audio segments, with results delivered asynchronously to your callback URL.
API Description
The Async Video Moderation API detects regulatory risks and business-specific content in both captured frames and audio segments of a video file.
Frame detection identifies political content, pornography, advertising, violence & terrorism, and other regulatory risks. It can also recognize faces, logos, flora & fauna, and other business-specific content based on your use case.
Audio detection identifies political content, pornography, advertising, and other regulatory risks. It can also recognize gender, voice timbre, minors, and other business-specific content based on your use case.
Submit video information for moderation with configurable frame capture frequency. Results are delivered asynchronously through a callback URL or can be retrieved via the active query endpoint. Processing time is approximately one-third of the video file duration.
Internal processing timeout is 3 seconds with one automatic retry. Normal request latency is approximately 5 ms. The synchronous response is an acknowledgement only — moderation results are delivered asynchronously through your callback URL or via the active query endpoint.
Callback Mechanism
When DeepCleer pushes a result to your callback URL and your endpoint responds with HTTP 200, the delivery is considered successful. If any other status code is returned (or the request fails), the system retries on the following schedule (in seconds):
After 20 failed attempts, no further retries are made.
Request
Request URL
Cluster
Endpoint
Singapore Video
http://api-video-xjp.fengkongcloud.com/video/v4
Request Parameters
Top-Level Parameters
Parameter
Type
Required
Max Length
Description
accessKey
string
Yes
20
API authentication key. The default accessKey is sent in your onboarding email.
appId
string
Yes
64
Application identifier, such as web for your web application or app for your mobile app. The default appId is sent in your onboarding email. Contact DeepCleer if you need a new appId.
eventId
string
Yes
64
Event identifier used to distinguish moderation scenarios in your application, such as promptMedia for attached media of prompts or liveVidio for livestream audio. The default eventId is sent in your onboarding email. Contact DeepCleer if you need a new eventId.
imgType
string
Conditional
64
Frame detection types. At least one of imgType or imgBusinessType must be provided. See Image Detection Types.
audioType
string
Conditional
64
Audio detection types. At least one of audioType or audioBusinessType must be provided. See Audio Detection Types.
imgBusinessType
string
Conditional
128
Frame business detection labels. At least one of imgType or imgBusinessType must be provided. See your business label catalog for available values.
audioBusinessType
string
Conditional
128
Audio business detection labels. At least one of audioType or audioBusinessType must be provided. See Audio Business Types.
callback
string
Yes
500
URL that receives asynchronous moderation results. Supports HTTP and HTTPS.
Combine multiple values with underscores (e.g. POLITY_QRCODE_ADVERT).
Value
Description
POLITY
Politically sensitive content
EROTIC
Pornographic & sexually suggestive content
VIOLENT
Violence, terrorism & prohibited content
QRCODE
QR code detection
ADVERT
Advertising content
IMGTEXTRISK
Image text violation detection
Audio Detection Types
Combine multiple values with underscores (e.g. POLITY_EROTIC).
Value
Description
POLITY
Politically sensitive content
EROTIC
Pornographic content
ADVERT
Advertising content
BAN
Prohibited content
VIOLENT
Violence & terrorism content
DIRTY
Verbal abuse
ADLAW
Advertising-law violations
MOAN
Sexual moaning
AUDIOPOLITICAL
Voiceprint of top political leaders
ANTHEN
National anthem detection
BANEDAUDIO
Prohibited songs
NONE
Skip audio detection
Audio Business Types
Combine multiple values with underscores. When detecting TIMBRE, SING, or LANGUAGE, you must also include GENDER.
Value
Description
SING
Singing detection
LANGUAGE
Language detection (Chinese, English, Cantonese, Tibetan, Uyghur, Korean, Mongolian, Other)
MINOR
Minor speaker detection
GENDER
Speaker gender
TIMBRE
Speaker timbre
VOICE
Human-voice attribute
AUDIOSCENE
Audio scene
AGE
Speaker age
data Object Parameters
Parameter
Type
Required
Max Length
Description
btId
string
Yes
64
Client-side unique request identifier.
tokenId
string
Yes
64
User account identifier. Recommended to pass the user ID for behavioral risk detection.
url
string
Yes
600
URL of the video to be moderated.
audioDetectStep
int32
No
—
Audio moderation sampling step. Range 1–36. 1 skips one 10-second segment between reviews, 2 skips two, and so on. When omitted, all audio content is moderated.
checkFrameCount
int32
No
—
Fixed number of frames to capture. Includes the first and last frames by default; remaining positions are calculated as video_duration / frame_count (rounded to 3 decimal places, values > 0 are used). Priority: checkFrameCount > advancedFrequency > detectFrequency. If the video duration cannot be determined, falls back to detectFrequency.
dataId
string
No
128
Custom data ID. Searchable in the DeepCleer SaaS dashboard.
detectFrequency
int32
No
—
Frame capture interval in seconds. Range 1–60. Default: 5.
deviceId
string
No
128
DeepCleer device fingerprint identifier, generated by the DeepCleer SDK for user behavior analysis.
gender
int32
No
—
User gender. 0: unknown. 1: male. 2: female.
ip
string
No
64
Client public IP address (IPv4 or IPv6) for IP-based user behavior analysis.
lang
string
No
—
Language for text detection in captured frames and audio segments. Default: zh. See Supported Languages.
level
int32
No
—
User level for configuring different interception strategies. See User Levels.
receiveTokenId
string
Conditional
64
Message receiver's tokenId. Alphanumeric with underscores and hyphens, up to 64 characters. Required when eventId is message.
returnAllAudio
int32
No
—
Controls which audio segments are returned. 0 (default): return only segments with non-PASS risk levels. 1: return all segments regardless of risk level.
returnAllImg
int32
No
—
Controls which video frames are returned. 0 (default): return only frames with non-PASS risk levels. 1: return all frames regardless of risk level.
returnAllVideo
int32
No
—
Controls which video clips are returned. Only effective when detection types include DANCE. 0 (default): return only clips with non-PASS risk levels. 1: return all clips regardless of risk level.
videoTitle
string
No
128
Video name. Displayed in the dashboard.
advancedFrequency
object
No
—
Advanced duration-based frame capture configuration. When set, overrides the default capture strategy. See advancedFrequency Object.
extra
object
No
—
Auxiliary parameters.
extra.passThrough
object
No
1024
Client pass-through field. DeepCleer does not process this field; it is returned as-is in the callback.
extra.acceptLang
string
No
—
Language for returned labels. en (default): English. zh: Chinese.
Supported Languages
Value
Language
zh
Chinese (default)
en
English
ar
Arabic
hi
Hindi
es
Spanish
fr
French
ru
Russian
pt
Portuguese
id
Indonesian
de
German
ja
Japanese
tr
Turkish
vi
Vietnamese
it
Italian
th
Thai
tl
Filipino
ko
Korean
ms
Malay
auto
Automatic language detection (contact DeepCleer to enable)
User Levels
Value
Description
0
Lowest-level user (e.g., newly registered, completely inactive, or level-0 users)
1
Lower-level user (e.g., low activity or low-level users)
2
Mid-level user (e.g., moderately active or mid-level users)
3
Higher-level user (e.g., highly active or high-level users)
4
Highest-level user (e.g., paying users, VIP users)
advancedFrequency Object
Configure dynamic frame capture rates based on video duration.
Parameter
Type
Required
Description
durationPoints
array
No
Video duration interval breakpoints in seconds. Maximum of 5 elements (each int32).
frequencies
array
No
Frame capture frequencies (int32, range 1–60 seconds) corresponding to each duration interval. Maximum of 6 elements. The frequencies array must have exactly one more element than durationPoints. Invalid or empty values return error code 1902.
Video duration ≤ 300s → capture 1 frame per second
300s ≤ video duration ≤ 600s → capture 1 frame every 5 seconds
Video duration > 600s → capture 1 frame every 10 seconds
Response
The synchronous response is an acknowledgement only — it confirms that DeepCleer has accepted the moderation task. Per-video moderation results are delivered asynchronously through the callback URL you provided.
Response Parameters
ℹ️
Parameters other than code, message, and requestId are only guaranteed to be returned when code is 1100.
Frame image risk details. Returned when risky frames exist or returnAllImg is 1. See frameDetail Array.
audioDetail
array
No
Audio segment risk details. Returned when risky segments exist or returnAllAudio is 1. See audioDetail Array.
tokenProfileLabels
array
No
Account attribute labels. Returned only when tokenId is provided and the labeling service is enabled. See Token Labels.
tokenRiskLabels
array
No
Account risk labels. Returned only when tokenId is provided and the labeling service is enabled. See Token Labels.
Callback auxInfo Object
Parameter
Type
Required
Description
billingAudioDuration
float
Yes
Audio duration (in seconds) in the current video for billing purposes. If the audio track duration differs from the video duration, billing is based on the actual audio track duration (may be 0 if no audio track exists).
billingImgNum
int32
Yes
Number of captured frame images in the current video for billing purposes.
frameCount
int32
Yes
Number of returned video frames. When returnAllImg is 0, this is the risk-frame count; when returnAllImg is 1, this is the total count.
time
float
Yes
Video duration in seconds.
passThrough
object
No
Client pass-through field returned as-is.
frameDetail Array
Each element in the array represents a captured frame:
Parameter
Type
Required
Description
imgUrl
string
Yes
URL of the captured frame image.
requestId
string
Yes
Unique DeepCleer request identifier for this frame.
Level 1 risk label. Returns normal when riskLevel is PASS.
riskLabel2
string
Yes
Level 2 risk label. Empty when riskLevel is PASS.
riskLabel3
string
Yes
Level 3 risk label. Empty when riskLevel is PASS.
riskDescription
string
Yes
Risk description. Returns "Normal" when riskLevel is PASS. Hits against custom lists return "Matched custom list". Otherwise format: "Level 1: Level 2: Level 3". For reference only — do not use for programmatic logic.
allLabels
array
Yes
All risk labels detected for this frame. See Frame allLabels.
Risk description. Returns "Normal" when riskLevel is PASS. Format: "Level 1: Level 2: Level 3". For reference only — do not use for programmatic logic.
Similarity between the current frame and the previous frame. The first frame is compared against a pure black background image. Range: 0–1 (closer to 1 = more similar).
qrContent
string
No
QR code URL detected in the image.
Frame riskDetail
Parameter
Type
Required
Description
riskSource
int32
Yes
Risk source. 1000: no risk. 1001: text risk. 1002: visual image risk.
face_num
int32
No
Number of faces detected.
person_num
int32
No
Number of persons detected.
faces
array
No
Names and positions of politically sensitive individuals in the image. Up to 10 entries (highest probability selected if more than 10). See Face Object.
objects
array
No
Detected objects/logos with names and positions. See Object Info.
ocrText
object
No
OCR text content. Present when imgType includes IMGTEXTRISK or ADVERT. Contains text (string): recognized text in the image.
matchedLists
array
No
Matched custom list information. Returned only when a custom list is hit. See Matched Lists.
riskSegments
array
No
High-risk content segments. Present when political, terrorism, prohibited, competitive, or advertising-law content is detected. See Risk Segments.
persons
array
No
Person names and positions. When the "person — multiple persons" label is hit, the array contains multiple elements (up to 10, highest probability selected). See Person Object.
Face Object
Parameter
Type
Required
Description
id
string
No
Identifier. The same person at the same position has the same ID across different labels. If the same person appears N times, N IDs are assigned.
name
string
No
Person name.
face_ratio
float
No
Face-to-image ratio (0–1). Higher values indicate a larger face proportion.
probability
float
No
Confidence score (0–1).
location
array
No
Face position coordinates [x1, y1, x2, y2] representing the top-left and bottom-right corners. Example: [207, 522, 340, 567] where 207=top-left X, 522=top-left Y, 340=bottom-right X, 567=bottom-right Y.
Object Info
Parameter
Type
Required
Description
id
string
No
Object/logo identifier. The same object at the same position has the same ID across different labels.
name
string
No
Object name.
probability
float
No
Confidence score (0–1).
qrContent
string
No
QR code URL detected in the image.
location
array
No
Object position coordinates [x1, y1, x2, y2] representing the top-left and bottom-right corners.
Matched Lists
Parameter
Type
Required
Description
name
string
No
Name of the matched list.
words
array
No
Sensitive word information from the matched list.
words[].word
string
No
The matched sensitive word.
words[].position
array
No
Position of the sensitive word.
Risk Segments
Parameter
Type
Required
Description
segment
string
No
High-risk content segment.
position
array
No
Position of the high-risk content segment (0-indexed).
Person Object
Parameter
Type
Required
Description
id
string
No
Identifier. The same person has the same ID across different labels. If the same person appears N times, N IDs are assigned.
person_ratio
float
No
Person-to-image ratio (0–1). Higher values indicate a larger person proportion.
probability
float
No
Confidence score (0–1).
location
array
No
Person position coordinates.
Frame businessLabels
Each element in the businessLabels array:
Parameter
Type
Required
Description
businessLabel1
string
Yes
Level 1 business label.
businessLabel2
string
Yes
Level 2 business label.
businessLabel3
string
Yes
Level 3 business label.
businessDescription
string
Yes
Business label description. Format: "Level 1: Level 2: Level 3". For reference only — do not use for programmatic logic.
Level 1 risk label. Returns normal when riskLevel is PASS.
riskLabel2
string
Yes
Level 2 risk label. Empty when riskLevel is PASS.
riskLabel3
string
Yes
Level 3 risk label. Empty when riskLevel is PASS.
riskDescription
string
Yes
Risk description. Format: "Level 1: Level 2: Level 3". Returns "Matched custom list" when a custom list is hit. For reference only — do not use for programmatic logic.
allLabels
array
Yes
All risk labels detected for this segment. See Audio allLabels.
audioText
string
No
Recognized text content of this audio segment.
audioStarttime
float
No
Audio segment start time relative to the audio beginning, in seconds. (Note: this field uses lowercase t here. The Audio Stream Moderation API returns the same conceptual field as audioStartTime (uppercase T). On-the-wire casing is preserved as-returned and is a candidate for v5 alignment.)
audioEndtime
float
No
Audio segment end time relative to the audio beginning, in seconds. (Same casing-inconsistency note as audioStarttime.)