Request API

Submit a video for content moderation to detect regulatory and business-specific risks in frames and audio.

Submit a video for content moderation to detect regulatory risks and business-specific content in both captured frames and audio segments.

Frame detection identifies: political content, pornography, advertising, violence & terrorism, and other regulatory risks. It can also recognize faces, logos, flora & fauna, and other business-specific content based on your use case.

Audio detection identifies: political content, pornography, advertising, and other regulatory risks. It can also recognize gender, voice timbre, minors, and other business-specific content based on your use case.

API Description

Submit video information for moderation with configurable frame capture frequency. Retrieve results asynchronously via callback to a specified URL, or poll the active query endpoint periodically. Processing time is approximately one-third of the video file duration.

Requirements

ItemSpecification
ProtocolHTTP or HTTPS
MethodPOST
EncodingUTF-8
FormatAll request and response parameters use JSON

Video Requirements

ItemSpecification
Supported formatsAVI, FLV, MP4, MPG, WMV, MOV, WMA, RMVB, M3U8, MKV, 3GP, WEBM
Size limit≤ 300 MB
Duration limit≤ 2 hours

Timeout

  • Recommended timeout: 7s
  • Internal processing timeout is 3s with one automatic retry. Normal request latency is approximately 5ms.

Callback Mechanism

When the user receives a push result and returns an HTTP status code of 200, the push is considered successful. Otherwise, the system retries (up to the maximum retry count). Retry intervals in seconds: [5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 120, 120, 120, 120, 120, 120]. After 20 failed attempts, no further retries are made.


Request

Request URL

ClusterRequest URLSupported Products
Beijing Videohttp://api-video-bj.fengkongcloud.com/video/v4Chinese Video File
Shanghai Videohttp://api-video-sh.fengkongcloud.com/video/v4Chinese Video File
Singapore Videohttp://api-video-xjp.fengkongcloud.com/video/v4Chinese Video File, English Video File, Arabic Video File

Request Parameters

ParameterTypeRequiredMax LengthDescription
accessKeystringYes20Company key for authentication, provided by ISHUMEI when the service is activated.
eventIdstringYes64Event identifier. The value must be agreed upon with ISHUMEI in advance.
appIdstringYes64Application identifier. This field is strictly validated and the value must be agreed upon with ISHUMEI in advance.
imgTypestringNo64Regulatory detection types for video frames. At least one of imgType or imgBusinessType is required. See Image Detection Types.
audioTypestringNo64Regulatory detection types for video audio. At least one of audioType or audioBusinessType is required. See Audio Detection Types.
imgBusinessTypestringNo128Business detection types for video frames. At least one of imgType or imgBusinessType is required. See business label types for available values.
audioBusinessTypestringNo128Business detection types for video audio. At least one of audioType or audioBusinessType is required. See Audio Business Types.
callbackstringNo500Callback URL. When this field is non-empty, the service sends moderation results to this URL (supports http/https).
dataobjectYesRequest data content. Maximum size: 1 MB. See data Object.

Image Detection Types

Combine multiple types with underscores (e.g., POLITY_QRCODE_ADVERT).

ValueDescription
POLITYPolitical content detection
EROTICPornographic & sexually suggestive content detection
VIOLENTViolence, terrorism & prohibited content detection
QRCODEQR code detection
ADVERTAdvertising detection
IMGTEXTRISKImage text violation detection

Audio Detection Types

Combine multiple types with underscores (e.g., POLITY_EROTIC).

ValueDescription
POLITYPolitical content detection
EROTICPornographic content detection
ADVERTAdvertising detection
BANProhibited content detection
VIOLENTViolence & terrorism detection
DIRTYAbusive language detection
ADLAWAdvertising law violation detection
MOANMoaning detection
AUDIOPOLITICALTop leader voiceprint detection
ANTHENNational anthem detection
BANEDAUDIOProhibited songs detection
NONEDo not detect audio

Audio Business Types

Combine multiple types with underscores. To detect timbre, singing, or language, you must also include GENDER.

ValueDescription
SINGSinging detection
LANGUAGELanguage detection (Chinese, English, Cantonese, Tibetan, Uyghur, Korean, Mongolian, Other)
MINORMinor detection
GENDERGender detection
TIMBREVoice timbre detection
VOICEVoice attributes
AUDIOSCENEAudio scene detection
AGEAge detection

data Object

ParameterTypeRequiredMax LengthDescription
btIdstringYes64Client-side unique request identifier.
tokenIdstringYes64User account identifier. Pass the user ID for risk detection of spam, advertising, and other behavioral dimensions.
urlstringYes600URL of the video to be moderated.
audioDetectStepint32NoAudio moderation step size for video files. Integer from 1–36. A value of 1 skips one 10-second audio segment, 2 skips two, and so on. When not set, all audio content is moderated.
checkFrameCountint32NoFixed number of frames to capture. Includes the first and last frames by default; remaining positions are calculated as video_duration / frame_count (rounded to 3 decimal places, values > 0 are used). This parameter has the highest priority: checkFrameCount > advancedFrequency > detectFrequency. If the video duration cannot be determined, falls back to detectFrequency.
dataIdstringNo128Custom data ID. Can be used for searching in the ISHUMEI SaaS dashboard.
detectFrequencyint32NoFrame capture interval in seconds (1–60). Default: 5 seconds.
deviceIdstringNo128ISHUMEI device fingerprint identifier, generated by the ISHUMEI SDK for user behavior analysis.
genderstringNoUser gender. Suggested values: male, female, ambiguity.
ipstringNo64Client public IP address for IP-based user behavior analysis.
langstringNoLanguage type for text detection in captured frames and audio segments. Default: zh. See Supported Languages.
levelint32NoUser level. Different interception strategies can be configured per level. See User Levels.
receiveTokenIdstringNo64Message receiver's tokenId. Alphanumeric string with underscores and hyphens, up to 64 characters.
returnAllAudioint32NoControls which audio segments are returned. 0 (default): return only non-pass risk level segments. 1: return all risk level segments.
returnAllImgint32NoControls which video frames are returned. 0 (default): return only non-pass risk level frames. 1: return all risk level frames.
returnAllVideointegerNoControls which video clips are returned. Only effective when detection types include DANCE. 0 (default): return only non-pass risk level clips. 1: return all risk level clips.
videoTitlestringNo128Video name. Displayed in the dashboard.
advancedFrequencyobjectNoAdvanced frame capture interval configuration. When set, the default capture strategy is overridden. See advancedFrequency Object.
extraobjectNoExtra parameters.
extra.passThroughobjectNo1024Client pass-through field. ISHUMEI does not process this field; it is returned as-is with the result.
extra.acceptLangstringNoLanguage for returned labels. zh (default): Chinese. en: English.

Supported Languages

ValueLanguage
zhChinese (default)
enEnglish
arArabic
hiHindi
esSpanish
frFrench
ruRussian
ptPortuguese
idIndonesian
deGerman
jaJapanese
trTurkish
viVietnamese
itItalian
thThai
tlFilipino
koKorean
msMalay
autoAutomatic language detection (contact ISHUMEI to enable interception standards)

User Levels

ValueDescription
0Lowest-level user (e.g., newly registered, completely inactive, or level-0 users)
1Lower-level user (e.g., low activity or low-level users)
2Mid-level user (e.g., moderately active or mid-level users)
3Higher-level user (e.g., highly active or high-level users)
4Highest-level user (e.g., paying users, VIP users)

advancedFrequency Object

Configure dynamic frame capture rates based on video duration.

ParameterTypeRequiredDescription
durationPointsint_arrayNoVideo duration interval breakpoints (in seconds). Maximum of 5 values.
frequenciesint_arrayNoFrame capture frequencies corresponding to each duration interval (1–60 seconds). Maximum of 6 values. The frequencies array must have exactly one more element than durationPoints. Invalid or empty values return error code 1902.

Example configuration:

{
  "durationPoints": [300, 600],
  "frequencies": [1, 5, 10]
}

This means:

  • Video duration ≤ 300s → capture 1 frame per second
  • 300s < video duration ≤ 600s → capture 1 frame every 5 seconds
  • Video duration > 600s → capture 1 frame every 10 seconds

Response

Response Parameters

ℹ️

Parameters other than code, message, and requestId are only guaranteed to be returned when code is 1100.

ParameterTypeRequiredDescription
requestIdstringYesUnique ISHUMEI request identifier.
codeint32YesResponse code. See Response Codes.
messagestringYesResponse message corresponding to the code. See Response Codes.
btIdstringYesClient-side unique request identifier.

Response Codes

CodeMessageDescription
1100SuccessThe request completed successfully.
1901QPS limit exceededThe request rate limit has been exceeded.
1902Invalid parametersOne or more request parameters are invalid.
1903Service failureAn internal service error occurred.
1905Invalid content formatThe content to be moderated does not meet format requirements.
9101Unauthorized operationThe provided accessKey does not have permission for this operation.

Callback Parameters

ℹ️

Parameters other than code, message, and requestId are only guaranteed to be returned when code is 1100.

ParameterTypeRequiredDescription
requestIdstringYesUnique ISHUMEI request identifier.
codeint32YesResponse code. See Response Codes.
messagestringYesResponse message corresponding to the code. See Response Codes.
btIdstringYesClient-side unique request identifier.
riskLevelstringYesOverall risk level. PASS: normal (allow), REVIEW: suspicious (manual review), REJECT: violation (block).
auxInfoobjectYesAuxiliary information. See auxInfo Object.
frameDetailarrayNoFrame image risk details. Returned when risky frames exist or returnAllImg=1. See frameDetail Array.
audioDetailarrayNoAudio segment risk details. Returned when risky segments exist or returnAllAudio=1. See audioDetail Array.
tokenProfileLabelsarrayNoAccount attribute labels. Returned only when tokenId is provided and the labeling service is enabled. See Token Labels.
tokenRiskLabelsarrayNoAccount risk labels. Returned only when tokenId is provided and the labeling service is enabled. See Token Labels.

Callback auxInfo Object

ParameterTypeRequiredDescription
billingAudioDurationfloatYesAudio duration (in seconds) in the current video for billing purposes. If the audio track duration differs from the video duration, billing is based on the actual audio track duration (may be 0 if no audio track exists).
billingImgNumint32YesNumber of captured frame images in the current video for billing purposes.
frameCountint32YesNumber of returned video frames. When returnAllImg=0, this is the risk frame count; when returnAllImg=1, this is the total count.
timefloatYesVideo duration in seconds.
passThroughobjectNoClient pass-through field returned as-is.

frameDetail Array

Each element in the array represents a captured frame with the following fields:

ParameterTypeRequiredDescription
imgUrlstringYesURL of the captured frame image.
requestIdstringYesUnique ISHUMEI request identifier for this frame.
riskLevelstringYesRisk level. PASS: normal, REVIEW: suspicious, REJECT: violation.
riskLabel1stringYesLevel 1 risk label. Returns normal when riskLevel is PASS.
riskLabel2stringYesLevel 2 risk label. Empty when riskLevel is PASS.
riskLabel3stringYesLevel 3 risk label. Empty when riskLevel is PASS.
riskDescriptionstringYesRisk description. Returns "Normal" when riskLevel is PASS. Format: "Level 1 label: Level 2 label: Level 3 label". Returns "Hit custom list" when a user-defined list is matched.
allLabelsarrayYesList of all risk labels. See Frame allLabels.
auxInfoobjectYesFrame auxiliary information. See Frame auxInfo.
riskDetailobjectYesRisk detail information. See Frame riskDetail.
imgTextstringNoOCR text content of the frame. Returned only when ADVERT or IMGTEXTRISK is passed.
timefloatNoTimestamp of this frame relative to the video start, in seconds.
businessLabelsarrayNoBusiness label list. See Frame businessLabels.

Frame allLabels

Each element in the allLabels array:

ParameterTypeRequiredDescription
riskLevelstringNoRisk level: PASS, REVIEW, or REJECT.
riskLabel1stringNoLevel 1 risk label.
riskLabel2stringNoLevel 2 risk label.
riskLabel3stringNoLevel 3 risk label.
riskDescriptionstringNoRisk description. Returns "Normal" when riskLevel is PASS. Format: "Level 1: Level 2: Level 3". For reference only — do not use this value for programmatic logic.
probabilityfloatNoConfidence score between 0 and 1. Higher values indicate greater confidence.
riskDetailobjectNoRisk detail information. See Frame riskDetail.

Frame auxInfo

ParameterTypeRequiredDescription
similarityfloatYesSimilarity between the current frame and the previous frame. The first frame is compared against a pure black background image. Value range: 0–1 (closer to 1 = more similar).
qrContentstringNoQR code URL detected in the image.

Frame riskDetail

ParameterTypeRequiredDescription
riskSourceint32YesRisk source: 1000 (no risk), 1001 (text risk), 1002 (visual image risk).
face_numint32NoNumber of faces detected.
person_numint32NoNumber of persons detected.
facesarrayNoNames and positions of politically sensitive individuals in the image. Up to 10 entries (highest probability selected if more than 10). See Face Object.
objectsarrayNoDetected objects/logos with names and positions. See Object Info.
ocrTextobjectNoOCR text content. Present when imgType includes IMGTEXTRISK or ADVERT. Contains text (string): recognized text in the image.
matchedListsarrayNoMatched custom list information. Returned only when a custom list is hit. See Matched Lists.
riskSegmentsarrayNoHigh-risk content segments. Present when political, terrorism, prohibited, competitive, or advertising law content is detected. See Risk Segments.
personsarrayNoPerson names and positions. When "person - multiple persons" label is hit, the array contains multiple elements (up to 10, highest probability selected). See Person Object.

Face Object

ParameterTypeRequiredDescription
idstringNoIdentifier. The same person at the same position has the same ID across different labels. If the same person appears N times, N IDs are assigned.
namestringNoPerson name.
face_ratiofloatNoFace-to-image ratio (0–1). Higher values indicate a larger face proportion.
probabilityfloatNoConfidence score (0–1).
locationarrayNoFace position coordinates [x1, y1, x2, y2] representing the top-left and bottom-right corners. Example: [207, 522, 340, 567] where 207=top-left X, 522=top-left Y, 340=bottom-right X, 567=bottom-right Y.

Object Info

ParameterTypeRequiredDescription
idstringNoObject/logo identifier. The same object at the same position has the same ID across different labels.
namestringNoObject name.
probabilityfloatNoConfidence score (0–1).
qrContentstringNoQR code URL detected in the image.
locationarrayNoObject position coordinates [x1, y1, x2, y2] representing the top-left and bottom-right corners.

Matched Lists

ParameterTypeRequiredDescription
namestringNoName of the matched list.
wordsarrayNoSensitive word information from the matched list.
words[].wordstringNoThe matched sensitive word.
words[].positionarrayNoPosition of the sensitive word.

Risk Segments

ParameterTypeRequiredDescription
segmentstringNoHigh-risk content segment.
positionarrayNoPosition of the high-risk content segment (0-indexed).

Person Object

ParameterTypeRequiredDescription
idstringNoIdentifier. The same person has the same ID across different labels. If the same person appears N times, N IDs are assigned.
person_ratiofloatNoPerson-to-image ratio (0–1). Higher values indicate a larger person proportion.
probabilityfloatNoConfidence score (0–1).
locationarrayNoPerson position coordinates.

Frame businessLabels

Each element in the businessLabels array:

ParameterTypeRequiredDescription
businessLabel1stringYesLevel 1 business label.
businessLabel2stringYesLevel 2 business label.
businessLabel3stringYesLevel 3 business label.
businessDescriptionstringYesBusiness label description. Format: "Level 1: Level 2: Level 3".
probabilityfloatYesConfidence score (0–1).
confidenceLevelint32NoConfidence level (0–2). Higher values indicate greater confidence.
businessDetailobjectNoBusiness label details. May contain face_num, person_num, faces, objects, and persons with the same structure as described in Frame riskDetail.

audioDetail Array

Each element in the array represents an audio segment with the following fields:

ParameterTypeRequiredDescription
audioUrlstringYesURL of the audio segment.
requestIdstringYesUnique ISHUMEI request identifier for this segment.
riskLevelstringYesRisk level: PASS, REVIEW, or REJECT.
riskLabel1stringYesLevel 1 risk label. Returns normal when riskLevel is PASS.
riskLabel2stringYesLevel 2 risk label. Empty when riskLevel is PASS.
riskLabel3stringYesLevel 3 risk label. Empty when riskLevel is PASS.
riskDescriptionstringYesRisk description. Format: "Level 1: Level 2: Level 3". Returns "Hit custom list" when matched.
allLabelsarrayYesList of all risk labels. See Audio allLabels.
audioTextstringNoRecognized text content of this audio segment.
audioStarttimefloatNoAudio segment start time relative to the audio beginning, in seconds.
audioEndtimefloatNoAudio segment end time relative to the audio beginning, in seconds.
businessLabelsarrayNoBusiness label list. See Audio businessLabels.

Audio allLabels

Each element in the allLabels array:

ParameterTypeRequiredDescription
riskLevelstringNoRisk level: PASS, REVIEW, or REJECT.
riskLabel1stringNoLevel 1 risk label.
riskLabel2stringNoLevel 2 risk label.
riskLabel3stringNoLevel 3 risk label.
riskDescriptionstringNoRisk description. For reference only — do not use for programmatic logic.
probabilityfloatNoConfidence score (0–1).
riskDetailobjectNoRisk detail information. See Audio riskDetail.

Audio riskDetail

ParameterTypeRequiredDescription
riskSourceint32YesRisk source: 1000 (no risk), 1001 (text risk), 1003 (audio voice risk).
audioTextstringNoRecognized text content of this segment.
matchedListsarrayNoMatched custom list information. See Matched Lists.
riskSegmentsarrayNoHigh-risk content segments. Present when political, terrorism, prohibited, competitive, or advertising law content is detected. See Risk Segments.

Audio businessLabels

Each element in the businessLabels array:

ParameterTypeRequiredDescription
businessLabel1stringYesLevel 1 business label.
businessLabel2stringYesLevel 2 business label.
businessLabel3stringYesLevel 3 business label.
businessDescriptionstringYesBusiness label description. Format: "Level 1: Level 2: Level 3".
probabilityfloatYesConfidence score (0–1).
confidenceLevelint32NoConfidence level (0–2). Higher values indicate greater confidence.
businessDetailobjectNoBusiness label details.
businessDetail.riskSourceint32NoRisk source: 1000 (no risk), 1001 (text risk), 1003 (audio voice risk).
businessDetail.audioTextstringNoRecognized text content.
businessDetail.matchedListsarrayNoMatched custom list information. See Matched Lists.
businessDetail.riskSegmentsarrayNoHigh-risk content segments. See Risk Segments.

Token Labels

Both tokenProfileLabels and tokenRiskLabels share the same structure:

ParameterTypeRequiredDescription
label1stringNoLevel 1 label.
label2stringNoLevel 2 label.
label3stringNoLevel 3 label.
descriptionstringNoLabel description.
timestampint32NoLabel timestamp. 13-digit Unix timestamp in milliseconds.

Examples

Request Example

{
  "accessKey": "YOUR_ACCESS_KEY",
  "appId": "default",
  "audioBusinessType": "SING_LANGUAGE",
  "audioType": "POLITY_EROTIC_ADVERT_MOAN",
  "callback": "http://www.example.com/callbackaddr",
  "data": {
    "advancedFrequency": {
      "durationPoints": [300, 600],
      "frequencies": [1, 5, 10]
    },
    "btId": "1639824316368",
    "detectFrequency": 3,
    "ip": "123.171.34.3",
    "returnAllAudio": 1,
    "returnAllImg": 1,
    "tokenId": "test",
    "url": "http://oss.example.com/static/photo/117608703147396.mp4"
  },
  "eventId": "video",
  "imgBusinessType": "BODY_FOOD_3CPRODUCTSLOGO",
  "imgType": "POLITY_EROTIC_ADVERT"
}

Response Example

{
  "btId": "1639824316368",
  "code": 1100,
  "message": "Success",
  "requestId": "66fb85e3149bb9e13d6c72161cc6c6cf"
}

Callback Example

Full callback response example

{
  "audioDetail": [
    {
      "allLabels": [
        {
          "probability": 0.998463273048401,
          "riskDescription": "Abuse: Personal attack: Severe personal attack",
          "riskDetail": {
            "audioText": "Recognized audio text content...",
            "riskSource": 1001
          },
          "riskLabel1": "abuse",
          "riskLabel2": "renshengongji",
          "riskLabel3": "zhongdurenshengongji",
          "riskLevel": "REJECT"
        }
      ],
      "audioEndtime": 20,
      "audioStarttime": 10,
      "audioText": "Recognized audio text content...",
      "audioUrl": "http://example.com/audio_segment_a0001.wav",
      "businessLabels": [],
      "requestId": "edaa113581ec1c18df7b44c86d36ae3b_a0001",
      "riskDescription": "Abuse: Personal attack: Severe personal attack",
      "riskDetail": {
        "audioText": "Recognized audio text content...",
        "riskSource": 1001
      },
      "riskLabel1": "abuse",
      "riskLabel2": "renshengongji",
      "riskLabel3": "zhongdurenshengongji",
      "riskLevel": "REJECT"
    }
  ],
  "auxInfo": {
    "billingAudioDuration": 85,
    "billingImgNum": 2,
    "frameCount": 2,
    "time": 85
  },
  "btId": "1666684506188",
  "code": 1100,
  "frameDetail": [
    {
      "allLabels": [
        {
          "probability": 0.665125370025635,
          "riskDescription": "Politics: Political symbols: Party emblem",
          "riskDetail": {
            "ocrText": {
              "text": "2022/10/25 09:05"
            },
            "riskSource": 1002
          },
          "riskLabel1": "politics",
          "riskLabel2": "zhengzhixiangzheng",
          "riskLabel3": "danghui",
          "riskLevel": "REJECT"
        }
      ],
      "auxInfo": {
        "similarity": 0.4765625
      },
      "businessLabels": [
        {
          "businessDescription": "Face: Face pose: Frontal face",
          "businessDetail": {},
          "businessLabel1": "face",
          "businessLabel2": "renlianzitai",
          "businessLabel3": "zhenglian",
          "confidenceLevel": 1,
          "probability": 0.450656906102068
        },
        {
          "businessDescription": "Face: Face type: Real person",
          "businessDetail": {
            "face_num": 1,
            "faces": [
              {
                "face_ratio": 0.00227673095650971,
                "id": "f7bf8842f80a5a2192781064bd69e776",
                "location": [352, 237, 381, 278],
                "name": "Example Person",
                "probability": 0.499512671029603
              }
            ]
          },
          "businessLabel1": "face",
          "businessLabel2": "renlianleixing",
          "businessLabel3": "zhenren",
          "confidenceLevel": 2,
          "probability": 0.979977369308472
        }
      ],
      "imgText": "2022/10/25 09:05",
      "imgUrl": "http://example.com/frame_v81.jpg",
      "requestId": "edaa113581ec1c18df7b44c86d36ae3b_v81",
      "riskDescription": "Politics: Political symbols: Party emblem",
      "riskDetail": {
        "ocrText": {
          "text": "2022/10/25 09:05"
        },
        "riskSource": 1002
      },
      "riskLabel1": "politics",
      "riskLabel2": "zhengzhixiangzheng",
      "riskLabel3": "danghui",
      "riskLevel": "REJECT",
      "time": 81
    }
  ],
  "message": "Success",
  "requestId": "66fb85e3149bb9e13d6c72161cc6c6cf",
  "riskLevel": "REJECT"
}