Sync API

Detect regulatory risks and business content in audio files including political content, pornography, advertising, violence, minors, and voice timbre.

Detect regulatory risks in audio content including political content, pornography, advertising, and violence & terrorism. Combine with your business scenarios to identify minors, voice timbre, and other content.

API Description

Synchronous detection API that returns recognition results directly. Recommended to use the HTTP protocol for API calls.

Audio Requirements

ItemSpecification
Audio typesURL, BASE64
Supported formatsWAV, MP3, AAC, AMR, 3GP, M4A, WMA, OGG, APE, FLAC, ALAC, WAVPACK, SILK_V3, etc.
Duration limit≤ 60 seconds
Size limit≤ 18 MB
ℹ️

Audio URLs should be downloaded from a CDN origin server. The origin server must not be a single point of failure, otherwise audio download failures may prevent moderation.

Timeout

  • Synchronous request: recommended timeout of 10 seconds
  • Async request: recommended timeout of 5 seconds
  • Response time depends on audio download time. Ensure the storage service hosting the audio is stable and reliable. Actual duration varies based on the request type and audio size.

Request

Request URL

ClusterRequest URLSupported Products
Shanghaihttp://api-audio-sh.fengkongcloud.com/audiomessage/v4Chinese
Singaporehttp://api-audio-xjp.fengkongcloud.com/audiomessage/v4Chinese

Request Parameters

ParameterTypeRequiredDescription
accessKeystringYesCompany key, provided by ISHUMEI. See the onboarding email for details.
appIdstringYesApplication identifier. Contact ISHUMEI to activate. Use the value provided by ISHUMEI. Default value is in the onboarding email.
eventIdstringYesEvent identifier. Contact ISHUMEI to activate. Use the value provided by ISHUMEI. Default value is in the onboarding email.
typestringNoRisk detection types. Either type or businessType is required. See Risk Detection Types.
businessTypestringNoBusiness label detection types. Either type or businessType is required. See Business Detection Types.
contentTypestringYesFormat of the audio content. URL: audio URL address. RAW: base64-encoded audio data.
contentstringYesAudio content — either a URL address or base64-encoded data. Base64 data limit: 15 MB. Only PCM, WAV, and MP3 formats are supported for base64. PCM format must use 16-bit little-endian encoding. PCM and WAV formats are recommended.
btIdstringYesUnique audio file identifier for matching callback results. Max 128 characters (truncated if exceeded). Must not be duplicated.
dataobjectYesRequest data content. Max 1 MB. See data Object.

Risk Detection Types

Combine multiple types with underscores (e.g., POLITY_EROTIC_MOAN). Recommended: POLITY_EROTIC_MOAN_ADVERT.

ValueDescription
AUDIOPOLITICALTop leader voiceprint detection
POLITYPolitical content detection
EROTICPornographic content detection
ADVERTAdvertising detection
BANProhibited content detection
VIOLENTViolence & terrorism detection
ANTHENNational anthem detection
MOANMoaning detection
DIRTYAbusive language detection
BANEDAUDIOProhibited songs detection
COPYRIGHTSONGSCopyrighted songs detection

Business Detection Types

Combine multiple types with underscores. To detect timbre, singing, or language, GENDER must also be included.

ValueDescription
SINGSinging detection
LANGUAGELanguage detection
GENDERGender detection
TIMBREVoice timbre detection
VOICEVoice attributes
MINORMinor detection
AUDIOSCENEAudio scene detection
AGEAge detection

data Object

ParameterTypeRequiredDescription
tokenIdstringNoUser account identifier for behavior analysis. Recommended to pass the user UID.
receiveTokenIdstringNoMessage receiver's tokenId for private chat scenarios. Alphanumeric with underscores and hyphens, up to 64 characters.
deviceIdstringNoISHUMEI device fingerprint identifier. Unique device ID generated by the ISHUMEI SDK.
ipstringNoIPv4 or IPv6 address of the user who sent the audio.
dataIdstringNoCustom data identifier.
levelintNoUser level for configuring different interception strategies. See User Levels.
genderstringNoUser gender. male or female.
formatInfostringNoAudio data format. Required when contentType is RAW. Values: pcm, wav, mp3.
rateintNoAudio sample rate. Required when format is pcm. Range: 8000–32000.
trackintNoAudio channels. Required when format is pcm. 1: mono. 2: stereo.
returnAllTextintNoControls which audio segments are returned. 0 (default): return only risk segments (REVIEW and REJECT). 1: return all segments (including PASS). This only controls segment-level results in audioDetail; it does not affect the overall result.

User Levels

ValueDescription
0Lowest-level user (e.g., newly registered, completely inactive, or level-0 users)
1Lower-level user (e.g., low activity or low-level users)
2Mid-level user (e.g., moderately active or mid-level users)
3Higher-level user (e.g., highly active or high-level users)
4Highest-level user (e.g., paying users, VIP users)

Response

Response Parameters

ℹ️

Parameters other than code, message, and requestId are only guaranteed to be returned when code is 1100.

ParameterTypeRequiredDescription
requestIdstringYesUnique request identifier.
codeintYesResponse code. See Response Codes.
messagestringYesResponse message corresponding to the code.
detailjsonNoDetailed response data. Required when code is 1100. See detail Object.

Response Codes

CodeDescription
1100Success
1101Processing
1901QPS limit exceeded
1902Invalid parameters
1903Service failure
1904Download failure
1905Decoding failure
9100Insufficient balance
9101Unauthorized operation

detail Object

ParameterTypeRequiredDescription
riskLevelstringYesOverall disposition recommendation. PASS, REVIEW, or REJECT. During initial integration, it is recommended not to use results directly for blocking — adjust interception thresholds first to match expectations.
audioTextstringYesFull audio-to-text transcription result.
audioTimeintYesTotal audio duration in seconds.
audioDetailjson_arrayYesAudio segment information. See audioDetail Array.
audioTagsjson_objectNoAudio tags including gender, timbre, and singing detection. Legacy compatibility field — use businessLabels instead. See audioTags Object.

audioDetail Array

Each element represents an audio segment:

ParameterTypeRequiredDescription
requestIdstringYesUnique identifier for this audio segment.
audioStarttimefloatYesSegment start time relative to the audio beginning, in seconds.
audioEndtimefloatYesSegment end time relative to the audio beginning, in seconds.
audioUrlstringYesAudio segment URL (MP3 format).
riskLevelstringYesSegment risk level. PASS, REVIEW, or REJECT.
riskLabel1stringYesLevel 1 risk label.
riskLabel2stringYesLevel 2 risk label.
riskLabel3stringYesLevel 3 risk label.
riskDescriptionstringYesRisk description. For reference only — do not use for programmatic logic.
riskDetailjson_objectNoRisk detail information. See Segment riskDetail.
allLabelsjson_arrayNoAll risk labels. See Segment allLabels.
businessLabelsjson_arrayNoAll business labels. See Segment businessLabels.

Segment allLabels

Each element in the allLabels array:

ParameterTypeRequiredDescription
riskLabel1stringYesLevel 1 risk label.
riskLabel2stringYesLevel 2 risk label.
riskLabel3stringYesLevel 3 risk label.
riskDescriptionstringYesRisk description. For reference only — do not use for programmatic logic.
riskLevelstringYesRisk level: PASS, REVIEW, or REJECT.
probabilityfloatNoConfidence score (0–1). Higher values indicate higher risk probability.
riskDetailjson_objectNoRisk detail information. See Segment riskDetail.

Segment riskDetail

ParameterTypeRequiredDescription
audioTextstringNoAudio-to-text transcription result for this segment.
riskSourceintNoRisk source: 1000 (no risk), 1001 (text risk), 1003 (audio risk).
matchedListsjson_arrayNoMatched custom list information. Returned only when a custom list is hit.
matchedLists[].namestringYesName of the custom list.
matchedLists[].wordsjson_arrayYesSensitive word information from the matched list.
matchedLists[].words[].wordstringYesThe matched sensitive word.
matchedLists[].words[].positionint_arrayYesPosition of the sensitive word.
riskSegmentsjson_arrayNoHigh-risk content segments.
riskSegments[].segmentstringNoHigh-risk content segment text.
riskSegments[].positionint_arrayNoPosition of the high-risk segment.

Segment businessLabels

Each element in the businessLabels array:

ParameterTypeRequiredDescription
businessLabel1stringYesLevel 1 business label.
businessLabel2stringYesLevel 2 business label.
businessLabel3stringYesLevel 3 business label.
businessDescriptionstringYesBusiness label description. Format: "Level 1: Level 2: Level 3".
confidenceLevelintNoConfidence level (0–2). Higher values indicate greater confidence.
probabilityfloatNoConfidence score (0–1).
businessDetailjson_objectNoDetailed information. Reserved field.

audioTags Object

⚠️

This is a legacy compatibility field. Use businessLabels in audioDetail instead for new integrations.

ParameterTypeRequiredDescription
genderobjectNoGender detection result.
gender.labelstringYesGender label name (e.g., "Male", "Female").
gender.probabilityintNoGender probability (0–100). Higher values indicate greater likelihood.
timbrearrayNoVoice timbre detection results. Each element contains label and probability. Possible label values: Uncle, Young Man, Boy, Elderly Man, Queen, Mature Woman, Young Woman, Loli, Middle-aged Woman, Male, Female, No Voice.
languagearrayNoLanguage detection results. Each element contains label and confidence.

Language Labels

LabelDescription
0Mandarin Chinese
1English
2Cantonese
3Tibetan
4Uyghur
5Mongolian
6Korean
-1Other languages

Examples

Request Example

{
  "accessKey": "YOUR_ACCESS_KEY",
  "appId": "default",
  "eventId": "default",
  "type": "POLITY_EROTIC",
  "businessType": "TIMBRE",
  "btId": "test1",
  "contentType": "URL",
  "content": "https://example.com/audio/sample.mp3",
  "data": {
    "returnAllText": 1,
    "tokenId": "token-short"
  }
}

Response Example

{
  "code": 1100,
  "message": "Success",
  "requestId": "817c8509359500c898a762ffe93a582b",
  "btId": "1667392054643",
  "detail": {
    "audioDetail": [
      {
        "requestId": "817c8509359500c898a762ffe93a582b_a0000",
        "audioStarttime": 0,
        "audioEndtime": 10,
        "audioUrl": "http://example.com/audio_segment_a0000.mp3",
        "businessLabels": [
          {
            "businessDescription": "Singing: Singing: Singing",
            "businessDetail": {},
            "businessLabel1": "sing",
            "businessLabel2": "changge",
            "businessLabel3": "changge",
            "confidenceLevel": 2,
            "probability": 0.858334402569294
          }
        ],
        "allLabels": [],
        "riskLevel": "REJECT",
        "riskLabel1": "abuse",
        "riskLabel2": "buwenmingyongyu",
        "riskLabel3": "qingdubuwenmingyongyu",
        "riskDescription": "Abuse: Uncivilized language: Mild uncivilized language",
        "riskDetail": {
          "audioText": "Recognized audio text content..."
        }
      }
    ],
    "audioTags": {
      "gender": {
        "label": "Female",
        "probability": 95
      },
      "language": [
        {
          "confidence": 0,
          "label": 2
        },
        {
          "confidence": 99,
          "label": 0
        },
        {
          "confidence": 0,
          "label": 1
        }
      ],
      "song": 0,
      "timbre": [
        {
          "label": "Female",
          "probability": 95
        },
        {
          "label": "Queen",
          "probability": 12
        },
        {
          "label": "Mature Woman",
          "probability": 37
        },
        {
          "label": "Young Woman",
          "probability": 56
        },
        {
          "label": "Middle-aged Woman",
          "probability": 67
        },
        {
          "label": "Loli",
          "probability": 24
        }
      ]
    },
    "audioText": "Recognized audio text content...",
    "audioTime": 10,
    "code": 1100,
    "requestParams": {
      "channel": "TEST",
      "lang": "zh",
      "returnAllText": 1,
      "tokenId": "test01"
    },
    "riskLevel": "REJECT"
  }
}