Async Audio Detection

Asynchronous audio detection API for identifying regulatory risks and business content in audio files with callback-based results.

Detect regulatory risks in audio content including political content, pornography, advertising, and violence & terrorism. Combine with your business scenarios to identify minors, voice timbre, and other content.

API Description

Asynchronous detection API that returns recognition results via callback. Recommended to use the HTTP protocol for API calls.

Audio Requirements

ItemSpecification
Audio typesURL, BASE64
Supported formatsWAV, MP3, AAC, AMR, 3GP, M4A, WMA, OGG, APE, FLAC, ALAC, WAVPACK, SILK_V3, etc.
Size limit≤ 18 MB
ℹ️

Audio URLs should be downloaded from a CDN origin server. The origin server must not be a single point of failure, otherwise audio download failures may prevent moderation.

Timeout

  • Synchronous request: recommended timeout of 10 seconds
  • Async batch request: recommended timeout of 5 seconds
  • Response time depends on audio download time. Ensure the storage service hosting the audio is stable and reliable. Actual duration varies based on the request type and audio size.

Request

Request URL

ClusterRequest URLSupported Products
Shanghaihttp://api-audio-sh.fengkongcloud.com/audio/v4Chinese, International
Silicon Valleyhttp://api-audio-gg.fengkongcloud.com/audio/v4Chinese, International
Singaporehttp://api-audio-xjp.fengkongcloud.com/audio/v4Chinese, International

Request Parameters

ParameterTypeRequiredDescription
accessKeystringYesCompany key, provided by ISHUMEI. See the onboarding email for details.
appIdstringYesApplication identifier. Contact ISHUMEI to activate. Use the value provided by ISHUMEI. Default value is in the onboarding email.
eventIdstringYesEvent identifier. Contact ISHUMEI to activate. Use the value provided by ISHUMEI. Default value is in the onboarding email.
typestringNoRisk detection types. Either type or businessType is required. See Risk Detection Types.
businessTypestringNoBusiness label detection types. Either type or businessType is required. See Business Detection Types.
translationTargetLangstringNoTranslation target language. Translates the input text into the target language. Contact ISHUMEI sales to enable. zh: Chinese. en: English.
contentTypestringYesFormat of the audio content. URL: audio URL address. RAW: base64-encoded audio data.
contentstringYesAudio content — either a URL address or base64-encoded data. Base64 data limit: 15 MB. Only PCM, WAV, and MP3 formats are supported for base64. PCM format must use 16-bit little-endian encoding. PCM and WAV formats are recommended.
btIdstringYesUnique audio file identifier for matching callback results. Max 128 characters (truncated if exceeded). Must not be duplicated.
callbackstringNoCallback HTTP URL. When non-empty, the service sends moderation results to this URL.
acceptLangstringNoLanguage for returned labels. zh (default): Chinese. en: English.
dataobjectYesRequest data content. Max 1 MB. See data Object.

Risk Detection Types

Combine multiple types with underscores (e.g., POLITY_EROTIC_MOAN). Recommended: POLITY_EROTIC_MOAN_ADVERT.

ValueDescription
AUDIOPOLITICALTop leader voiceprint detection
POLITYPolitical content detection
EROTICPornographic content detection
ADVERTAdvertising detection
ADLAWAdvertising law violation detection
BANProhibited content detection
VIOLENTViolence & terrorism detection
ANTHENNational anthem detection
MOANMoaning detection
DIRTYAbusive language detection
BANEDAUDIOProhibited songs detection
COPYRIGHTSONGSCopyrighted songs detection

Business Detection Types

Combine multiple types with underscores. To detect timbre, singing, or language, GENDER must also be included.

ValueDescription
SINGSinging detection
LANGUAGELanguage detection
GENDERGender detection
TIMBREVoice timbre detection
VOICEVoice attributes
MINORMinor detection
AUDIOSCENEAudio scene detection
AGEAge detection

data Object

ParameterTypeRequiredDescription
retryUrlstringNoFallback audio URL. Used when the primary content URL fails to download.
tokenIdstringNoUser account identifier for behavior analysis. Recommended to pass the user UID.
formatInfostringNoAudio data format. Required when contentType is RAW. Values: pcm, wav, mp3.
rateintNoAudio sample rate. Required when format is pcm. Range: 8000–32000.
trackintNoAudio channels. Required when format is pcm. 1: mono. 2: stereo.
returnAllTextintNoControls which audio segments are returned. 0 (default): return only risk segments (REVIEW and REJECT). 1: return all segments (including PASS). This only controls segment-level results in audioDetail; it does not affect the overall result.
audioDetectStepintNoInterval moderation step size. Integer from 1–36. A value of 1 skips one 10-second audio segment, 2 skips two, and so on. When not set, all audio content is moderated. When enabled, it is recommended to also enable returnAllText and use the ASR result from each segment.
receiveTokenIdstringNoMessage receiver's tokenId for private chat scenarios. Alphanumeric with underscores and hyphens, up to 64 characters.
langstringNoAudio language type. Default: zh. See Supported Languages.
deviceIdstringNoISHUMEI device fingerprint identifier. Unique device ID generated by the ISHUMEI SDK.
roomstringNoRoom number. Recommended to provide.
dataIdstringNoCustom data identifier.
ipstringNoIPv4 or IPv6 address of the user who sent the audio.
levelintNoUser level for configuring different interception strategies. See User Levels.
genderstringNoUser gender. male or female.
extrajson_objectNoAuxiliary parameters.
extra.passThroughjson_objectNoPass-through field. All content under this field is returned via callback.

Supported Languages

ValueLanguage
zhChinese (default)
enEnglish
arArabic
hiHindi
esSpanish
frFrench
ruRussian
ptPortuguese
idIndonesian
deGerman
jaJapanese
trTurkish
viVietnamese
itItalian
thThai
tlFilipino
koKorean
msMalay
autoAutomatic language detection (contact ISHUMEI to enable)

User Levels

ValueDescription
0Lowest-level user (e.g., newly registered, completely inactive, or level-0 users)
1Lower-level user (e.g., low activity or low-level users)
2Mid-level user (e.g., moderately active or mid-level users)
3Higher-level user (e.g., highly active or high-level users)
4Highest-level user (e.g., paying users, VIP users)

Response

Response Parameters

ParameterTypeRequiredDescription
requestIdstringYesUnique request identifier.
codeintYesResponse code. See Response Codes.
messagestringYesResponse message corresponding to the code.
btIdint32YesUnique audio identifier. Returned when code is 1100.

Response Codes

CodeDescription
1100Success
1901QPS limit exceeded
1902Invalid parameters
1903Service failure
1904Download failure
1905Decoding failure
9101Unauthorized operation

Callback Parameters

ℹ️

Parameters other than code, message, and requestId are only guaranteed to be returned when code is 1100.

ParameterTypeRequiredDescription
requestIdstringYesUnique request identifier.
btIdstringYesUnique audio identifier.
codeintYesResponse code. See Callback Response Codes.
messagestringYesResponse message corresponding to the code.
riskLevelstringYesOverall disposition recommendation. PASS, REVIEW, or REJECT. During initial integration, it is recommended not to use results directly for blocking — adjust interception thresholds first.
audioTextstringYesFull audio-to-text transcription result.
audioTimeintYesTotal audio duration in seconds.
audioDetailjson_arrayYesAudio segment information. See audioDetail Array.
audioTagsjson_objectNoAudio tags including gender, timbre, and singing detection. Legacy compatibility field — use businessLabels instead. See audioTags Object.
requestParamsjson_objectYesPass-through field. Returns all fields under data.
auxInfojson_objectNoAuxiliary information. See auxInfo Object.
tokenProfileLabelsjson_arrayNoAccount attribute labels. Returned only when the feature is enabled. See Token Labels.
tokenRiskLabelsjson_arrayNoAccount risk labels. Returned only when the feature is enabled. See Token Labels.

Callback Response Codes

CodeDescription
1100Success
1101Processing
1901QPS limit exceeded
1902Invalid parameters
1903Service failure
1904Download failure
1905Decoding failure
9100Insufficient balance
9101Unauthorized operation

audioDetail Array

Each element represents an audio segment:

ParameterTypeRequiredDescription
requestIdstringYesUnique identifier for this audio segment.
audioStarttimefloatYesSegment start time relative to the audio beginning, in seconds.
audioEndtimefloatYesSegment end time relative to the audio beginning, in seconds.
audioUrlstringYesAudio segment URL (MP3 format).
riskLevelstringYesSegment risk level. PASS, REVIEW, or REJECT.
riskLabel1stringYesLevel 1 risk label.
riskLabel2stringYesLevel 2 risk label.
riskLabel3stringYesLevel 3 risk label.
riskDescriptionstringYesRisk description. For reference only — do not use for programmatic logic.
riskDetailjson_objectNoRisk detail information. See Segment riskDetail.
allLabelsjson_arrayNoAll risk labels. See Segment allLabels.
businessLabelsjson_arrayNoAll business labels. See Segment businessLabels.

Segment allLabels

Each element in the allLabels array:

ParameterTypeRequiredDescription
riskLabel1stringYesLevel 1 risk label.
riskLabel2stringYesLevel 2 risk label.
riskLabel3stringYesLevel 3 risk label.
riskDescriptionstringYesRisk description. For reference only — do not use for programmatic logic.
riskLevelstringYesRisk level: PASS, REVIEW, or REJECT.
probabilityfloatNoConfidence score (0–1). Higher values indicate higher risk probability.
riskDetailjson_objectNoRisk detail information. See Segment riskDetail.

Segment riskDetail

ParameterTypeRequiredDescription
audioTextstringNoAudio-to-text transcription result for this segment.
riskSourceintNoRisk source: 1000 (no risk), 1001 (text risk), 1003 (audio risk).
matchedListsjson_arrayNoMatched custom list information. Returned only when a custom list is hit.
matchedLists[].namestringYesName of the custom list.
matchedLists[].wordsjson_arrayYesSensitive word information from the matched list.
matchedLists[].words[].wordstringYesThe matched sensitive word.
matchedLists[].words[].positionint_arrayYesPosition of the sensitive word.
riskSegmentsjson_arrayNoHigh-risk content segments.
riskSegments[].segmentstringNoHigh-risk content segment text.
riskSegments[].positionint_arrayNoPosition of the high-risk segment.

Segment businessLabels

Each element in the businessLabels array:

ParameterTypeRequiredDescription
businessLabel1stringYesLevel 1 business label.
businessLabel2stringYesLevel 2 business label.
businessLabel3stringYesLevel 3 business label.
businessDescriptionstringYesBusiness label description. Format: "Level 1: Level 2: Level 3".
confidenceLevelintNoConfidence level (0–2). Higher values indicate greater confidence.
probabilityfloatNoConfidence score (0–1).
businessDetailjson_objectNoDetailed information. Reserved field.

audioTags Object

⚠️

This is a legacy compatibility field. Use businessLabels in audioDetail instead for new integrations.

ParameterTypeRequiredDescription
genderobjectNoGender detection result.
gender.labelstringYesGender label name (e.g., "Male", "Female").
gender.probabilityintNoGender probability (0–100). Higher values indicate greater likelihood.
timbrearrayNoVoice timbre detection results. Each element contains label and probability. Possible label values: Uncle, Young Man, Boy, Elderly Man, Queen, Mature Woman, Young Woman, Loli, Middle-aged Woman, Male, Female, No Voice.
languagearrayNoLanguage detection results. Each element contains label and probability/confidence.

Language Labels

LabelDescription
0Mandarin Chinese
1English
2Cantonese
3Tibetan
4Uyghur
5Mongolian
6Korean
-1Other languages

auxInfo Object

ParameterTypeRequiredDescription
errorCodeintYesStatus code. 2003: audio download failure. 2007: no valid data.

Token Labels

Both tokenProfileLabels and tokenRiskLabels share the same structure:

ParameterTypeRequiredDescription
label1stringNoLevel 1 label.
label2stringNoLevel 2 label.
label3stringNoLevel 3 label.
descriptionstringNoLabel description. For reference only — do not use for programmatic logic.
timestampintNoLabel timestamp. 13-digit Unix timestamp in milliseconds.

Examples

Request Example

{
  "accessKey": "YOUR_ACCESS_KEY",
  "appId": "default",
  "eventId": "default",
  "type": "POLITY_EROTIC_ADVERT_MOAN",
  "businessType": "GENDER_TIMBRE_SING_LANGUAGE",
  "btId": "test1",
  "contentType": "URL",
  "content": "https://example.com/audio/sample.mp3",
  "callback": "http://www.example.com/callback",
  "data": {
    "returnAllText": 1,
    "tokenId": "token-short"
  }
}

Response Example

{
  "code": 1100,
  "message": "Success",
  "requestId": "6a9cb980346dfea41111656a514e9109",
  "btId": "1604311839040"
}

Callback Example

{
  "requestId": "6a9cb980346dfea41111656a514e9109",
  "btId": "1604311839040",
  "code": 1100,
  "message": "Success",
  "riskLevel": "PASS",
  "audioDetail": [
    {
      "requestId": "6a9cb980346dfea41111656a514e9109_a0000",
      "audioStarttime": 0,
      "audioEndtime": 10,
      "audioUrl": "http://example.com/audio_segment_a0000.mp3",
      "businessLabels": [
        {
          "businessDescription": "Singing: Singing: Singing",
          "businessDetail": {},
          "businessLabel1": "sing",
          "businessLabel2": "changge",
          "businessLabel3": "changge",
          "confidenceLevel": 2,
          "probability": 0.858334402569294
        }
      ],
      "allLabels": [],
      "riskLevel": "PASS",
      "riskLabel1": "normal",
      "riskLabel2": "",
      "riskLabel3": "",
      "riskDescription": "Normal",
      "riskDetail": {
        "audioText": ""
      }
    },
    {
      "requestId": "6a9cb980346dfea41111656a514e9109_a0001",
      "audioStarttime": 10,
      "audioEndtime": 20,
      "audioUrl": "http://example.com/audio_segment_a0001.mp3",
      "riskLevel": "PASS",
      "riskLabel1": "normal",
      "riskLabel2": "",
      "riskLabel3": "",
      "riskDescription": "Normal",
      "riskDetail": {
        "audioText": ""
      }
    },
    {
      "requestId": "6a9cb980346dfea41111656a514e9109_a0002",
      "audioStarttime": 20,
      "audioEndtime": 30,
      "audioUrl": "http://example.com/audio_segment_a0002.mp3",
      "riskLevel": "PASS",
      "riskLabel1": "normal",
      "riskLabel2": "",
      "riskLabel3": "",
      "riskDescription": "Normal",
      "riskDetail": {
        "audioText": ""
      }
    }
  ],
  "audioTags": {
    "gender": {
      "label": "Female",
      "probability": 95
    },
    "language": [
      {
        "confidence": 0,
        "label": 2
      },
      {
        "confidence": 99,
        "label": 0
      },
      {
        "confidence": 0,
        "label": 1
      }
    ],
    "song": 0,
    "timbre": [
      {
        "label": "Female",
        "probability": 95
      },
      {
        "label": "Queen",
        "probability": 12
      },
      {
        "label": "Mature Woman",
        "probability": 37
      },
      {
        "label": "Young Woman",
        "probability": 56
      },
      {
        "label": "Middle-aged Woman",
        "probability": 67
      },
      {
        "label": "Loli",
        "probability": 24
      }
    ]
  }
}