Async Audio Detection
Asynchronous audio detection API for identifying regulatory risks and business content in audio files with callback-based results.
Detect regulatory risks in audio content including political content, pornography, advertising, and violence & terrorism. Combine with your business scenarios to identify minors, voice timbre, and other content.
API Description
Asynchronous detection API that returns recognition results via callback. Recommended to use the HTTP protocol for API calls.
Audio Requirements
| Item | Specification |
|---|---|
| Audio types | URL, BASE64 |
| Supported formats | WAV, MP3, AAC, AMR, 3GP, M4A, WMA, OGG, APE, FLAC, ALAC, WAVPACK, SILK_V3, etc. |
| Size limit | ≤ 18 MB |
Audio URLs should be downloaded from a CDN origin server. The origin server must not be a single point of failure, otherwise audio download failures may prevent moderation.
Timeout
- Synchronous request: recommended timeout of 10 seconds
- Async batch request: recommended timeout of 5 seconds
- Response time depends on audio download time. Ensure the storage service hosting the audio is stable and reliable. Actual duration varies based on the request
typeand audio size.
Request
Request URL
| Cluster | Request URL | Supported Products |
|---|---|---|
| Shanghai | http://api-audio-sh.fengkongcloud.com/audio/v4 | Chinese, International |
| Silicon Valley | http://api-audio-gg.fengkongcloud.com/audio/v4 | Chinese, International |
| Singapore | http://api-audio-xjp.fengkongcloud.com/audio/v4 | Chinese, International |
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
accessKey | string | Yes | Company key, provided by ISHUMEI. See the onboarding email for details. |
appId | string | Yes | Application identifier. Contact ISHUMEI to activate. Use the value provided by ISHUMEI. Default value is in the onboarding email. |
eventId | string | Yes | Event identifier. Contact ISHUMEI to activate. Use the value provided by ISHUMEI. Default value is in the onboarding email. |
type | string | No | Risk detection types. Either type or businessType is required. See Risk Detection Types. |
businessType | string | No | Business label detection types. Either type or businessType is required. See Business Detection Types. |
translationTargetLang | string | No | Translation target language. Translates the input text into the target language. Contact ISHUMEI sales to enable. zh: Chinese. en: English. |
contentType | string | Yes | Format of the audio content. URL: audio URL address. RAW: base64-encoded audio data. |
content | string | Yes | Audio content — either a URL address or base64-encoded data. Base64 data limit: 15 MB. Only PCM, WAV, and MP3 formats are supported for base64. PCM format must use 16-bit little-endian encoding. PCM and WAV formats are recommended. |
btId | string | Yes | Unique audio file identifier for matching callback results. Max 128 characters (truncated if exceeded). Must not be duplicated. |
callback | string | No | Callback HTTP URL. When non-empty, the service sends moderation results to this URL. |
acceptLang | string | No | Language for returned labels. zh (default): Chinese. en: English. |
data | object | Yes | Request data content. Max 1 MB. See data Object. |
Risk Detection Types
Combine multiple types with underscores (e.g., POLITY_EROTIC_MOAN). Recommended: POLITY_EROTIC_MOAN_ADVERT.
| Value | Description |
|---|---|
AUDIOPOLITICAL | Top leader voiceprint detection |
POLITY | Political content detection |
EROTIC | Pornographic content detection |
ADVERT | Advertising detection |
ADLAW | Advertising law violation detection |
BAN | Prohibited content detection |
VIOLENT | Violence & terrorism detection |
ANTHEN | National anthem detection |
MOAN | Moaning detection |
DIRTY | Abusive language detection |
BANEDAUDIO | Prohibited songs detection |
COPYRIGHTSONGS | Copyrighted songs detection |
Business Detection Types
Combine multiple types with underscores. To detect timbre, singing, or language, GENDER must also be included.
| Value | Description |
|---|---|
SING | Singing detection |
LANGUAGE | Language detection |
GENDER | Gender detection |
TIMBRE | Voice timbre detection |
VOICE | Voice attributes |
MINOR | Minor detection |
AUDIOSCENE | Audio scene detection |
AGE | Age detection |
data Object
data Object| Parameter | Type | Required | Description |
|---|---|---|---|
retryUrl | string | No | Fallback audio URL. Used when the primary content URL fails to download. |
tokenId | string | No | User account identifier for behavior analysis. Recommended to pass the user UID. |
formatInfo | string | No | Audio data format. Required when contentType is RAW. Values: pcm, wav, mp3. |
rate | int | No | Audio sample rate. Required when format is pcm. Range: 8000–32000. |
track | int | No | Audio channels. Required when format is pcm. 1: mono. 2: stereo. |
returnAllText | int | No | Controls which audio segments are returned. 0 (default): return only risk segments (REVIEW and REJECT). 1: return all segments (including PASS). This only controls segment-level results in audioDetail; it does not affect the overall result. |
audioDetectStep | int | No | Interval moderation step size. Integer from 1–36. A value of 1 skips one 10-second audio segment, 2 skips two, and so on. When not set, all audio content is moderated. When enabled, it is recommended to also enable returnAllText and use the ASR result from each segment. |
receiveTokenId | string | No | Message receiver's tokenId for private chat scenarios. Alphanumeric with underscores and hyphens, up to 64 characters. |
lang | string | No | Audio language type. Default: zh. See Supported Languages. |
deviceId | string | No | ISHUMEI device fingerprint identifier. Unique device ID generated by the ISHUMEI SDK. |
room | string | No | Room number. Recommended to provide. |
dataId | string | No | Custom data identifier. |
ip | string | No | IPv4 or IPv6 address of the user who sent the audio. |
level | int | No | User level for configuring different interception strategies. See User Levels. |
gender | string | No | User gender. male or female. |
extra | json_object | No | Auxiliary parameters. |
extra.passThrough | json_object | No | Pass-through field. All content under this field is returned via callback. |
Supported Languages
| Value | Language |
|---|---|
zh | Chinese (default) |
en | English |
ar | Arabic |
hi | Hindi |
es | Spanish |
fr | French |
ru | Russian |
pt | Portuguese |
id | Indonesian |
de | German |
ja | Japanese |
tr | Turkish |
vi | Vietnamese |
it | Italian |
th | Thai |
tl | Filipino |
ko | Korean |
ms | Malay |
auto | Automatic language detection (contact ISHUMEI to enable) |
User Levels
| Value | Description |
|---|---|
0 | Lowest-level user (e.g., newly registered, completely inactive, or level-0 users) |
1 | Lower-level user (e.g., low activity or low-level users) |
2 | Mid-level user (e.g., moderately active or mid-level users) |
3 | Higher-level user (e.g., highly active or high-level users) |
4 | Highest-level user (e.g., paying users, VIP users) |
Response
Response Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
requestId | string | Yes | Unique request identifier. |
code | int | Yes | Response code. See Response Codes. |
message | string | Yes | Response message corresponding to the code. |
btId | int32 | Yes | Unique audio identifier. Returned when code is 1100. |
Response Codes
| Code | Description |
|---|---|
1100 | Success |
1901 | QPS limit exceeded |
1902 | Invalid parameters |
1903 | Service failure |
1904 | Download failure |
1905 | Decoding failure |
9101 | Unauthorized operation |
Callback Parameters
Parameters other than
code,message, andrequestIdare only guaranteed to be returned whencodeis1100.
| Parameter | Type | Required | Description |
|---|---|---|---|
requestId | string | Yes | Unique request identifier. |
btId | string | Yes | Unique audio identifier. |
code | int | Yes | Response code. See Callback Response Codes. |
message | string | Yes | Response message corresponding to the code. |
riskLevel | string | Yes | Overall disposition recommendation. PASS, REVIEW, or REJECT. During initial integration, it is recommended not to use results directly for blocking — adjust interception thresholds first. |
audioText | string | Yes | Full audio-to-text transcription result. |
audioTime | int | Yes | Total audio duration in seconds. |
audioDetail | json_array | Yes | Audio segment information. See audioDetail Array. |
audioTags | json_object | No | Audio tags including gender, timbre, and singing detection. Legacy compatibility field — use businessLabels instead. See audioTags Object. |
requestParams | json_object | Yes | Pass-through field. Returns all fields under data. |
auxInfo | json_object | No | Auxiliary information. See auxInfo Object. |
tokenProfileLabels | json_array | No | Account attribute labels. Returned only when the feature is enabled. See Token Labels. |
tokenRiskLabels | json_array | No | Account risk labels. Returned only when the feature is enabled. See Token Labels. |
Callback Response Codes
| Code | Description |
|---|---|
1100 | Success |
1101 | Processing |
1901 | QPS limit exceeded |
1902 | Invalid parameters |
1903 | Service failure |
1904 | Download failure |
1905 | Decoding failure |
9100 | Insufficient balance |
9101 | Unauthorized operation |
audioDetail Array
audioDetail ArrayEach element represents an audio segment:
| Parameter | Type | Required | Description |
|---|---|---|---|
requestId | string | Yes | Unique identifier for this audio segment. |
audioStarttime | float | Yes | Segment start time relative to the audio beginning, in seconds. |
audioEndtime | float | Yes | Segment end time relative to the audio beginning, in seconds. |
audioUrl | string | Yes | Audio segment URL (MP3 format). |
riskLevel | string | Yes | Segment risk level. PASS, REVIEW, or REJECT. |
riskLabel1 | string | Yes | Level 1 risk label. |
riskLabel2 | string | Yes | Level 2 risk label. |
riskLabel3 | string | Yes | Level 3 risk label. |
riskDescription | string | Yes | Risk description. For reference only — do not use for programmatic logic. |
riskDetail | json_object | No | Risk detail information. See Segment riskDetail. |
allLabels | json_array | No | All risk labels. See Segment allLabels. |
businessLabels | json_array | No | All business labels. See Segment businessLabels. |
Segment allLabels
allLabelsEach element in the allLabels array:
| Parameter | Type | Required | Description |
|---|---|---|---|
riskLabel1 | string | Yes | Level 1 risk label. |
riskLabel2 | string | Yes | Level 2 risk label. |
riskLabel3 | string | Yes | Level 3 risk label. |
riskDescription | string | Yes | Risk description. For reference only — do not use for programmatic logic. |
riskLevel | string | Yes | Risk level: PASS, REVIEW, or REJECT. |
probability | float | No | Confidence score (0–1). Higher values indicate higher risk probability. |
riskDetail | json_object | No | Risk detail information. See Segment riskDetail. |
Segment riskDetail
riskDetail| Parameter | Type | Required | Description |
|---|---|---|---|
audioText | string | No | Audio-to-text transcription result for this segment. |
riskSource | int | No | Risk source: 1000 (no risk), 1001 (text risk), 1003 (audio risk). |
matchedLists | json_array | No | Matched custom list information. Returned only when a custom list is hit. |
matchedLists[].name | string | Yes | Name of the custom list. |
matchedLists[].words | json_array | Yes | Sensitive word information from the matched list. |
matchedLists[].words[].word | string | Yes | The matched sensitive word. |
matchedLists[].words[].position | int_array | Yes | Position of the sensitive word. |
riskSegments | json_array | No | High-risk content segments. |
riskSegments[].segment | string | No | High-risk content segment text. |
riskSegments[].position | int_array | No | Position of the high-risk segment. |
Segment businessLabels
businessLabelsEach element in the businessLabels array:
| Parameter | Type | Required | Description |
|---|---|---|---|
businessLabel1 | string | Yes | Level 1 business label. |
businessLabel2 | string | Yes | Level 2 business label. |
businessLabel3 | string | Yes | Level 3 business label. |
businessDescription | string | Yes | Business label description. Format: "Level 1: Level 2: Level 3". |
confidenceLevel | int | No | Confidence level (0–2). Higher values indicate greater confidence. |
probability | float | No | Confidence score (0–1). |
businessDetail | json_object | No | Detailed information. Reserved field. |
audioTags Object
audioTags ObjectThis is a legacy compatibility field. Use
businessLabelsinaudioDetailinstead for new integrations.
| Parameter | Type | Required | Description |
|---|---|---|---|
gender | object | No | Gender detection result. |
gender.label | string | Yes | Gender label name (e.g., "Male", "Female"). |
gender.probability | int | No | Gender probability (0–100). Higher values indicate greater likelihood. |
timbre | array | No | Voice timbre detection results. Each element contains label and probability. Possible label values: Uncle, Young Man, Boy, Elderly Man, Queen, Mature Woman, Young Woman, Loli, Middle-aged Woman, Male, Female, No Voice. |
language | array | No | Language detection results. Each element contains label and probability/confidence. |
Language Labels
| Label | Description |
|---|---|
0 | Mandarin Chinese |
1 | English |
2 | Cantonese |
3 | Tibetan |
4 | Uyghur |
5 | Mongolian |
6 | Korean |
-1 | Other languages |
auxInfo Object
auxInfo Object| Parameter | Type | Required | Description |
|---|---|---|---|
errorCode | int | Yes | Status code. 2003: audio download failure. 2007: no valid data. |
Token Labels
Both tokenProfileLabels and tokenRiskLabels share the same structure:
| Parameter | Type | Required | Description |
|---|---|---|---|
label1 | string | No | Level 1 label. |
label2 | string | No | Level 2 label. |
label3 | string | No | Level 3 label. |
description | string | No | Label description. For reference only — do not use for programmatic logic. |
timestamp | int | No | Label timestamp. 13-digit Unix timestamp in milliseconds. |
Examples
Request Example
{
"accessKey": "YOUR_ACCESS_KEY",
"appId": "default",
"eventId": "default",
"type": "POLITY_EROTIC_ADVERT_MOAN",
"businessType": "GENDER_TIMBRE_SING_LANGUAGE",
"btId": "test1",
"contentType": "URL",
"content": "https://example.com/audio/sample.mp3",
"callback": "http://www.example.com/callback",
"data": {
"returnAllText": 1,
"tokenId": "token-short"
}
}Response Example
{
"code": 1100,
"message": "Success",
"requestId": "6a9cb980346dfea41111656a514e9109",
"btId": "1604311839040"
}Callback Example
{
"requestId": "6a9cb980346dfea41111656a514e9109",
"btId": "1604311839040",
"code": 1100,
"message": "Success",
"riskLevel": "PASS",
"audioDetail": [
{
"requestId": "6a9cb980346dfea41111656a514e9109_a0000",
"audioStarttime": 0,
"audioEndtime": 10,
"audioUrl": "http://example.com/audio_segment_a0000.mp3",
"businessLabels": [
{
"businessDescription": "Singing: Singing: Singing",
"businessDetail": {},
"businessLabel1": "sing",
"businessLabel2": "changge",
"businessLabel3": "changge",
"confidenceLevel": 2,
"probability": 0.858334402569294
}
],
"allLabels": [],
"riskLevel": "PASS",
"riskLabel1": "normal",
"riskLabel2": "",
"riskLabel3": "",
"riskDescription": "Normal",
"riskDetail": {
"audioText": ""
}
},
{
"requestId": "6a9cb980346dfea41111656a514e9109_a0001",
"audioStarttime": 10,
"audioEndtime": 20,
"audioUrl": "http://example.com/audio_segment_a0001.mp3",
"riskLevel": "PASS",
"riskLabel1": "normal",
"riskLabel2": "",
"riskLabel3": "",
"riskDescription": "Normal",
"riskDetail": {
"audioText": ""
}
},
{
"requestId": "6a9cb980346dfea41111656a514e9109_a0002",
"audioStarttime": 20,
"audioEndtime": 30,
"audioUrl": "http://example.com/audio_segment_a0002.mp3",
"riskLevel": "PASS",
"riskLabel1": "normal",
"riskLabel2": "",
"riskLabel3": "",
"riskDescription": "Normal",
"riskDetail": {
"audioText": ""
}
}
],
"audioTags": {
"gender": {
"label": "Female",
"probability": 95
},
"language": [
{
"confidence": 0,
"label": 2
},
{
"confidence": 99,
"label": 0
},
{
"confidence": 0,
"label": 1
}
],
"song": 0,
"timbre": [
{
"label": "Female",
"probability": 95
},
{
"label": "Queen",
"probability": 12
},
{
"label": "Mature Woman",
"probability": 37
},
{
"label": "Young Woman",
"probability": 56
},
{
"label": "Middle-aged Woman",
"probability": 67
},
{
"label": "Loli",
"probability": 24
}
]
}
}Updated 12 days ago