Detect regulatory risks and business content in audio files including political content, pornography, advertising, violence, minors, and voice timbre.
Detect regulatory risks in audio content including political content, pornography, advertising, and violence & terrorism. Combine with your business scenarios to identify minors, voice timbre, and other content.
API Description
Synchronous detection API that returns recognition results directly. Recommended to use the HTTP protocol for API calls.
Audio URLs should be downloaded from a CDN origin server. The origin server must not be a single point of failure, otherwise audio download failures may prevent moderation.
Timeout
Synchronous request: recommended timeout of 10 seconds
Async request: recommended timeout of 5 seconds
Response time depends on audio download time. Ensure the storage service hosting the audio is stable and reliable. Actual duration varies based on the request type and audio size.
Company key, provided by ISHUMEI. See the onboarding email for details.
appId
string
Yes
Application identifier. Contact ISHUMEI to activate. Use the value provided by ISHUMEI. Default value is in the onboarding email.
eventId
string
Yes
Event identifier. Contact ISHUMEI to activate. Use the value provided by ISHUMEI. Default value is in the onboarding email.
type
string
No
Risk detection types. Either type or businessType is required. See Risk Detection Types.
businessType
string
No
Business label detection types. Either type or businessType is required. See Business Detection Types.
contentType
string
Yes
Format of the audio content. URL: audio URL address. RAW: base64-encoded audio data.
content
string
Yes
Audio content — either a URL address or base64-encoded data. Base64 data limit: 15 MB. Only PCM, WAV, and MP3 formats are supported for base64. PCM format must use 16-bit little-endian encoding. PCM and WAV formats are recommended.
btId
string
Yes
Unique audio file identifier for matching callback results. Max 128 characters (truncated if exceeded). Must not be duplicated.
Combine multiple types with underscores (e.g., POLITY_EROTIC_MOAN). Recommended: POLITY_EROTIC_MOAN_ADVERT.
Value
Description
AUDIOPOLITICAL
Top leader voiceprint detection
POLITY
Political content detection
EROTIC
Pornographic content detection
ADVERT
Advertising detection
BAN
Prohibited content detection
VIOLENT
Violence & terrorism detection
ANTHEN
National anthem detection
MOAN
Moaning detection
DIRTY
Abusive language detection
BANEDAUDIO
Prohibited songs detection
COPYRIGHTSONGS
Copyrighted songs detection
Business Detection Types
Combine multiple types with underscores. To detect timbre, singing, or language, GENDER must also be included.
Value
Description
SING
Singing detection
LANGUAGE
Language detection
GENDER
Gender detection
TIMBRE
Voice timbre detection
VOICE
Voice attributes
MINOR
Minor detection
AUDIOSCENE
Audio scene detection
AGE
Age detection
data Object
Parameter
Type
Required
Description
tokenId
string
No
User account identifier for behavior analysis. Recommended to pass the user UID.
receiveTokenId
string
No
Message receiver's tokenId for private chat scenarios. Alphanumeric with underscores and hyphens, up to 64 characters.
deviceId
string
No
ISHUMEI device fingerprint identifier. Unique device ID generated by the ISHUMEI SDK.
ip
string
No
IPv4 or IPv6 address of the user who sent the audio.
dataId
string
No
Custom data identifier.
level
int
No
User level for configuring different interception strategies. See User Levels.
gender
string
No
User gender. male or female.
formatInfo
string
No
Audio data format. Required when contentType is RAW. Values: pcm, wav, mp3.
rate
int
No
Audio sample rate. Required when format is pcm. Range: 8000–32000.
track
int
No
Audio channels. Required when format is pcm. 1: mono. 2: stereo.
returnAllText
int
No
Controls which audio segments are returned. 0 (default): return only risk segments (REVIEW and REJECT). 1: return all segments (including PASS). This only controls segment-level results in audioDetail; it does not affect the overall result.
User Levels
Value
Description
0
Lowest-level user (e.g., newly registered, completely inactive, or level-0 users)
1
Lower-level user (e.g., low activity or low-level users)
2
Mid-level user (e.g., moderately active or mid-level users)
3
Higher-level user (e.g., highly active or high-level users)
4
Highest-level user (e.g., paying users, VIP users)
Response
Response Parameters
ℹ️
Parameters other than code, message, and requestId are only guaranteed to be returned when code is 1100.
Detailed response data. Required when code is 1100. See detail Object.
Response Codes
Code
Description
1100
Success
1101
Processing
1901
QPS limit exceeded
1902
Invalid parameters
1903
Service failure
1904
Download failure
1905
Decoding failure
9100
Insufficient balance
9101
Unauthorized operation
detail Object
Parameter
Type
Required
Description
riskLevel
string
Yes
Overall disposition recommendation. PASS, REVIEW, or REJECT. During initial integration, it is recommended not to use results directly for blocking — adjust interception thresholds first to match expectations.
This is a legacy compatibility field. Use businessLabels in audioDetail instead for new integrations.
Parameter
Type
Required
Description
gender
object
No
Gender detection result.
gender.label
string
Yes
Gender label name (e.g., "Male", "Female").
gender.probability
int
No
Gender probability (0–100). Higher values indicate greater likelihood.
timbre
array
No
Voice timbre detection results. Each element contains label and probability. Possible label values: Uncle, Young Man, Boy, Elderly Man, Queen, Mature Woman, Young Woman, Loli, Middle-aged Woman, Male, Female, No Voice.
language
array
No
Language detection results. Each element contains label and confidence.