Async API

API Description

Asynchronous audio moderation API. Submits one audio clip per request and receives an immediate acknowledgement with a btId; the full moderation result is delivered later via callback to the URL you provide. Detects regulatory risks in audio content — including political content, pornography, advertising, and violence & terrorism — and can additionally identify minors, voice timbre, language, and other content based on the business scenarios enabled on your account.

Requirements

Item	Specification
Protocol	HTTP or HTTPS
Method	POST
Encoding	UTF-8
Format	All request and response parameters use JSON

Audio Requirements

Item	Specification
Audio types	URL, BASE64
Supported formats	WAV, MP3, AAC, AMR, 3GP, M4A, WMA, OGG, APE, FLAC, ALAC, WAVPACK, SILK_V3, etc.
Size limit	≤ 18 MB

ℹ️
Host audio on a highly available CDN before submitting the URL. DeepCleer fetches each clip directly from the origin server you provide, so any outage or single point of failure on your side will cause fetch failures — and an audio clip that can't be fetched can't be moderated.

Timeout Suggestion

Asynchronous request (this endpoint): recommended timeout of 5 seconds for the acknowledgement call
Callback delivery: results arrive separately; keep your callback handler fast (< 2 seconds) so DeepCleer doesn't retry unnecessarily

ℹ️
The initial acknowledgement returns almost immediately once the request is validated. End-to-end moderation time is dominated by how long DeepCleer takes to fetch your audio and run the requested detection types, so keep your hosting fast and reliable. Actual duration varies with the request type and the audio size.

Callback Mechanism

Results are delivered to the callback URL you supply in the request. When DeepCleer calls your endpoint:

The request body is a JSON payload matching Callback Parameters.
Your endpoint must respond with HTTP 200 OK within a few seconds. Any non-2xx response or timeout is treated as a delivery failure.
On failure, DeepCleer retries with exponential backoff. After repeated failures the result is dropped; we recommend monitoring your callback handler and reprocessing via btId if needed.
Your endpoint should be idempotent on btId — the same result may be delivered more than once if an earlier delivery succeeded but the response was lost in transit.

Request

Request URL

Cluster	Request URL
Silicon Valley	`http://api-audio-gg.fengkongcloud.com/audio/v4`
Singapore	`http://api-audio-xjp.fengkongcloud.com/audio/v4`

Request Parameters

Parameter	Type	Required	Max Length	Description
`accessKey`	string	Yes	20	API authentication key. The default `accessKey` is sent in your onboarding email.
`appId`	string	Yes	64	Application identifier, such as `web` for your web application or `app` for your mobile app. The default `appId` is sent in your onboarding email. Contact DeepCleer if you need a new `appId`.
`eventId`	string	Yes	64	Event identifier used to distinguish moderation scenarios in your application, such as `voiceMessage` for chat voice messages or `liveAudio` for livestream audio. The default `eventId` is sent in your onboarding email. Contact DeepCleer if you need a new `eventId`.
`type`	string	Conditional	64	Risk detection types to run. Either `type` or `businessType` must be provided; you can also provide both. Multiple values can be combined with underscores (for example, `POLITY_EROTIC_MOAN_ADVERT`). See Risk Detection Types for the full catalog.
`businessType`	string	Conditional	128	Business detection types to run — your organization's custom moderation categories, configured with DeepCleer separately from the built-in `type` catalog. Either `type` or `businessType` must be provided; you can also provide both. Multiple values can be combined with underscores. See Business Detection Types for the full catalog.
`contentType`	string	Yes	—	Format of the audio content. `URL`: audio URL address. `RAW`: base64-encoded audio data.
`content`	string	Yes	—	Audio content — either a URL address or base64-encoded data. Base64 data limit: 15 MB. Only PCM, WAV, and MP3 formats are supported for base64. PCM format must use 16-bit little-endian encoding. PCM and WAV formats are recommended.
`btId`	string	Yes	128	Client-side unique audio identifier used to correlate callback results. Echoed back in both the acknowledgement and the callback payload. Must be unique per request (truncated if the 128-character limit is exceeded).
`callback`	string	Yes	—	HTTP callback URL. DeepCleer sends the full moderation result to this URL once processing completes. See Callback Mechanism for delivery semantics.
`translationTargetLang`	string	No	—	Translation target language. When provided, the transcribed audio text is translated into this language and returned in the callback. Contact DeepCleer to enable this feature. `zh`: Chinese. `en`: English.
`acceptLang`	string	Yes	—	Language for returned labels. Set `en` by default.
`data`	object	Yes	1 MB	Request data content. See `data` Object.

Risk Detection Types

Combine multiple types with underscores (for example, POLITY_EROTIC_MOAN). Recommended starter combination: POLITY_EROTIC_MOAN_ADVERT.

Value	Description
`AUDIOPOLITICAL`	Top-leader voiceprint detection
`POLITY`	Political content detection
`EROTIC`	Pornographic content detection
`ADVERT`	Advertising detection
`ADLAW`	Advertising law violation detection
`BAN`	Prohibited content detection
`VIOLENT`	Violence & terrorism detection
`ANTHEN`	National anthem detection
`MOAN`	Moaning detection
`DIRTY`	Abusive language detection
`BANEDAUDIO`	Prohibited songs detection
`COPYRIGHTSONGS`	Copyrighted songs detection

Business Detection Types

Combine multiple types with underscores. To detect timbre, singing, or language, GENDER must also be included.

Value	Description
`SING`	Singing detection
`LANGUAGE`	Language detection
`GENDER`	Gender detection
`TIMBRE`	Voice timbre detection
`VOICE`	Voice attributes
`MINOR`	Minor detection
`AUDIOSCENE`	Audio scene detection
`AGE`	Age detection

`data` Object

Parameter	Type	Required	Max Length	Description
`retryUrl`	string	No	—	Fallback audio URL. Used when the primary `content` URL fails to download. Only applies when `contentType` is `URL`.
`tokenId`	string	Yes	64	User account identifier. Recommended to pass the user UID (can be encrypted). Used for behavioral risk detection (spam, advertising, etc.). Must be an alphanumeric string with underscores and hyphens, up to 64 characters.
`receiveTokenId`	string	Conditional	64	Message receiver's `tokenId` for private-chat scenarios. Alphanumeric with underscores and hyphens, up to 64 characters. Required when `eventId` is `message`.
`deviceId`	string	No	128	DeepCleer device fingerprint identifier generated by the DeepCleer SDK. Used for device-level behavior analysis.
`ip`	string	No	64	Public IPv4 or IPv6 address of the user who submitted the audio. Used for IP-based behavior analysis.
`dataId`	string	No	—	Custom data identifier. Passed through by your application for your own record-keeping.
`level`	int32	No	—	User level for configuring different interception strategies. See User Levels.
`gender`	int32	No	—	User gender. `0`: male. `1`: female. `2`: unknown.
`nickname`	string	No	150	User nickname. Max 150 characters (truncated if exceeded).
`room`	string	No	64	Live room / game room ID. When the scenario is a live audio room, strongly recommended to provide so per-room strategies can be applied and context recognition is enabled.
`lang`	string	No	—	Audio language used for ASR (audio-to-text transcription). Default: `zh`. For international audio that cannot be categorized, use `auto` for automatic language detection. Distinct from `acceptLang` (which controls the language of the returned labels). See Supported Languages.
`formatInfo`	string	Conditional	—	Audio data format. Required when `contentType` is `RAW`. Values: `pcm`, `wav`, `mp3`.
`rate`	int32	Conditional	—	Audio sample rate, in Hz. Required when `formatInfo` is `pcm`. Range: 8000–32000.
`track`	int32	Conditional	—	Audio channel count. Required when `formatInfo` is `pcm`. `1`: mono. `2`: stereo.
`returnAllText`	int32	No	—	Controls which audio segments are returned in `audioDetail`. `0` (default): return only risk segments (`REVIEW` and `REJECT`). `1`: return all segments (including `PASS`). This only controls segment-level output; it does not affect the overall result.
`audioDetectStep`	int32	No	—	Interval moderation step size (1–36). A value of `1` skips one 10-second audio segment between moderated segments, `2` skips two, and so on. When omitted, all audio content is moderated. When enabled, it is recommended to also set `returnAllText` to `1` and consume the ASR result from each moderated segment.
`extra`	object	No	—	Auxiliary parameters.
`extra.passThrough`	object	No	1024	Client pass-through field. DeepCleer does not process this field; all content under it is echoed back in the callback payload.

Supported Languages

Value	Language
`en`	English
`zh`	Chinese (default)
`ar`	Arabic
`hi`	Hindi
`es`	Spanish
`fr`	French
`ru`	Russian
`pt`	Portuguese
`id`	Indonesian
`de`	German
`ja`	Japanese
`tr`	Turkish
`vi`	Vietnamese
`it`	Italian
`th`	Thai
`tl`	Filipino
`ko`	Korean
`ms`	Malay
`auto`	Automatic language detection (contact DeepCleer to enable)

User Levels

Value	Description
`0`	Lowest-level user (e.g., newly registered, completely inactive, or level-0 users)
`1`	Lower-level user (e.g., low activity or low-level users)
`2`	Mid-level user (e.g., moderately active or mid-level users)
`3`	Higher-level user (e.g., highly active or high-level users)
`4`	Highest-level user (e.g., paying users, VIP users)

Synchronous Response

The synchronous response is an acknowledgement only. It confirms whether the request was accepted for asynchronous processing. The actual moderation result is delivered later to your callback URL — see Callback Parameters.

Response Parameters

Parameter	Type	Required	Description
`requestId`	string	Yes	Unique DeepCleer request identifier. Strongly recommended to save for troubleshooting.
`code`	int32	Yes	Response code. See Response Codes.
`message`	string	Yes	Response message corresponding to the `code`.
`btId`	string	Yes	Client-side audio identifier echoed back. Returned when `code` is `1100`.

Response Codes

Code	Message
`1100`	Success
`1901`	QPS limit exceeded
`1902`	Invalid parameters
`1903`	Service failure
`1904`	Download failure
`1905`	Decoding failure
`9101`	Unauthorized operation

Callback Parameters

The callback payload delivers the full moderation result for one audio clip.

ℹ️
Parameters other than code, message, and requestId are only guaranteed to be returned when code is 1100.

Parameter	Type	Required	Description
`requestId`	string	Yes	Unique DeepCleer request identifier for this audio clip.
`btId`	string	Yes	Client-side audio identifier echoed back from the request.
`code`	int32	Yes	Response code. See Callback Response Codes.
`message`	string	Yes	Response message corresponding to the `code`.
`riskLevel`	string	Yes	Overall disposition recommendation. `PASS`: normal (allow). `REVIEW`: suspicious (route to manual review). `REJECT`: violation (block). During initial integration, we recommend tuning your interception thresholds before using this value for hard blocks.
`audioText`	string	Yes	Full audio-to-text transcription result.
`audioTime`	int32	Yes	Total audio duration in seconds.
`audioDetail`	array	Yes	Per-segment moderation results. See `audioDetail` Array.
`audioTags`	object	No	Legacy audio tags — gender, timbre, and singing detection. New integrations should use `businessLabels` inside `audioDetail` instead. See `audioTags` Object.
`requestParams`	object	Yes	Echo of all fields submitted under `data` in the original request. Useful for correlating callbacks with the original payload.
`auxInfo`	object	No	Auxiliary information. See `auxInfo` Object.
`tokenProfileLabels`	array	No	Account attribute labels. Returned only when `tokenId` is provided and the labeling service is enabled. See Token Labels.
`tokenRiskLabels`	array	No	Account risk labels. Returned only when `tokenId` is provided and the labeling service is enabled. See Token Labels.

Callback Response Codes

Code	Message
`1100`	Success
`1101`	Processing
`1901`	QPS limit exceeded
`1902`	Invalid parameters
`1903`	Service failure
`1904`	Download failure
`1905`	Decoding failure
`9100`	Insufficient balance
`9101`	Unauthorized operation

`audioDetail` Array

Each element represents one audio segment:

Parameter	Type	Required	Description
`requestId`	string	Yes	Unique identifier for this audio segment.
`audioStarttime`	float	Yes	Segment start time relative to the audio beginning, in seconds.
`audioEndtime`	float	Yes	Segment end time relative to the audio beginning, in seconds.
`audioUrl`	string	Yes	Audio segment URL (MP3 format).
`riskLevel`	string	Yes	Segment risk level. `PASS`: normal. `REVIEW`: suspicious. `REJECT`: violation.
`riskLabel1`	string	Yes	Level 1 risk label. Returns `normal` when `riskLevel` is `PASS`.
`riskLabel2`	string	Yes	Level 2 risk label. Empty when `riskLevel` is `PASS`.
`riskLabel3`	string	Yes	Level 3 risk label. Empty when `riskLevel` is `PASS`.
`riskDescription`	string	Yes	Risk description. Returns "Normal" when `riskLevel` is `PASS`. Format: "Level 1: Level 2: Level 3". For reference only — do not use for programmatic logic.
`riskDetail`	object	No	Risk detail for this segment. See Segment `riskDetail`.
`allLabels`	array	No	All risk labels matched in this segment. See Segment `allLabels`.
`businessLabels`	array	No	All business labels matched in this segment. See Segment `businessLabels`.

Note on field casing: audioStarttime and audioEndtime are preserved exactly as returned on the wire (lowercase t). Flag for v5 cleanup alongside other inconsistent casings such as face_num and b_advertise_risk_tokenid.

Segment `allLabels`

Each element in the allLabels array:

Parameter	Type	Required	Description
`riskLabel1`	string	Yes	Level 1 risk label.
`riskLabel2`	string	Yes	Level 2 risk label.
`riskLabel3`	string	Yes	Level 3 risk label.
`riskDescription`	string	Yes	Risk description. For reference only — do not use for programmatic logic.
`riskLevel`	string	Yes	Risk level: `PASS`, `REVIEW`, or `REJECT`.
`probability`	float	No	Confidence score (0–1). Higher values indicate greater confidence.
`riskDetail`	object	No	Risk detail. Same structure as Segment `riskDetail`.

Segment `riskDetail`

Parameter	Type	Required	Description
`audioText`	string	No	Audio-to-text transcription result for this segment.
`riskSource`	int32	No	Risk source: `1000` (no risk), `1001` (text risk), `1003` (audio risk).
`matchedLists`	array	No	Matched custom list information. Returned only when a custom list is hit. See Matched Lists.
`riskSegments`	array	No	High-risk content segments. Present when political, terrorism, prohibited, competitive, or advertising-law content is detected. See Risk Segments.

Matched Lists

Parameter	Type	Required	Description
`name`	string	Yes	Name of the matched list.
`words`	array	Yes	Sensitive word details.
`words[].word`	string	Yes	The matched sensitive word.
`words[].position`	array	Yes	Position of the sensitive word.

Risk Segments

Parameter	Type	Required	Description
`segment`	string	No	High-risk content segment text.
`position`	array	No	Position of the segment (0-indexed).

Segment `businessLabels`

Each element in the businessLabels array:

Parameter	Type	Required	Description
`businessLabel1`	string	Yes	Level 1 business label.
`businessLabel2`	string	Yes	Level 2 business label.
`businessLabel3`	string	Yes	Level 3 business label.
`businessDescription`	string	Yes	Business label description. Format: "Level 1: Level 2: Level 3".
`confidenceLevel`	int32	No	Confidence level (0–2). Higher values indicate greater confidence.
`probability`	float	No	Confidence score (0–1).
`businessDetail`	object	No	Detailed information. Reserved field.

`audioTags` Object

⚠️
Legacy compatibility field. New integrations should consume businessLabels inside audioDetail instead.

Parameter	Type	Required	Description
`gender`	object	No	Gender detection result.
`gender.label`	string	Yes	Gender label name (e.g., `Male`, `Female`).
`gender.probability`	int32	No	Gender probability on a legacy 0–100 scale (higher values indicate greater likelihood). Note that other probability fields in this API use the modern 0–1 scale.
`timbre`	array	No	Voice timbre detection results. Each element contains `label` and `probability`. Possible label values: Uncle, Young Man, Boy, Elderly Man, Queen, Mature Woman, Young Woman, Loli, Middle-aged Woman, Male, Female, No Voice.
`language`	array	No	Language detection results. Each element contains `label` (see Language Labels) and `probability` (modern responses) or `confidence` (legacy responses).

Language Labels

Label	Description
`0`	Mandarin Chinese
`1`	English
`2`	Cantonese
`3`	Tibetan
`4`	Uyghur
`5`	Mongolian
`6`	Korean
`-1`	Other languages

`auxInfo` Object

Parameter	Type	Required	Description
`errorCode`	int32	No	Processing-stage error code. `2003`: audio download failure. `2007`: no valid audio data to moderate.
`passThrough`	object	No	Client pass-through field. Same value as `data.extra.passThrough` in the request.

Token Labels

Both tokenProfileLabels and tokenRiskLabels share the same structure:

Parameter	Type	Required	Description
`label1`	string	No	Level 1 label.
`label2`	string	No	Level 2 label.
`label3`	string	No	Level 3 label.
`description`	string	No	Label description. For reference only — do not use for programmatic logic.
`timestamp`	int64	No	Label timestamp. 13-digit Unix timestamp in milliseconds (UTC).

Examples

Request Example

{
  "accessKey": "YOUR_ACCESS_KEY",
  "appId": "default",
  "eventId": "default",
  "type": "POLITY_EROTIC_ADVERT_MOAN",
  "businessType": "GENDER_TIMBRE_SING_LANGUAGE",
  "btId": "test1",
  "contentType": "URL",
  "content": "https://example.com/audio/sample.mp3",
  "callback": "http://www.example.com/callback",
  "data": {
    "returnAllText": 1,
    "tokenId": "token-short"
  }
}

Response Example

{
  "code": 1100,
  "message": "Success",
  "requestId": "6a9cb980346dfea41111656a514e9109",
  "btId": "1604311839040"
}

Callback Example

{
  "requestId": "6a9cb980346dfea41111656a514e9109",
  "btId": "1604311839040",
  "code": 1100,
  "message": "Success",
  "riskLevel": "PASS",
  "audioDetail": [
    {
      "requestId": "6a9cb980346dfea41111656a514e9109_a0000",
      "audioStarttime": 0,
      "audioEndtime": 10,
      "audioUrl": "http://example.com/audio_segment_a0000.mp3",
      "businessLabels": [
        {
          "businessDescription": "Singing: Singing: Singing",
          "businessDetail": {},
          "businessLabel1": "sing",
          "businessLabel2": "changge",
          "businessLabel3": "changge",
          "confidenceLevel": 2,
          "probability": 0.858334402569294
        }
      ],
      "allLabels": [],
      "riskLevel": "PASS",
      "riskLabel1": "normal",
      "riskLabel2": "",
      "riskLabel3": "",
      "riskDescription": "Normal",
      "riskDetail": {
        "audioText": ""
      }
    },
    {
      "requestId": "6a9cb980346dfea41111656a514e9109_a0001",
      "audioStarttime": 10,
      "audioEndtime": 20,
      "audioUrl": "http://example.com/audio_segment_a0001.mp3",
      "riskLevel": "PASS",
      "riskLabel1": "normal",
      "riskLabel2": "",
      "riskLabel3": "",
      "riskDescription": "Normal",
      "riskDetail": {
        "audioText": ""
      }
    },
    {
      "requestId": "6a9cb980346dfea41111656a514e9109_a0002",
      "audioStarttime": 20,
      "audioEndtime": 30,
      "audioUrl": "http://example.com/audio_segment_a0002.mp3",
      "riskLevel": "PASS",
      "riskLabel1": "normal",
      "riskLabel2": "",
      "riskLabel3": "",
      "riskDescription": "Normal",
      "riskDetail": {
        "audioText": ""
      }
    }
  ],
  "audioTags": {
    "gender": {
      "label": "Female",
      "probability": 95
    },
    "language": [
      {
        "confidence": 0,
        "label": 2
      },
      {
        "confidence": 99,
        "label": 0
      },
      {
        "confidence": 0,
        "label": 1
      }
    ],
    "song": 0,
    "timbre": [
      {
        "label": "Female",
        "probability": 95
      },
      {
        "label": "Queen",
        "probability": 12
      },
      {
        "label": "Mature Woman",
        "probability": 37
      },
      {
        "label": "Young Woman",
        "probability": 56
      },
      {
        "label": "Middle-aged Woman",
        "probability": 67
      },
      {
        "label": "Loli",
        "probability": 24
      }
    ]
  }
}

API Description

Requirements

Audio Requirements

Timeout Suggestion

Callback Mechanism

Request

Request URL

Request Parameters

Risk Detection Types

Business Detection Types

data Object

Supported Languages

User Levels

Synchronous Response

Response Parameters

Response Codes

Callback Parameters

Callback Response Codes

audioDetail Array

Segment allLabels

Segment riskDetail

Matched Lists

Risk Segments

Segment businessLabels

audioTags Object

Language Labels

auxInfo Object

Token Labels

Examples

Request Example

Response Example

Callback Example

`data` Object

`audioDetail` Array

Segment `allLabels`

Segment `riskDetail`

Segment `businessLabels`

`audioTags` Object

`auxInfo` Object