Sync API

API Description

Synchronous audio moderation API. Submits one audio clip per request and returns the moderation result directly in the response. Detects regulatory risks in audio content — including political content, pornography, advertising, and violence & terrorism — and can additionally identify minors, voice timbre, language, and other content based on the business scenarios enabled on your account.

Requirements

Item	Specification
Protocol	HTTP or HTTPS
Method	POST
Encoding	UTF-8
Format	All request and response parameters use JSON

Audio Requirements

Item	Specification
Audio types	URL, BASE64
Supported formats	WAV, MP3, AAC, AMR, 3GP, M4A, WMA, OGG, APE, FLAC, ALAC, WAVPACK, SILK_V3, etc.
Duration limit	≤ 60 seconds
Size limit	≤ 18 MB

ℹ️
Host audio on a highly available CDN before submitting the URL. DeepCleer fetches each clip directly from the origin server you provide, so any outage or single point of failure on your side will cause fetch failures — and an audio clip that can't be fetched can't be moderated.

Timeout Suggestion

Synchronous request: recommended timeout of 10 seconds
Asynchronous request: recommended timeout of 5 seconds

ℹ️
End-to-end response time is dominated by how long DeepCleer takes to fetch your audio — keep your hosting fast and reliable. Once the clip is in hand, processing time varies with the request type and the audio size.

Request

Request URL

Cluster	Request URL
Singapore	`http://api-audio-xjp.fengkongcloud.com/audiomessage/v4`

Request Parameters

Parameter	Type	Required	Max Length	Description
`accessKey`	string	Yes	20	API authentication key. The default `accessKey` is sent in your onboarding email.
`appId`	string	Yes	64	Application identifier, such as `web` for your web application or `app` for your mobile app. The default `appId` is sent in your onboarding email. Contact DeepCleer if you need a new `appId`.
`eventId`	string	Yes	64	Event identifier used to distinguish moderation scenarios in your application, such as `voiceMessage` for chat voice messages or `liveAudio` for livestream audio. The default `eventId` is sent in your onboarding email. Contact DeepCleer if you need a new `eventId`.
`type`	string	Conditional	64	Risk detection types to run. Either `type` or `businessType` must be provided; you can also provide both. Multiple values can be combined with underscores (for example, `POLITY_EROTIC_MOAN_ADVERT`). See Risk Detection Types for the full catalog.
`businessType`	string	Conditional	128	Business detection types to run — your organization's custom moderation categories, configured with DeepCleer separately from the built-in `type` catalog. Either `type` or `businessType` must be provided; you can also provide both. Multiple values can be combined with underscores. See Business Detection Types for the full catalog.
`contentType`	string	Yes	—	Format of the audio payload supplied in `content`. `URL`: a publicly fetchable audio URL. `RAW`: base64-encoded audio data.
`content`	string	Yes	—	Audio content — either a URL or base64-encoded data. Base64 payload limit: 15 MB; only PCM, WAV, and MP3 formats are accepted for base64. PCM must use 16-bit little-endian encoding. PCM and WAV are recommended.
`btId`	string	Yes	128	Client-side unique identifier for this audio clip. Echoed back in the response so you can correlate inputs and outputs. Truncated if it exceeds 128 characters; must not be reused across concurrent requests.
`acceptLang`	string	Yes	—	Language of the labels and descriptions in the response. Set `en` by default.
`data`	object	Yes	1 MB	Request payload. See `data` Object.

Risk Detection Types

Combine multiple values with underscores (for example, POLITY_EROTIC_MOAN). The recommended starting set for general moderation is POLITY_EROTIC_MOAN_ADVERT.

Value	Description
`AUDIOPOLITICAL`	Top-leader voiceprint detection
`POLITY`	Political content detection
`EROTIC`	Pornographic content detection
`ADVERT`	Advertising detection
`BAN`	Prohibited content detection
`VIOLENT`	Violence & terrorism detection
`ANTHEN`	National anthem detection
`MOAN`	Moaning detection
`DIRTY`	Abusive language detection
`BANEDAUDIO`	Prohibited songs detection
`COPYRIGHTSONGS`	Copyrighted songs detection

Business Detection Types

Combine multiple values with underscores. To detect TIMBRE, SING, or LANGUAGE, GENDER must also be included in the same request.

Value	Description
`SING`	Singing detection
`LANGUAGE`	Language detection
`GENDER`	Gender detection
`TIMBRE`	Voice timbre detection
`VOICE`	Voice attributes
`MINOR`	Minor detection
`AUDIOSCENE`	Audio scene detection
`AGE`	Age detection

`data` Object

Parameter	Type	Required	Max Length	Description
`tokenId`	string	No	64	Stable identifier for the end user, typically your internal user UID (an encrypted UID is fine). Used for behavioral-risk signals such as spam and repeat-offender detection. Alphanumeric with underscores and hyphens, up to 64 characters.
`receiveTokenId`	string	Conditional	64	`tokenId` of the message recipient in a one-to-one chat. Alphanumeric with underscores and hyphens, up to 64 characters. Required when `eventId` is `message`.
`deviceId`	string	No	128	Device-fingerprint identifier issued by DeepCleer. Generated by the DeepCleer SDK on the end user's device.
`ip`	string	No	64	Public IP address of the user who submitted the audio. Accepts IPv4 or IPv6.
`dataId`	string	No	128	Client-side identifier attached to the moderation call. DeepCleer echoes it back with the result, letting you correlate your source record (a message ID, voice-note ID, review ID, etc.) with the moderation verdict — typically used to look up historical decisions in your own database or in the DeepCleer console.
`level`	int32	No	—	User level. See User Levels.
`gender`	int32	No	—	User's gender. `0`: male. `1`: female. `2`: unknown.
`formatInfo`	string	Conditional	—	Audio data format. Required when `contentType` is `RAW`. Accepted values: `pcm`, `wav`, `mp3`.
`rate`	int32	Conditional	—	Audio sample rate, in Hz. Required when `formatInfo` is `pcm`. Range: `8000`–`32000`.
`track`	int32	Conditional	—	Audio channel count. Required when `formatInfo` is `pcm`. `1`: mono. `2`: stereo.
`returnAllText`	int32	No	—	Controls which audio segments are included in `audioDetail`. `0` (default): return only risk-bearing segments (`REVIEW` and `REJECT`). `1`: return all segments, including `PASS`. This only affects segment-level results in `audioDetail`; it does not change the overall `riskLevel`.

User Levels

Value	Description
`0`	Lowest-level user (e.g., newly registered, completely inactive, or level-0 users)
`1`	Lower-level user (e.g., low activity or low-level users)
`2`	Mid-level user (e.g., moderately active or mid-level users)
`3`	Higher-level user (e.g., highly active or high-level users)
`4`	Highest-level user (e.g., paying users, VIP users)

Response

Response Parameters

ℹ️
Fields other than code, message, and requestId are guaranteed to be returned only when code is 1100.

Parameter	Type	Required	Description
`requestId`	string	Yes	Unique DeepCleer request identifier. Strongly recommended to save for troubleshooting and optimization.
`code`	int32	Yes	Response code. See Response Codes.
`message`	string	Yes	Human-readable message corresponding to `code`.
`detail`	object	No	Detailed moderation result. Returned when `code` is `1100`. See `detail` Object.

Response Codes

Code	Description
`1100`	Success
`1101`	Processing
`1901`	QPS limit exceeded
`1902`	Invalid parameters
`1903`	Service failure
`1904`	Download failure
`1905`	Decoding failure
`9100`	Insufficient balance
`9101`	Unauthorized operation

`detail` Object

Parameter	Type	Required	Description
`riskLevel`	string	Yes	Overall disposition recommendation. `PASS`: normal (allow). `REVIEW`: suspicious (manual review recommended). `REJECT`: violation (reject). During initial integration, do not wire results directly into automated blocking — tune your interception thresholds against historical traffic first.
`audioText`	string	Yes	Full audio-to-text transcription of the clip.
`audioTime`	int32	Yes	Total audio duration, in seconds.
`audioDetail`	array	Yes	Per-segment moderation results. See `audioDetail` Array.
`audioTags`	object	No	Audio attribute tags including gender, timbre, and language. Legacy compatibility field — for new integrations, read these signals from `businessLabels` inside `audioDetail` instead. See `audioTags` Object.

`audioDetail` Array

Each element is the moderation result for one audio segment:

Parameter	Type	Required	Description
`requestId`	string	Yes	Unique DeepCleer identifier for this audio segment.
`audioStarttime`	float	Yes	Segment start time relative to the beginning of the clip, in seconds.
`audioEndtime`	float	Yes	Segment end time relative to the beginning of the clip, in seconds.
`audioUrl`	string	Yes	URL of the segment audio (MP3 format).
`riskLevel`	string	Yes	Segment disposition. `PASS`: normal (allow). `REVIEW`: suspicious (manual review recommended). `REJECT`: violation (reject).
`riskLabel1`	string	Yes	Level-1 risk label. Returns `normal` when `riskLevel` is `PASS`.
`riskLabel2`	string	Yes	Level-2 risk label. Empty when `riskLevel` is `PASS`.
`riskLabel3`	string	Yes	Level-3 risk label. Empty when `riskLevel` is `PASS`.
`riskDescription`	string	Yes	Risk description. Returns `Normal` when `riskLevel` is `PASS`. Format: `"Level 1: Level 2: Level 3"`. Human-readable summary intended for display only — do not parse for programmatic logic; branch on `riskLabel1` / `riskLabel2` / `riskLabel3` instead.
`riskDetail`	object	No	Detail backing this segment's risk decision. See Segment `riskDetail`.
`allLabels`	array	No	All risk labels detected on this segment. See Segment `allLabels`.
`businessLabels`	array	No	All business labels detected on this segment. See Segment `businessLabels`.

Segment `allLabels`

Each element in the allLabels array:

Parameter	Type	Required	Description
`riskLabel1`	string	Yes	Level-1 risk label.
`riskLabel2`	string	Yes	Level-2 risk label.
`riskLabel3`	string	Yes	Level-3 risk label.
`riskDescription`	string	Yes	Risk description. Display only — do not parse for programmatic logic.
`riskLevel`	string	Yes	Disposition for this label: `PASS`, `REVIEW`, or `REJECT`.
`probability`	float	No	Confidence score in the range `0`–`1`. Higher values indicate greater confidence.
`riskDetail`	object	No	Detail backing this label. See Segment `riskDetail`.

Segment `riskDetail`

Parameter	Type	Required	Description
`audioText`	string	No	Audio-to-text transcription for this segment.
`riskSource`	int32	No	Where the risk was identified. `1000`: no risk. `1001`: text risk (transcribed text). `1003`: audio risk (acoustic content).
`matchedLists`	array	No	Custom-list match information. Returned only when a custom keyword list is hit. See Matched Lists.
`riskSegments`	array	No	High-risk content segments inside the transcription. See Risk Segments.

Matched Lists

Parameter	Type	Required	Description
`name`	string	Yes	Name of the matched custom list.
`words`	array	Yes	Sensitive-word matches within the list.
`words[].word`	string	Yes	The matched sensitive word.
`words[].position`	array	Yes	Position of the matched word.

Risk Segments

Parameter	Type	Required	Description
`segment`	string	No	The high-risk content segment.
`position`	array	No	Position of the segment in the source text (0-indexed).

Segment `businessLabels`

Each element in the businessLabels array:

Parameter	Type	Required	Description
`businessLabel1`	string	Yes	Level-1 business label.
`businessLabel2`	string	Yes	Level-2 business label.
`businessLabel3`	string	Yes	Level-3 business label.
`businessDescription`	string	Yes	Business label description. Format: `"Level 1: Level 2: Level 3"`.
`confidenceLevel`	int32	No	Coarse confidence bucket in the range `0`–`2`. Higher values indicate greater confidence.
`probability`	float	No	Confidence score in the range `0`–`1`.
`businessDetail`	object	No	Detail backing the business label. Reserved for future use.

`audioTags` Object

⚠️
Legacy compatibility field. For new integrations, read the equivalent attributes from businessLabels inside audioDetail instead.

Parameter	Type	Required	Description
`gender`	object	No	Gender detection result.
`gender.label`	string	Yes	Gender label name (for example, `Male`, `Female`).
`gender.probability`	int32	No	Gender probability in the range `0`–`100`. Higher values indicate greater confidence. (Note: this legacy field uses a `0`–`100` scale, while modern `probability` fields use the `0`–`1` scale.)
`timbre`	array	No	Voice timbre detection results. Each element contains `label` and `probability`. Possible `label` values: `Uncle`, `Young Man`, `Boy`, `Elderly Man`, `Queen`, `Mature Woman`, `Young Woman`, `Loli`, `Middle-aged Woman`, `Male`, `Female`, `No Voice`.
`language`	array	No	Language detection results. Each element contains `label` (see Language Labels) and `confidence`.

Language Labels

Label	Description
`0`	Mandarin Chinese
`1`	English
`2`	Cantonese
`3`	Tibetan
`4`	Uyghur
`5`	Mongolian
`6`	Korean
`-1`	Other languages

Examples

Request Example

{
  "accessKey": "YOUR_ACCESS_KEY",
  "appId": "default",
  "eventId": "default",
  "type": "POLITY_EROTIC",
  "businessType": "TIMBRE",
  "btId": "test1",
  "contentType": "URL",
  "content": "https://example.com/audio/sample.mp3",
  "data": {
    "returnAllText": 1,
    "tokenId": "token-short"
  }
}

Response Example

{
  "code": 1100,
  "message": "Success",
  "requestId": "817c8509359500c898a762ffe93a582b",
  "btId": "1667392054643",
  "detail": {
    "audioDetail": [
      {
        "requestId": "817c8509359500c898a762ffe93a582b_a0000",
        "audioStarttime": 0,
        "audioEndtime": 10,
        "audioUrl": "http://example.com/audio_segment_a0000.mp3",
        "businessLabels": [
          {
            "businessDescription": "Singing: Singing: Singing",
            "businessDetail": {},
            "businessLabel1": "sing",
            "businessLabel2": "changge",
            "businessLabel3": "changge",
            "confidenceLevel": 2,
            "probability": 0.858334402569294
          }
        ],
        "allLabels": [],
        "riskLevel": "REJECT",
        "riskLabel1": "abuse",
        "riskLabel2": "buwenmingyongyu",
        "riskLabel3": "qingdubuwenmingyongyu",
        "riskDescription": "Abuse: Uncivilized language: Mild uncivilized language",
        "riskDetail": {
          "audioText": "Recognized audio text content..."
        }
      }
    ],
    "audioTags": {
      "gender": {
        "label": "Female",
        "probability": 95
      },
      "language": [
        {
          "confidence": 0,
          "label": 2
        },
        {
          "confidence": 99,
          "label": 0
        },
        {
          "confidence": 0,
          "label": 1
        }
      ],
      "song": 0,
      "timbre": [
        {
          "label": "Female",
          "probability": 95
        },
        {
          "label": "Queen",
          "probability": 12
        },
        {
          "label": "Mature Woman",
          "probability": 37
        },
        {
          "label": "Young Woman",
          "probability": 56
        },
        {
          "label": "Middle-aged Woman",
          "probability": 67
        },
        {
          "label": "Loli",
          "probability": 24
        }
      ]
    },
    "audioText": "Recognized audio text content...",
    "audioTime": 10,
    "code": 1100,
    "requestParams": {
      "channel": "TEST",
      "lang": "zh",
      "returnAllText": 1,
      "tokenId": "test01"
    },
    "riskLevel": "REJECT"
  }
}

API Description

Requirements

Audio Requirements

Timeout Suggestion

Request

Request URL

Request Parameters

Risk Detection Types

Business Detection Types

data Object

User Levels

Response

Response Parameters

Response Codes

detail Object

audioDetail Array

Segment allLabels

Segment riskDetail

Matched Lists

Risk Segments

Segment businessLabels

audioTags Object

Language Labels

Examples

Request Example

Response Example

`data` Object

`detail` Object

`audioDetail` Array

Segment `allLabels`

Segment `riskDetail`

Segment `businessLabels`

`audioTags` Object