Async API

Submit audio for asynchronous moderation and receive the results through a callback URL.

API Description

Asynchronous audio moderation API. Submits one audio clip per request and receives an immediate acknowledgement with a btId; the full moderation result is delivered later via callback to the URL you provide. Detects regulatory risks in audio content — including political content, pornography, advertising, and violence & terrorism — and can additionally identify minors, voice timbre, language, and other content based on the business scenarios enabled on your account.

Requirements

ItemSpecification
ProtocolHTTP or HTTPS
MethodPOST
EncodingUTF-8
FormatAll request and response parameters use JSON

Audio Requirements

ItemSpecification
Audio typesURL, BASE64
Supported formatsWAV, MP3, AAC, AMR, 3GP, M4A, WMA, OGG, APE, FLAC, ALAC, WAVPACK, SILK_V3, etc.
Size limit≤ 18 MB
ℹ️

Host audio on a highly available CDN before submitting the URL. DeepCleer fetches each clip directly from the origin server you provide, so any outage or single point of failure on your side will cause fetch failures — and an audio clip that can't be fetched can't be moderated.

Timeout Suggestion

  • Asynchronous request (this endpoint): recommended timeout of 5 seconds for the acknowledgement call
  • Callback delivery: results arrive separately; keep your callback handler fast (< 2 seconds) so DeepCleer doesn't retry unnecessarily
ℹ️

The initial acknowledgement returns almost immediately once the request is validated. End-to-end moderation time is dominated by how long DeepCleer takes to fetch your audio and run the requested detection types, so keep your hosting fast and reliable. Actual duration varies with the request type and the audio size.

Callback Mechanism

Results are delivered to the callback URL you supply in the request. When DeepCleer calls your endpoint:

  • The request body is a JSON payload matching Callback Parameters.
  • Your endpoint must respond with HTTP 200 OK within a few seconds. Any non-2xx response or timeout is treated as a delivery failure.
  • On failure, DeepCleer retries with exponential backoff. After repeated failures the result is dropped; we recommend monitoring your callback handler and reprocessing via btId if needed.
  • Your endpoint should be idempotent on btId — the same result may be delivered more than once if an earlier delivery succeeded but the response was lost in transit.

Request

Request URL

ClusterRequest URL
Silicon Valleyhttp://api-audio-gg.fengkongcloud.com/audio/v4
Singaporehttp://api-audio-xjp.fengkongcloud.com/audio/v4

Request Parameters

ParameterTypeRequiredMax LengthDescription
accessKeystringYes20API authentication key. The default accessKey is sent in your onboarding email.
appIdstringYes64Application identifier, such as web for your web application or app for your mobile app. The default appId is sent in your onboarding email. Contact DeepCleer if you need a new appId.
eventIdstringYes64Event identifier used to distinguish moderation scenarios in your application, such as voiceMessage for chat voice messages or liveAudio for livestream audio. The default eventId is sent in your onboarding email. Contact DeepCleer if you need a new eventId.
typestringConditional64Risk detection types to run. Either type or businessType must be provided; you can also provide both. Multiple values can be combined with underscores (for example, POLITY_EROTIC_MOAN_ADVERT). See Risk Detection Types for the full catalog.
businessTypestringConditional128Business detection types to run — your organization's custom moderation categories, configured with DeepCleer separately from the built-in type catalog. Either type or businessType must be provided; you can also provide both. Multiple values can be combined with underscores. See Business Detection Types for the full catalog.
contentTypestringYesFormat of the audio content. URL: audio URL address. RAW: base64-encoded audio data.
contentstringYesAudio content — either a URL address or base64-encoded data. Base64 data limit: 15 MB. Only PCM, WAV, and MP3 formats are supported for base64. PCM format must use 16-bit little-endian encoding. PCM and WAV formats are recommended.
btIdstringYes128Client-side unique audio identifier used to correlate callback results. Echoed back in both the acknowledgement and the callback payload. Must be unique per request (truncated if the 128-character limit is exceeded).
callbackstringYesHTTP callback URL. DeepCleer sends the full moderation result to this URL once processing completes. See Callback Mechanism for delivery semantics.
translationTargetLangstringNoTranslation target language. When provided, the transcribed audio text is translated into this language and returned in the callback. Contact DeepCleer to enable this feature. zh: Chinese. en: English.
acceptLangstringYesLanguage for returned labels. Set en by default.
dataobjectYes1 MBRequest data content. See data Object.

Risk Detection Types

Combine multiple types with underscores (for example, POLITY_EROTIC_MOAN). Recommended starter combination: POLITY_EROTIC_MOAN_ADVERT.

ValueDescription
AUDIOPOLITICALTop-leader voiceprint detection
POLITYPolitical content detection
EROTICPornographic content detection
ADVERTAdvertising detection
ADLAWAdvertising law violation detection
BANProhibited content detection
VIOLENTViolence & terrorism detection
ANTHENNational anthem detection
MOANMoaning detection
DIRTYAbusive language detection
BANEDAUDIOProhibited songs detection
COPYRIGHTSONGSCopyrighted songs detection

Business Detection Types

Combine multiple types with underscores. To detect timbre, singing, or language, GENDER must also be included.

ValueDescription
SINGSinging detection
LANGUAGELanguage detection
GENDERGender detection
TIMBREVoice timbre detection
VOICEVoice attributes
MINORMinor detection
AUDIOSCENEAudio scene detection
AGEAge detection

data Object

ParameterTypeRequiredMax LengthDescription
retryUrlstringNoFallback audio URL. Used when the primary content URL fails to download. Only applies when contentType is URL.
tokenIdstringYes64User account identifier. Recommended to pass the user UID (can be encrypted). Used for behavioral risk detection (spam, advertising, etc.). Must be an alphanumeric string with underscores and hyphens, up to 64 characters.
receiveTokenIdstringConditional64Message receiver's tokenId for private-chat scenarios. Alphanumeric with underscores and hyphens, up to 64 characters. Required when eventId is message.
deviceIdstringNo128DeepCleer device fingerprint identifier generated by the DeepCleer SDK. Used for device-level behavior analysis.
ipstringNo64Public IPv4 or IPv6 address of the user who submitted the audio. Used for IP-based behavior analysis.
dataIdstringNoCustom data identifier. Passed through by your application for your own record-keeping.
levelint32NoUser level for configuring different interception strategies. See User Levels.
genderint32NoUser gender. 0: male. 1: female. 2: unknown.
nicknamestringNo150User nickname. Max 150 characters (truncated if exceeded).
roomstringNo64Live room / game room ID. When the scenario is a live audio room, strongly recommended to provide so per-room strategies can be applied and context recognition is enabled.
langstringNoAudio language used for ASR (audio-to-text transcription). Default: zh. For international audio that cannot be categorized, use auto for automatic language detection. Distinct from acceptLang (which controls the language of the returned labels). See Supported Languages.
formatInfostringConditionalAudio data format. Required when contentType is RAW. Values: pcm, wav, mp3.
rateint32ConditionalAudio sample rate, in Hz. Required when formatInfo is pcm. Range: 8000–32000.
trackint32ConditionalAudio channel count. Required when formatInfo is pcm. 1: mono. 2: stereo.
returnAllTextint32NoControls which audio segments are returned in audioDetail. 0 (default): return only risk segments (REVIEW and REJECT). 1: return all segments (including PASS). This only controls segment-level output; it does not affect the overall result.
audioDetectStepint32NoInterval moderation step size (1–36). A value of 1 skips one 10-second audio segment between moderated segments, 2 skips two, and so on. When omitted, all audio content is moderated. When enabled, it is recommended to also set returnAllText to 1 and consume the ASR result from each moderated segment.
extraobjectNoAuxiliary parameters.
extra.passThroughobjectNo1024Client pass-through field. DeepCleer does not process this field; all content under it is echoed back in the callback payload.

Supported Languages

ValueLanguage
enEnglish
zhChinese (default)
arArabic
hiHindi
esSpanish
frFrench
ruRussian
ptPortuguese
idIndonesian
deGerman
jaJapanese
trTurkish
viVietnamese
itItalian
thThai
tlFilipino
koKorean
msMalay
autoAutomatic language detection (contact DeepCleer to enable)

User Levels

ValueDescription
0Lowest-level user (e.g., newly registered, completely inactive, or level-0 users)
1Lower-level user (e.g., low activity or low-level users)
2Mid-level user (e.g., moderately active or mid-level users)
3Higher-level user (e.g., highly active or high-level users)
4Highest-level user (e.g., paying users, VIP users)

Synchronous Response

The synchronous response is an acknowledgement only. It confirms whether the request was accepted for asynchronous processing. The actual moderation result is delivered later to your callback URL — see Callback Parameters.

Response Parameters

ParameterTypeRequiredDescription
requestIdstringYesUnique DeepCleer request identifier. Strongly recommended to save for troubleshooting.
codeint32YesResponse code. See Response Codes.
messagestringYesResponse message corresponding to the code.
btIdstringYesClient-side audio identifier echoed back. Returned when code is 1100.

Response Codes

CodeMessage
1100Success
1901QPS limit exceeded
1902Invalid parameters
1903Service failure
1904Download failure
1905Decoding failure
9101Unauthorized operation

Callback Parameters

The callback payload delivers the full moderation result for one audio clip.

ℹ️

Parameters other than code, message, and requestId are only guaranteed to be returned when code is 1100.

ParameterTypeRequiredDescription
requestIdstringYesUnique DeepCleer request identifier for this audio clip.
btIdstringYesClient-side audio identifier echoed back from the request.
codeint32YesResponse code. See Callback Response Codes.
messagestringYesResponse message corresponding to the code.
riskLevelstringYesOverall disposition recommendation. PASS: normal (allow). REVIEW: suspicious (route to manual review). REJECT: violation (block). During initial integration, we recommend tuning your interception thresholds before using this value for hard blocks.
audioTextstringYesFull audio-to-text transcription result.
audioTimeint32YesTotal audio duration in seconds.
audioDetailarrayYesPer-segment moderation results. See audioDetail Array.
audioTagsobjectNoLegacy audio tags — gender, timbre, and singing detection. New integrations should use businessLabels inside audioDetail instead. See audioTags Object.
requestParamsobjectYesEcho of all fields submitted under data in the original request. Useful for correlating callbacks with the original payload.
auxInfoobjectNoAuxiliary information. See auxInfo Object.
tokenProfileLabelsarrayNoAccount attribute labels. Returned only when tokenId is provided and the labeling service is enabled. See Token Labels.
tokenRiskLabelsarrayNoAccount risk labels. Returned only when tokenId is provided and the labeling service is enabled. See Token Labels.

Callback Response Codes

CodeMessage
1100Success
1101Processing
1901QPS limit exceeded
1902Invalid parameters
1903Service failure
1904Download failure
1905Decoding failure
9100Insufficient balance
9101Unauthorized operation

audioDetail Array

Each element represents one audio segment:

ParameterTypeRequiredDescription
requestIdstringYesUnique identifier for this audio segment.
audioStarttimefloatYesSegment start time relative to the audio beginning, in seconds.
audioEndtimefloatYesSegment end time relative to the audio beginning, in seconds.
audioUrlstringYesAudio segment URL (MP3 format).
riskLevelstringYesSegment risk level. PASS: normal. REVIEW: suspicious. REJECT: violation.
riskLabel1stringYesLevel 1 risk label. Returns normal when riskLevel is PASS.
riskLabel2stringYesLevel 2 risk label. Empty when riskLevel is PASS.
riskLabel3stringYesLevel 3 risk label. Empty when riskLevel is PASS.
riskDescriptionstringYesRisk description. Returns "Normal" when riskLevel is PASS. Format: "Level 1: Level 2: Level 3". For reference only — do not use for programmatic logic.
riskDetailobjectNoRisk detail for this segment. See Segment riskDetail.
allLabelsarrayNoAll risk labels matched in this segment. See Segment allLabels.
businessLabelsarrayNoAll business labels matched in this segment. See Segment businessLabels.

Note on field casing: audioStarttime and audioEndtime are preserved exactly as returned on the wire (lowercase t). Flag for v5 cleanup alongside other inconsistent casings such as face_num and b_advertise_risk_tokenid.

Segment allLabels

Each element in the allLabels array:

ParameterTypeRequiredDescription
riskLabel1stringYesLevel 1 risk label.
riskLabel2stringYesLevel 2 risk label.
riskLabel3stringYesLevel 3 risk label.
riskDescriptionstringYesRisk description. For reference only — do not use for programmatic logic.
riskLevelstringYesRisk level: PASS, REVIEW, or REJECT.
probabilityfloatNoConfidence score (0–1). Higher values indicate greater confidence.
riskDetailobjectNoRisk detail. Same structure as Segment riskDetail.

Segment riskDetail

ParameterTypeRequiredDescription
audioTextstringNoAudio-to-text transcription result for this segment.
riskSourceint32NoRisk source: 1000 (no risk), 1001 (text risk), 1003 (audio risk).
matchedListsarrayNoMatched custom list information. Returned only when a custom list is hit. See Matched Lists.
riskSegmentsarrayNoHigh-risk content segments. Present when political, terrorism, prohibited, competitive, or advertising-law content is detected. See Risk Segments.
Matched Lists
ParameterTypeRequiredDescription
namestringYesName of the matched list.
wordsarrayYesSensitive word details.
words[].wordstringYesThe matched sensitive word.
words[].positionarrayYesPosition of the sensitive word.
Risk Segments
ParameterTypeRequiredDescription
segmentstringNoHigh-risk content segment text.
positionarrayNoPosition of the segment (0-indexed).

Segment businessLabels

Each element in the businessLabels array:

ParameterTypeRequiredDescription
businessLabel1stringYesLevel 1 business label.
businessLabel2stringYesLevel 2 business label.
businessLabel3stringYesLevel 3 business label.
businessDescriptionstringYesBusiness label description. Format: "Level 1: Level 2: Level 3".
confidenceLevelint32NoConfidence level (0–2). Higher values indicate greater confidence.
probabilityfloatNoConfidence score (0–1).
businessDetailobjectNoDetailed information. Reserved field.

audioTags Object

⚠️

Legacy compatibility field. New integrations should consume businessLabels inside audioDetail instead.

ParameterTypeRequiredDescription
genderobjectNoGender detection result.
gender.labelstringYesGender label name (e.g., Male, Female).
gender.probabilityint32NoGender probability on a legacy 0–100 scale (higher values indicate greater likelihood). Note that other probability fields in this API use the modern 0–1 scale.
timbrearrayNoVoice timbre detection results. Each element contains label and probability. Possible label values: Uncle, Young Man, Boy, Elderly Man, Queen, Mature Woman, Young Woman, Loli, Middle-aged Woman, Male, Female, No Voice.
languagearrayNoLanguage detection results. Each element contains label (see Language Labels) and probability (modern responses) or confidence (legacy responses).

Language Labels

LabelDescription
0Mandarin Chinese
1English
2Cantonese
3Tibetan
4Uyghur
5Mongolian
6Korean
-1Other languages

auxInfo Object

ParameterTypeRequiredDescription
errorCodeint32NoProcessing-stage error code. 2003: audio download failure. 2007: no valid audio data to moderate.
passThroughobjectNoClient pass-through field. Same value as data.extra.passThrough in the request.

Token Labels

Both tokenProfileLabels and tokenRiskLabels share the same structure:

ParameterTypeRequiredDescription
label1stringNoLevel 1 label.
label2stringNoLevel 2 label.
label3stringNoLevel 3 label.
descriptionstringNoLabel description. For reference only — do not use for programmatic logic.
timestampint64NoLabel timestamp. 13-digit Unix timestamp in milliseconds (UTC).

Examples

Request Example

{
  "accessKey": "YOUR_ACCESS_KEY",
  "appId": "default",
  "eventId": "default",
  "type": "POLITY_EROTIC_ADVERT_MOAN",
  "businessType": "GENDER_TIMBRE_SING_LANGUAGE",
  "btId": "test1",
  "contentType": "URL",
  "content": "https://example.com/audio/sample.mp3",
  "callback": "http://www.example.com/callback",
  "data": {
    "returnAllText": 1,
    "tokenId": "token-short"
  }
}

Response Example

{
  "code": 1100,
  "message": "Success",
  "requestId": "6a9cb980346dfea41111656a514e9109",
  "btId": "1604311839040"
}

Callback Example

{
  "requestId": "6a9cb980346dfea41111656a514e9109",
  "btId": "1604311839040",
  "code": 1100,
  "message": "Success",
  "riskLevel": "PASS",
  "audioDetail": [
    {
      "requestId": "6a9cb980346dfea41111656a514e9109_a0000",
      "audioStarttime": 0,
      "audioEndtime": 10,
      "audioUrl": "http://example.com/audio_segment_a0000.mp3",
      "businessLabels": [
        {
          "businessDescription": "Singing: Singing: Singing",
          "businessDetail": {},
          "businessLabel1": "sing",
          "businessLabel2": "changge",
          "businessLabel3": "changge",
          "confidenceLevel": 2,
          "probability": 0.858334402569294
        }
      ],
      "allLabels": [],
      "riskLevel": "PASS",
      "riskLabel1": "normal",
      "riskLabel2": "",
      "riskLabel3": "",
      "riskDescription": "Normal",
      "riskDetail": {
        "audioText": ""
      }
    },
    {
      "requestId": "6a9cb980346dfea41111656a514e9109_a0001",
      "audioStarttime": 10,
      "audioEndtime": 20,
      "audioUrl": "http://example.com/audio_segment_a0001.mp3",
      "riskLevel": "PASS",
      "riskLabel1": "normal",
      "riskLabel2": "",
      "riskLabel3": "",
      "riskDescription": "Normal",
      "riskDetail": {
        "audioText": ""
      }
    },
    {
      "requestId": "6a9cb980346dfea41111656a514e9109_a0002",
      "audioStarttime": 20,
      "audioEndtime": 30,
      "audioUrl": "http://example.com/audio_segment_a0002.mp3",
      "riskLevel": "PASS",
      "riskLabel1": "normal",
      "riskLabel2": "",
      "riskLabel3": "",
      "riskDescription": "Normal",
      "riskDetail": {
        "audioText": ""
      }
    }
  ],
  "audioTags": {
    "gender": {
      "label": "Female",
      "probability": 95
    },
    "language": [
      {
        "confidence": 0,
        "label": 2
      },
      {
        "confidence": 99,
        "label": 0
      },
      {
        "confidence": 0,
        "label": 1
      }
    ],
    "song": 0,
    "timbre": [
      {
        "label": "Female",
        "probability": 95
      },
      {
        "label": "Queen",
        "probability": 12
      },
      {
        "label": "Mature Woman",
        "probability": 37
      },
      {
        "label": "Young Woman",
        "probability": 56
      },
      {
        "label": "Middle-aged Woman",
        "probability": 67
      },
      {
        "label": "Loli",
        "probability": 24
      }
    ]
  }
}