Sync API

API Description

Synchronous audio moderation API. Submits one audio clip per request and returns the moderation result directly in the response. Detects regulatory risks in audio content — including political content, pornography, advertising, and violence & terrorism — and can additionally identify minors, voice timbre, language, and other content based on the business scenarios enabled on your account.

Requirements

ItemSpecification
ProtocolHTTP or HTTPS
MethodPOST
EncodingUTF-8
FormatAll request and response parameters use JSON

Audio Requirements

ItemSpecification
Audio typesURL, BASE64
Supported formatsWAV, MP3, AAC, AMR, 3GP, M4A, WMA, OGG, APE, FLAC, ALAC, WAVPACK, SILK_V3, etc.
Duration limit≤ 60 seconds
Size limit≤ 18 MB
ℹ️

Host audio on a highly available CDN before submitting the URL. DeepCleer fetches each clip directly from the origin server you provide, so any outage or single point of failure on your side will cause fetch failures — and an audio clip that can't be fetched can't be moderated.

Timeout Suggestion

  • Synchronous request: recommended timeout of 10 seconds
  • Asynchronous request: recommended timeout of 5 seconds
ℹ️

End-to-end response time is dominated by how long DeepCleer takes to fetch your audio — keep your hosting fast and reliable. Once the clip is in hand, processing time varies with the request type and the audio size.


Request

Request URL

ClusterRequest URL
Singaporehttp://api-audio-xjp.fengkongcloud.com/audiomessage/v4

Request Parameters

ParameterTypeRequiredMax LengthDescription
accessKeystringYes20API authentication key. The default accessKey is sent in your onboarding email.
appIdstringYes64Application identifier, such as web for your web application or app for your mobile app. The default appId is sent in your onboarding email. Contact DeepCleer if you need a new appId.
eventIdstringYes64Event identifier used to distinguish moderation scenarios in your application, such as voiceMessage for chat voice messages or liveAudio for livestream audio. The default eventId is sent in your onboarding email. Contact DeepCleer if you need a new eventId.
typestringConditional64Risk detection types to run. Either type or businessType must be provided; you can also provide both. Multiple values can be combined with underscores (for example, POLITY_EROTIC_MOAN_ADVERT). See Risk Detection Types for the full catalog.
businessTypestringConditional128Business detection types to run — your organization's custom moderation categories, configured with DeepCleer separately from the built-in type catalog. Either type or businessType must be provided; you can also provide both. Multiple values can be combined with underscores. See Business Detection Types for the full catalog.
contentTypestringYesFormat of the audio payload supplied in content. URL: a publicly fetchable audio URL. RAW: base64-encoded audio data.
contentstringYesAudio content — either a URL or base64-encoded data. Base64 payload limit: 15 MB; only PCM, WAV, and MP3 formats are accepted for base64. PCM must use 16-bit little-endian encoding. PCM and WAV are recommended.
btIdstringYes128Client-side unique identifier for this audio clip. Echoed back in the response so you can correlate inputs and outputs. Truncated if it exceeds 128 characters; must not be reused across concurrent requests.
acceptLangstringYesLanguage of the labels and descriptions in the response. Set en by default.
dataobjectYes1 MBRequest payload. See data Object.

Risk Detection Types

Combine multiple values with underscores (for example, POLITY_EROTIC_MOAN). The recommended starting set for general moderation is POLITY_EROTIC_MOAN_ADVERT.

ValueDescription
AUDIOPOLITICALTop-leader voiceprint detection
POLITYPolitical content detection
EROTICPornographic content detection
ADVERTAdvertising detection
BANProhibited content detection
VIOLENTViolence & terrorism detection
ANTHENNational anthem detection
MOANMoaning detection
DIRTYAbusive language detection
BANEDAUDIOProhibited songs detection
COPYRIGHTSONGSCopyrighted songs detection

Business Detection Types

Combine multiple values with underscores. To detect TIMBRE, SING, or LANGUAGE, GENDER must also be included in the same request.

ValueDescription
SINGSinging detection
LANGUAGELanguage detection
GENDERGender detection
TIMBREVoice timbre detection
VOICEVoice attributes
MINORMinor detection
AUDIOSCENEAudio scene detection
AGEAge detection

data Object

ParameterTypeRequiredMax LengthDescription
tokenIdstringNo64Stable identifier for the end user, typically your internal user UID (an encrypted UID is fine). Used for behavioral-risk signals such as spam and repeat-offender detection. Alphanumeric with underscores and hyphens, up to 64 characters.
receiveTokenIdstringConditional64tokenId of the message recipient in a one-to-one chat. Alphanumeric with underscores and hyphens, up to 64 characters. Required when eventId is message.
deviceIdstringNo128Device-fingerprint identifier issued by DeepCleer. Generated by the DeepCleer SDK on the end user's device.
ipstringNo64Public IP address of the user who submitted the audio. Accepts IPv4 or IPv6.
dataIdstringNo128Client-side identifier attached to the moderation call. DeepCleer echoes it back with the result, letting you correlate your source record (a message ID, voice-note ID, review ID, etc.) with the moderation verdict — typically used to look up historical decisions in your own database or in the DeepCleer console.
levelint32NoUser level. See User Levels.
genderint32NoUser's gender. 0: male. 1: female. 2: unknown.
formatInfostringConditionalAudio data format. Required when contentType is RAW. Accepted values: pcm, wav, mp3.
rateint32ConditionalAudio sample rate, in Hz. Required when formatInfo is pcm. Range: 800032000.
trackint32ConditionalAudio channel count. Required when formatInfo is pcm. 1: mono. 2: stereo.
returnAllTextint32NoControls which audio segments are included in audioDetail. 0 (default): return only risk-bearing segments (REVIEW and REJECT). 1: return all segments, including PASS. This only affects segment-level results in audioDetail; it does not change the overall riskLevel.

User Levels

ValueDescription
0Lowest-level user (e.g., newly registered, completely inactive, or level-0 users)
1Lower-level user (e.g., low activity or low-level users)
2Mid-level user (e.g., moderately active or mid-level users)
3Higher-level user (e.g., highly active or high-level users)
4Highest-level user (e.g., paying users, VIP users)

Response

Response Parameters

ℹ️

Fields other than code, message, and requestId are guaranteed to be returned only when code is 1100.

ParameterTypeRequiredDescription
requestIdstringYesUnique DeepCleer request identifier. Strongly recommended to save for troubleshooting and optimization.
codeint32YesResponse code. See Response Codes.
messagestringYesHuman-readable message corresponding to code.
detailobjectNoDetailed moderation result. Returned when code is 1100. See detail Object.

Response Codes

CodeDescription
1100Success
1101Processing
1901QPS limit exceeded
1902Invalid parameters
1903Service failure
1904Download failure
1905Decoding failure
9100Insufficient balance
9101Unauthorized operation

detail Object

ParameterTypeRequiredDescription
riskLevelstringYesOverall disposition recommendation. PASS: normal (allow). REVIEW: suspicious (manual review recommended). REJECT: violation (reject). During initial integration, do not wire results directly into automated blocking — tune your interception thresholds against historical traffic first.
audioTextstringYesFull audio-to-text transcription of the clip.
audioTimeint32YesTotal audio duration, in seconds.
audioDetailarrayYesPer-segment moderation results. See audioDetail Array.
audioTagsobjectNoAudio attribute tags including gender, timbre, and language. Legacy compatibility field — for new integrations, read these signals from businessLabels inside audioDetail instead. See audioTags Object.

audioDetail Array

Each element is the moderation result for one audio segment:

ParameterTypeRequiredDescription
requestIdstringYesUnique DeepCleer identifier for this audio segment.
audioStarttimefloatYesSegment start time relative to the beginning of the clip, in seconds.
audioEndtimefloatYesSegment end time relative to the beginning of the clip, in seconds.
audioUrlstringYesURL of the segment audio (MP3 format).
riskLevelstringYesSegment disposition. PASS: normal (allow). REVIEW: suspicious (manual review recommended). REJECT: violation (reject).
riskLabel1stringYesLevel-1 risk label. Returns normal when riskLevel is PASS.
riskLabel2stringYesLevel-2 risk label. Empty when riskLevel is PASS.
riskLabel3stringYesLevel-3 risk label. Empty when riskLevel is PASS.
riskDescriptionstringYesRisk description. Returns Normal when riskLevel is PASS. Format: "Level 1: Level 2: Level 3". Human-readable summary intended for display only — do not parse for programmatic logic; branch on riskLabel1 / riskLabel2 / riskLabel3 instead.
riskDetailobjectNoDetail backing this segment's risk decision. See Segment riskDetail.
allLabelsarrayNoAll risk labels detected on this segment. See Segment allLabels.
businessLabelsarrayNoAll business labels detected on this segment. See Segment businessLabels.

Segment allLabels

Each element in the allLabels array:

ParameterTypeRequiredDescription
riskLabel1stringYesLevel-1 risk label.
riskLabel2stringYesLevel-2 risk label.
riskLabel3stringYesLevel-3 risk label.
riskDescriptionstringYesRisk description. Display only — do not parse for programmatic logic.
riskLevelstringYesDisposition for this label: PASS, REVIEW, or REJECT.
probabilityfloatNoConfidence score in the range 01. Higher values indicate greater confidence.
riskDetailobjectNoDetail backing this label. See Segment riskDetail.

Segment riskDetail

ParameterTypeRequiredDescription
audioTextstringNoAudio-to-text transcription for this segment.
riskSourceint32NoWhere the risk was identified. 1000: no risk. 1001: text risk (transcribed text). 1003: audio risk (acoustic content).
matchedListsarrayNoCustom-list match information. Returned only when a custom keyword list is hit. See Matched Lists.
riskSegmentsarrayNoHigh-risk content segments inside the transcription. See Risk Segments.
Matched Lists
ParameterTypeRequiredDescription
namestringYesName of the matched custom list.
wordsarrayYesSensitive-word matches within the list.
words[].wordstringYesThe matched sensitive word.
words[].positionarrayYesPosition of the matched word.
Risk Segments
ParameterTypeRequiredDescription
segmentstringNoThe high-risk content segment.
positionarrayNoPosition of the segment in the source text (0-indexed).

Segment businessLabels

Each element in the businessLabels array:

ParameterTypeRequiredDescription
businessLabel1stringYesLevel-1 business label.
businessLabel2stringYesLevel-2 business label.
businessLabel3stringYesLevel-3 business label.
businessDescriptionstringYesBusiness label description. Format: "Level 1: Level 2: Level 3".
confidenceLevelint32NoCoarse confidence bucket in the range 02. Higher values indicate greater confidence.
probabilityfloatNoConfidence score in the range 01.
businessDetailobjectNoDetail backing the business label. Reserved for future use.

audioTags Object

⚠️

Legacy compatibility field. For new integrations, read the equivalent attributes from businessLabels inside audioDetail instead.

ParameterTypeRequiredDescription
genderobjectNoGender detection result.
gender.labelstringYesGender label name (for example, Male, Female).
gender.probabilityint32NoGender probability in the range 0100. Higher values indicate greater confidence. (Note: this legacy field uses a 0100 scale, while modern probability fields use the 01 scale.)
timbrearrayNoVoice timbre detection results. Each element contains label and probability. Possible label values: Uncle, Young Man, Boy, Elderly Man, Queen, Mature Woman, Young Woman, Loli, Middle-aged Woman, Male, Female, No Voice.
languagearrayNoLanguage detection results. Each element contains label (see Language Labels) and confidence.

Language Labels

LabelDescription
0Mandarin Chinese
1English
2Cantonese
3Tibetan
4Uyghur
5Mongolian
6Korean
-1Other languages

Examples

Request Example

{
  "accessKey": "YOUR_ACCESS_KEY",
  "appId": "default",
  "eventId": "default",
  "type": "POLITY_EROTIC",
  "businessType": "TIMBRE",
  "btId": "test1",
  "contentType": "URL",
  "content": "https://example.com/audio/sample.mp3",
  "data": {
    "returnAllText": 1,
    "tokenId": "token-short"
  }
}

Response Example

{
  "code": 1100,
  "message": "Success",
  "requestId": "817c8509359500c898a762ffe93a582b",
  "btId": "1667392054643",
  "detail": {
    "audioDetail": [
      {
        "requestId": "817c8509359500c898a762ffe93a582b_a0000",
        "audioStarttime": 0,
        "audioEndtime": 10,
        "audioUrl": "http://example.com/audio_segment_a0000.mp3",
        "businessLabels": [
          {
            "businessDescription": "Singing: Singing: Singing",
            "businessDetail": {},
            "businessLabel1": "sing",
            "businessLabel2": "changge",
            "businessLabel3": "changge",
            "confidenceLevel": 2,
            "probability": 0.858334402569294
          }
        ],
        "allLabels": [],
        "riskLevel": "REJECT",
        "riskLabel1": "abuse",
        "riskLabel2": "buwenmingyongyu",
        "riskLabel3": "qingdubuwenmingyongyu",
        "riskDescription": "Abuse: Uncivilized language: Mild uncivilized language",
        "riskDetail": {
          "audioText": "Recognized audio text content..."
        }
      }
    ],
    "audioTags": {
      "gender": {
        "label": "Female",
        "probability": 95
      },
      "language": [
        {
          "confidence": 0,
          "label": 2
        },
        {
          "confidence": 99,
          "label": 0
        },
        {
          "confidence": 0,
          "label": 1
        }
      ],
      "song": 0,
      "timbre": [
        {
          "label": "Female",
          "probability": 95
        },
        {
          "label": "Queen",
          "probability": 12
        },
        {
          "label": "Mature Woman",
          "probability": 37
        },
        {
          "label": "Young Woman",
          "probability": 56
        },
        {
          "label": "Middle-aged Woman",
          "probability": 67
        },
        {
          "label": "Loli",
          "probability": 24
        }
      ]
    },
    "audioText": "Recognized audio text content...",
    "audioTime": 10,
    "code": 1100,
    "requestParams": {
      "channel": "TEST",
      "lang": "zh",
      "returnAllText": 1,
      "tokenId": "test01"
    },
    "riskLevel": "REJECT"
  }
}