Audio Moderation

Video Streaming API is for identifying visual risks or business labels of audio file.

Audio Moderation

The Audio Moderation API detects regulatory risks in audio content — political content, pornography, advertising, violence & terrorism, prohibited content, abusive language, moaning, prohibited or copyrighted songs — and can additionally identify business-scenario attributes such as gender, voice timbre, language, age, and minor presence.

DeepCleer exposes three audio endpoints. They share the same moderation engine and return the same result schema; they differ only in how the result is delivered and what kind of audio they're sized for.

Endpoints at a Glance

EndpointPathDeliveryAudio lengthTypical use case
Sync Audio/audiomessage/v4Result in same responseShort clips (≤ 60 s, ≤ 18 MB)Chat voice messages, short user-generated clips
Async Audio Detection/audio/v4Ack now + callback laterLonger audio (≤ 18 MB, no duration cap)Livestream recordings, long-form audio, throughput-heavy pipelines
Audio Query (polling)/query_audio/v4Polled by btIdN/A — looks up an existing async submissionEnvironments without a public callback URL; recovery for dropped callbacks

All three accept the same authentication (accessKey) and the same risk catalog (type + businessType), and they return the same result fields — the differences below are about the delivery model, not the moderation behavior.

When to Use Which

Sync Audio — /audiomessage/v4

Submit one short clip per request and get the full moderation result back in the same response. Best when:

  • The clip is a short voice message or recorded snippet (≤ 60 seconds).
  • Your application needs a synchronous decision — for example, deciding whether to publish a voice message before showing it to the recipient.
  • You don't have infrastructure to receive a callback.

Recommended timeout: 10 seconds. Currently exposed on the Singapore cluster.

Async Audio Detection — /audio/v4

Submit one clip per request and receive an immediate acknowledgement (requestId + btId + code); the full moderation result is delivered later to the callback URL you supply. Best when:

  • The clip is longer-form — livestream recordings, interview audio, full-length tracks (size cap is 18 MB; no duration cap).
  • You're running a throughput-heavy pipeline and don't want to hold sync connections open during audio download and ASR.
  • You want to use interval moderation (audioDetectStep) to skip every Nth 10-second segment for cost-sensitive scanning.
  • You want translation (translationTargetLang) of the transcript into a target language.
  • You need a fallback URL (retryUrl) in case the primary audio URL fails to download.

Recommended timeout: 5 seconds for the ack call. Exposed on Silicon Valley and Singapore.

Audio Query — /query_audio/v4

Pairs with the Async endpoint. Send the original btId and DeepCleer returns the same payload that would have been delivered via callback. Best when:

  • Your environment can't host a public callback URL (private networks, on-prem deployments, mobile-only clients), so you poll for results instead.
  • You need a recovery channel: the original callback was dropped, your handler crashed, or you want to re-fetch a result you've already lost.
  • You want to reconcile state after an outage on your callback handler.

Recommended timeout: 5 seconds. Exposed on Silicon Valley and Singapore. Results are retained on DeepCleer's side for up to 3 days after the original async submission.

Request-Side Differences

The three endpoints share accessKey, appId, eventId, type, businessType, contentType, content, and btId (where applicable). The differences:

FieldSync AudioAsync AudioAudio Query
callback (URL)Yes
acceptLangYes
translationTargetLangYes
data envelopeYesYes
data.retryUrlYes
data.audioDetectStepYes
data.lang (ASR)Yes
data.extra.passThroughYes
btId (lookup key)YesYesYes

The Audio Query endpoint is intentionally minimal — it only takes accessKey and btId, because all moderation parameters were specified in the original async submission.

Response-Side Differences

Field groupSync Audio (response)Async Audio (callback)Audio Query (response)
requestId / code / messageYesYesYes
btId (echoed)Yes (in ack only)*YesYes
riskLevel / audioText / audioTimeYesYesYes
audioDetail (per-segment results)YesYesYes
audioTags (legacy gender/timbre/language)YesYesYes
requestParams (echo of data)YesYes
auxInfo.errorCode (2003 / 2007)YesYes
tokenProfileLabels / tokenRiskLabelsYes(see Note)
Response code 1101 (Processing)YesYes

*Sync Audio doesn't take a btId, so the response carries the moderation result directly under detail — it doesn't need an echoed identifier.

The async callback and the polling response carry the same structure. Treat them as two delivery channels for the same payload.

Choosing Between Async Callback and Polling

If you have a public callback URL available, prefer the callback model — results are pushed to you as soon as they're ready, with no polling overhead. Use the polling endpoint as a complement, not a replacement: it shines when callbacks aren't possible (private networks, mobile clients) or when you need to recover from a delivery failure. Many production integrations enable both — the callback as the primary channel and the polling endpoint as a recovery and reconciliation tool.