Audio Moderation

The Audio Moderation API detects regulatory risks in audio content — political content, pornography, advertising, violence & terrorism, prohibited content, abusive language, moaning, prohibited or copyrighted songs — and can additionally identify business-scenario attributes such as gender, voice timbre, language, age, and minor presence.

DeepCleer exposes three audio endpoints. They share the same moderation engine and return the same result schema; they differ only in how the result is delivered and what kind of audio they're sized for.

Endpoints at a Glance

Endpoint	Path	Delivery	Audio length	Typical use case
Sync Audio	`/audiomessage/v4`	Result in same response	Short clips (≤ 60 s, ≤ 18 MB)	Chat voice messages, short user-generated clips
Async Audio Detection	`/audio/v4`	Ack now + callback later	Longer audio (≤ 18 MB, no duration cap)	Livestream recordings, long-form audio, throughput-heavy pipelines
Audio Query (polling)	`/query_audio/v4`	Polled by `btId`	N/A — looks up an existing async submission	Environments without a public callback URL; recovery for dropped callbacks

All three accept the same authentication (accessKey) and the same risk catalog (type + businessType), and they return the same result fields — the differences below are about the delivery model, not the moderation behavior.

When to Use Which

Sync Audio — `/audiomessage/v4`

Submit one short clip per request and get the full moderation result back in the same response. Best when:

The clip is a short voice message or recorded snippet (≤ 60 seconds).
Your application needs a synchronous decision — for example, deciding whether to publish a voice message before showing it to the recipient.
You don't have infrastructure to receive a callback.

Recommended timeout: 10 seconds. Currently exposed on the Singapore cluster.

Async Audio Detection — `/audio/v4`

Submit one clip per request and receive an immediate acknowledgement (requestId + btId + code); the full moderation result is delivered later to the callback URL you supply. Best when:

The clip is longer-form — livestream recordings, interview audio, full-length tracks (size cap is 18 MB; no duration cap).
You're running a throughput-heavy pipeline and don't want to hold sync connections open during audio download and ASR.
You want to use interval moderation (audioDetectStep) to skip every Nth 10-second segment for cost-sensitive scanning.
You want translation (translationTargetLang) of the transcript into a target language.
You need a fallback URL (retryUrl) in case the primary audio URL fails to download.

Recommended timeout: 5 seconds for the ack call. Exposed on Silicon Valley and Singapore.

Audio Query — `/query_audio/v4`

Pairs with the Async endpoint. Send the original btId and DeepCleer returns the same payload that would have been delivered via callback. Best when:

Your environment can't host a public callback URL (private networks, on-prem deployments, mobile-only clients), so you poll for results instead.
You need a recovery channel: the original callback was dropped, your handler crashed, or you want to re-fetch a result you've already lost.
You want to reconcile state after an outage on your callback handler.

Recommended timeout: 5 seconds. Exposed on Silicon Valley and Singapore. Results are retained on DeepCleer's side for up to 3 days after the original async submission.

Request-Side Differences

The three endpoints share accessKey, appId, eventId, type, businessType, contentType, content, and btId (where applicable). The differences:

Field	Sync Audio	Async Audio	Audio Query
`callback` (URL)	—	Yes	—
`acceptLang`	—	Yes	—
`translationTargetLang`	—	Yes	—
`data` envelope	Yes	Yes	—
`data.retryUrl`	—	Yes	—
`data.audioDetectStep`	—	Yes	—
`data.lang` (ASR)	—	Yes	—
`data.extra.passThrough`	—	Yes	—
`btId` (lookup key)	Yes	Yes	Yes

The Audio Query endpoint is intentionally minimal — it only takes accessKey and btId, because all moderation parameters were specified in the original async submission.

Response-Side Differences

Field group	Sync Audio (response)	Async Audio (callback)	Audio Query (response)
`requestId` / `code` / `message`	Yes	Yes	Yes
`btId` (echoed)	Yes (in ack only)*	Yes	Yes
`riskLevel` / `audioText` / `audioTime`	Yes	Yes	Yes
`audioDetail` (per-segment results)	Yes	Yes	Yes
`audioTags` (legacy gender/timbre/language)	Yes	Yes	Yes
`requestParams` (echo of `data`)	—	Yes	Yes
`auxInfo.errorCode` (`2003` / `2007`)	—	Yes	Yes
`tokenProfileLabels` / `tokenRiskLabels`	—	Yes	(see Note)
Response code `1101` (Processing)	—	Yes	Yes

*Sync Audio doesn't take a btId, so the response carries the moderation result directly under detail — it doesn't need an echoed identifier.

The async callback and the polling response carry the same structure. Treat them as two delivery channels for the same payload.

Choosing Between Async Callback and Polling

If you have a public callback URL available, prefer the callback model — results are pushed to you as soon as they're ready, with no polling overhead. Use the polling endpoint as a complement, not a replacement: it shines when callbacks aren't possible (private networks, mobile clients) or when you need to recover from a delivery failure. Many production integrations enable both — the callback as the primary channel and the polling endpoint as a recovery and reconciliation tool.