Audio Moderation
Video Streaming API is for identifying visual risks or business labels of audio file.
Audio Moderation
The Audio Moderation API detects regulatory risks in audio content — political content, pornography, advertising, violence & terrorism, prohibited content, abusive language, moaning, prohibited or copyrighted songs — and can additionally identify business-scenario attributes such as gender, voice timbre, language, age, and minor presence.
DeepCleer exposes three audio endpoints. They share the same moderation engine and return the same result schema; they differ only in how the result is delivered and what kind of audio they're sized for.
Endpoints at a Glance
| Endpoint | Path | Delivery | Audio length | Typical use case |
|---|---|---|---|---|
| Sync Audio | /audiomessage/v4 | Result in same response | Short clips (≤ 60 s, ≤ 18 MB) | Chat voice messages, short user-generated clips |
| Async Audio Detection | /audio/v4 | Ack now + callback later | Longer audio (≤ 18 MB, no duration cap) | Livestream recordings, long-form audio, throughput-heavy pipelines |
| Audio Query (polling) | /query_audio/v4 | Polled by btId | N/A — looks up an existing async submission | Environments without a public callback URL; recovery for dropped callbacks |
All three accept the same authentication (accessKey) and the same risk catalog (type + businessType), and they return the same result fields — the differences below are about the delivery model, not the moderation behavior.
When to Use Which
Sync Audio — /audiomessage/v4
/audiomessage/v4Submit one short clip per request and get the full moderation result back in the same response. Best when:
- The clip is a short voice message or recorded snippet (≤ 60 seconds).
- Your application needs a synchronous decision — for example, deciding whether to publish a voice message before showing it to the recipient.
- You don't have infrastructure to receive a callback.
Recommended timeout: 10 seconds. Currently exposed on the Singapore cluster.
Async Audio Detection — /audio/v4
/audio/v4Submit one clip per request and receive an immediate acknowledgement (requestId + btId + code); the full moderation result is delivered later to the callback URL you supply. Best when:
- The clip is longer-form — livestream recordings, interview audio, full-length tracks (size cap is 18 MB; no duration cap).
- You're running a throughput-heavy pipeline and don't want to hold sync connections open during audio download and ASR.
- You want to use interval moderation (
audioDetectStep) to skip every Nth 10-second segment for cost-sensitive scanning. - You want translation (
translationTargetLang) of the transcript into a target language. - You need a fallback URL (
retryUrl) in case the primary audio URL fails to download.
Recommended timeout: 5 seconds for the ack call. Exposed on Silicon Valley and Singapore.
Audio Query — /query_audio/v4
/query_audio/v4Pairs with the Async endpoint. Send the original btId and DeepCleer returns the same payload that would have been delivered via callback. Best when:
- Your environment can't host a public callback URL (private networks, on-prem deployments, mobile-only clients), so you poll for results instead.
- You need a recovery channel: the original callback was dropped, your handler crashed, or you want to re-fetch a result you've already lost.
- You want to reconcile state after an outage on your callback handler.
Recommended timeout: 5 seconds. Exposed on Silicon Valley and Singapore. Results are retained on DeepCleer's side for up to 3 days after the original async submission.
Request-Side Differences
The three endpoints share accessKey, appId, eventId, type, businessType, contentType, content, and btId (where applicable). The differences:
| Field | Sync Audio | Async Audio | Audio Query |
|---|---|---|---|
callback (URL) | — | Yes | — |
acceptLang | — | Yes | — |
translationTargetLang | — | Yes | — |
data envelope | Yes | Yes | — |
data.retryUrl | — | Yes | — |
data.audioDetectStep | — | Yes | — |
data.lang (ASR) | — | Yes | — |
data.extra.passThrough | — | Yes | — |
btId (lookup key) | Yes | Yes | Yes |
The Audio Query endpoint is intentionally minimal — it only takes accessKey and btId, because all moderation parameters were specified in the original async submission.
Response-Side Differences
| Field group | Sync Audio (response) | Async Audio (callback) | Audio Query (response) |
|---|---|---|---|
requestId / code / message | Yes | Yes | Yes |
btId (echoed) | Yes (in ack only)* | Yes | Yes |
riskLevel / audioText / audioTime | Yes | Yes | Yes |
audioDetail (per-segment results) | Yes | Yes | Yes |
audioTags (legacy gender/timbre/language) | Yes | Yes | Yes |
requestParams (echo of data) | — | Yes | Yes |
auxInfo.errorCode (2003 / 2007) | — | Yes | Yes |
tokenProfileLabels / tokenRiskLabels | — | Yes | (see Note) |
Response code 1101 (Processing) | — | Yes | Yes |
*Sync Audio doesn't take a btId, so the response carries the moderation result directly under detail — it doesn't need an echoed identifier.
The async callback and the polling response carry the same structure. Treat them as two delivery channels for the same payload.
Choosing Between Async Callback and Polling
If you have a public callback URL available, prefer the callback model — results are pushed to you as soon as they're ready, with no polling overhead. Use the polling endpoint as a complement, not a replacement: it shines when callbacks aren't possible (private networks, mobile clients) or when you need to recover from a delivery failure. Many production integrations enable both — the callback as the primary channel and the polling endpoint as a recovery and reconciliation tool.
Updated 17 days ago