iconik AI: auto-tagging, face recognition, and transcription

iconik's AI is a metadata machine: it transcribes speech, recognizes faces and people, and tags objects and scenes, then makes all of it searchable inside the same cloud MAM that manages your media. The part most teams miss is how it bills. Almost none of this AI runs on iconik's own models. iconik orchestrates third-party engines (Rev AI for transcription, Google Video Intelligence and Amazon Rekognition for visual tagging) against proxy files and extracted audio, then meters your usage in credits. This guide walks through what each feature does, what it costs, and why the proxy-and-audio processing model is the detail that decides both your privacy posture and your bill.

For the wider product (the MAM core, the Storage Gateway, and where the platform's seat and storage costs hide), see the full iconik review for 2026. This piece stays on the AI.

What iconik AI actually does #

iconik groups its AI into four jobs, and it is honest on its own AI page that these run automatically during ingest and are administrator-controlled, not on by default. Checked Jun 2026, the four are:

Automatic transcription. Speech-to-text runs during ingest and produces a time-coded, searchable transcript. Click a phrase, jump to that frame. iconik's FAQ lists support for roughly 28 to 30 languages through Rev AI, plus one-click translation of any transcript into 11 languages.
Face and people recognition. iconik uses its own facial-recognition model (not a cloud provider's) to find every instance where a specific person appears, then lets you name and train those identities across the library.
Object and scene detection. This is the auto-tagging most people mean by "iconik ai tagging": labels for objects, locations, activities, and scene changes, generated automatically and written back as searchable metadata.
Content summarization. Newer NLP that extracts themes and writes a short summary from the transcript, so a two-hour interview gets a paragraph you can skim.

Think of it as a librarian who watches a low-res copy of every clip and writes index cards: who is in it, what is in it, and what was said. The footage stays where it is; only the cards (the metadata) go into iconik's search index.

Why proxy-and-audio processing is the key detail #

This is the architectural fact that matters most, and iconik states it plainly: processing uses proxy files for visual analysis and extracted audio for transcription, so full-resolution originals are never sent to the AI providers, and your content is not used to train their external models (checked Jun 2026 on iconik's AI page).

That has two consequences worth pulling apart. The good one: your 4K and RAW masters stay put. iconik generates a lightweight proxy and a stripped audio track, and only those leave for Google, Amazon, or Rev. For a team under an NDA or handling embargoed footage, sending a 540p proxy to a third party is a very different exposure than shipping the camera negative, though it is still a third party seeing the proxy and hearing the audio. If that distinction matters to your clients, it deserves a real conversation, which is the subject of our piece on AI features and client confidentiality.

The practical one: a proxy and an audio file are small, so the cost of analysis is driven by the duration of your media, not its resolution. A one-hour 8K timeline and a one-hour 1080p timeline cost the same to transcribe and tag, because both produce roughly an hour of proxy and audio. Duration is the meter. Resolution is not.

Who actually runs each AI job #

iconik is an orchestrator, and crediting the real engines fairly matters because it explains the pricing. Per iconik's FAQ and help documentation (checked Jun 2026), the routing is:

Which engine powers each iconik AI feature, and the engine's own published rate where public. Checked Jun 2026.
iconik feature	Underlying engine	Engine's own list price
Transcription	Rev AI speech-to-text (Google Cloud transcription is also selectable)	Rev AI lists $0.25 per minute, 96%+ accuracy, 37 languages
Object and scene tagging	Google Video Intelligence and Amazon Rekognition Video	Rekognition label detection $0.10/min, shot detection $0.05/min (US East)
Face and people recognition	iconik's own model	Not separately published; metered in iconik AI credits
Summaries and topics	iconik NLP over the transcript	Metered in iconik AI credits

The list prices in the right column are the providers' direct rates, not what iconik charges you. They are here to show the floor. When you buy transcription through iconik, you are paying iconik, which is paying Rev, plus iconik's margin and orchestration. That is a fair arrangement (you get one bill, one search index, one UI), but it is why nobody should expect iconik AI to be cheaper than wiring up Rev or Rekognition yourself.

The credit system, and where the real cost hides #

iconik prices AI in credits rather than per feature. The reasoning, from its help center (checked Jun 2026): instead of a separate line for transcription, tagging, and face recognition, all AI usage draws down a single pool of AI credits, so usage can scale without renegotiating your core plan. Credit consumption varies by feature and by the total volume processed, and (this is the part to circle) unused AI credits do not roll over. They expire at the end of each billing cycle.

iconik does not publish a public dollar value for one credit or a per-feature credit table; Pro and Enterprise plans include a baseline AI-credit allocation set in your contract, and you buy more if you exceed it. The one concrete anchor iconik's FAQ does give: an Enterprise plan can include around 1,000 hours per month of transcription. The seat and base costs sit alongside this. iconik's published Starter pricing (checked Jun 2026) runs Collaborator at $0 per month, Browse at $9, Standard at $65, and Power at $120 per user per month, with AI, automation, and services drawn from credits on top.

iconik AI billing, what is published vs. what is contract-only. Checked Jun 2026.
Item	How it is priced	The catch
User seats (Starter)	$0 / $9 / $65 / $120 per user per month by role	Power users add up fast on a small team
AI usage	Pooled AI credits, deducted by feature and volume	No public credit-to-dollar rate; set in your contract
Included AI allocation	Baseline credits on Pro and Enterprise	Credits expire monthly, no rollover
Transcription at scale	Enterprise can bundle ~1,000 hours/month	Overage is more credits, billed on top

The honest summary: AI cost in iconik is a function of how many hours of media you push through it, and the credit pool's use-it-or-lose-it design rewards steady, predictable ingest over bursty back-catalog projects. If you suddenly decide to tag five years of archive in one month, that is exactly the workload most likely to blow past your included credits.

When iconik AI is worth running, and when it is not #

The features are genuinely useful, and the orchestration is the value: one search box that returns hits by spoken word, by face, and by on-screen object across your whole library is a real time-saver for a team that searches constantly. Whether that saving justifies the per-hour meter depends entirely on your search volume, a question we take apart in does AI search actually save editors time.

Where it does not fit: if your library is small, your team rarely searches by content, or your clients forbid any third party touching the material (proxy or not), the credit meter is paying for an index you barely query. And if your concern is keeping the AI index itself on hardware you control rather than on a vendor's cloud, that is a different architecture entirely, the local-versus-cloud tradeoff covered in local vs cloud AI indexing.

For the record, this is the rare place JuiceMount touches the topic and the honest line is short: JuiceMount keeps a local search index of your filenames, paths, and folder structure so you can find media fast on your own NAS, but it does not transcribe speech, recognize faces, or tag objects. If you need iconik-grade content AI, iconik does it and JuiceMount does not. If you mostly need a fast, mounted volume and quick filename search without a per-hour AI bill, that is JuiceMount's lane.

Next step

If you want fast search on a NAS you own without a per-hour AI meter, see how the local index works, then weigh it against a full content-AI MAM like iconik.

How the local index works Compare the options

Sources, checked June 2026

iconik AI page (iconik.io/artificial-intelligence): the four AI jobs, proxy-for-visual and extracted-audio-for-transcription model, "originals not sent to providers / not used to train external models," and the 5.7M analyses / 5.1M transcriptions / 578K faces figures.
iconik FAQs (iconik.io/faqs): Rev AI transcription, ~28-30 languages, one-click translation into 11 languages, Google Video Intelligence and Amazon Rekognition for tagging, iconik's own facial-recognition model, and the Enterprise ~1,000 hours/month transcription example.
iconik Help Center, AI Credits and AI Accounts articles (help.iconik.backlight.co): the pooled-credit model, per-feature/volume deduction, no rollover, contract-set baseline allocations, and selectable engines (Rev AI, Google Cloud transcription, Google Video Analysis, AWS Rekognition).
iconik pricing (iconik.io/pricing) and G2 iconik pricing 2026: Starter role-based seats ($0 Collaborator, $9 Browse, $65 Standard, $120 Power per user/month), AI/automation/services drawn from credits.
Rev AI transcription page (rev.com): $0.25 per minute, 96%+ accuracy, 37 languages, shown as the engine's own rate, not iconik's.
Amazon Rekognition Video pricing (aws.amazon.com/rekognition/pricing): label detection $0.10/min, shot detection $0.05/min (US East), shown as the engine's own rate.