Chrome is experimentally shipping with Gemini Nano, their smallest Large Language Model (LLM) baked right in, then offer APIs to use it.
In Chrome, these APIs are built to run inference against Gemini Nano with fine-tuning or an expert model. Designed to run locally on most modern devices, Gemini Nano is best for language-related use cases, such as summarization, rephrasing, or categorization.
It’s an API most, with methods you call and get responses. Raymond Camden had a look:
const model = await window.ai.createTextSession();
await model.prompt("Who are you?");
// I am a large language model, trained by Google.
Code language: JavaScript (javascript)
Using AI in this way means 1) it’s fast (no network trip) 2) it works offline 3) it’s private (maybe) 4) it’s free to use.
I admit that’s awfully compelling. I suspect this will happen and will be very highly used.
Don’t we need to think about standards here though? What if Apple ships window.ai.instantiateIntelligence()
with an .ask()
method? And Firefox ships navigator.llm('dolly').enqueueQuery()
? I’d just like to remind everyone that when browser just ship whatever and compete on proprietary features, everybody loses.
There is at least a proposed standard prompt API from the folks at Google
[https://github.com/explainers-by-googlers/prompt-api]
One argument is rather than a generalised ‘prompt’ API there should be specialised API like WebSpeech [https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API] or translate [https://github.com/WICG/translation-api]
With models in the order of GB in size, while downloading and using WASM for inference is doable–see webllm [https://webllm.mlc.ai]-it would put web applications at a distinct disadvantage to native apps that have platform level access to models.
Ah nice! Thanks for pointing that out John. This seems much more considered than I was led to believe by that blog post.
By the way, this requires 22gig bytes of storage and is used in offline tabs.
e.g. ‘chrome://components’