Google Chrome On-Device Embedding Model

Exploring the embedding model in Google Chrome

2025-05-13

Google Chrome bundles a text embedding model used to cluster browsing history as part of the Topics API and for semantic search. They also ship a number of other models with Chrome.

First I had to track down these on-device models. I started at the usual place where apps store their application data in the ~/Library/Application Support/ folder on macOS. Searching for find ~/Library/Application\ Support/ -maxdepth 4 'optimization*' returns the the folder I was looking for: ~/Library/Application Support/Google/Chrome/optimization_guide_model_store. The Chromium source code says this about optimization guide: “The optimization guide component contains code for processing hints and machine learning models received from the remote Chrome Optimization Guide Service”.

Models are stored as tflite files, which is rebranded as LiteRT:

LiteRT (short for Lite Runtime), formerly known as TensorFlow Lite, is Google’s high-performance runtime for on-device AI. LiteRT is also the format Google uses to ship their Gemma3N edge-device models.

$ cd ~/Library/Application\ Support/Google/Chrome/optimization_guide_model_store
$ tree -L  4 .
.
├── 13
│   └── E6DC4029A1E4B4C1
│       └── 205CA176C885321E
│           ├── model-info.pb
│           └── model.tflite
├── 15
│   └── E6DC4029A1E4B4C1
│       └── 255B83C178FA9DD9
│           ├── model-info.pb
│           ├── model.tflite
│           ├── override_list.pb.gz
│           └── VERSION.txt
├── 2
│   └── E6DC4029A1E4B4C1
│       └── 2DFDB6405E512759
│           ├── model-info.pb
│           └── model.tflite
├── 20
│   └── E6DC4029A1E4B4C1
│       └── 91EF641BEE15B40C
│           ├── model-info.pb
│           └── model.tflite
├── 24
│   └── E6DC4029A1E4B4C1
│       └── 1B8C0D25285420AB
│           ├── enus_denylist_encoded_241007.txt
│           ├── model-info.pb
│           ├── model.tflite
│           └── vocab_en-us.txt
├── 25
│   └── E6DC4029A1E4B4C1
│       └── C278361C6A5A6107
│           ├── model-info.pb
│           ├── model.tflite
│           └── visual_model_desktop.tflite
├── 26
│   └── E6DC4029A1E4B4C1
│       └── 141FCE0CF6807549
│           ├── model-info.pb
│           └── model.tflite
├── 43
│   └── E6DC4029A1E4B4C1
│       └── E234446CB5BACE99
│           ├── model-info.pb
│           ├── model.tflite
│           └── sentencepiece.model
├── 45
│   └── E6DC4029A1E4B4C1
│       └── 063B3FABDDE10CE8
│           ├── model-info.pb
│           └── model.tflite
└── 9
    └── E6DC4029A1E4B4C1
        └── B5ECF67C32B2BD47
            ├── model-info.pb
            └── model.tflite

31 directories, 26 files

One of the models (255B83C178FA9DD9) is the “Browsing Topics Privacy Sandbox feature” which maps recent browsing history to a set of interest-based categories to serve relevant ads. The Topics API is a replacement for the FLOC proposal. The version.txt refers to a taxonomy, which presumably is related to the interest-based categories.

The visual_model_desktop.tflite file appears to be part of a phishing classifier based on the Chromium source code.

There is a sentencepiece.model file which is a text-tokenizer. The accompanying model.tflite file is the largest at 107 MB.

$ find . -type f -name "*model.tflite" -print0 | xargs -0 du -h

64K    ./20/E6DC4029A1E4B4C1/91EF641BEE15B40C/model.tflite
16K    ./9/E6DC4029A1E4B4C1/B5ECF67C32B2BD47/model.tflite
132K    ./45/E6DC4029A1E4B4C1/063B3FABDDE10CE8/model.tflite
8.0K    ./26/E6DC4029A1E4B4C1/141FCE0CF6807549/model.tflite
107M    ./43/E6DC4029A1E4B4C1/E234446CB5BACE99/model.tflite <---
4.4M    ./24/E6DC4029A1E4B4C1/1B8C0D25285420AB/model.tflite
2.6M    ./15/E6DC4029A1E4B4C1/255B83C178FA9DD9/model.tflite
384K    ./2/E6DC4029A1E4B4C1/2DFDB6405E512759/model.tflite
1.2M    ./13/E6DC4029A1E4B4C1/205CA176C885321E/model.tflite
180K    ./25/E6DC4029A1E4B4C1/C278361C6A5A6107/model.tflite

Let’s load up the sentencepiece tokenizer and the model and get embeddings.

import numpy as np
import tensorflow as tf
import sentencepiece as spm

tokenizer = spm.SentencePieceProcessor(model_file='sentencepiece.model')
interpreter = tf.lite.Interpreter(model_path='model.tflite')
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

def get_embedding(
    interpreter: tf.lite.Interpreter,
    tokenizer: spm.SentencePieceProcessor,
    text: str,
) -> np.ndarray:
    """
    Embedding vector for a given text. Max token length is 64.

    Args:
        interpreter: TFLite interpreter.
        tokenizer: SentencePiece tokenizer.
        text: Text to be tokenized and embedded.
    
    Returns:
        np.ndarray: Embedding vector of shape (768,).
    
    Example:
        >>> tokenizer = spm.SentencePieceProcessor(model_file="sentencepiece.model")
        >>> interpreter = tf.lite.Interpreter(model_path="model.tflite")
        >>> get_embedding(interpreter, tokenizer, "New York")  # (768,)
    """
    input_shape = input_details[0]["shape"]
    seq_len = input_shape[1] if len(input_shape) > 1 else 64
    tokens = tokenizer.encode(text, out_type=int)
    tokens = np.pad(
        tokens[:seq_len], 
        (0, max(0, seq_len - len(tokens))), 
        "constant"
    )[np.newaxis, :]  # (1, 64)
    interpreter.set_tensor(input_details[0]["index"], tokens.astype(np.int32))
    interpreter.invoke()
    embedding = interpreter.get_tensor(output_details[0]["index"])  # (1, 768)
    return embedding[0]  # (768,)

text = "New York"
embedding = get_embedding(interpreter, tokenizer, text)
# [-2.49071661e-02  2.41535041e-03 -1.51733104e-02 -1.12882648e-02...]

This model has a max sequence length of 64 inpt tokens and outputs a 768 dimension vector. That is a small input size of around 200-220 characters.

#Python #LLM