Beyond the Vector Database: Engineering High-Performance On-Device Semantic Search with Expo and PGlite
Stop over-engineering with cloud-hosted vector DBs. Learn how I implemented high-performance, local-first semantic search using PGlite's pgvector support directly inside an Expo mobile application.

Beyond the Vector Database: Engineering High-Performance On-Device Semantic Search with Expo and PGlite
For the past year, the industry obsession has been clear: RAG (Retrieval-Augmented Generation) architectures hosted on massive cloud vector databases like Pinecone or Weaviate. While these are great for enterprise-scale web apps, they introduce significant friction for mobile developers: latency, costs, and the absolute requirement for a data connection.
I’ve been experimenting with moving the intelligence closer to the user. My goal was to build a fully offline, high-performance semantic search engine inside an Expo (React Native) app. The breakthrough came when I combined PGlite—a WASM-based Postgres build—with its native vector support.
Here’s how I bypassed the cloud and why this architecture is a game-changer for local-first AI.
The Problem with the "Cloud-First" Vector Approach
When we build mobile apps, we often treat them as thin clients. For semantic search, the traditional flow is:
- Generate an embedding on a server (or call OpenAI).
- Query a remote vector database.
- Return IDs and fetch data from a separate API.
On a mobile device, this feels sluggish. If the user is on a spotty 5G connection, the experience breaks. More importantly, many of the use cases I'm interested in—like searching personal journals or private documents—should never leave the device from a privacy standpoint.
Enter PGlite: Postgres in the Browser (and Mobile)
PGlite by the team at ElectricSQL is a revelation. It’s a build of Postgres compiled to WASM, packaged as a lightweight library. It allows you to run a full Postgres instance in-memory or persisted to a filesystem (like IndexedDB or OpFS).
What makes it the "killer app" for local AI is its support for extensions—specifically pgvector. This means we can run cosine similarity searches using standard SQL syntax directly on the mobile device.
The Architecture
In my implementation, I used a three-tier local stack:
- Expo (React Native): The host environment.
- Transformers.js: For generating embeddings on the client (using a quantized model like
all-MiniLM-L6-v2). - PGlite + pgvector: For storing and querying those embeddings.
Setting up the Database
First, I had to configure PGlite to handle the vector extension. Here’s the core initialization logic I used:
The Breakthrough: Performance at Scale
One of my biggest concerns was indexing performance. Running a Hierarchical Navigable Small World (HNSW) index in a WASM environment sounded like a recipe for a frozen UI.
However, I found that for datasets under 10,000 vectors (which covers most personal mobile use cases), the search latency was sub-15ms. The HNSW index in pgvector is highly optimized, and even within the WASM overhead, it outperforms any custom JavaScript-based nearest-neighbor implementation I’ve tried.
Executing a Semantic Search
Once I generate a query embedding locally, the SQL query is as familiar as it gets:
Engineering Hurdles & Workarounds
It wasn't all smooth sailing. There are a few things you need to be aware of when implementing this:
- WASM Memory Limits: In a mobile environment, memory management is aggressive. I had to ensure the PGlite instance didn't balloon by carefully managing the size of the vectors and avoiding unnecessary large-blob storage in the same table.
- Native Threading: Expo’s JS engine (Hermes) is fast, but WASM execution can still block the main thread if you’re doing heavy lifting. I moved the embedding generation and the PGlite queries into a Web Worker (on web) or utilized
expo-standard-web-cryptoto keep the crypto/math operations off the main loop where possible. - Model Loading: Downloading a 90MB model file to a phone is a one-time cost, but you need to cache it aggressively using
expo-file-systemto avoid re-fetching on every app boot.
Why This Matters
We are moving toward an era of "Local-First AI." By shifting vector search to the device, we achieve:
- Zero Latency: No round-trip to a server in Virginia.
- Privacy by Default: User data never leaves the device. If the phone is encrypted, the database is encrypted.
- Cost Efficiency: No $50/month bill for a hosted vector DB that's mostly idle.
Building this with Expo and PGlite proved that Postgres isn't just for the server anymore. It’s a formidable mobile database that, with pgvector, turns a standard React Native app into a sophisticated AI tool.
If you’re still piping all your embeddings to the cloud, I highly recommend giving PGlite a spin. The future of AI isn't just in the datacenter—it's in your pocket.