Exploring Urban Embeddings from the Terramind Model
This project creates an interactive system that discovers visually similar urban areas across the globe using TerraMind foundation model embeddings from satellite imagery. By analyzing over 200 cities worldwide, it reveals how AI “sees” urban patterns and allows users to click on any neighborhood to find similar-looking areas in other cities, demonstrating the potential of geospatial AI for urban analysis.
Structure (in short)
1. Embedding Extraction
The pipeline extracts optical imagery from Sentinel-2 L2A using 12 spectral bands through STAC catalogues. It processes over 200 cities (100k+ population, max 5 per country), creates median composites from recent months while masking clouds using Dask processing. This generates 768-dimensional TerraMind embeddings from 196 patches per 224×224m tile, totalling +48k tiles across the global dataset.
2. Filtering Spatial Dimensions
Analysis revealed that most similar tiles were neighbouring areas within the same city due to TerraMind’s architecture. A spatial correlation analysis script identified embedding dimensions most correlated with geographic location, to refocus similarity search on visual patterns rather than spatial proximity.

3. Aggregation to Tile with Different Methods
Multiple approaches were tested to aggregate 196 patch embeddings into single tile vectors: mean, median, min, and max operations. To address diverse land use within 224m tiles, an alternative method clusters patches into 3-4 groups and aggregates only the dominant cluster. The system also incorporates UMAP dimensionality reduction for visualisation, revealing regional clustering and global visual patterns.
4. Web Application
The full-stack application features a React frontend integrating Mapbox satellite imagery with a custom UMAP visualisation for real-time exploration. A FastAPI backend queries a Qdrant vector database storing filtered and aggregated embeddings for fast similarity search. The synchronised interface allows users to click locations on the map and instantly discover visually similar areas worldwide, with the system deployed live and offering both mean and dominant cluster aggregation methods.