Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
mkluczekΒ 
posted an update 16 days ago
Post
1588
First Global and Dense Open Embedding Dataset of Earth! 🌍 πŸ€—

Introducing the Major TOM embeddings dataset, created in collaboration with CloudFerro S.A. πŸ”Ά and Ξ¦-lab at the European Space Agency (ESA) πŸ›°οΈ. Together with @mikonvergence and JΔ™drzej S. Bojanowski, we present the first open-access dataset of Copernicus embeddings, offering dense, global coverage across the full acquisition areas of Sentinel-1 and Sentinel-2 sensors.

πŸ’‘ Highlights:
πŸ“Š Data: Over 8 million Sentinel-1 & Sentinel-2 images processed, distilling insights from 9.368 trillion pixels of raw data.
🧠 Models: Foundation models include SigLIP, DINOv2, and SSL4EO.
πŸ“¦ Scale: 62 TB of raw satellite data processed into 170M+ embeddings.

This project delivers open and free vectorized expansions of Major-TOM/README datasets, setting a new standard for embedding releases and enabling lightweight, scalable ingestion of Earth Observation (EO) data for countless applications.

πŸ€— Explore the datasets:
Major-TOM/Core-S2L1C-SSL4EO
Major-TOM/Core-S1RTC-SSL4EO
Major-TOM/Core-S2RGB-DINOv2
Major-TOM/Core-S2RGB-SigLIP

πŸ“– Check paper: Global and Dense Embeddings of Earth: Major TOM Floating in the Latent Space (2412.05600)
πŸ’» Code notebook: https://github.com/ESA-PhiLab/Major-TOM/blob/main/05-Generate-Major-TOM-Embeddings.ipynb

very cool!