The World Ocean Database, or simply WOD, is the largest and most detailed collection of publicly available ocean and seas data in the world. It is hosted in the public S3 bucket of NOAA (National Oceanic and Atmospheric Administration) — a part of the U.S. federal government under the Department of Commerce. The database contains data dating back to 1800, with records from 1900 through 2024 (as of February 25, 2026).
Challenge
The dataset is arranged by year and then by instrument type. Simply put, instrument types refer to the tools used to collect ocean data, such as CTD (Conductivity–Temperature–Depth). All the files are stored in NetCDF (.nc) format. Because NetCDF files require specialized software or programming to open and visualize, unlike other file types, this poses a major challenge for everyday users to access the ocean data.
Likewise, within the file are several profiles, ranging from a few thousand to hundreds of thousands, providing data on variables such as temperature, salinity, and oxygen. Due to the dataset's size of more than 200 GB and the heterogeneous instrument types, it has been a hurdle for researchers to locate the required data in time.
Solution
This is where the AQUAVIEW Team steps in.
The AQUAVIEW project is led by the faculty members and student researchers at the University of Southern Mississippi and funded by NOAA. The team works in a sprint of two weeks with a well-defined goal and deadline, which is delivered in a biweekly demo with the concerned government and private sector.
Three-Phase Pipeline
The AQUAVIEW team presents a three-phase pipeline that delivers raw NetCDF files from the S3 bucket to the customers in a user-friendly manner in real-time:
Phase 1: Data Transfer
Transfer NetCDF files from the S3 bucket to Google Cloud Storage using the Storage Transfer Service.
Phase 2: Data Ingestion
Ingest NetCDF files into an Icechunk repository for efficient storage and access.
Phase 3: Metadata Extraction
Extract cruise-level metadata by aggregating millions of individual profiles into 250,000 STAC items. The resulting catalog allows users to make queries based on temporal and spatial filters via Elasticsearch in seconds.
WOD Chat: Natural Language Interface
In addition, the team also implements an important service called WOD Chat for faster data accessibility. It is a natural language interface using LangGraph agents that queries the Icechunk dataset and returns the requested WOD data based on the prompts of the user in a conversational manner in real time.
What makes the work of the AQUAVIEW team on WOD special is the accuracy and efficiency with which the result is delivered from a complex raw dataset.
Learn More
To learn more about the WOD, read the detailed documentation on aquaview.org/documentation.
