dbSEABED

Helpful Documents, Setup, Software

Documentation

Software Releases

Display Assistants

Semantic Resources

Data Tally

Documentation

Display Assistants

The Output Parameters
The output parameters from dbSEABED are defined in this document:

Specification of Output Formats 2021 Update PDF

Data Sources

The >16,000 contributing datasets are described in detail here:

Data Sources Metadata Notes (PDF) Data Sources Metadata Listing (TXT)

One-Page Briefing Sheets
Seabed composition data is complicated. Here are one-page briefings for some of the issues deal with and tasks performed in dbSEABED:

Bring_Grids_Into_A_GIS	The_dbSEABED_Output_Fileset
Coastline: The Boundary Condition	Numerical Forms for Folk & Domnc Grids
Supporting Environmental Data Leyers	Compositional Data Analysis: Compliance
The Materials Classified as 'Rock'	Polygon-to-Point Methods
Land Proximity Data	What Counts as Carbonate in Mappings ?
The Uncertainty, Incompleteness, and Neutrality Properties of Seafloor Descriptive Terms	Feature Layers Aid Grid Interpolations

Standardized GIS Legends
A collection of *.avl and *lyr files which can efficiently be picked up and used in Geographic Information Systems.

It is strongly recommended that labs use these standard schemes - liveries - for quick and accurate visual communication with maps, especially within the dbSEABED community

- For ArcGIS9.x and ArcGIS10.x: (*.lyr)
- All and documentation
- Point data on Properties and Components
- Gridded data
- Polygon data

- For Quantum GIS (QGIS): (*.qml)
- All in ZipFile

- For ArcView3.x: (*.avl)
- For Griddings : for Polygons
- Point data on Properties and Components

Software Release
Maximum a-Posteriori Resampler
This tool for 'interpolation' of noisy, heterogeneous data was contributed by John Goff of UTIG/UTexas, Austin USA. A paper has been published, describing the tool.
Maximum a posteriori resampler

The Interpolation Challenge

Semantic Resources

Formerly dbSEABED employed a modified IDW (Inverse Distance Weighted) interpolation, which worked acceptably for many projects over many years (see Competent Interpolator, below; ONR Award). Special measures were taken to: sensibly interpolate near shorelines and through archipelagos, retain the resolution in well-surveyed areas, avoid spatial biases using isotropic data selection, quantify uncertainties, validate the grids with jacknife methods.

However, with the advent of Machine Learning other possibilities arose and several were experimented with. Random Forest was preferred as the method. As a class these methods suffer several problems: the map-wide basis of the statistical rules (a 'global', not local process); map areas were not completely covered by the training data; the methods are seen as 'black box' by users; environmental layers used for training are often not germane, because of time and spatial scales, the importance of extreme events, etc. The application of ML was not satisfactory on a routine basis for mappings at diverse scales, with varying data availabilities.

Lately a 3D-IDW has been settled on, where the distance metric is three dimensional - X,Y (geographic in km) and Z (water depth in decametres, 10m). This method is local, easily understood, and very effective at gridding to a high statistical standard while preserving environmental reasonableness. The sediment facies are elongated along slope.

3D-IDW Interpolation (2019)

Random Forest (2018)

Competent Seabed Interpolator (IDW 2015)

GeoMaterials Vocabulary
For the Dark Data (NSF EAGER Award 1242909) a vocabulary of lithology, soils, fluids, ice terms was compiled which could help locate and rate 'dark' geoscience datasets from their web presence in metadata or publications. The vocabulary is also useful for reconciling geomaterials terms between land/coast/ocean (NSF RAPID Award 1047776 'Seamless over Strandline').
- http://csdms.colorado.edu/wiki/Data:Geomaterials_Vocab
http://instaar.colorado.edu/~jenkinsc/dbseabed/resources/geomaterials/

Data Model
The output parameters from dbSEABED are described with a Semantic Web syntax. This is suitable for use with applications like SPARQL.

RDF/SKOS Encoded Parameter Definitions

Tally of Data

A program is occasionally run over the entire dbSEABED to count the numbers of data in different categories. The latest, in May 2019 gives the following results. These results are counted over the data which is successfully integrated and passes quality controls, that is, the data which is available for mappings and analyses in the WWD and CMP outputs.

sourceCOUNT - Number of source Datasets. These are usually one per survey campaign or research project. Some projects such as IODP, NOS or PANGAEA may yield hundreds of sources for long coring, per survey, or per data source inside the database (respectively).
locationCOUNT - Number of sites at the Seafloor. A site is marked as a distinct location, start-time, and sampling/observation method.
coreCOUNT - Number of penetrations of the seafloor >=0.3m, which is deemed to be the minimum for a 'coring'. (This number is also included in locationCOUNT .)
observationCOUNT - Number of samples / observations, defined as point/time/method of any sampling/analysis/observation that yields data values.
datITEMcount - Number of successfully integrated data values across all parameters.
datFIELDScount - Number of data fields exposed to accept values (whether filled or not; basis for sparsity measure).

sourceCOUNT	locationCOUNT	coreCOUNT	observationCOUNT	datITEMcount	datFIELDScount
9,226	4,776,797	1,645,881	6,663,255	40,730,163	133,266,000

In addition to the statistics, the distribution of samples/observations by water depth and subbottom depth.
The graphs below show the results, which are relevant to the representativeness of dbSEABED data for mapping the entire ocean.