Helpful Documents, Setup, Software
 
Documentation
Software Releases
Display AssistantsSemantic ResourcesData Tally

DocumentationDisplay Assistants
The Output Parameters
The output parameters from dbSEABED are defined in this document:       
           Specification of Output Formats 2021 Update PDF

Data Sources

The >16,000 contributing datasets are described in detail here:

One-Page Briefing Sheets
Seabed composition data is complicated. Here are one-page briefings for some of the issues deal with and tasks performed in dbSEABED:
Bring_Grids_Into_A_GISThe_dbSEABED_Output_Fileset
Coastline: The Boundary Condition
Numerical Forms for Folk & Domnc Grids
Supporting Environmental Data LeyersCompositional Data Analysis:  Compliance
The Materials Classified as 'Rock'Polygon-to-Point Methods
Land Proximity Data
The Uncertainty, Incompleteness, and Neutrality Properties of Seafloor Descriptive TermsFeature Layers Aid Grid Interpolations
Standardized GIS Legends
A collection of *.avl and *lyr files which can efficiently be picked up and used in Geographic Information Systems.

It is strongly recommended that labs use these standard schemes - liveries - for quick and accurate visual communication with maps, especially within the dbSEABED community

       - For ArcGIS9.x and ArcGIS10.x: (*.lyr)
                - All and documentation
                - Point data on Properties and Components
                - Gridded data
                - Polygon data

       - For Quantum GIS (QGIS): (*.qml)
                 - All in ZipFile

       - For ArcView3.x: (*.avl)
                 - For Griddings : for Polygons
                 - Point data on Properties and Components

Software Release
Maximum a-Posteriori Resampler
This tool for 'interpolation' of noisy, heterogeneous data was contributed by John Goff of UTIG/UTexas, Austin USA. A paper has been published, describing the tool.
 Maximum a posteriori resampler

The Interpolation ChallengeSemantic Resources
Formerly dbSEABED employed a modified IDW (Inverse Distance Weighted) interpolation, which worked acceptably for many projects over many years (see Competent Interpolator, below; ONR Award). Special measures were taken to: sensibly interpolate near shorelines and through archipelagos, retain the resolution in well-surveyed areas, avoid spatial biases using isotropic data selection, quantify uncertainties, validate the grids with jacknife methods.

However, with the advent of Machine Learning other possibilities arose and several were experimented with. Random Forest was preferred as the method. As a class these methods suffer several problems: the map-wide basis of the statistical rules (a 'global', not local process); map areas were not completely covered by the training data; the methods are seen as 'black box' by users; environmental layers used for training are often not germane, because of time and spatial scales, the importance of extreme events, etc. The application of ML was not satisfactory on a routine basis for mappings at diverse scales, with varying data availabilities.

Lately a 3D-IDW has been settled on, where the distance metric is three dimensional - X,Y (geographic in km) and Z (water depth in decametres, 10m). This method is local, easily understood, and very effective at gridding to a high statistical standard while preserving environmental reasonableness. The sediment facies are elongated along slope.
3D-IDW Interpolation (2019)Random Forest (2018)Competent Seabed Interpolator (IDW 2015)
GeoMaterials Vocabulary
For the Dark Data (NSF EAGER Award 1242909)  a vocabulary of  lithology, soils, fluids, ice terms was compiled which could help locate and rate 'dark' geoscience datasets from their web presence in metadata or publications. The vocabulary is also useful for reconciling geomaterials terms between land/coast/ocean (NSF RAPID Award 1047776 'Seamless over Strandline').
       - http://csdms.colorado.edu/wiki/Data:Geomaterials_Vocab
          http://instaar.colorado.edu/~jenkinsc/dbseabed/resources/geomaterials/


Data Model
The output parameters from dbSEABED are described with a Semantic Web syntax. This is suitable for use with applications like SPARQL.

Tally of Data

A program is occasionally run over the entire dbSEABED to count the numbers of data in different categories. The latest, in May 2019 gives the following results. These results are counted over the  data which is successfully integrated and passes quality controls, that is, the data which is available for mappings and analyses in the WWD and CMP outputs.

sourceCOUNT - Number of source Datasets. These are usually one per survey campaign or research project. Some projects such as IODP, NOS or PANGAEA may yield hundreds of  sources for long coring, per survey, or per data source inside the database (respectively).
locationCOUNT - Number of sites at the Seafloor. A site is marked as a distinct location, start-time, and sampling/observation method.
coreCOUNT - Number of penetrations of the seafloor >=0.3m, which is deemed to be the minimum for a 'coring'. (This number is also included in
locationCOUNT .)
observationCOUNT - Number of samples / observations, defined as point/time/method of any sampling/analysis/observation that yields data values.
datITEMcount - Number of successfully integrated data values across all parameters.
datFIELDScount - Number of data fields exposed to accept values (whether filled or not; basis for sparsity measure).

sourceCOUNTlocationCOUNTcoreCOUNTobservationCOUNTdatITEMcountdatFIELDScount
9,2264,776,7971,645,8816,663,25540,730,163133,266,000


In addition to the statistics, the distribution of samples/observations by water depth and subbottom depth.
The graphs below show the results, which are relevant to the representativeness of dbSEABED data for mapping the entire ocean.







Author: Chris Jenkins
Date: 21 Apr 2021
Place: CU INSTAAR Boulder CO USA