Helpful Documents, Setup, Software
| |
Documentation | Display Assistants | The Output Parameters
The output parameters from dbSEABED are defined in this document: Data Sources
The >16,000 contributing datasets are described in detail here:
One-Page Briefing Sheets Seabed
composition data is complicated. Here are one-page briefings for some
of the issues deal with and tasks performed in dbSEABED:
| Standardized GIS Legends A collection of *.avl and *lyr files which can efficiently be picked up and used in Geographic Information Systems.
It
is strongly recommended that labs use these standard schemes - liveries
- for quick and accurate visual communication with maps, especially
within the dbSEABED community
- For ArcGIS9.x
and ArcGIS10.x: (*.lyr) - All and documentation - Point data on Properties and Components - Gridded data - Polygon data
- For Quantum GIS (QGIS): (*.qml) - All in ZipFile
- For ArcView3.x: (*.avl) - For Griddings : for Polygons - Point data on Properties and Components
Software Release Maximum a-Posteriori Resampler This tool for 'interpolation' of noisy, heterogeneous data was contributed by John Goff
of UTIG/UTexas, Austin USA. A paper has been published, describing the tool.
Maximum
a
posteriori resampler
|
The Interpolation Challenge | Semantic Resources |
Formerly dbSEABED employed a modified IDW (Inverse Distance Weighted) interpolation, which worked acceptably for many projects over many years (see Competent Interpolator, below; ONR Award). Special measures were taken to: sensibly interpolate near
shorelines and through archipelagos, retain the resolution in
well-surveyed areas, avoid spatial biases using isotropic data
selection, quantify uncertainties, validate the grids with
jacknife methods.
However, with the advent of Machine Learning other possibilities arose and several were experimented with. Random Forest
was preferred as the method. As a class these methods suffer several
problems: the map-wide basis of the statistical rules (a 'global', not
local process); map areas were not completely covered by the training
data; the methods are seen as 'black box' by users; environmental
layers used for training are often not germane, because of time and
spatial scales, the importance of extreme events, etc. The application
of ML was not satisfactory on a routine basis for mappings at diverse
scales, with varying data availabilities.
Lately a 3D-IDW has been settled on, where the distance metric is three dimensional
- X,Y (geographic in km) and Z (water depth in decametres, 10m). This
method is local, easily understood, and very effective at gridding to a
high statistical standard while preserving environmental
reasonableness. The sediment facies are elongated along slope.
|
GeoMaterials Vocabulary For the Dark Data (NSF EAGER Award 1242909)
a vocabulary of lithology, soils, fluids, ice terms was
compiled which could help locate and rate 'dark' geoscience datasets
from their web presence in metadata or publications. The vocabulary is
also useful for reconciling geomaterials terms between land/coast/ocean
(NSF RAPID Award 1047776 'Seamless over Strandline').
- http://csdms.colorado.edu/wiki/Data:Geomaterials_Vocab http://instaar.colorado.edu/~jenkinsc/dbseabed/resources/geomaterials/
Data Model
The output parameters from dbSEABED are described with a Semantic Web
syntax. This is suitable for use with applications like SPARQL.
|
| Tally of Data
A program
is occasionally run over the entire dbSEABED to count the numbers of
data in different categories. The latest, in May 2019
gives the following results. These results are counted over the
data which is successfully integrated and passes quality
controls, that is, the data which is available for mappings and
analyses in the WWD and CMP outputs.
sourceCOUNT - Number of source Datasets. These are usually one per survey campaign or research project. Some
projects such as IODP, NOS or PANGAEA may yield hundreds of
sources for long coring, per survey, or per data source inside the
database (respectively). locationCOUNT - Number of sites at the Seafloor. A site is marked as a distinct location, start-time, and sampling/observation method. coreCOUNT
- Number of penetrations of the seafloor >=0.3m, which is deemed to
be the minimum for a 'coring'. (This number is also included in locationCOUNT .) observationCOUNT - Number of samples / observations, defined as point/time/method of any sampling/analysis/observation that yields data values. datITEMcount - Number of successfully integrated data values across all parameters. datFIELDScount - Number of data fields exposed to accept values (whether filled or not; basis for sparsity measure).
sourceCOUNT | locationCOUNT | coreCOUNT | observationCOUNT | datITEMcount | datFIELDScount | 9,226 | 4,776,797 | 1,645,881 | 6,663,255 | 40,730,163 | 133,266,000 |
|
In addition to the statistics, the distribution of samples/observations by water depth and subbottom depth. The graphs below show the results, which are relevant to the representativeness of dbSEABED data for mapping the entire ocean.
|
|