Copyright INSTAAR / CU and OSI / SU 1998-2004
dbSEABED
Entering Seabed Sample Descriptions 
(Word-based Data) 

Contents

Introduction
Literature and science: Prose and Data
Tackling a description
Quantifiers, modifiers and objects
Changing words to abbreviations efficiently
Choice of vocabulary
Finding terms in thesaurus
Using standard stems and endings
When to join terms with an underscore

Online dictionary


Size descriptors
Percentage descriptors
Special linguistic structures
Check list of attributes

Instrumental and analysis data
Mandatory Lines in Data Files
'1,CMP', '2,LTH', '3,PET', etc.

 

Return to db9 Manual



Introduction

One of the special functions of dbSEABED is to handle word-based descriptive data - linguistic data - and from them, produce output attributes that are compatible with the outputs from numeric analytical data.

A requirement for successful processing in this way is that the input linguistic data is reasonably well organized, although allowance is made for a variety of descriptive linguistic structures that are usual and familar to geologists, ecologists, navy divers, etc.

dbSEABED is flexible. It recognizes that different kinds of study use a different voculabaries, and a variety of linguistic structures and ways of attaching quantities. To be useful in dbSEABED, sediment descriptions and analysis data do not need to be absolutely perfect or straightjacketed: it is a data mining system and will make the best of what is supplied, while rejecting (and reporting for future attention) incomplete or erroneous structures.

At the same time, the language of the descriptions must be able to be computed on, ie. it should have an basically arithmetic character. The processing is therefore not quite "natural language" processing and stream of consciousness" prose, such as found in most DSDP and ODP core summaries, is not able to be handled without extensive editing at the time of data entry. An arithmetic character to descriptions is probably a good goal anyway in the direction of scientific rigor and observational discipline.

This web page sets out many commonly encountered descriptive structures, and describes how they should be entered to promote successful handled by dbSEABED.

Top of page



Literature and science: Prose and Data

Visual (deck) descriptions should be organized so more use can be made of the digital data. By "organized" we mean that the descriptions should not be prose , but itemized lists of observations that clearly divide the sample into its separate geological components.

An example
"Dredge load with rocks and sediment plus some liquid slush, rocks red slush green and sediment coarse. Oops, color chart lost overboard. Slush appears to be organic in part, rocks possibly Triassic red beds and the sediment carbonate-rich. Forams, pteropods, occasional bryozoans and bits of rock. "
This description closely follows one that was entered into dbSEABED recently. It might be better organized in the log as follows:

Dredge load - 3 components
1. rock, possibly Triassic red bed
    also small bits of rock
2. sediment, coarse carbonate sand
    rich in forams, pteropods, occasional bryozoans
3. green liquid slush - appears to be organic in part
This data would be organized along these lines in dbSEABED;
 
Sample metadata = Dredge load - 3 components

Rest of dataset metadata = Colour chart fell overboard

Component 1 = rock, also small bits of rock = rock = possibly Triassic red bed
Component 2 = sediment = coarse carbonate sand, rich in { forams, pteropods, occasional bryozoans }
Component 3 = green liquid slush = appears to be organic in part

Notice that the order of components of the description is maintained, making data entry very easy in secretarial types of software like EXCEL and WORD.

Top of page



Tackling a description

Realize that descriptions consist of quantifiers, modifiers and objects  (QMO) and treat descriptions like arithmetic expressions:

(m* Ax + n*By + o*Cz  ... = TheSedimentFraction).
Usually (and hopefully) 'TheSedimentFraction' is the whole sediment. An example QMO description is: "abundant green shark_teeth + 30% fine sand + 10% nobbly phosphatic_nodules".

You can use a shorthand to remove ambiguities from descriptions. The "/" and "-" are used in dbSEABED to denote quantifiers (m,n,o) and modifiers (A,B,C) respectively, and point to the object that they refer to.
So, "green- shark_teeth /abundant + 30%/ fine- sand + 10% phosphatic_nodules -nobbly" is computationally equivalent to the example above.

Decide whether your syntax is front- (ODP/Scientific) or rear (NIMA/Navy) significant in abundances, and then place the descriptions in the correct corresponding field. Perhaps there is no front- rear- order of significance.

Top of page



Quantifiers, Modifiers and Objects

Most sediment descriptions are composed of a sequence of constructions like:

                                    [quantity] (modifier) object
that is, combinations of terms expressing abundance, some modification of a property and the object of interest itself. (A fourth, operators, can be considered as quantifiers.) In the example, “slightly muddy coarse bryozoan sand with trace quartz” we distinguish 3 objects and accompanying modifiers:
                                   [slight] (  ) mud + [ ] (coarse) bryozoan_sand + [trace] ( ) quartz.
a. Objects are terms which describe a property that may be texture, carbonate, lithology, strength, colour, etc.
b. Modifiers are terms which only convey meaning in conjunction with an object - for example "coarse" of itself does not define texture.
c. Quantifiers express a weighting and usually work in combination with other terms: "abundant" in "abundant clay", "3" in "3 shells".
 This division has withstood the test of time and huge enlargement of the operational database.

In effect sediment descriptions are linear expressions of the form:

                                    m * a * x + n *b * y + …. = total sediment .
(Terms like 'with', 'or' or 'to' can be regarded as quantifiers or neutral).

Some modifiers are marked with a "-". All quantifiers are marked with a "/". These point in the direction of the term they modify or quantify:
e.g., "fne- snd", "altrd- baslt"; ab/ bryz" and "shrk_teeth /rre" for "fine sand", "altered basalt"; "abundant bryozoans" and "shark teeth rare".

Top of page



Changing words to abbreviations efficiently

Changing words to abbreviations is best done in WORD using the find/replace functions on a big dataset. Its good to take the care to set 'Match Case' and/or 'Whole words' in the Replace setup, and to view each case as you are going along (ie. not to use 'replace all').

Also, be careful to do the abreviations per item of meaning. For example "SAND DOLLARS" to "snd_dllrs", "TRACES OF" to "trcs_of/" and "FINE TO MEDIUM" to "fne- to med-". Do these combined meanings before the individual ones like "SAND" to "snd".

Top of page



Choice of vocabulary

Mostly, the choice of vocabulary doesn't worry dbSEABED, because its dictionary (actually thesaurus) is adaptable. The dictionary already includes French words and terms such as "beautiful" (bryozoans), "sardine tin" (dredged up !), "coffee grounds" (slush) and "fettucine" (seaweed). Such terms are treated either as neutral or given geological meaning.

Geological terms however, should be used in standard ways. For example, conforming to a geological dictionary such as that by AGU. If a term is non-standard, say so in a note (metadata) and notify the maintainer so proper arrangements can be made in the dictionary.

dbSEABED does not work through usual English word classes of verb/adjective/adverb/noun. Instead, it divides on what descriptive meaning is carried.

Top of page



Finding terms in the thesaurus

To do this it is best to do a search in WORD on the file "db8_dct.rtf".
It is not advisable to search for the plural because plurals may be marked as (for instance) "band;s" or "band/s", signifying optional plural.

Alternatively, the web dictionary listing can be accessed db9_dictionary.htm .

Top of page



When to join terms with an underscore

Underscores are meant to link terms that complement each other. "limt_cemt" is a good example: limt is compositional and cemt is structural/textural. If they were separate then a lot of meaning would be lost. If you do what you (as a geologist) think is best to maximize the carrying of the geological meaning, you will be doing the right thing. That is the key test.

All underscore-linked terms need to be added to the thesaurus.

The main objection to joining a greater number of terms is that the thesaurus would really blow out in size. With "porphyritic rhyolite" ("prphrytc- rhyl"), "porphyritic" is really a standard (expectable) modifier for "rhyolite" and they could be kept separate.

Top of page


Using standard stems and endings

When suggesting or making new abbreviations, it is best to draw on already defined abbreviations and just add standard variations like endings. This way, adjectives, adverbs and participles can be easily generated.

Examples of stems include:

FINE: fne-, fnely-, fner-, fneng_upwrds
SORTING: srtng, -srtd, wl_srtd-, prly_srtd-
QUARTZ: qtz, qtzse, qtz_gn;s
STONE: stn, limstn, wckestn, sndstn.
Examples of endings:
 
...ized ...zd
...ed ...d
...ing ...ng
...s & ...es ...s
...ose ...se
This has not always been adhered to (there are homonyms and synonyms in the thesaurus, English isn't perfect and neither are we).
But it does provide a way to work more systematically with the abbreviations.

Some stem conflicts include:
 
GRAIN and GREEN gn & grn
BAND and BOUND band & bnd
CAST (colour) and CAST (vs mold) cast(colr) & cast
SOFT (biological) and SOFT (geotechnic) sft & so
HIGH DENSITY (abundance) and HIGH DENSITY (geotechnical) hi_dnsty & hi_dens

Do you encounter others ? Email me !

Top of page



Size descriptors

There is a set of semi-standardized terms that convey particle size: "0.8mmszd-", "-<4cmhi", "~8mwvl-", "4:9phiszd-", "0.061mmlng"  are examples.

They are standardized as follows.

  • currently, the units of measurement can only be mm, cm, m, um, in, ft and phi
  • the symbols "<", ">" and "~" symbols must lie at the front
  • the codes "szd", "hi", "lng", "sze", "rnd" or "wvl" end the term; (sized, high, long, size, wavelength, round)
  • a range of size can be rendered using a colon between 2 numbers with the smaller number at front
  • leading decimal points should be preceded with a "0"
  • as with all modifiers, a hyphen ("-") is used at the front or rear of the terms to point to its associated object.
  • The purpose of this standardization is to permit automatic parsing at a later stage. When
    this is implemented dictionary entries will not be needed.

    Non-standard arrangements will be reported to the "*.DGN" diagnostics file.

    Top of page



    Percentage descriptors

    There is a set of semi-standardized terms that convey percentage abundance. Examples are "/<0.5%", "30:50%/", "<10%/".

    They are standardized as follows.

  • A "/" must be placed at the front / back depending on the direction of the object being quantified.
  • A % must come after the percentage numeric.
  • The numeric can be integer or decimal.
  • A range of percent can be conveyed using ":", as in "/30:50%" meaning 30 to 50% range.

  • The smaller numeric comes first.
  • Less and greater than or approximation can be put in front of the numeric: "<33%/"
  • The purpose of this standardization is to permit automatic parsing at a later stage. When
    this is implemented dictionary entries will not be needed.

    Non-standard arrangements will be reported to the "*.DGN" diagnostics file.

    This syntax is only required in fields that expect word-based data.

    Top of page



    Special linguistic structures

    dbSEABED recognizes that different types of studies may use a variety of word and linguistic structures. Structures are introduced that (i) convey the meanings unambiguously and artithmetically, (ii) keeps the terms in their original order in decriptions (this reduces re-formatting work during data entry).

    Example 1 - Quantifier and modifier pointers.   quant1/ object1  object2 /quant2      modif1- object1  object2 -modif2
    In dbSEABED the "/" quantifier-pointer and modifier-pointer structures resolves cases where an object can be quantified or modified by terms to left or right. Some descriptions employ "forams 25%", others "25% forams".
    The insertion of pointers is easy during the EXCEL stage of data entry.

    Example 2 - Grouped List.    quant/ { object1 + object2 + ... }
    This type of  'grouped component' or "{}" structure is usually found in petrology (grain or clast count) data. It can be recognized at the time of data entry (and "{" and "}" properly inserted, or dbSEABED will recognize the structure itself (currently only in a "PET" line).

    Thus, the pet (grain / clast) count:

    ">50% pblszd- crl_shls + pblszd- baslt + pblszd- andest"
    "10-50% cly_trrg"
    "10% pbls_volc + shls"
    "3% macrfna_shls"

    and ...
    "pblszd- crl_shls + pblszd- baslt + pblszd- andest,    >50%"
    "cly_trrg,    10-50%"
    "pbls_volc + shls,     10%"
    "macrfna_shls,     3%"

    will code (manually or automatically) in dbSEABED format to:
    ">50%/ { pblszd- crl_shls + pblszd- baslt + pblszd- andest }"
    "10-50%/ {slt + cly_trrg }"
    "<10%/ { pbls_volc + shls }"
    "3%/ { anml_dbr },,,,,,,,PETend ".
    Grouping is currently only available for Quantifiers, but will be extended eventually to Modifiers.
    Individual terms do not need to be grouped in a PET set where there other terms are grouped.
    The un-accounted for remainder in the above description will be assessed as "unknown" and if >5% will cause the parsing in dbSEABED to fail.

    Example 3 - Proportion of Special Fraction.   object1 // modif2- object2, quant1
    In many grain counts (PET sets) percentages of a grain type are given as a proportion of some other fraction, such as of the coarse fraction (gvl + snd). The "//" structure allows this to be expressed in terms of a PRODUCT object (forward of "//") and QUOTIENT (behind "//"). An example is:

    "1,PET, ,0,,hvy_min // csefrct,1.18"
    "1,PET, ,0,,hvy_min // snd,2.22"
    "1,PET, ,0,,hvy_min // non_carb- snd,1.33"
    "1,PET, ,0,,opq // non_carb- snd,0.665"
    "1,PET, ,0,,opq // hvy_min,/>50"
    "1,PET, ,0,,leucx // hvy_min,12-15%/"

    In some cases the absolute grain proportion of the whole sediment can be worked out. This occurs when the proportion of the quotient is known from before, such as where G:S:M are given in a preceding TXR, GRZ, CMP, SFT or LTH line. For this reason, it is important in dbSEABED datasets to have these lines preceding PET lines; the effectiveness of the data mining is increased.

    Top of page



    Check list of attributes

    A shortage of data in a dataset doesn't concern dbSEABED - it simply mines what is available and what is complete enough. Obviously though, time at sea is better spent if the survey datasets are more complete and richer in attributes.
    Here is a checklist of what might be sought - as a minimum.
     
    Lithology general geological description component by component
    Texture grainsize and sorting, if possible with estimates of rock:gravel:sand:mud (or r:g:s:silt:clay) ratios
    Colour either as Munsell Code (GSA Rock Colour code) or visual description (by a female preferably - not colour blind)
    Grain composition grain types with estimated percentages
    Special structures bioturbation, pellets, voids, lamination, imbrication, rootlets, form of top surface, ripples
    Organics smell, organics present ?, voids or bubbles
    Consolidation describe whether liquid/soft/stiff/hard and whether loose/friable, etc
    Biota especially shelly remains/livings
    Outsize objects clasts, nodules, shells and other shelly biota; very important for seabed roughness (for acoustics and hydrodynamics)

    A system of initial (top-level) description then second pass (more detailed) description works well and dbSEABED treats both as valid observations. Deck descriptions are more reliable in some ways, for outsize objects, colour and structures.

    Top of page



    Associated Instrumental and analysis data

    On board, observers might have a pocket penetrometer, GSA color chart, microscope, etc; these results can be associated with the descriptions in spreadsheet columns. Later laboratory analyses can be kept separately and digitally merged later with the spreadsheet data.

    Top of page



    Mandatory Lines in Data Files

    Before any data on lithology, tecture, strength, etc can be carried through dbSEABED, information has to be provided on the origin of the data. For this reason, "1,SRC" and "1,SFS" lines are mandatory. If they are not present, processing will not produce a result.

    A typical stream of lines is:
     
    Divider just for format style:
    0,FMT,**********************************************************
    Mandatory SRC line:
    1,SRC,AGSORigSeis102,AGSO+USYD,DaveFeary,copyright,notto3rdparty,typein,typein,cruisereport,GAB,19-Jun-91 to 28-Jun-91,May2000
    Metadata:
    0,SRC,AGSO Rig Seismic Cruise Great Australian Bight to prepare for ODP
    0,SRC,Dave Feary Noel James & Gavin Birch
    Describe which fields contain data:
    0,LTH,Top|Bot|FieldLith|FieldConsol|FieldTexr|Layering|GeoAge|SedEnvir|Colour|Sorting||GrnChar|FreeFormDescn|Altrn/Wthrng|Munsell|C14Date||TrcFoss
    0,PET,Top|Bot|GrainTyp|%GrainTyp|GrainTypSize|||GrnShape|GrnSortng|GRnCharDescn
    Mandatory SFS line:
    1,SFS,102 DR 03,-35.279,130.752,4180,,4180,3660,,,4.00 kg (~10kg in rept),,24-Jun-91,17:20,W-central Ceduna Terrace
    Metadata noting author's rock type in dredge:
    0,LTH,Lith A
    Actual attribute data describing the seafloor:
    1,LTH,,,mudstn wi/ vthn/ mn_ctng,,,outr_surfc bord,CAMPAN to MAASTR,?/ estuarn,vdk- gry to blk,,,,,,10YR3/1 to 10YR2/1
    1,PET,,,mud // mudstn,almst_whly/
    New type of rock in same dredge
    0,LTH,Lith B
    1,LTH,,,sltstn_cbl wi/ :5mmthk fe_ri_minlyr,wl_cnsldtd,,unfrm- txtr & colr_varns,,?/ estuarn,grysh brn,,,,,,10YR5/3,,,astersma + chondrts + planolts
    1,PET,,,pyrt // sltstn,mnr/
    New site:
    1,SFS,102 DR 03 (PD 2),-35.279,130.752,4180,,4180,3660,,,2.00 kg,,24-Jun-91,17:20,W-central Ceduna Terrace
    Metadata noting author's additional informal dredge name
    0,LTH,Pipe dredge 2
    This sites' attributes:
    1,LTH,,,mud wi/ orng fe_strks,,,,CAMPAN to MAASTR,,vdk- grysh brn,,,,,,10YR3/2
    1,PET,,,nan // mudstn,brrn_of/

    Top of page



    When to use "1,CMP", "2,LTH", "3,PET", etc.?

    Use "1,XXX," when the sample is from the seabed surface.
    Use "2,XXX," when the sample comes from the subsurface.
    Use "3,XXX," etc when the sample is from within some segregation
    such as a mottle or clast. That information can be in the layer/structure field of "3,LTH" or as "0,LTH" metadata.

    Top of page



    Return to db9 Manual


    Chris Jenkins (Email)
    INSTAAR, University of Colorado
    17 June 2004