Explanation
The
state of vocabularies for earth landscape materials - geomaterials - is
chaotic. Over 10,000 different names are thought to exist, and all the
fields of geology, geomorphology, pedology and agriculture, foundations
engineering, cryology and glaciology, marine geology, benthic habitats,
coastal survey have their own vocabularies. If a structured vocabulary
could be made it would open very large opportunities for data mining,
data integration across the single issue of earth surface materials.
The
vocabulary was compiled from multitudes of glossaries, dictionaries,
thesauri, schema and data models. The sources defined or described rock
lithologies, sediments, soils, fluids, landscapes and
habitats, and ice formations. Over 3600 terms are represented from the
18 different but overlapping linguistic corpora.
The
motivations were several, to: (i) have a resource which could be
used to identify documents, datasets relevant to the geosciences,
particularly in the detection of 'dark data', (ii) organize
geomaterials terms as a semantic net, identifying the similarity and
heirarchic relationships between their concepts, (iii) investigate
whether a semantic approach to lithologies could improve the way
dbSEABED handles word-based data.
On the latter, the
dictionary for dbSEABED is now over 15,000 terms (including cliched
phrases), which is becoming unwieldy. There is the potential for
automation of the dictionary and processing with methods such as
Natural Language Processing (NLP), and WordNet methods. However, the results
need to be of very high reliability because dbSEABED is used for
real-world decision-making and risk assessments.
Served Items
a.
Documentation of latest developments using lexical, nomenclatural,
statistical methods to mine for vocabulary, structure (taxonomy), and
ontology. ["http://instaar.colorado.edu/~jenkinsc/dbseabed/resources/geomaterials/GeomaterialsVocab.pdf"]
b.
A zipfile of some of the data products as explained in the documentation.
["http://instaar.colorado.edu/~jenkinsc/dbseabed/resources/geomaterials/GeomaterialsVocab.zip"]
Send queries or comments to "chris.jenkins colorado.edu" .
Continuation
This work is continuing on a collaborative basis - to extend the vocabulary, deepen the structuring, extend the methods.
|
(Click to Expand)
Graphviz
compilation of the geomaterials terms from the core 'Keystone' corpus
plus the WMO Sea-Ice glossary and the International Permafrost
Glossary. This run of the software was assisted by the NSIDC, Boulder,
CO USA.
Above: the entire network, including orphan terms. Below: A close-in of part of the network.
(Click to Expand) |