Metadata Arrangements |
Contents
Introduction Commentary MetaData MetaData and Data Run-time Reports from the Data Mining |
Metadata is important to datasets, often the key to unravelling issues and ambiguities if they arise later. dbSEABED works with the philosophy that Metadata is as important as data, but is actively used in the same way. Also, that each informs and is necessary to the other, and that therefore they should be very closely associated.
dbSEABED holds MetaData either in formatted and data mineable Active Metadata form (e.g., sediment sampler type, dates), or as free-form comments Commentary Metadata , which adds detail to some other line of data. An example of Active Metadata is the SRC theme line, and also many of the entries in the SFS theme. The details of Active Metadata are described under those themes and others.
The structure adopted for Metadata in dbSEABED have been devised to convey metadata efficiently and accurately, but also so that the entry of data remains highly efficient.
In the Data Resource Files, Commentary Metadata is immediately associated with data it refers to. It provides many different type of information on such aspects as: the sources of data, sampling and analytical procedures, edits and changes made to the data when it is brought into dbSEABED, and information not immediately useful in dbSEABED but included for completeness.
db9 version identifies this Commentary Metadata by Theme (eg., GRZ, LTH, SFS) and Currency (eg. to a site, whole dataset, or analysis). The theme is given in Field 2 of a metadata line; currency by field3 which entry is marked like an HTML or XML tag ("<>"). For example:
0,LTH,<dataset>,FieldPattern=;;;Lithology;;;;;EnvironmentThe commentary form of MetaData is not dealt with in the Data Mining, but is output to Relational Database structures generated by "db9_RDB".
0,SFS,<site>,Next site: Recovery would have been 3990cm if bottom 1370cm not lost from barrel;
0,SFS,<site>,….snapped off cut elect. Cable
Commentaries are labelled as Data Type "0", for example "0,COL,<rest_of_dataset>,Oops colour chart fell over board !"is a commentary on progress regarding logging of sediment colours, the comment applying for the remainder of this dataset. By the way, this is an actual comment in one dataset.
The Metadata structure is:
Data Type | Flag for "MetaData" | Always "0" |
Record Type | What theme does the Metadata apply to ? | "XXX" where XXX is any other Data Type such as SRC, SFS, LTH,
TXR,
PET, GTC, OCE, etc. (Also including "FMT" for neutral formatting). |
Currency | Extent of application of the commentary; when does it expire ? | Recognized forms: <file>, <dataset>,
<data_subset>, <site>,<sample>,
<analysis>, <component> and variations with
<next_....> and <rest_of_....>; <> is neutral and used with "0,FMT" formatting. If users wish to have additional forms, make a request. Currency tags should not include blanks; use underscore instead. |
Comment | The MetaData Comment | Any text without commas included; preferably <256
characters long;
may be split into successive lines; use a " ", ":", ";" or "/" instead of commas. |
If a MetaData Commentary line precedes the line it refers to, then it should be of form: <next_...>.
The currency of the various tags is defined as follows.
<file> To the whole of the RTF file that the tag is included in.
<dataset> To the whole of the dataset delimited between its SRC line and the succeeding SRC line.
<data_subset> To a portion of a dataset (as defined above), and expiring with the next SRC theme line.
<site> To all of the information associated with the SFS line that this tag is associated with.
<sample> To all of the information for the sample that this tag is associated with, the sample defined by defined by a combination of Phase/Top/Bot.
<analysis> To one data item, such as the carbonate analysis for a sample. Or to an entire GRZ analysis set.
<component> To one fraction, grain type or structure within a sample, for example one line of a GRZ or PET sequence.
What are the criterea for deciding whether information should be held as data or metadata, active or commentary metadata ?
dbSEABED data mining program puts reports of errors, warnings and bad or suspect values to a file "***_MNE.dgn". This file is relationally linkable to the other output fles via site and sample number keys. It is also written so that, opened in EXCEL, the issues encountered during processing can be easily worked through.
The fields in ***_MNE.dgn files are:
RUNrept$ | A program-generated report of progress in the processing. |
ERRrept$ | A program-generated report of a serious issue encountered during run. Could be Fatal to the processing of that data item, or merely a warning. |
PRMrept$ | Field to hold any parameters that might be put out with RUNrept$ or ERRrept$ |
SiteNUM | Relational-style Foreign Key |
SampleNUM | Relational-style Foreign Key |
Dataset$ | (Echoed from SRC) The Dataset name |
Site$ | (Echoed from SFS) The Site name |
Phase$ | (Echoed from data themes) Phase descriptor |
Top$ | (Echoed from data themes) Top location |
Bot$ | (Echoed from data themes) Bot location |
Line$ | (Echoed from data themes) The whole data line, with commas transformed to pipes ('|') |
Chris Jenkins (Email)
INSTAAR, University of Colorado
13-Aug-2002