RAMSES - DataFrames

	RAMSES	Systems Ecology
Start Modules: A-Z Function Layer QuickRefs: DM AuxLib AuxLibE SciLib EasyMW MW ISIS RMSLib
DEFINITION MODULE DataFrames;

  (*******************************************************************

    Module  DataFrames     (DF_Version_2.2)

      Copyright (c) 1997-2006 by Andreas Fischlin, Dimitrios Gyalistras
      and ETH Zurich.

    Purpose   Core routines to work with Data Frames
              (see also module DatFraAux for more advanced
              routines).  Data Frames are read from file(s) and
              stored in memory for analysis and subsequent
              retrieval.

    Remarks   Current implementation assumes any number of
              data frames (spread over any number of files via
              so-called file references) can be read into memory
              for subsequent retrieval.  Actual data are always
              specified in a tabular form.  The syntax of an
              input file is as follows:

              DataFrameFile        =  (FileReference | DataFrame)
                                      {FileReference | DataFrame}.
              FileReference        =  "FILE" "=" fileName ";" [FilterSpecif].
              fileName             =  STRING.
              FilterSpecif         =  "USE" "IF" "FILTER" "=" filterVal ";".
              filterVal            =  LONGINT.
              DataFrame            =  "DATAFRAME" dataFrameIdent ";"
                                      [DataFrameParamList]
                                      "DATA" ":" Table
                                      "END" dataFrameIdent ";".
              dataFrameIdent       =  IDENTIFIER.
              DataFrameParamList   =  { [FilterSpecif] | [ParentOrModelSpecif] |
                                        [RemarkSpecif] | [KeyColumnSpecif] }.
              ParentOrModelSpecif  =  ("PARENT"|"MODEL") "=" parentOrModelID ";".
              parentOrModelID      =  IDENTIFIER | "ANY" | "ALL".
              RemarkSpecif         =  "REMARK" "=" STRING ";".
              KeyColumnSpecif      =  "KEYCOLUMN" "=" keyColumnID ";".
              keyColumnID          =  IDENTIFIER.
              Table                =  TableHeader TableLine {";" TableLine}.
              TableHeader          =  IDENTIFIER {IDENTIFIER} ";".
              TableLine            =  TableEle {TableEle}.
              TableEle             =  (INTEGER | LONGINT | REAL | LONGREAL |
                                      IDENTIFIER | STRING | BOOLEAN).

              Syntax of the elementary data types is (regular expression notation):

                  INTEGER      =  [+-]?[0-9]+
                  LONGINT      =  [+-]?[0-9]+ "D"
                  REAL         =  [+-]?[0-9]+ "." [0-9]+ (("E"|"e")[+-]?[0-9]+)?
                  LONGREAL     =  [+-]?[0-9]+ "." [0-9]+ (("D"|"d")[+-]?[0-9]+)?
                  IDENTIFIER   =  [a-zA-Z] [_a-zA-Z0-9]*
                  STRING       =  ('.*')|(".*")
                  BOOLEAN      =  "TRUE" | "FALSE"

              Comments start with "(*" and close with "*)" and may be nested.

              Examples:

                DATAFRAME SiteBatch; MODEL = ForClim;
                DATA: SiteName; "Bern"; "Davos"; "Bever_S"
                END SiteBatch;

              or

                DATAFRAME ModelParameters;
                  REMARK = 'For any logistic growth model';
                  MODEL = ANY;
                  KEYCOLUMN = Ident;
                DATA:
                (*===========================================================*)
                   Ident  Descr                   val   min    max    unit    ;
                (*-----------------------------------------------------------*)
                   r      'Relative growth rate'  0.7   0.0    10.0   'd^-1'  ;
                   K      'Carrying capacity'   700.0   0.0 1.0E+38   'g/m^2' ;
                (*-----------------------------------------------------------*)
                END ModelParameters;

              Filter specifications serve as a reading filter to
              control conditional recognition of file references or
              data frames.  Only those file references or data frames,
              respectively, are scanned for content and loaded into
              memory, whose filter value (see symbol filterVal in EBNF
              above) actually fits within the filter range
              [fromFilter..toFilter] passed as actual arguments to the
              reading routines ReadDataFramesIntoMemory or
              LoadDataFrames (see below).  Otherwise they are entirely
              skipped and the scanning process continues after the
              file reference or data frame, respectively.  Any file
              reference or data frame without a filter specification
              (optional symbol FilterSpecif, see EBNF above) will be
              treated to have a filter specification like "USE IF
              FILTER = 0 (* = defaultFilter *)".  To recognize, i.e.
              load, all file references and data frames regardless of
              their filter specifications, pass the range
              [minFilter..maxFilter] to ReadDataFramesIntoMemory or
              LoadDataFrames, respectively.

              Note, this module does not export an abstract data
              type DataFrame.  See module DatFraAux instead, which
              exports this type and allows to access and operate on
              individual data frames via its routines.


    NOTE      This module uses package DataTables (all data tables
              are used in locked mode only).


    Programming

      o Design
        Andreas Fischlin          04/03/1997
        Dimitrios Gyalistras      23/07/1997

      o Implementation
        Andreas Fischlin          04/03/1997


    ETH Zurich
    Systems Ecology
    CHN E 35.1
    Universitaetstrasse 16
    8092 Zurich
    SWITZERLAND

    URLs:
        <mailto:RAMSES@env.ethz.ch>
        <http://www.sysecol.ethz.ch>
        <http://www.sysecol.ethz.ch/SimSoftware/RAMSES>


    Last revision of definition:  05/06/1998  AF

  *******************************************************************)


  (********************************)
  (*#####   Error Handling   #####*)
  (********************************)

  FROM DMLanguage IMPORT userBase;


  CONST
    dtfOffset = userBase + 200;           (* constants used for resCode *)
    badLoadFromAnchor     = dtfOffset + 0;  (* internally used by LoadDataFrames *)
    loadUserInterrupted   = dtfOffset + 1;  (* internally used by LoadDataFrames *)
    integerOverflow       = dtfOffset + 2;  (* internally used by VI,VIVec *)
    realOverflow          = dtfOffset + 3;  (* internally used by VR,VRVec *)
    datFraHasNoKeyCol     = dtfOffset + 4;  (* internally used by GetValDefsKeyColumn *)
    subKeyIsRealOrBoolCol = dtfOffset + 5;  (* internally used by GetValDefsSubKeyColumn *)
    (* Note: Error #'s 500..550 (i.e.  userBase + 200..userBase +
    250) are reserved for the entire package DataFrames.  The
    underlying DataTables uses range 600..650.  The DataFrames
    package uses only module Errors and its modes for error
    generation and display.  WARNING: In case the mode is
    suppressing and deferring error messages, you won't learn
    about any errors which have possibly occurred.  See also
    module Errors.  *)



  (*************************************************)
  (*#####   Loading Data Frames Into Memory   #####*)
  (*************************************************)


  TYPE
    ReadingFilter = LONGINT;

  VAR
    defaultFilter: ReadingFilter;       (* READ ONLY *)
    minFilter: ReadingFilter;           (* READ ONLY *)
    maxFilter: ReadingFilter;           (* READ ONLY *)



  PROCEDURE ReadDataFramesIntoMemory (startOnFileWithName: ARRAY OF CHAR;
                                      fromFilter,
                                      tillFilter: ReadingFilter;
                                      VAR resCode: INTEGER);
  PROCEDURE LoadDataFrames (VAR startOnFileWithName: ARRAY OF CHAR;
                            fromFilter, tillFilter: ReadingFilter;
                            VAR resCode: INTEGER);
    (*
      Above procedures use the file with name startOnFileWithName
      (may contain a path) as an anchor to scan and interprete
      data frames.  Any value definitions found during the
      reading process are transferred to memory and can
      subsequently be retrieved by any of the value definition
      retrieval routines described below (e.g.  VR or GetRVec).

      Any file reference or data frame whose filter specification is
      out of range [fromFilter..tillFilter] will be ignored and
      skipped by the scanning process.  Note, file references or data
      frames which do not specify its filter explicitely, i.e.  those
      without the phrase "USE IF FILTER = x" are treated as if they
      would have the specification "USE IF FILTER = defaultFilter (* =
      0 *)".  To load without any filtering pass the range
      [minFilter..maxFilter] to above routines.

      Any data frames already in memory with the same identifier as
      a data frame encountered during the reading process will be
      discarded and replaced (overwritten) with the content of the
      newly read data frame.  This allows for a partial or total
      updating of value definitions or loading of additional data
      frames etc.  See also procedure DropAllDataFrames for freeing
      memory and module DatFraAux for more advanced operations on
      data frames.

      LoadDataFrames is very similar to ReadDataFramesIntoMemory; in
      fact, it calls the same core function as ReadDataFramesIntoMemory,
      but offers additional features like offering a file opening dialog
      in case that startOnFileWithName is the empty string, displays the
      progress by showing in a small window which file is corrently
      processed, and features an automatic error display in case that the
      anchor file can't be accessed properly or ReadDataFramesIntoMemory
      returns with resCode<>Errors.allOk.  startOnFileWithName is also an
      in/out parameter which allows to learn about which file has
      actually been opened as anchor.  If the file opening fails for
      whichever reason, the startOnFileWithName returns untouched.

      All currently installed handlers for a loading event (see module
      DatFraAux, routine AddAnyDFChangedHandler) will always be called
      at the end of ReadDataFramesIntoMemory.
    *)




  (**********************************************)
  (*#####   Retrieving Value Definitions   #####*)
  (**********************************************)

  TYPE
    ALPHA = ARRAY [0..63] OF CHAR;    (* also maximum size of identifiers *)
    ValDefType = (undefined, integer, longint, singlereal, doublereal, boolean,
                  identifier, string);
    (*
      Types of value definitions supported by data frames.
      undefined denotes a value definition which has never been
      made, i.e.  the associated identifier has never been
      found in any data frames by previous calls to
      ReadDataFramesIntoMemory (see also procedure GetValDefType).
      Note, in contrast to a general string an identifier
      follows a particular syntax and starts with a letter and
      contains no special characters, i.e.  its EBNF is:

        identifier ::= letter {letter|digit|"_"}.
    *)


  VAR
    undefInteger: INTEGER;              (* READ ONLY; = MIN(INTEGER)+1 *)
    undefLongInt: LONGINT;              (* READ ONLY; = MIN(LONGINT)+1 *)
    undefReal: REAL;                    (* READ ONLY; = DMConversions.UndefREAL() *)
    undefLongReal: LONGREAL;            (* READ ONLY; = DMConversions.UndefLONGREAL() *)
    undefBoolean: BOOLEAN;              (* READ ONLY; = BOOLEAN(ORD(MAX(BOOLEAN))+1) *)
                                        (* WARNING: undefBoolean may be TRUE or FALSE in a
                                        compiler dependent manner. *)
    undefString: ARRAY [0..0] OF CHAR;  (* READ ONLY; undefString[0] = 0C *)
    (*
       Most above constants are the defaults also used by DataTables.  They
       are returned by the value definition retrieval procedures in case the
       value definition could not be found or an error occurred, like a number
       overflow.
     *)

  (*--------------------------------*)
  (*=====   scalar retrieval   =====*)
  (*--------------------------------*)

  PROCEDURE VI    (ident: ARRAY OF CHAR): INTEGER;
  PROCEDURE VLI   (ident: ARRAY OF CHAR): LONGINT;
  PROCEDURE VR    (ident: ARRAY OF CHAR): REAL;
  PROCEDURE VLR   (ident: ARRAY OF CHAR): LONGREAL;
  PROCEDURE VB    (ident: ARRAY OF CHAR): BOOLEAN; (* returns undefBoolean if undefined *)
  PROCEDURE Flag  (ident: ARRAY OF CHAR): BOOLEAN; (* returns FALSE if undefined *)
  PROCEDURE GetVS (ident: ARRAY OF CHAR;  VAR s: ARRAY OF CHAR);
  PROCEDURE GetVId(ident: ARRAY OF CHAR;  VAR id: ARRAY OF CHAR);


  (*--------------------------------*)
  (*=====   vector retrieval   =====*)
  (*--------------------------------*)

  PROCEDURE GetIVec (ident: ARRAY OF CHAR;
                     VAR ivec: ARRAY OF INTEGER; VAR n: INTEGER);
  PROCEDURE GetLIVec(ident: ARRAY OF CHAR;
                     VAR livec: ARRAY OF LONGINT; VAR n: INTEGER);

  PROCEDURE GetRVec (ident: ARRAY OF CHAR;
                     VAR rvec: ARRAY OF REAL; VAR n: INTEGER);
  PROCEDURE GetLRVec(ident: ARRAY OF CHAR;
                     VAR lrvec: ARRAY OF LONGREAL; VAR n: INTEGER);

  PROCEDURE GetBVec(ident: ARRAY OF CHAR;
                    VAR bvec: ARRAY OF BOOLEAN; VAR n: INTEGER);

  PROCEDURE GetSVec(ident: ARRAY OF CHAR;
                    VAR svec: ARRAY OF ALPHA; VAR n: INTEGER);
  PROCEDURE GetIdVec(ident: ARRAY OF CHAR;
                     VAR idvec: ARRAY OF ALPHA; VAR n: INTEGER);

  PROCEDURE GetSs  (ident: ARRAY OF CHAR; sepChar: CHAR;
                    VAR ssvec: ARRAY OF CHAR; VAR n: INTEGER);
  PROCEDURE GetIds (ident: ARRAY OF CHAR; sepChar: CHAR;
                    VAR idvec: ARRAY OF CHAR; VAR n: INTEGER);
    (*
      Above procedures allow to fetch value definitions as
      contained in all data frames currently stored in memory.

      Basic methods of retrieving value definitions
      ---------------------------------------------
      Value definitions are given by an identifier consisting of
      2 (vector retrieval) or 3 parts (scalar retrieval), where
      parts are delimited by ".".  Ex.:

        "MyModel.Bern.Bucketsize"  - scalar retrieval returns single value
        "MyModel.Bucketsize"       - vector retrieval returns vector of n values

      Formation of ident
      ------------------
      The identifier ident denotes a particular value definition,
      the procedures return the corresponding value (in case of
      vector retrieval additional information on the dimension of
      the vector is also returned). In case of scalar retrieval,
      the middle part needs not be an identifier, but can be
      any string and may even contain a period.  Ex.:

        good idents for value definitions:

         "ForClim.Bern.Beech_species"
         "FC_C.St. Gallen.Bucketsize"

        bad idents for value definitions:

         "ForClim.Bern.Beech specs."
         "FC-C.St. Gallen.Bucketsize"

        Note: There are auxiliary routines (ConcatIdent and
        SplitIdent) available from this module for construction
        and analysis of ident.

      The first part of ident is always formed exactly the same
      way: It is either the identifier of the data frame (see also
      routine DatFraAux.GetDataFrameIdent) or it is the parent or
      model identifier, respectively.  Any data frame may use a so
      called parent or model specification (see EBNF-production
      ParentOrModelSpecif), hereby using a common identifier for
      value definitions.  This allows to construct large sets of
      value definitions denoted by a common first part from
      several data frames.  Ex.:

          DATAFRAME SiteEdaphics; MODEL = ForClim; Data:
          (*-------------------------------------*)
             Site           Bucketsize    SiteID;
          (*-------------------------------------*)
             "Bern"         30.0          333333  ;
             "Bever S"      20.0          222222  ;
             "Davos"        25.0          666666  ;
          END SiteEdaphics;

          DATAFRAME TreeParams; MODEL = ForClim; KEYCOLUMN = SpecIdent; DATA:
          (*-------------------------------------------------------------*)
             SpecIdent      Scientific_name        Common_name;
          (*-------------------------------------------------------------*)
             Fsil           "Fagus silvatica L."   "European beech"       ;
             Aalb           "Abies alba Mill."     "European silver fir"  ;
          END TreeParams;

      Above data frames allow for accessing value definitions,
      which all start with an identical first part, i.e.  idents
      like "ForClim.Site", "ForClim.BucketSize", or
      "ForClim.Fsil.Scientific_name" (see also following
      explanations).


      The remainder parts of ident are constructed in two different
      ways, depending on the retrieval method:

        Retrieval method    Examples of ident
        ----------------    -------------------------
        scalar retrieval    "MyModel.ThirdRowEntry.MyColumnHead"
                            "SiteEdaphics.Bern.BucketSize"
        vector retrieval    "MyModel.MyColumnHead"
                            "SiteEdaphics.BucketSize"

      IMPORTANT NOTE: Scalar retrieval is only possible if the data
      frame uses a keyColumn.  Vector retrieval is always possible.

      In case of scalar retrieval, the first part of ident is
      followed by a '.' and the identifier of a particular column.
      Such an ident always denotes an entire vector of values.  The
      tabulated number of rows defines its dimension n.  E.g.  the
      following data frame

          DATAFRAME SiteEdaphics1; DATA:
          (*-------------------------------------*)
             Site           Bucketsize    SiteID;
          (*-------------------------------------*)
             "Bern"         30.0          333333  ;
             "Bever S"      20.0          222222  ;
             "Davos"        25.0          666666  ;
          END SiteEdaphics1;

      allows to retrieve all bucket sizes by code similar to this

        VAR bs: ARRAY [0..maxNoSites] OF LONGREAL;
        ...
        GetRVec("SiteEdaphics1.Bucketsize",bs,n);
        IF n<=maxNoSites THEN
          (* use bs *)
        ELSE
          Errors.Info(...
        END(*IF*);

      Similarily the corresponding site identifiers could be retrieved by

        VAR ss: ARRAY [0..maxNoSites] OF ALPHA;
        ...
        GetSVec("SiteEdaphics1.Site",ss,n);

      In case of scalar retrieval (a keyColumn is in use), the
      remainder parts of the identifier is built as follows: Again
      first part is followed by a '.', but then comes the string
      found in the particular row of the column headed by
      keyColumn, another '.', plus the identifier of the column.
      E.g.  the following data frame lists in the key column Site
      strings; each of these can be used to denote a particular row.


          DATAFRAME SiteEdaphics2; KEYCOLUMN = Site;
          DATA:
          (*-------------------------------------*)
             Site           Bucketsize    SiteID;
          (*-------------------------------------*)
             "Basel"        30.0          111111  ;
             "Bern"         30.0          333333  ;
             "Bever S"      20.0          222222  ;
             "St. Gotthard" 20.0          444444  ;
             "Bever_N"      15.0          555555  ;
             "Davos"        25.0          666666  ;
             "Bern"         25.0          333333  ;
          END SiteEdaphics2;

      Above data frame allows to retrieve individual bucket sizes
      (Bucketsize) or site identification numbers (SiteID) as
      follows

        myBucketSize := VR("SiteEdaphics2.Basel.Bucketsize");
        theSiteID := VI("SiteEdaphics2.Davos.SiteID");

      or

        GetSVec("SiteEdaphics2.Site",ss,n);
        IF n<=maxNoSites THEN
          FOR i := 0 TO n-1 DO
            ConcatIdent("SiteEdaphics2",ss[i],"Bucketsize",id);
            myBucketSize[i] := VR(id);
          END(*FOR*);
        ELSE
          Errors.Info(...
        END(*IF*);

      To allow for model specific retrieval above header of the
      data frame would need the additional MODEL clause similar
      to this:

          DATAFRAME SiteEdaphics2; KEYCOLUMN = Site;
          MODEL = ForClimS; DATA:
          (*-------------------------------------*)
             Site           Bucketsize    SiteID;
          (*-------------------------------------*)
             "Basel"        30.0          111111  ;
             ...
             ...

      allowing for retrievals similar to this

          myBucketSize := VR("ForClimS.Basel.Bucketsize");

      above retrieval would also work if the headers of data
      frame SiteEdaphics2 would contain the reserved word ANY or
      ALL, i.e. "MODEL = ANY" or "MODEL = ALL".




      Uniqueness of Value Definitions
      -------------------------------
      ASSERTION: Since the reading process of data frames unifies
      any value definitions referring to the same value, the
      access of a particular value definition via its identifier
      ident and its expected type is unique at all times.

      Note, in the above example for data frame SiteEdaphics2,
      which contains 2 entries for "Bern", above assertion becomes
      relevant. Module DataFrames always unifies all data definitions
      encountered by treating any conflicts like redifinitions and
      overwriting a particular value definition with the last found
      during reading, i.e.  VR("SiteEdaphics2.Bern.Bucketsize")
      returns in the above example 25.0 and not 30.0.  To learn
      about the fact of multiple definitions use procedure
      DatFraAux.InspectValDef (returned value of nrOfDefs).

      Note also, in case the same identifier denotes value
      definitions of different types, they can still be retrieved
      in a unique way.  E.g.

          DATAFRAME SiteEdaphics3; MODEL = ForClim;
          KEYCOLUMN = Site; DATA:
          (*-------------------------------------*)
             Site           Bucketsize            ;
          (*-------------------------------------*)
             "Bern"            30.0               ;
             "Bever S"         20.0               ;
             "Davos"           25.0               ;
          END SiteEdaphics3;

          DATAFRAME SiteEdaphics4; MODEL = ForClim;
          KEYCOLUMN = Site; DATA:
          (*-------------------------------------*)
             Site           Bucketsize            ;
          (*-------------------------------------*)
             "Bern"            high               ;
             "Bever S"         low                ;
             "Davos"           medium             ;
          END SiteEdaphics4;

      In above example you can either call
      VR("ForClim.Bern.Bucketsize") or you can call
      GetVS("ForClim.Bern.Bucketsize", bucketSizeCode)
      successfully.  The result is different and lets you
      distinguish between the two types of value definitions.
      However, within a particular type, the identifier denotes
      uniquely a single value definition without ambiguity.

      Important note: Be aware that for the sake of simplicity,
      the following types (see ValDefType) are merged and columns
      of one of these types is treated as the same:

         integer with longint,
         singlereal with doublereal,

      This means that a real value definition for a given
      identifier is uniqe within all present real types, i.e.
      regardless wether a particular column is of type singlereal
      or doublereal etc.  Note also this means that you can
      retrieve the very same real value either with VR or with
      VLR.  If the actual value is too large to fit into a single
      real (type REAL), the difference is that you get an overflow
      by using VR.  This means, that VR vs.  VLR, or VI vs.  VLI
      etc.  differ only in their memory requirements for storing
      the returned values, but not which original value definition
      they are referencing in the data base.

      IMPLEMENTATION RESTRICTION: A column which contains only NA
      is of no defined type.  Thus any retrieval of values from
      such a column would always only return undefined values.
      Therefore, for the sake of simplicity and efficiency such
      columns will be completely ignored and attempts to retrieve
      data from them will be treated by this model as if the
      column would be completely absent.

      Finally note that certain priority rules are observed
      during data retrieval to allow for easier assignment of
      data to models in the context of modeling and simulation
      (clauses "MODEL = ANY" and "MODEL = ALL").  See below type
      definition DataFrameSet and routine LastValDefFoundIn.


      Vector dimension (in case of vector retrieval)
      ----------------
      In case of vector retrieval n returns the dimension of the
      actual vector as stored in the data frame where ident was
      found.  Usually n is the number of elements actually assigned
      to the returned vector variable like rvect, ivect etc.  Note
      however, if the callee passes a vector of insufficient
      length, only HIGH(vect) values will be assigned and no error
      message is produced (incomplete retrieval).  Only n will
      return the actual, larger number of elements contained in the
      data frame.  To learn about the actual dimension of the
      vector data use routine GetVectDim in order to proved a
      sufficiently large object for data retrieval.  If the value
      definition couldn't be found, n is 0.


      Type of Value Definitions
      -------------------------
      Note, a data frame defines the type of values hold in a
      particular column by implicit definition.  The first row
      tabulated defines the type of the data expected in the entire
      table (for details see modules DatFraAux and DataTables,
      especially the latter which is used by DataFrames to load
      data frames).  Any subsequent row, which contains an entry in
      a particular column which does not match the expected type,
      will be treated as an erronous condition.  Such errors will
      be normally reported during any calls to LoadDataFrames or
      ReadDataFramesIntoMemory and no such faulty data frame should
      ever be available for retrieval of value definitions.

      If the type of the value definition does not match the
      expected one as given by the routine like VR or GetRVect,
      one of the read only variables undefInteger, undefReal, or
      undefBoolean are returned instead; for the string or
      identifier types, the emtpy string is returned.

      The same is the case for missing values. The latter are
      entered in a data frame by using the symbol NA instead
      of the value.

      IMPLEMENTATION RESTRICTIONS: To allow for an unambiguous
      detection of a missing value definition (symbol NA) in case
      VI, VB, GetVS, or GetVId (GetIVec, GetBVec, GetSVec,
      GetIdVec, GetSS, or GetIds respectively) return undefInteger
      or undefBoolean, inspect n or use also GetValDefType to
      learn whether the value definition is really undefined or
      the data frame happens to contain exactly that value used by
      undefInteger or undefReal.

      A column which contains only NA is of no defined type.  Thus
      any retrieval of values from such a column would always only
      return undefined values.  Therefore, for the sake of
      simplicity and efficiency such columns will be completely
      ignored and attempts to retrieve data from them will be
      treated by this model as if the column would be completely
      absent.


      GetVS vs. GetVId
      ----------------
      GetVS can retrieve only a string, whereas GetVId can
      retrieve only an identifier.


      GetSs/GetIds vs. GetSVec/GetIdVec
      ---------------------------------
      GetSs (or GetIds) returns a vector of strings (or
      identifiers), but in contrast to GetSVec (or GetIdVec) not
      as an open array, but as a single large string, where the
      elements of the vector are delimited by the separator
      sepChar, typically "|".  Use DMStrings.ExtractSubString to
      access invidual elements with the following algorithm

        GetSs(ident,"|",ssvec,n);
        ix := 0; count := 0; ExtractSubString(ix,ssvec,element,"|");
        WHILE (element[0]<>0C) DO
          INC(count); (* process element count *)
          ExtractSubString(ix,ssvec,element,"|");
        END(*WHILE*);
        IF count<>n THEN
          Errors.Info(...
        END(*IF*);

    *)



  (*************************************************************************)
  (*#####   Construction & Analysis of Value Definition Identifiers   #####*)
  (*************************************************************************)

  PROCEDURE ConcatIdent(    dtfParModId, (* data frame (no key column) or
                                        parent or model identifier (with key column) *)
                            rowId,  (* = empty string if vector retrieval *)
                            colId: ARRAY OF CHAR;
                        VAR ident: ARRAY OF CHAR);

  PROCEDURE SplitIdent (    ident: ARRAY OF CHAR;
                        VAR dtfParModId, (* data frame (no key column) or
                                         parent or model identifier (with key column) *)
                            rowId,  (* = empty string if vector retrieval *)
                            colId: ARRAY OF CHAR);
    (*
      Concatenates, respectively splits, ident from/to its parts
      for subsequent uses while calling routines retrieving value
      definitions.
    *)


  PROCEDURE FixIdent(VAR ident: ARRAY OF CHAR);
    (*
      Utility to fix the identifier 'ident', which has recently
      been used to retrieve a value definition according to the
      result by LastValDefFoundIn() (from this module).  Upon
      returning from FixIdent the first part of the ident may
      contain ALL or ANY, respectively, instead of the original
      model name, depending in which set of data frames the
      value definition was actually retrieved.  If the value
      definition could not be retrieved, the first part contains
      "-" instead of the model name.
    *)



  (********************************************)
  (*#####   Type of A Value Definition   #####*)
  (********************************************)

  TYPE
    ValDefTypes = SET OF ValDefType;
    (*
      Since value definitions referred to by the same
      identifier can have different types, a particular
      identifier can be used to retrieve not just a single, but
      a set of value definition types (see also above comments
      on topic "Uniqueness of Value Definitions" and routine
      GetValDefType).
    *)

  PROCEDURE GetValDefType(ident: ARRAY OF CHAR; VAR fdt: ValDefTypes);
    (*
      Allows to learn about the type(s) of the value
      definitions, without actually having to retrieve the
      data.  Returns the empty set if the value definition is
      currently not stored in memory.
    *)



  (***********************************************************)
  (*#####   Origin of Last Retrieved Value Definition   #####*)
  (***********************************************************)

  TYPE
    DataFrameSet = (nowhere, inALL, inSpecific, inANY);

    (*
      Each particular value definitions can be retrieved from 3 basic sets
      of data frames.

        The first set (inALL) is the set of all data frames with the phrase
        "MODEL = ALL" respectively "PARENT = ALL".

        The second set (inSpecific) is the set of all data frames with the
        phrase "MODEL =  " respectively "PARENT =
        " or without that phrase but with the data frame
        identifier .

          E.g.  the value definition given by "MyModel.Bern.Bucketsize"
          defines as the set inSpecific all data frames with the
          phrase "MODEL = MyModel", respectively "PARENT =
          MyModel" plus the single data frame with the name
          "MyModel" but without the phrase "MODEL = ...".

        The third set (inANY) is the set of all data frames with the phrase
        "MODEL = ANY" respectively "PARENT = ANY".

      While retrieving data the following, fundamental priority rules apply:

      1) A value definition present in the set inALL, is retrieved with the
         highest priority.  The first part of the identifier is ignored but
         the remainder parts have to match exactly the value definition.

          E.g.  the value definition given by "MyModel.Bern.Bucketsize" is
          retrieved from the set inALL as soon it contains a data frame
          where the key column contains in a row "Bern" and another column
          is headed by "Bucketsize".  The value definition actually used is
          "ALL.Bern.Bucketsize".  This is the case regardless whether
          the value definition is also present in a present data frame
          using "MODEL = MyModel".

      2) A value definition present in the set inSpecific, is retrieved when
         it can't be retrieved from the set inALL.  Then the entire
         identifier denoting the value definition must match; in particular
         its first part must match exactly  used in
         phrase "MODEL = ", "PARENT =
         ", or in "DATAFRAME ".

          E.g.  the value definition given by "MyModel.Bern.Bucketsize" is
          retrieved from the set inSpecific as soon as this set contains a
          data frame where the key column contains in a row "Bern" and
          another column is headed by "Bucketsize".  The value definition
          actually used is exactly "MyModel.Bern.Bucketsize".  This is the
          case regardless whether the value definition is also contained in
          a data frame present using "MODEL = ANY".

      3) A value definition present in the set inANY, is retrieved when
         it can't be retrieved from the set inALL, nor the set inSpecific.
         Then again as with the set inALL, the first part of the identifier
         is ignored and only the remainder parts have to match the value
         definition.

          E.g.  the value definition given by "MyModel.Bern.Bucketsize" is
          retrieved from the set inANY as soon as this set contains a data
          frame where the key column contains in a row "Bern" and another
          column is headed by "Bucketsize".  The value definition actually
          used is "ANY.Bern.Bucketsize".

      A value definition not present in any of the three sets,
      is not defined and will return values undefined.

      To learn about the origin of a particular, successfully retrieved
      value definition, use procedure LastValDefFoundIn.

      In the context of modeling and simulation above sets serve particular
      purposes: The set inALL is particularily useful to define common
      parameters which are used by several submodels in a structured,
      complex model system.  The set inSpecific of course, does exactly
      the opposite, i.e.  it defines parameters which are specific to
      individual submodels.  The set inANY is useful to define particular
      model parameters without having already a specific model in mind.

      Ex.:

        (*===========================================================*)
        (*                    Model   Parameters                     *)
        (*===========================================================*)
        DATAFRAME ModelParameters;
          REMARK = 'For any logistic growth model';
          MODEL = ANY; KEYCOLUMN = Ident;
        DATA:
        (*===========================================================*)
           Ident  Descr                   val   min    max    unit    ;
        (*-----------------------------------------------------------*)
           r      'Relative growth rate'  0.7   0.0    10.0   'd^-1'  ;
           K      'Carrying capacity'   700.0   0.0 1.0E+38   'g/m^2' ;
        (*-----------------------------------------------------------*)
        END ModelParameters;
        (*===========================================================*)


      Using module ModDatAccess it is easy possible to have the tabulated
      value definitions be assigned to any logistic growth model,
      regardless of its identifier.  I.e.  retrieving the values
      VR("MyModel.r.val"), GetVS("MyModel.r.Descr",descr), or
      VR("MyModel.r.min") etc.  will always be successful thanks to the
      fact the data frame ModelParameters is of model ANY.  However, if the
      data frame would be DATAFRAME ModelParameters OF MODEL Logistic
      retrieving data by VR("MyModel.r.val") would not succeed (For
      explanations on the procedures VR, GetVS etc. see their comments).

    *)

  PROCEDURE LastValDefFoundIn(): DataFrameSet;
    (*
      Immediately after retrieving a value definition use this
      routine to learn about the set of data frames from which
      the value definition has been retrieved.  See also
      explanations on TYPE DataFrameSet (No implementation
      restrictions).
    *)

  PROCEDURE GetLastValDefsDFIdent (VAR lastDataFrameIdent: ARRAY OF CHAR);
    (*
      Immediately after retrieving a value definition use this
      routine to learn about the identifier of the data frame
      from which the value definition has been retrieved.
      IMPLEMENTATION RESTRICTION: Works only if the data frame
      has not been dropped in the meantime.
    *)


  (******************************************)
  (*#####   Special Vector Retrieval   #####*)
  (******************************************)

  PROCEDURE GetVectDim (ident: ARRAY OF CHAR; ofType: ValDefType; VAR n: INTEGER);
    (*
      Allows to learn about the dimension n of the vector
      denoted by identifier ident and of type ofType in the
      context of vector retrieval without actually having to
      retrieve the data.  Returns n = 0 if value definition is
      currently not present.
    *)

  PROCEDURE GetValDefsKeyColumn (ident: ARRAY OF CHAR; ofType: ValDefType;
                                 sepChar: CHAR; VAR keys: ARRAY OF CHAR;
                                 VAR n: INTEGER);
    (*
      Allows to retrieve the key column of the data frame in
      which the value definition denoted by identifier ident
      and of type ofType is contained.  n returns the dimension
      of the key column vector (regardless wether all its
      content fits within keys or not).  Actual keys are
      delimited by character sepChar, typically "|".  See also
      below routine GetValDefsSubKeyColumn.
    *)

  PROCEDURE GetValDefsSubKeyColumn (ident: ARRAY OF CHAR; ofType: ValDefType;
                                    subKeyIdent: ARRAY OF CHAR;
                                    sepChar: CHAR; VAR subKeys: ARRAY OF CHAR;
                                    VAR n: INTEGER);
    (*

      Allows to retrieve the column with identifier subKeyIdent
      of the data frame in which the value definition denoted by
      identifier ident and of type ofType is contained.  The
      column must be of type integer, long integer, string, or
      identifier, or the procedure will return nothing (n=0,
      subKeys empty).  Use for subKeys not an open array, but a
      single large string, where the elements of the vector are
      delimited by the separator sepChar, typically "|".
      Consequently, in case of an integer column, you have to
      convert the strings first to integer values.  The ideas of
      subkeys is simply that in a data frame some columns beside
      the key column may serve to establish a particular order
      among the rows within a data frame.  Typically you can use
      such a column to store indices which help to further
      specify the main key to denote particular rows when
      multiple value definitions are present in data frames.  By
      means of vector retrieval it is then possible to retrieve
      more than just the last value definition by searching
      within the vector denoted by ident in function of the
      independent main key and sub keys of any number.  E.g.:

        DATAFRAME MonitoringPart1; MODEL = ANY; KEYCOLUMN = Ident; DATA:
        (*------------------------------------------------------------*)
        Ident         MonitLev        Filing        Tabulation         ;
        (*------------------------------------------------------------*)
        grass            3            FALSE           FALSE            ;
        grass            4            FALSE           FALSE            ;
        grassDot         3            TRUE            FALSE            ;
        grassDot         4            TRUE            FALSE            ;
        (*------------------------------------------------------------*)
        END MonitoringPart1;


        DATAFRAME MonitoringPart2; MODEL = ANY; KEYCOLUMN = Ident; DATA:
        (*------------------------------------------------------------*)
        Ident         MonitLev    ScaleMin  ScaleMax      Graphing     ;
        (*------------------------------------------------------------*)
        grass            3           0.0     1000.0          Y         ;
        grass            4           0.0      700.0          X         ;
        grassDot         3            NA         NA          noG       ;
        grassDot         4           0.0      500.0          Y         ;
        (*------------------------------------------------------------*)
        END MonitoringPart2;

      The two data frames define for the same keys as given by
      the identifier listed in the key column 'Ident' for the two
      monitoring levels 3 and 4 in the first data frame the
      filing and tabulation attributes, in the second data frame
      the graphing attributes.  Column 'MonitLev' serves here as
      a subkey column, since each variable identifier, e.g.
      grass, has two value definitions tabulated, one for
      monitoring level 3 and the other for level 4.  Of course
      only vector retrieval can retrieve these multiple
      definitions, yet the tabulated data can still be
      interpreted fully, since the order as given by the main key
      and the subkeys (redundantly present in both data frames)
      can still be retrieved and used by calling routines
      GetValDefsKeyColumn and GetValDefsSubKeyColumn like this:

        GetValDefsKeyColumn("MyModel.Tabulation","|",idents,n);
        GetValDefsSubKeyColumn("MyModel.Tabulation",boolean,
                             "MonitLev", monLevsArr, n);

        will return: idents = "grass|grass|grassDot|grassDot"
                     monLevsArr = "3|4|3|4"
                     n = 4
      Note in above example the calls

        GetValDefsKeyColumn("MyModel.Graphing","|",idents,n);
        GetValDefsSubKeyColumn("MyModel.Graphing",identifier,
                             "MonitLev", monLevsArrStr, n);

        retrieve the same information as

        GetSs("MyModel.Ident","|",idents,n);
        GetIVec("MyModel.MonitLev",monLevsArrInt,n);

        which is not necessary the same as what
        GetValDefsKeyColumn("MyModel.Tabulation","...  might
        retrieve, in case the key columns or sub key columns
        should differ in content from on to the other data frame.

      The actual content of the key column and possible subkey
      columns and wether they really correctly define the same
      ordering among two data frames is fully the users
      responsibility.  Same order means for instance that
      individual components from retrieved vectors can be paired,
      e.g.  the vectors 'MyModel.Tabulation' with
      'MyModel.Graphing'.  An analysis of the order defining
      vectors can reveal the validity of the assumption before
      actually using any retrieved data.

    *)



  (*******************************************************)
  (*#####   Dropping Data Frames / Freeing Memory   #####*)
  (*******************************************************)

  PROCEDURE DropAllDataFrames;
    (*
      Removes all data frames from memory and releases the memory
      for other uses.  Note, if you call several times
      LoadDataFrames respectively ReadDataFramesIntoMemory without
      calling DropAllDataFrames inbetween, that data frames tend to
      accumulate in memory, using up more and more memory.  This is
      of course not true, if you reread the same data frame files
      starting with the same anchor (and the same reading filers),
      because data frames with an identical identifier will be
      overwritten with the latest data found in the scanned data
      frame files, hereby reusing the memory space efficiently.
      Note also, this means that you ought to call procedure
      DropAllDataFrames in case you want to make sure that older
      value definitions contained in any previously read data frames
      are forgotten.  In case you wish to forget only about
      individual data frames, see the routines provided by module
      DatFraAux, which support versatile plus efficient rereading
      and management of data frames.  All currently installed
      handlers for a dropping event (see module DatFraAux, routines
      AddAnyDFChangedHandler and AddDFDropHandler) will also be
      called.
    *)



END DataFrames.

Contact RAMSES@env.ethz.ch	Last updated: 25-Jul-2011	[Top of page]