Proposed changes to dREL, part III

Introduction

dREL is a machine-actionable language describing data relationships and designed to be embedded in DDLm dictionaries. The language is defined both explicitly in the dREL publication [1] and implicitly by the dREL code appearing in the DDLm core CIF dictionary. Note that the code in the core CIF dictionary significantly expands the language presented in the paper, for example, by adding category methods.

The present proposals concern the built-in functions that are supported by dREL. No syntax changes or enhancements are proposed.

Proposal 5: enhance meaning of 'Validation' methods

It is proposed that a _method.purpose of Validation will imply that all DDLm attribute categories may appear as variables within the associated dREL method, and that the values of these attributes are those for the data name being validated. Additional predefined variables value and category are bound to the particular value and loop being validated.

Explanation

DDLm requires that each method have an associated _method.purpose. The current DDLm attributes dictionary defines Evaluation, Validation and Definition. The Validation purpose is given as

method compares an evaluation with existing item value

This type of method appears only once in the DDLm core dictionary, to show how the crystal system can be checked against the cell parameters. This could equally well be performed by creating a data name whose Evaluation dREL method returns True if the conditions are met. The present proposal therefore suggests repurposing Validation methods for more general validation by providing them with access to all attributes of the definition of any data name that they are used to check. This allows the methods to confirm, for example, that a value for a data name matches the list of allowed enumerated values.

Note that currently, all categories from the dictionary in which a dREL method appears can appear as pre-defined variables in that dREL method, with values obtained from an associated data block. This proposal enhances that list with the attributes for the definition of the data name being checked.

The category and value pre-bound variables are required in order to represent the generic value(s) being checked.

Example

The following code finds enumerated values that are not allowed. It would appear in the definition for a data name valid.bad_enumerated_values. The enumeration_set variable contains the contents of the enumerated_set category in the definition for a given data name, and value is bound by the execution environment to the particular value being checked. The execution environment is responsible for collating values of this data name for each data name in the data block being checked.:

# Check that a value is listed in an enumeration
found = 'False'
# Loop over enumerated states in the definition for this
# data name
loop e as enumeration_set {
    if (value == e.state) found = 'True';
    }
valid.bad_enumerated_value = found

Proposal 5: Extra validation functions

It is proposed to add the following functions to the list of those allowed in dREL methods:

Reference(name,attribute): The value of attribute in the dictionary definition of name is returned. Both attribute and name are either string literals or string-valued variables. Where the result would be a loop, an appropriate dREL category object would be returned.
Instance(category): Returns an instance of category category in the data block provided to the dREL method.
PacketData(container,object): Returns specific data corresponding to object in container. The functional equivalent of the syntax cat.obj, where cat is the value of container and obj is the value of object. If container is a category, the row must be unambiguous from context, if necessary using the resolution rules of the proposals in Part II.
Lookup(category,keys): The functional equivalent of cat[k1=val1,k2=val2,...] where cat is the value of category and keys is the dictionary {'k1':val1,'k2':val2,...}.
Known(name): evaluates to true if a value for the object referenced by name can be found, false otherwise. If name does not resolve to a category.object reference, or the particular row of a multi-row category is unknown, will return false.

Explanation

A dREL method for checking conformance to requirements arising out of DDLm attributes (for example, that a value is drawn from a list of values of a different data name) cannot have 'hard-coded' <category>.<object> names, as the method would no longer be applicable to all data names. The above functions are therefore required to provide access into categories and data in a generalised way.

Examples

Reference('atom_type.symbol','enumerated_set'): Return the contents of the enumerated_set loop in the definition of atom_type.symbol.
Loop i as Instance( Reference( name.linked_item_id,'_name.category_id'))': Loop over all rows of the category containing the data name contained in variable name.linked_item_id. Note that name.linked_item_id is not contained in quotes and therefore will be assigned the value given in the definition of the data name being validated. The Reference function returns a string naming the category of the linked data name, and the Instance function takes that string and returns a category object that is populated with the values in the data file.

Finding values that are not child values.

# Find values that are not those of the linked data name.
result = 'False'
linked_object = Reference(name.linked_item_id,'_name.object_id')
loop i as Instance(Reference(name.linked_item_id,'_name.category_id')) {
    if (PacketData(i,linked_object) == value) result = 'True'
}
valid.is_child_key = result

Finding and returning repeated values of a key data name as the value of data name valid.not_unique. Note the use of variable category to refer to the loop being checked.

# find key values that are not unique.
not_unique = []
# Accumulate keys
keylist = []
# get the object id for each key data name
Loop k as category_key {
    keylist ++= Reference(k.name,'name.object_id') #Append
    }
Loop c as category {
    new_val = []
    for ko in keylist {
        new_val ++= PacketData(c,ko) #Append
        }
    if (new_val in keylist) {
        not_unique ++= new_val
    }
else {
    keylist ++= new_val
    }
valid.not_unique = not_unique

Proposal 6: Extension of 'in' to substrings

It is proposed that the construction <string1> in <string2> be interpreted as a boolean statement that returns true if <string1> is a substring of <string2>.

Explanation

in in dREL is currently only applied to testing membership in a List or Array. dREL as published proposes using the Substr function to test for membership of a string in another string. This could be more economically performed using the in keyword without compromising the use for Lists or Arrays. This also accords with the use of in in Python.

Proposal 7: Removal of built-in functions

The following functions are proposed for removal from the list of provided functions:

TopLo, TopHi (sorting low->high, high->low): functionality duplicated by combinations of sort() and reverse()
Substr: functionality replaced by Proposal 6.

[1] Spadaccini et. al, (2012) J. Chem. Inf. Model. **52**(8) pp 1917-1925

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposed changes to dREL, part III

Introduction

Proposal 5: enhance meaning of 'Validation' methods

Explanation

Example

Proposal 5: Extra validation functions

Explanation

Examples

Proposal 6: Extension of 'in' to substrings

Explanation

Proposal 7: Removal of built-in functions

FilesExpand file tree

drel_changes_3.rst

Latest commit

History

drel_changes_3.rst

File metadata and controls

Proposed changes to dREL, part III

Introduction

Proposal 5: enhance meaning of 'Validation' methods

Explanation

Example

Proposal 5: Extra validation functions

Explanation

Examples

Proposal 6: Extension of 'in' to substrings

Explanation

Proposal 7: Removal of built-in functions