dREL is a machine-actionable language describing data relationships and designed to be embedded in DDLm dictionaries. The language is defined both explicitly in the dREL publication [1] and implicitly by the dREL code appearing in the DDLm core CIF dictionary. Note that the code in the core CIF dictionary significantly expands the language presented in the paper, for example, by adding category methods.
The present proposals concern the built-in functions that are supported by dREL. No syntax changes or enhancements are proposed.
It is proposed that a _method.purpose of Validation will imply
that all DDLm attribute categories may appear as variables within the
associated dREL method, and that the values of these attributes are
those for the data name being validated. Additional predefined
variables value and category are bound to the particular value
and loop being validated.
DDLm requires that each method have an associated _method.purpose.
The current DDLm attributes dictionary defines Evaluation, Validation
and Definition. The Validation purpose is given as
method compares an evaluation with existing item value
This type of method appears only once in the DDLm core dictionary, to
show how the crystal system can be checked against the cell parameters.
This could equally well be performed by creating a data name whose
Evaluation dREL method returns True if the conditions are met.
The present proposal therefore suggests repurposing Validation methods for
more general validation by providing them with access to all attributes
of the definition of any data name that they are used to check. This allows
the methods to confirm, for example, that a value for a data name matches
the list of allowed enumerated values.
Note that currently, all categories from the dictionary in which a dREL method appears can appear as pre-defined variables in that dREL method, with values obtained from an associated data block. This proposal enhances that list with the attributes for the definition of the data name being checked.
The category and value pre-bound variables are required in order
to represent the generic value(s) being checked.
The following code finds enumerated values that are not allowed. It
would appear in the definition for a data name
valid.bad_enumerated_values. The enumeration_set variable
contains the contents of the enumerated_set category in the
definition for a given data name, and value is bound by the
execution environment to the particular value being checked. The
execution environment is responsible for collating values of this
data name for each data name in the data block being checked.:
# Check that a value is listed in an enumeration
found = 'False'
# Loop over enumerated states in the definition for this
# data name
loop e as enumeration_set {
if (value == e.state) found = 'True';
}
valid.bad_enumerated_value = found
It is proposed to add the following functions to the list of those allowed in dREL methods:
- Reference(name,attribute)
- The value of
attributein the dictionary definition ofnameis returned. Bothattributeandnameare either string literals or string-valued variables. Where the result would be a loop, an appropriate dREL category object would be returned. - Instance(category)
- Returns an instance of category
categoryin the data block provided to the dREL method. - PacketData(container,object)
- Returns specific data corresponding to
objectincontainer. The functional equivalent of the syntaxcat.obj, wherecatis the value ofcontainerandobjis the value ofobject. Ifcontaineris a category, the row must be unambiguous from context, if necessary using the resolution rules of the proposals in Part II. - Lookup(category,keys)
- The functional equivalent of
cat[k1=val1,k2=val2,...]wherecatis the value ofcategoryandkeysis the dictionary{'k1':val1,'k2':val2,...}. - Known(name)
- evaluates to true if a value for the object referenced by
namecan be found, false otherwise. Ifnamedoes not resolve to acategory.objectreference, or the particular row of a multi-row category is unknown, will return false.
A dREL method for checking conformance to requirements arising out of
DDLm attributes (for example, that a value is drawn from a list of values
of a different data name) cannot have 'hard-coded' <category>.<object>
names, as the method would no longer be applicable to all data names.
The above functions are therefore required to provide access into categories
and data in a generalised way.
Reference('atom_type.symbol','enumerated_set')- Return the contents
of the
enumerated_setloop in the definition ofatom_type.symbol. Loop i as Instance( Reference( name.linked_item_id,'_name.category_id'))'- Loop over all rows of the category
containing the data name contained in variable
name.linked_item_id. Note thatname.linked_item_idis not contained in quotes and therefore will be assigned the value given in the definition of the data name being validated. TheReferencefunction returns a string naming the category of the linked data name, and theInstancefunction takes that string and returns a category object that is populated with the values in the data file.
Finding values that are not child values.
# Find values that are not those of the linked data name.
result = 'False'
linked_object = Reference(name.linked_item_id,'_name.object_id')
loop i as Instance(Reference(name.linked_item_id,'_name.category_id')) {
if (PacketData(i,linked_object) == value) result = 'True'
}
valid.is_child_key = result
Finding and returning repeated values of a key data name as the
value of data name valid.not_unique. Note the use of variable
category to refer to the loop being checked.
# find key values that are not unique.
not_unique = []
# Accumulate keys
keylist = []
# get the object id for each key data name
Loop k as category_key {
keylist ++= Reference(k.name,'name.object_id') #Append
}
Loop c as category {
new_val = []
for ko in keylist {
new_val ++= PacketData(c,ko) #Append
}
if (new_val in keylist) {
not_unique ++= new_val
}
else {
keylist ++= new_val
}
valid.not_unique = not_unique
It is proposed that the construction <string1> in <string2> be interpreted
as a boolean statement that returns true if <string1> is a substring of
<string2>.
in in dREL is currently only applied to testing membership in a
List or Array. dREL as published proposes using the Substr
function to test for membership of a string in another string. This
could be more economically performed using the in keyword without
compromising the use for Lists or Arrays. This also accords with the
use of in in Python.
The following functions are proposed for removal from the list of provided functions:
- TopLo, TopHi (sorting low->high, high->low)
- functionality duplicated by combinations of sort() and reverse()
- Substr
- functionality replaced by Proposal 6.
[1] Spadaccini et. al, (2012) J. Chem. Inf. Model. **52**(8) pp 1917-1925