Search This Blog

Wednesday, November 26, 2008

Return of the 4GL for eClinical - Part 2

In the first part of the series, I described how in the Technology business, Fourth Generation Languages provided a platform for the effective development of database driven application software.   With this instalment, I would like to examine how the principles of a 4GL might be applied in the design of an eClinical Application Development Tool. 

This part of the series of articles will focus on the requirements - in particular for handling rules, and briefly how these requirements might be met with a syntax.  It should be noted that many of the principles apply directly to EDC systems, however, the target application area is not EDC specifically, but rather a full eClinical platform.

Here are the bullet requirement items;

  • Triggers - Capable of defining rules that can be applied based on the occurrence of an event
  • Powerful - capable of describing all required rules that might arise in capturing and cleaning eClinical Data
  • Human readable syntax - it must act as a specification to a user as well as machine readable instructions when executed
  • Referencing - must provide a simple unambiguous mechanism to reference data
  • Repeatable - must be re-executable
  • Multi-Action - Queries - yes, but what about other actions
  • Testable - through a black box paradigm, built in self testing
  • Speed - must execute very quickly. <10ms per rule on a typical environment.
  • Business aware - must provide features that automate certain common business elements typical of eClinical

Triggers

Firing mechanism

Not Winnie the Pooh's buddy, but a logical method of controlling the execution of rules in an EDC system, is to base the triggering mechanism on the change of an input value.  For example, to check if a subject is between the ages of 18 and 65 simply set an input value of the field containing the 'Age'.  When the age changes, the rule executes.

In practice, this works for the majority of requirements, but not for all.  Sometimes, triggers are required based on the change of other study elements or attributes.  For example, it might be required that when a query is raised for a subject, that a rule is triggered.  In this example, this is subject level triggering rather than response value triggering.

To achieve this, the syntax should support the attachment of rules to study objects.  For example a possible syntax for triggering might be;

on change of [subject|visit|repeat|field].[Object Id].[attribute]:

Each object would have a default attribute - this would be applied if not specified.   A sample reference might be;

on change of field.Age:

or

on change of subject.Status:

The default attribute of a field object might be the value. 

Also, the 'on change' syntax would be assumed based on the reference values.  If you referenced the 'age' in a rule, then that would automatically become the trigger point in addition to any 'on change' attributes defined.

Should a rule trigger based on all values being in existence, or, just some of the values?   With EDC application in particular,  a positive response is required - a page that is left blank cannot be assumed to represent no data - it could simply mean the data hasn't yet been entered.  Special consideration is required here.  Readers familiar with eCRF implementations will recognize the common Header question such as 'Have any Adverse Events occurred?' This is often used to interpret missing subsequent values from entries left blank on purpose. From a triggering perspective, rather than looking at value that may not have been entered - such as AE Description - the code would need to first check the AnyAE flag.

Prerequisites

This is a more advanced concept, but important from the perspective of keeping the resulting tool simple.  Looking at the example above.  More thought is required when developed rules when conditional questions are involved.   So - with a dynamic eCRF, how can we simplify things.

A feature I will refer to a 'pre-requisites' that are aware to the rules engine would potentially solve this problem.  It is necessary to relate, at the metadata level the required need for a particular value. In our example, if AnyAE = No, then we would not expect a AEDescription. However, if a rule looked at AEDescription in isolation, it may not work. In reality, the rule would need to check AEDescription AND AnyAE.  Infact, this would be the case on any check that cross referenced the AE eCRF.

Now, lets image we could place a Pre-Requisite rule against the AEDescription field; AnyAE='Yes'.  That rule could meet two purposes.  1). it could control the availability of the field on the page and 2). it could act as a additional criteria for any rule that referenced the field.  With a Pre-requisite principle, the rule would simple check AEDescription.  The AND condition would be automatically added (behind the scenes) based on the attached pre-requisite rule.

The end result to the study developer would be that they wouldn't need to worry about whether or not a value should exist based on other criteria - this would be catered for by the pre-requisites.  The resulting syntax for rules would be easier.

Powerful

This requirement is at odds with some of the other requirements.  How do you create a syntax that can cater for each and every rules situation, while at the same time be human readable, and re-executable.    The answer is that you will never ever create the perfect syntax that does everything.   What you do need, is a syntax that delivers on at least 99% of requirements.  The enhancements to the syntax from that of handling regular expressions need to be business aware.  Constructs for things that are common to eClinical system must be available as standard.  By providing support for these, the need to go outside of the 4GL bounds should be reduced to a negligible degree.

Human Readable Syntax

eClinicalOpinion raised the question of syntax on a comment on Part 1 of the series.  I see syntax applying to all language components.  What I mean by that is that all application components should be representable as a syntax that can be manipulated as free format text through a text editor OR from within an Application Development Environment. The maintenance of the syntax though will depend on the activity being performed.  For example, it might prove easier to create a data collection form (i.e. CRF) through a point and click UI.  The end result though would be a human and machine readable syntax.

So - if the metadata is all described in the form of a syntax, does that mean that the preparation of the syntax cannot be table driven - not at all - a syntax would consist of 3 things - basic language constructs, references to data/metadata and references to application objects. The preparation of the syntax would be through a table driven approach. Lists of metadata (fields, forms etc) would be stored in tables as would list of application objects such as what can be done to a subject or a visit event.

Should the syntax be XML - I don't think so.  XML might be one of the languages for representing the metadata - ala CDISC ODM - but the most effective syntax should be optimized for purpose. XML is not easily Human readable.

A number of other factors govern syntax.  Writing a compiler or an interpreter is not that easy to do well. Also, the execution of the resulting interpreted or compiled code needs to be fast. If the code that is produced needs to be processed many times before it reaches the executable machine code, then the length of time it takes to execute is longer. 4GL's are processed through 4 iterations. There are answers. For ECMAScript or JavaScript - SpiderMonkey is an opensource Javascript engine.  This can be embedded into an application and extended with high level application area constructs.  Other embedded scripting tools are available.

The combination of an open source script engine with an object model that is eClinical aware, it is possible to create a syntax that has all the power and flexibility required, while at the same time keeps the complexity of an underlying (potentially super normalized) database hidden.

Value Referencing

This is a key consideration in the definition of a rules syntax.  The inputs and outputs must be easy to reference (readable), non ambiguous and re-usable.  Control of the inputs and outputs is important in order to assure stability of the rules once deployed.   A form of 'black box' approach allows simplified and potentially even automated testing.

Repeatable

Version and Change Management

One of the critical factors in providing an application development solution for EDC is the need to support Version and Change Management - this is a key factor in defining the scripting language scope. Clinical Trials are unusual in that the structure and rules associated with the data may change during the deployment. More import still, the data that may have already been entered may need to be re-applied to the revised structure and rules.  This can prove to be considerably challenging.  If a Study Builder is given total control to add fields, CRF Pages and Visits to an existing study, it would be virtually impossibly to programmatically manage the mapping of data between an old version of a study definition to the new, and still guarantee data integrity.  What this means is that any environment must support metadata release management. In addition, to protect the integrity of regulatory compliance, once data is entered, it must be impossible to delete it even if a protocol change is applied.

Metadata Release Management

Two methods exist for the redeployment of a revised set of metadata against existing data.   You can either only process the changes, or, you can re-apply the existing data to the new definition.  Both methods are workable, but, the latter option can be slow, and requires an extensive and complete object based audit trail containing the former data and actions that can be re-executed against the new metadata.  The former option - to - process changes - requires that the metadata that has been released is managed.  Once data is entered into a system against a set of metadata - the metadata must go into a 'managed' state where all subsequent changes to the metadata can only be re-deployed when it is compatible with the data that has previously been entered.

So - how does this all impact the syntax?

Well, if the syntax is entirely open as far as the actions it performs, and the inputs that it receives, then it is very difficult to handle changes.  On the other hand, if the syntax is limited to operating in a 'blackbox' fashion - for example, comparing datapoints - and returning a boolean followed by the raising of a Query, then the management of a change, or, specifically, the re-execution of the rule against existing data, is predictable.

Lets imagine a large study. It has 1000 rules associated with data.  The study is up and running, and 1,000's of patients have been created.  During the 4th visit, it is discovered that one of the rules defined needs to be changed. The many thousands of other data points, and, rules executed are fine, but, the data points associated with this one rule needs to be changed. The change may effect the previous outcome.  With a combination of managed metadata - where the system is aware of the rule that has changed - and the object based audit trail, it is possible to limit the impact of the change to only the area of the study, and the associated data effected.  This is achieved by only re-executing the actions relative to the changes.

Some Metadata will arrive from an unmanaged source - for example, as an import from an external tool - in this instance, all unmanaged metadata will be assumed to be 'unclean'  and therefore changed.

Rule Actions

So, if a rule executes - by whatever mechanism - and the result of the execution demands an action, what should the action or actions be?   

Some EDC solutions are limited to raising Discrepancies or Queries. Even for the systems that support other actions, Queries, are by far the most common.  However, EDC systems are often differentiated by their ability to offer more advanced forms of actions.

Conditionally adding CRF Pages is one particular action that makes sense.  Changing the status of elements - such as a Subject status might also be useful.

However, one very specific consideration must be supported.  Any actions carried out here may potentially need to be rolled back, or, re-applied as the result of a Protocol update.  Each action that is offered must be fully compatible with the need for re-execution with no adverse results.

Testable

Many eClinical systems fall short when it comes to supporting the full requirements for implementations. In particular, support for testing.  In a strictly controlled, regulated environment, it is as important to prove that a configured eClinical system has been fully tested. The underlying product must be fully validated of course, but, over and above that, regulatory bodies are becoming increasingly aware of the need for configuration testing. 

Good test support in a eClinical 4GL must be built in from the start.  Adding this later is often impossible to achieve well.

To ensure a language is testable, the metadata objects - rules in many cases - need to be managed. The system must keep track of the elements that have been tested, and those that have not.

Speed

The underlying platform probably has a greater bearing on the performance of the 4GL, than simply the syntax.  Also, with web based systems, network latency is an issue.  A potential language needs to be capable of rapid execution. On a typical eCRF it should take less than 50ms to turn around the submission of a page - outside of network latency. Achieving this requires optimization at each step in the execution process.  Extracting data from a super-normalized database - 1 value from 1 record is an issue.  A means to address this is critical.  Avoiding slow interpretation of high level 4GL code is also key.   If the code that is manipulated by the user can be pre-compiled into a more CPU friendly form, then that will help.

Business Aware

Object Orientation

I must admit to being a person that has struggled to get my head around true - or pure - object orientation.  When people start talking about Polymorphism, modularity etc... it all becomes rather cryptic to me. For study developers this could all be too much.

However, I do think that Object Oriented Programming or OOP does lend itself well to specific business problem modelling.  With eClinical, you have certain rules that can be built into objects.  For example, lets imagine we have created an object called a 'Visit'.   A visit can have an attribute 'Name' with values of 'Screening, Visit 1 etc'.  It can also have another attribute 'Number' with values of  '1.0, 2.0 etc'.   Visits could belong to subjects.  Visits can have CRF Pages associated with them.  By defining these 'business objects' and 'object attributes' within the application tool, we can take away some of the complexity of handling relationships and actions from the study programmer.  Instead of having to create a SQL Select inner join between a Visit table and a CRF Form table, the relationship is pre-formed within the application business layer.

So - object orientation - yes, but, only where the resulting user (study developer) experience when preparing studies could be described as 'simple'.

Conclusion

This is all very high level still, but, the above does contain some concepts that may have value in the definition of a potential eClinical 4GL.  In the next posting of the series, we will most likely look at how technologies such as xForms might be supportive in providing an interactive user interface over a web front end to an eClinical 4GL.

Comments welcome!

2 comments:

Eco said...

Hi Ed,

I've been trying to find the time to come back to these posts. Can I say that this post should have probably been split into at least six. The topics are deep and it's hard to concentrate on one before you're swamped by the next.

Just to look at your proposed trigger syntax for a moment you have:

on change of [subject|visit|repeat|field].[Object Id].[attribute]:

When I look at the CDISC ODM spec, which I think it makes sense to base this on if you want something vendor neutral, I see:

MetaDataVersion
Study Event
Form
Form Repeat
Section
Section Repeat
ItemGroup
ItemGroupRepeat
Item

Then within item there could be a set of attributes apart from value (such as SDV, year part of a date or whatever)

So when you say:

on change of field.Age:

You're really saying,

on change of item.Age.value in any item group, item group repeat, section, section group repeat etc right up to MetaDataVersion?

I would want to see that codified so I could make it clear that this trigger is only valid for a certain appearance of that age field within that hierarchy.

I'd also like to be able to specify a priority or order for a trigger. I might have more than one, they both might modify a value in some way. How do I know which will run first?

on change of field.Age with priority 1....

on change of field.Age with priority 2....

There's a lot more but my brain can only hold so much.

Doug Bain said...

Just getting back to you on this comment. 15 years may be setting a record, but the feedback was valid and deserves a response.

Yes, 6 different topics if not more, but, to get to the 6, you need to start at a top level with enough detail to raise interests.

Regarding the age question - ideally, when a piece of logic / trigger is defined, the qualification of what is stated is as specific and non specific as you need.

This creates challenges though with triggers. Triggers tend to be 'headless' in that they are not really in the loop of context - so, the trigger engine doesn't know if the age is in any particular group, form or visit. So - that headless mode doesn't really work as we want. Imagine if we had age in 2 different places for example.

Instead, we need to think of the execution of triggers as being instead contextual. When a person leaves a field, the reference to the field in the trigger is recognized and the context is compared. If no group, form or field are specified, then the logic engine should assume the context is limited to the triggering operation. This means it qualifies the age with [event].[form].[group].age.[value]. It is almost as if the fully qualified syntax is [current_event].[current_form].[current_group].age..

Now, lets imagine we do not wish to refer to the current, but maybe specifically to a field in a location. lets imagine we want to refer to a date in a particular visit;

.*.*.VisitDT.

This would say I want to refer to the VISITDT in Visit 1 on any form or group.
if I wanted to refer to a specific form, then I would need an alternative syntax to just blank - maybe a ?.

By the way, this syntax, or something similar to this was used in DBL Recorder / eDM.