Search This Blog

Sunday, September 28, 2008

Web Services in eClinical

Web Services is one of these technical terms than many folk have heard of, some people understand, and a very few people can actually use.  The definition of a Web Service from a technical perspective - courtesy of Wikipedia -  is "a software system designed to support interoperable machine-to-machine interaction over a network.

From an eClinical perspective, Web Services allow disparate eClinical systems to communicate on a (near) real-time basis.

I believe that Web Services will help resolve many of the integration issues that eClinical systems suffer from today.  You can procedure 2 great systems, but if they don't speak properly together at lot of business value is lost.  Combining CDISC with Web Services may well be a solution to many problems encountered.

Web Services - the Basics

Technologies similar to web services have been around for many many years. For example, when you visit an autoteller, put your Visa card in the slot to withdrawn cash, a Web Service 'type' of communication goes on between the bank you communicate with and the Credit Card company actually releasing the funds.  What they actually say when such communications occur will of course differ depending on application, but, with Web Services, the way they say it is standardized.

Web Services have evolved into many different things, but,  the underlying principles remain the same.  Generally, they communicate using XML (Extensible Markup Language) based text over a Protocol called SOAP.

Many folk will be familiar with XML already - CDISC ODM is build around XML as a means to give meaning to clinical data that is transferred.  SOAP though may be new term.  SOAP, put simply provides a means to transfer - typically over the Internet, XML messages from system to system over firewall friendly channels (or Ports). 

When you open up a browser, and enter something like  what you are actually doing is asking to communicate with google on internet port '80'.  The http equates to Port 80.  You might also see https.  The 's' part signifies 'secure' and indicates the use of Port 443 (known as SSL).  Many corporate and site networks place restrictions on the ports that are open to the internet.  Ports 80 and 443 are some of the few ports almost always open, and therefore usable for communication.  SOAP can use both these ports.  Therefore, web services running on SOAP can speak between systems, avoiding firewall conflicts. This means that if you want System A to speak to System B via Web Services, all you need to do is ensure that an Internet link is available, and you're off and running.

CDISC & eClinical before Web Services

So, what about Web Services, CDISC and eClinical.  Why should I care?

Well, traditionally, eClinical systems have been relatively 'dumb' when it has come to communicating.  An IVR system would be used to capture the recruitment, or randomization of a subject.  The IVR would then send a Text file via old fashioned FTP file transfer to an EDC system, and the EDC system would - at some time in the future - process the text file - creating a new subject, or recording the randomization in the EDC system tables. Sounds ok... but.. what if things go wrong?

With this model, the EDC and IVR systems don't really speak to each other.  The IVR system sends something - yes, but if the EDC system doesn't like it - then oops! The IVR will keep sending things regardless.   That is one issue.   The second issue is that because the two systems don't actively communicate, they cannot cross check (or Handshake) with each other.  Imagine if the EDC system held information that the IVR did not.  Lets say for instance that the investigator recorded in the EDC system that the subject had dropped out. If the investigator later used the IVR to possibly then Randomize this same patient the IVR could check against the EDC system that the subject was valid and current. Maybe not a perfect example, but, the capability exists.

Web Services provides the mechanism for system A to speak with system B.  CDISC ODM provides the syntax with which to communicate.  When both systems make reference to a 'FORM', both systems know what is meant.

Web Services - eClinical  - so...

In traditional systems design, you had a decision to make when you developed new modules of software as part of a suite of applications.  Do I store database information in the same place - sharing a common database, or, do I store it in a separate database and communication / synchronize between the two systems.  If you stored everything in the same database - you simplified the table structure, and didn't need to worry about data replication, but, systems were tied together. If you separated the databases, then of course you had duplicate data between the databases, and you had to replicate.  This replication was complicated and problematic.

Ok, now lets imagine that the systems come from different vendors. Of course each vendor wants to sell their own system independently - a separate database is mandatory.  They hold common information.... no problem, we write interfaces. 

Complicated software is designed to examine information that is common between systems, and transfer this by batch transfer.  So, for example, we have a list of Sites in System A - we also have a list of Sites in System B.  We have a list of site personnel in system A, we also have a list of site personnel in system B - no problem I hear you say. Lets imagine that System A doesn't fully audit trail the information that has changed on the Site's tables.  How would System B know what to take....  we need to transfer all the sites, and compare the site information with the previous site information... getting tricky...and this is just a simple list of sites.

Now, lets imagine a more complicated situation, common in an eClinical system.  A Protocol amendment occurs, a new arm has been added to a study whereby subjects meeting particular criteria are branched into two separate dosing schemes.

Transferring or Synchronizing this sort of information between 2 systems would be possible, but very very difficult.  System A may not have a good place to the put the information from System B.  The question is though - do both systems really need the same data?  If System B wants to know something, why doesn't it just ask System A at the time it needs the answer, instead of storing all the same data itself?

This is where Web Services can come in.

Lets imagine an IVR system wanted to check with an EDC system if a subject was current in a study (current meaning not dropped out, early terminated or a screen failure).  A Web Service could be offered by the EDC system to respond with a 'True' or 'False'  to a call 'IS_SUBJECT_CURRENT' ?  Of course hand-shaking would need to occur before it hand for security and so on, but following this, the IVR system would simply need to make the call, provide a unique Subject identifier, and the EDC system web service would respond with either 'True' or 'False.  With Web Services, this can potentially occur in less than a second.

Lets take this one step further.  The EDC system would like to record a subject randomization.  The site personnel enter all the key information into the EDC system.  The EDC system then makes a Web Service call to the IVR system - passing all of the necessary details.  The IVR takes these details, checks them, and if valid, returns the appropriate subject randomization no..  The EDC system presents the Randomization No. for the subject on the eCRF for the site personnel to use.  This all happens realtime, and via web service calls in systems located in completely different locations.

Web Services -  Metadata independence

Web Services are significant for a number of reasons.  Yes, they allow systems to communicate in a near real-time basis over the internet - that's quite cool in itself.  What's more significant though in terms of eClinical systems is that Systems A and B don't really need to understand how the other systems do what they do.

If System A had to read the database of System B, it would need to understand how System B actually used the data in the database.  The same applies to an interface.  If System A received data from System B, it needs to process that data with an understanding of how System B works before it could use it, or potentially update it.

Web Services - beyond CDISC?

CDISC ODM allows you to transfer data, and to some extent metadata from one system to another.   To ensure it works for all, the support is to some extent, the 'lowest common denominator' of metadata.  It is only really able to describe data that is common and understandable to every other system - (barring extensions - see eClinicalOpinion on these).

Imagine if we could create a common set of Web Service calls.  The common calls would take certain parameters, and, return a potential set of responses.  The Messaging might be based on CDISC ODM, but the actions would be new and common. 

  • Add_Subject(Study, Site, SubjectId) returns ScreeningNo
  • Add_DataValue(Study, Site, Subject, Visit....) returns Success, QueryResponse
  • Read_DataValue(Study, Site, Subject, Visit....) returns DataValue,QueryResponse, DataStatus

With this sort of mechanism, the degree of processing of data and metadata between systems is limited.  The 'owning' system does all the work. The data and metadata that the systems need, stay with the original system. 

One remaining challenge exists - the common indexing of information - if a data value is targeted towards a particular site, subject, Visit, Page and Line, then they all must be known and specified.  That said, a bit of business logic (protocol knowledge) can be applied.  For example, if a DBP is captured for a subject, and the target study only has one reference to DBP for a subject in the whole CRF, should I really need to specify the Visit, Page and Instance? Sufficient uniqueness rules could apply.

If CDISC were to create a standard set of InBound and OutBound Web Service calls, you would see a great simplification in how normally disconnected systems inter-operate.  Not only could we send data from System A to System B, we could appreciate what happens when it gets there - 'Can I login', 'Did that Verbatim Code?' 'Can I have lab data for subject x'... etc etc.

Will Web Services technologies change the eClinical landscape?  No.  But, technology advances such as these all help to make the whole eClinical process somewhat less complicated.

Monday, September 22, 2008

Green EDC?

Besides working in Clinical Trial technologies, I also have an interest in tackling issues around global warming.  Electronic Data Capture in clinical trials has the advantage in that it inherently lends itself towards tackling many of the issues that contribute towards global warming.   I am interested in both the benefits that can be achieved during the implementation cycle, as well as the value during the execution phase.

Green EDC does make a lot of sense from many perspectives.  In almost all instances, working with an electronic medium is considerably faster than working with paper. This leads to a reduction in cycle times, and lower costs. You can basically 'do more with less'.

Using technology today


Probably the most obvious one.

Large, expensive Investigator Meetings were common until only a few years ago.  Changes in rules and attitudes have ensured they have become more functional rather than an investigator incentive. As a result, unnecessary travel has been reduced.  However, are investigator meetings actually required?

From the EDC perspective, training is often provided during the course of the meeting.  This can take up as much as half the time spent.

eLearning solutions, available from many of the EDC vendors, should alleviate the need for Investigator meeting training.  At most, you might be looking at a short demonstration. If eLearning is implemented well, it has many advantages over traditional instructor led training.  It should be integrated into the product, it should be integrated into the workflow, and, it should adaptable to meet the changing training requirements from study to study.

Integration is important. The early eLearning solutions were often implemented as separate tools.  You would be provided a separate login to access the eLearning - you almost had to learn to operate the eLearning tool!   With fully integrated eLearning inside EDC, the EDC system login directs trainee through appropriate eLearning topics based on the role they have been assigned to, before they can participate in the study. Participation and test results are all logged ensuring an electronic record is maintained for process compliance.  Another advantage of eLearning is the ability to train new staff prior to Monitoring visits.  In the past, a new study nurse for example had to wait until potentially the Monitors next site visit.  With eLearning, they simply await a login, perform the eLearning and they are ready to participate.

Less Monitoring Visits

Besides the removal of the need for monitors to attend sites regularly for staff training, the overall number of Monitoring visits should be reduced with effective use of EDC systems themselves.  Monitors are increasingly taking the role of the Data Manager in carrying out the duties of data review over and above the cleaning functions of the study build itself.  Source Data Verification remains necessary, but the actual Q & A regarding the data itself can be carried out remotely.  Less Monitoring visits means typically less air-travel.

Less Paper

This seems obvious - we are talking about replacing paper with an electronic medium - however, we also have paper flows in other areas of the process.   The development of the EDC product itself together with the implementation phase of the system for a study often involves the preparation of many binders of paper materials designed to support a vendor (or regulatory) audit.  Despite the electronic nature of the solutions that EDC companies provide, very few companies effectively push a paper development and implementation process.  The use of electronic document management systems are beginning to change this, but, it does require the support of Quality Assurance and Regulatory groups who tend to be more comfortable with a large pile of paper binders and wet ink signatures than fully electronic systems.

Widely used standards for document signatures are currently a barrier.  The SAFE BioPharma Association have had some success in offering up a electronic document signing solution that could be used across the industry, but it still has some challenges related to Hardware and Technology dependency. We may see progress following SAFE's partnership with CDISC.

Paper itself is not that 'non-green', but the delivery of paper can be.  Organizations involved in clinical trials today are often global.  Fedex'ing a CRF, a specification or a submission in paper format is simply not necessary with technology available today.

Messaging and Video Conferencing

Is it not odd that a technology such as video conference has such limited use in business today, and yet teenagers all round the global use it regularly to chat with their friends over the internet? 

Some EDC systems make use of tools to provide interactive support, but they are often poor.

If I am using an Internet application - such as banking or the like, and I have a problem, I would like to chat - by keyboard or over the phone - immediately.  Messenger services either inside, or outside of the EDC application are available today.  I have yet to see a good implementation of interactive support within the tool.  [I am sure EDC vendors that already have such services built in will correct me here!]

By offering interactive support and communication tools within an EDC product, site personnel can achieve equivalent, or potentially better support than can be achieved through infrequent monitoring visits.


Tomorrows Technology?

So, what can we still do to improve our green credentials when carrying out clinical trials?


Applying standards, can improve overall efficiencies.

Since 2004, SDTM, the Submission Data Tabulation Model from CDISC is part of a move to a standard electronic medium for data submissions.  The electronic submission of course is not new, but the standardization of the format is new making it somewhat easier for regulatory bodies to process data received regardless of the source.  Standard outputs require standard inputs, so with the recent introduction of CDASH - Clinical Data Acquisition Standards Harmonization- a standardization of the input structures in clinical trials - the overall SDTM production process should become less onerous. 

Downstream from EDC, we have the eCTD - electronic Common Technical Document an interface for the pharmaceutical industry to agency transfer of regulatory information.  From 1/1/2008, this is a required format (barring a waiver) by the FDA and has/will result in a great reduction in the amount of paper delivered to regulatory authorities.

Trial Execution Efficiency

Both Adaptive Clinical trials as well as non adaptive studies where a Bayesian continual reassessment method (CRM) is taken, can help ensure that unnecessary trial execution work is avoided.  In the past, trials tended to be more serial in nature - complete study A before moving onto Study B.  With the ability to adjust the design, and, more commonly, the ability to examine data subsets early in the trial, either an early termination or a change of focus can be made saving time, money and of course our impact on the environment.

Source Data Verification

Going back to my eBanking analogy, if I want to record a payment transaction into my eBanking solution, I do not write it down on paper first, and then transcribe what I write down into the eBanking transaction form. There is no value in that.

In clinical trials, if an investigator is involved in a clinical trial where the subject responds to questions during data entry, it would make sense to in certain circumstances to enter these directly into the EDC system, and not on paper first  - as source date - this would avoid potential transcription errors. 

But no....The interpretation of guidelines regarding Source Data typically means that web based systems cannot be used to retain the 'Source Data'.  I personally have an issue with this interpretation, but I will not elaborate here.  It is indicated that the 'Source Data' must remain at the site. Web based systems do not hold data onsite, but instead hold the data on a central server.  [Dave Iberson-Hirst from CDISC provided a good description of the challenges of Source data in his article in Applied Clinical Trials focusing on electronic Patient Diaries - for the sake of brevity, I have summarized the points of argument.]

Workarounds have involved the use of an offline, or hybrid system where the data is stored on a local device at the site.  More commonly though, data is recorded on paper first, and then transcribed into the electronic CRF.  Admittedly, a lot of source data will come from Medical records, and other paper records, however, it seems rather regressive that due to the wording of a regulation, more data is captured on paper first for web based systems that the old fashion Remote Data Entry tools.  Hopefully improved guidance in the near future will avoid this.


I am a great believer that being 'Green' means being efficient in business. Reducing the inefficiencies in how we go about our work not only offers savings in time and money, it can also have a positive impact on our environment. Here's to an electronic clinical world!

Thursday, September 18, 2008

Top 10 mistakes made when implementing EDC

(last update from admin @ eclinicalopinion)

Ok, I am calling for a challenge here.  I am making an attempt at identifying 10 top Top 10 winnermistakes that I believe are made when companies attempt to implement EDC.  No science to this. Just a bit of fun.

I will make edits if anyone posts comments that I believe out do my own;


1. Pick the nastiest, most complex study to implement as a first study

Sponsors may be trying to check the EDC system and vendor as a confirmation of claims of functionality, services and support.  It may also be an internal organizations 'sell' when bringing in a new system. In reality, the risk factors are at their highest, and the chances of failure greater than at any subsequent time in an Enterprise EDC system rollout.  Instead of learning and improving with optimized processes and a well designed workflow model, a 'get it out the door quickly' approach is forced and pain is suffered from all parties!

2. Expecting the return on EDC to be immediate - admin @ eclinicalopinion

Many clients are very experienced with paper and have wrung the very last drop of efficiency out of their process. They start with EDC believing that they are entering a new golden era only to be disappointed with the gains (or losses!) on their first study.
As with any new process or technology, it takes time to refine. The potential gains are real but it will take a few trials before a company hits its stride with EDC.

3. Over emphasis on faster closeout- admin @ eclinicalopinion

Companies new to EDC getting excited about the faster closeout of EDC trials but ignoring the issue of longer start-up times with EDC. With paper you could print the CRF's and send them out before you have finalized (or even built) the database that will finally store the data.

4. Use all the functionality that was demonstrated

A common problem. When a sales person demo's the product, it looks cool. Almost every feature looks good, and could add value.... Well, in reality, not always.  Many EDC systems developed today offer features as a 'tick in the box', but when the feature is used and combined with other features, sometimes the value falls short.  For example, most systems offer some form of data flagging... Reviewed, SDV'd, Frozen, Locked etc etc. Do not use all flags on all fields.  That will be slower than paper.

5. Resource the same way

If you have the same resourcing for Data Management and Monitoring AND you are also resourcing separately for building and testing EDC studies - then you have done something wrong.

With a good EDC product, the rules that would typically be applied manually are applied automatically. The delta should be picked up by a smaller number of 'eyes'.  Many CRO's have played the 'better safe than sorry' card to charge for the same Monitoring and Data Management as paper as well as EDC license and deployment costs.  This demonstrates an inexperienced CRO.

6. Model the eCRF according to the paper CRF Layout

Trying to make an electronic CRF identical to an original paper CRF will result in tears.  The users will be frustrated with the workflow.  The 'e' nature of the medium will not be utilized and the study will be less effective.

Instead, consider appropriate workflow and dynamic eCRF's.  I will stress 'appropriate'. Overdoing the bells and whistles can cause frustration, but, no bells and whistles and many of the advantages of EDC are lost.

7. eCRF Design by committee

The surest way to blown budgets and timelines is to attempt to develop an eCRF based on a committee of individuals.  The sponsor should delegate a chosen few (ideally 1) to work with the EDC CRF Designer. The study should be built to a greater %, then following this period, a wider review should be carried out.

8. Wait until the end of the study to look at the Data

It is surprising how often this is still the case. EDC means cleaner data faster, but often sponsors, and their Statistical departments are geared towards working with final data-sets. Good EDC systems can deliver clean data on a continuous basis.  Whether the data is able to achieve a statistically significant sample size is another question, but, information is often available for companies that are willing to leverage it.

9. Fail to use the built in communication tools provided

Many EDC systems offer the means for different parties involved in the execution to communicate.  These might be in the form of Post it Notes, Query Messages or internal email. Often these facilities are either not used, or not used effectively.  This means that the true status of study is a combination of information in the EDC tool, tasks in Outlook, actions in emails, scribbled notes on a Monitors pad etc etc.

10. Do lots of programming for a study

This covers many areas. It could be programming to handle complicated validation rules, or it could be programming to adapt data-sets to meet requirements. If you're EDC system requires lots of programming in order to define an EDC study, then I would suspect you have the wrong EDC system. Good EDC systems today are configured based on metadata stored in tables. Old systems relied on heavy customization of the core code for each deployment, or, relied on some form of programming in order to complete the study build. If you write code, then you need to test it. The testing is similar to software validation.  This takes time and money. 

Most EDC tools can be extended through 'programming'. If you need to do this, try to do the work outside of the critical path of a study.  Develop an interface, test and dry run it, and then, utilize it in a live study. In this way, you will have time to do it right with proper documentation, support and processes.

and relegated below the top 10...

11. Start by developing Library Standards from Day 1

This may sound like an odd one, but, let me explain.  Implementing EDC effectively with an EDC system, even with a highly experienced vendor, takes time.  All parties are learning, and, modern EDC systems take a while to adapt. Workflow, system settings and integrations all need to come up to speed and be optimized before standards can really be applied and add value. Start too early, and the library standards are just throw aways once the teams come up to speed. It is best to leverage the knowledge and skills of the CRO or Vendor first.

12. Develop your Data Extracts after First Patient In

Sometimes tempting due to tight timelines, but, if you have an EDC tool that requires either programming or mapping of data to reach the target format, then the less consideration you give to the outputs when you design the study, the harder it can be to meet the these requirements after FPI. This leads to higher costs, and, the potential for post deployment changes if you discover something is missing.


Many thanks to admin @ eclinicalopinion for the additional 2 mistakes made, now coming in at numbers 2 & 3!  More comments welcome...

Friday, September 12, 2008

How do Monitors work on site with Online only systems?


I have a question that I would like to ask.

When Monitors go out onsite to carry out Monitoring activities, and, the monitoring activity involves working with the data captured into an EDC system - how do they do it if the EDC system is online only?

  • They could use the site computer - but, often this is a computer shared for other purposes. With increasing security at site locations, often it is not appropriate for an external person to login to a site system.
  • They could use their own laptop with the site network link?  Using the site network infrastructure is often a big no-no. The site will often not provide a wireless network key, again due to security restrictions
  • They could use their own laptop, but with a 3G or similar wireless technology?  Well - not in many hospital establishments.  Mobile communication, including 3G is not permitted.

So, without resorting to an offline solution, how do they work?


One of the functional areas that I have been surprised at the lack of good support for is CTMS features built out of EDC products.

CTMS systems often perform a variety of tasks - Portfolio Management, Site Management, Trial Planning, Monitor Reporting, Study Progress Tracking, the list goes on for a while...

One of the difficulties with using separate EDC and CTMS systems was often the need to either synchronize or interchange the metadata relevant to both systems.  Both systems need to be aware of the visit structure, and, potential workflow associated with more complicated visit structures.  For example, it might be the case that a study has multiple arms, and, one or more trigger points that determine the branching.   In order to predict the CRF's to be completed from a trial planning perspective the CTMS either needs to recreate the structure - including an appreciation of the trigger points, or, it needs to import the structure from the CTMS.  This is all a bit messy.

So, why do EDC vendors not do a better job of leveraging the data and metadata in an EDC database in order to drive at least a % of the needs of a CTMS?

I am open to better ideas here, but to get this started :-

EDC systems try to generalize the definition of studies. Often things like Visit Dates, Visit Numbers, Branching points and subject statuses etc. are all purely data based - often simply fields on a CRF.  They have no special meaning that differentiates them from other CRF fields.  Vendors could tackle this by offering a form of user definable flags.  On the metadata, a spare attribute would be provided that in turn points to an internal metadata codelist. The codelist could be populated by trigger values such as 'Visit Date' or 'Visit Number' etc.  When interfacing, the CTMS system just need to be aware of the special nature of flags, and use them to pick up the one or more fields that are associated with the flags. So, even though 10 different fields are used to hold Visit Dates, provided the 'Visit Date' flag is attached, the CTMS knows the find them. Being data based, no special programming required from study to study.

A second issue is the lack of CTMS information defined on an eCRF. EDC systems are generally only geared to capture data through the eCRF medium. When a sponsor approaches either a CRO or Vendor, often, the contents of a CRF page is gospel. The thought of bolting on the capturing of additional fields is considered inappropriate - eek!  it could be mistaken for clinical data!  - for example - and I have seen it done - a standard could be developed that ensures that for every subject visit, an end of visit form is filled out.  The end of visit form would not capture clinical data. It would be used to capture (or derive) study progress information. Visit Date, End of Visit Subject Status and other such information could be captured a validated in one place. With this kind of information, it is somewhat less difficult to either report on status and progress using native EDC reporting tools, or, feed the standard information across to a CTMS.

The third reason I believe for EDC systems not effectively supporting CTMS needs is the whole lack of planning features - yes, when building a deploying a study it is possible to enter no. of expected subjects at a site level - all EDC systems include this, and, in my experience all EDC system rarely use it - but that doesn't give you the timeline planning - based on a specific recruitment interval, when will I reach my target subject recruitment.... when will the recruited subjects complete LPLV... etc.

So - could EDC systems better leverage the information they have to offer basic CTMS functionality - absolutely.  Should they do more - Yes.

The recent Clinpage article vendors offering multiple integrated solutions suggest that the existence of a fully integrated solution will prove to be a key decision influencer over and above the actual feature functionality of individual systems. There is something to be said for that, but (and this is a big but), the systems must be sufficiently well integrated, robust and flexible to actually work in real life.  I would venture that many of the purportedly fully integrated product suites are in fact separate products loosely coupled with single-sign-on and a similar UI.   I have known (and I will not mention any names) of products that were sold under the same branding, used the same UI, could be accessed through a portal using Single-Sign-on, but that did not share a single application table once utilized - data, and metadata went into entirely separate tables - seamless, yes - a chasm!

With a strong CTMS feature inside an EDC product - does a future exist for standalone CTMS tools - I personally think not.  Although more highly functional, their data entry requirements versus the EDC tools that leverage existing metadata and admin data offers a less convincing value proposition.   Then again, how many good EDC vendors know how to create a good CTMS? 

Friday, September 5, 2008

Tools for validating data in EDC


I have had the fortune to work with a number of different EDC products over the years.  In each case, they implemented features that allowed a study developer to create validate rules associated with the data captured.  Some were good and some were bad. In almost all cases, it was difficult to appreciate the tools shortcomings until a solid amount of work was carried out utilizing them.  They say the devil is in the detail - with EDC edit checking tools, this certainly proves to be the case.

I would like to discuss (or rather ramble on) about the history of validation rule tools in EDC - at least from the mid 90's.


1st Generation Tools - SQL Base

The early tools, primarily before EDC, used either SQL, or an pre-compiled version of SQL together with a scripting language as a syntax for edit checking.  This approach had the advantage that the data access was standardized to a degree with SQL.  The disadvantage was that the SQL worked directly against an underlying database. The developer had to understand and operate against the underlying database in order to correctly write edit checks.

PL/SQL was a common language to leverage. Tools such as Clintrial and DLB Recorder (now eResearch Technologies - eXpert Data Management) relied heavily on the logic and data constructs provided.

The downside to (PL/)SQL based edit check syntaxes were that they often assumed that the underlying database was a relational database that matched the structure of screens that the logic was often associated with.  The product had to therefore be a relational database building tool - good on the surface, but not good when it came to meeting the needs and flexibility of EDC.

2nd Generation Tools - Expression Builders

In the early to mid 1990's, a new set of tools arrived that generally attempted to take away much of the earlier complexity of 1st Generation tools, and, that took advantage of the fact that the underlying data structures were not relational.

The first set of 2nd generation tools tackled the issue of logical data checking through the provision of expression building tools. Initially, these were restrictive with the only means to build the expressions being through a thick client front-end with no free-format expression entry possible.  This made the tool development somewhat easier, and, the corresponding expression parsing simple.  The downside to the approach though was the  it was not possible to define all the required rules in the provided expression builders.

3rd Generation Tools - Hybrid Logic Builders

Expression builders alone were seen as being too restrictive in the development of a complete set of edit checks for a study.  Also, Power users felt constrained. The fallback position for implementations were that edit checking had to be performed at the back-end with SAS or SQL queries. 

To work around these limitations, a 3rd generation of tools were produced that provided a combination of expression building as well as direct syntax entry.  The direct syntax entry was either provided by allowing the developer to edit and extend the code that was derived through the expression builder, or, it was provided as an alternative to expression built code.

The added flexibility of the direct syntax provided a mechanism allowing studies to tackle close to 100% of all edit checking associated with a protocol.  Back end data checking was limited to rules that could not be determined prior to study rollout.

One limitation of the syntax approach is the issue of testing.  With a point and click configuration approach, the scope of testing can be controlled, and, to a degree even automated.  With a open language based syntax, the potential combinations that need to be tested for are higher.  In fact, the testing that may be applied here is equivalent to the sort of testing that is required when carrying out full system validation.  I will be discussing the methods of, and challenges in, testing studies in a later post.

Key Factors in good Validation Rule Syntax

Absolute Data References

This topic primarily applies to script based systems rather than expression builders.  Expression Builders typically present the metadata that exist in drop down lists (Visits, Forms or Field Names). Referencing data fields in a free format expression can be somewhat more challenging.  Take the following example;


This is an example of an absolute reference to a data field. The 1st Inclusion question on the Inclusion/Exclusion Form in Visit 1. But why the brackets?  Well, the metadata has spaces and a / in the names. To ensure the interpreter doesn't think the space or / separates this operand from an operator the brackets bound them.   A way around this of course is to use names without spaces. CDISC offers this with Object Identifiers or OID's. They don't have spaces, so, the issue does not arise.  However, the OIDS can be less than friendly when it comes to making an expression human readable. Anyway, OID's or equivalents are standard practice except where the number of elements in an expression are always fixed.  Even with OID's though, the length of these logical expressions can be horrific.

Programming languages have simplified the issue of dealing with long qualifications by providing variables (or aliases). You define the alias at the start, and then refer to the simple alias name through the expression.  So for the above - you could say;


Then, to compare values, it might be

if INC1= 'No' then Raise Query "XXXXXX"

Wildcard Data References

It is common for the same eCRF pages to be dropped into multiple visits.  In this circumstance, the visit references that may exist in attached edit checks need to change.  The way this is usually achieved is through wildcarding.

If the above Inclusion / Exclusion check appeared in say Visit 1 and Visit 2 (hypothetically) then the Visit reference would need to be wildcarded to ensure it does not refer to Visit 1 when the form is dropped into Visit 2. The tool would replace the wildcard with the appropriate reference based on where the form appears. You can infact have 3 types of reference - Absolute as above, Relative or Any.

'Any' or 'Current' style references are often presented with a '*' a '$' or some other special symbol. This designates that the element is replaced

Relative references are usually derived based on the original source of the edit check. So, if the edit check fired in Visit 2, and the relative reference stated -1 - or 'previous', then this might indicate the current visit -1.

Wildcarding causes some difficulties though when it comes to reuse.  Testing is only predictable when the logic is applied within a study.  If you take the form out of one study and drop it into another, it is possible that with a different visit structure, you may obtain different - potentially undesirable - results.


Data References do have an impact on a number of areas. Well designed data referencing ensures that the maximum amount of re-use can be achieved from study to study as well as within a study. Also, the readability of validation rules is important.  If the Protocol Designer or Data Manager cannot understand the rule that is presented from an EDC system, how can it be assured that the rule is correct?


Other Considerations - Actions

The boolean true/false results of an edit check expression is only one side of the expression.  The other side is the action that place either as the result of a true, or a false result.  Systems designs seem to fall evenly on one of two approaches.  Either the syntax allows one or more actions some of which are to create a Discrepancy (or Query). The second type is where the Discrepancy is the only, and therefore the assumed action.   Clinical Data Management systems often went with just the Logic --> Query approach as the need for actions outside of Discrepancies was limited.

With most modern EDC systems, the edit check language provides the means to carry out one or more actions as the result of a boolean. Additional actions that might be support are things like status changing, assigning values, or even activating or inactivating metadata elements such as forms and fields. Some systems separate the query creation from other actions. The reason behind this is normally to help support protocol updates.  If a tool mixes queries in with other activities, it can be very difficult to deal with the situation where a new protocol definition needs to be applied to existing data.  For queries, it is easy - re-run the logic and if a query is created that was not previously created - then add it.  For other actions - a bit more tricky - for example, if you had set up your system to send out emails if an Serious Adverse Event occurred, then, you wouldn't want the emails to be re-submitted when you applied a protocol update.


Other Considerations - Batch Processing

This is an interesting one. Anyone that has sold EDC systems will have been asked this questions.

Does your system support Batch execution of validation rules?

With very limited exceptions, the answer was always no.  I could argue that in some cases, due to pressure from sales, batch execution was added to the detriment of the EDC products. EDC systems are designed along the principle that bad data is corrected immediately by presenting errors as soon as possible to the data entry person.

The only argument for batch processing that I have seen applied for a positive reason is in the area of performance.  An EDC system that suffers poor performance may resort to batch execution to improve page response times. However, this is often unsatisfactory - CDM systems run across data typically using the efficiencies that single SQL Select statements can bring.  EDC systems often operate on a datapoint by datapoint basis with only limited cache optimisation possible. Batch running EDC edit checks can be tortuously slow. Presenting queries after the user has left the page is also sub-optimal.


The Future?


A gap exist right now in the development of standard, (i.e. CDISC) where they pertain to rules applied to data.   

Why do we need standards for Rules? I hear you say.  

Well, from an EDC study build perspective, the associated edit checks is often the largest single work effort in preparing the study. In fact, in comparison to preparing forms and folders, the edit check together with testing can often be 3-4 times more work. So, when attempting to leverage standards such as ODM, the relative savings that can be achieved related to automating the study build are limited.  

The second reason behind the need for standards around rules is the potential knowledge that might be associated with them.   Imagine you have access to a warehouse of clinical data.  In that warehouse you have 100's of instance of a particular set of data - lets say vital signs.  Can you use all the data? What if some of the data had restrictions that other data did not have?

Rules were originally applied to data in order to determine cleanliness within a study. These rules may also have determined whether the data reached the warehouse.  Inappropriate data may have been filtered out.   By taking away the rules in the warehouse, you take away a proportion of the context behind the data.  If you take the rules - that form part of the metadata - can you really utilize the data in an unbiased way?    Maybe Statisticians will say this doesn't happen, or, the impact is negligible... I am happy to receive comments.


As mentioned in a recent posting on eClinical Opinion there are many input requirements for validation logic.  If you are thinking proprietary, then you will want a syntax that is as close to the sort of language that is used in protocol definitions as possible, will at the same time as concise as is necessary to assure re-use and non ambiguity.  The point and click builders will not go away - they can be useful.  At the same time though, for power users, you need to have high end editor features.  I believe the strongest vendors will create editors that are clinical trial business object aware. When building syntax, they will know for instance that a field may be qualified by a sequence no. or form.  They will understand that given two dates, an Age can be derived.

Device independency may become significant once again. In the past, provided you ran on a browser - things were fine. However, who wants to enter patient diary data on a PDA Browser.  The iPhone is a perfect example. The iPhone Apps are not browser apps. They make use of the internet, but the leverage another UI. By taking the validation rules away from the front end, the actual device used to offer up the questions will not mater. The same rules apply regardless of the capture medium.