Search This Blog

Sunday, February 14, 2010

Value of Batch validation?

One of the questions often asked of EDC systems is ‘Where is the batch validation’.  The question I would like to ask, is what is the value of Batch Validation versus Online Validation.

I should start by saying that I have a personal dislike of technology that works in a particular way – because that is the way it has always worked – rather than because a pressing requirement exists to make it work the way it does today.

Performance – Batch Validation generally dates back to the good old days of batch data processing.  With Clinical Data Management systems where the act of entering data, and the triggering of queries were not time critical, batch processing made sense.   The centralized Clinical Data Coordinators would double-enter the data rapidly, and at an appropriate point in time, the batch processing would be triggered, and the appropriate DCF’s lined up for review and distribution.

For EDC – things are different.   It is all about Cleaner Data Faster. So not data checking immediately after entry creates an inherent delay.   No site personnel want to be hit with a Query/DCF hours or even days after data was keyed if it could have been highlighted to them when they originally entered the data – and, presumably had the source data at hand.

A couple of CDM based tools provide both online edit checking, as well as offline batch validation.   The Batch validation elements come from the legacy days of paper CDM as per above.  The online checking is a subsequent add-on created due to the difficulty of efficiently parameterize and executing Batch validation checks per Subject eCRF.

Lets have a look at a couple of other differentiators.

1). Online edit checking tends to run within the same transaction scope as the page – so, when a user sees the submited page – they are able to immediately see the results of the edit check execution.   This means the data submission and edit check execution for all checks must occur in less than a couple of seconds in order to be sufficiently responsive.  With Batch Validation, running across data can be more efficient, and user experience is not impacted – waiting for a page refresh.

I believe most leading EDC products have the performance aspects of real time edit check execution cracked. Networks are faster, computers are maybe 10 time faster than 4 years ago. I don’t believe that performance is an issue in a modern EDC system with properly designed edit checks.

2). Scope – Batch validation is able to read all data  within a subject regardless of visit. In addition, some are also capable of checking across patients.   EDC systems with online validation also generally manage to read all data for a subject, but do not permit reading across subjects. 

3). Capabilities – Most EDC systems edit checking mechanisms are application intelligent, rather than based on SQL or syntax that interprets down to SQL as with Batch Validation.   As a result, the syntaxes tend to more business aware. If you are having to write code – SQL or other syntax, then you have a demand to validate the code in a similar fashion to the vendors validation of the system itself.  Avoiding  coding in favor of a configuration / point and click tool makes the testing considerably easier with automation possible.

4). Architectural Simplicity – If you were a software designer, and you saw a requirement to check data that is entered into a database.  Would you create one syntax, or multiple syntaxes?  Even if  you saw a need for offline batch validation – I think you would go with a single syntax.  If you have a means to balance where and when the rules run, then that might be ideal – either at the client side, application side, or database layer.  Using 2 or even more syntaxes would be something you would avoid.

5). Integration implications – Data that is imported into an EDC or CDM system should go through exactly the same rules regardless of the medium used to capture it - Browser, PDA, Lab, ECG etc. This even applies if you are importing ODM data.  If this is not the case, then downstream data analysis needs to confirm that the validity of the data against the protocol was assured across the devices.  Managing to achieve this if you have separate batch and online edit checking is difficult.


On re-reading the details above, it sounds a bit like I am bashing systems that do Batch Validation.  That is probably slightly unfair.  I have worked with both EDC and CDM systems, and written checks for both. In the paper CDM world, the User Interface for the batch execution of rules makes sense. You choose the appropriate point in time, and, you can determine the scheduling and scope of DCF’s.  So – for a pure Paper environment, this meets requirements.

However,  in an increasing EDC world – I am not sure this has value.  It could be argued that it gives you the best of both worlds.  However, I think it is an unsatisfactory compromise that increases complexity when migrating to focus on EDC. It simply does not create a good scalable solution.  Users will be left wondering why things are so complex.

Thursday, February 11, 2010


Hurrah!  The FDA have made an announcement on their preference towards SDTM!!  Well.   Sort of.   They met up with representatives from CDISC. The CDISC organization wrote down some notes on the discussion, and posted them to their Blog.

Ok – maybe I am being overly flippant. However, why does this message need to come out by proxy from CDISC?  Why can the FDA CDER / CBER not step off the fence and make a firm statement on what they want, and when they want it?

One point made was that applying CDASH is the key to attaining SDTM datasets.  Well.  Sort of.  It is a good start point. But, it is only a start point.

The CDASH forms are very closely modeled on the structure of SDTM domains.   Do I always want to capture one domain, on one eCRF form – not always.  Do I want to sometimes capture information that is logically grouped together according to source documents that belongs to multiple domains on the same eCRF – often I do.  We should not compromise the user friendliness and therefore compliance at the sites because of a need to capture data according to the structure of the data extracts.

CDASH was developed around the principle that the EDC or CDM system modeled eCRF’s to equal SDTM domains.   If your EDC or CDM system does not do that, then compliance with CDASH is not entirely valuable.

However – or rather HOWEVER – if you fail to apply equivalent naming conventions to CDASH/SDTM and fail to use matching Controlled Terminology, and, you expect to achieve SDTM – you will be severely disappointed. Achieving SDTM will not be hard – it will be virtually impossible.

With regards to the statement that applying CDASH can create 70-90% savings.  That is not the whole story.  Apply CDASH + standardizing all of the other elements such as rules, visits etc – and automating testing and documentation – yes, then you can achieve a 70-90% savings.