Search This Blog

Sunday, February 14, 2010

Value of Batch validation?

One of the questions often asked of EDC systems is ‘Where is the batch validation’.  The question I would like to ask, is what is the value of Batch Validation versus Online Validation.

I should start by saying that I have a personal dislike of technology that works in a particular way – because that is the way it has always worked – rather than because a pressing requirement exists to make it work the way it does today.

Performance – Batch Validation generally dates back to the good old days of batch data processing.  With Clinical Data Management systems where the act of entering data, and the triggering of queries were not time critical, batch processing made sense.   The centralized Clinical Data Coordinators would double-enter the data rapidly, and at an appropriate point in time, the batch processing would be triggered, and the appropriate DCF’s lined up for review and distribution.

For EDC – things are different.   It is all about Cleaner Data Faster. So not data checking immediately after entry creates an inherent delay.   No site personnel want to be hit with a Query/DCF hours or even days after data was keyed if it could have been highlighted to them when they originally entered the data – and, presumably had the source data at hand.

A couple of CDM based tools provide both online edit checking, as well as offline batch validation.   The Batch validation elements come from the legacy days of paper CDM as per above.  The online checking is a subsequent add-on created due to the difficulty of efficiently parameterize and executing Batch validation checks per Subject eCRF.

Lets have a look at a couple of other differentiators.

1). Online edit checking tends to run within the same transaction scope as the page – so, when a user sees the submited page – they are able to immediately see the results of the edit check execution.   This means the data submission and edit check execution for all checks must occur in less than a couple of seconds in order to be sufficiently responsive.  With Batch Validation, running across data can be more efficient, and user experience is not impacted – waiting for a page refresh.

I believe most leading EDC products have the performance aspects of real time edit check execution cracked. Networks are faster, computers are maybe 10 time faster than 4 years ago. I don’t believe that performance is an issue in a modern EDC system with properly designed edit checks.

2). Scope – Batch validation is able to read all data  within a subject regardless of visit. In addition, some are also capable of checking across patients.   EDC systems with online validation also generally manage to read all data for a subject, but do not permit reading across subjects. 

3). Capabilities – Most EDC systems edit checking mechanisms are application intelligent, rather than based on SQL or syntax that interprets down to SQL as with Batch Validation.   As a result, the syntaxes tend to more business aware. If you are having to write code – SQL or other syntax, then you have a demand to validate the code in a similar fashion to the vendors validation of the system itself.  Avoiding  coding in favor of a configuration / point and click tool makes the testing considerably easier with automation possible.

4). Architectural Simplicity – If you were a software designer, and you saw a requirement to check data that is entered into a database.  Would you create one syntax, or multiple syntaxes?  Even if  you saw a need for offline batch validation – I think you would go with a single syntax.  If you have a means to balance where and when the rules run, then that might be ideal – either at the client side, application side, or database layer.  Using 2 or even more syntaxes would be something you would avoid.

5). Integration implications – Data that is imported into an EDC or CDM system should go through exactly the same rules regardless of the medium used to capture it - Browser, PDA, Lab, ECG etc. This even applies if you are importing ODM data.  If this is not the case, then downstream data analysis needs to confirm that the validity of the data against the protocol was assured across the devices.  Managing to achieve this if you have separate batch and online edit checking is difficult.


On re-reading the details above, it sounds a bit like I am bashing systems that do Batch Validation.  That is probably slightly unfair.  I have worked with both EDC and CDM systems, and written checks for both. In the paper CDM world, the User Interface for the batch execution of rules makes sense. You choose the appropriate point in time, and, you can determine the scheduling and scope of DCF’s.  So – for a pure Paper environment, this meets requirements.

However,  in an increasing EDC world – I am not sure this has value.  It could be argued that it gives you the best of both worlds.  However, I think it is an unsatisfactory compromise that increases complexity when migrating to focus on EDC. It simply does not create a good scalable solution.  Users will be left wondering why things are so complex.


eclinical said...

I'm in general agreement with you. There must be a way to execute immediate edit checks in batch for situations where you create a new edit check or want to modify an edit check and re-run it across all the data.

I think we sometimes have a situation where the data is collected and checked in EDC with online validations. The data is then exported as SAS and the SAS programmer creates all kinds of new validations (which by SAS's nature are batch). Some of those might be clever statistical sampling checks, difficult to write in an EDC system but I suspect most are not.

Maybe we can convince Data Management to trust the checks but we'll probably have to beat this habit out of the SAS programmers. They have SAS and they are going to use it!

e-Girl said...

I have a study where the primary site will be the 'data-entry hub' for all nearby smaller satellite sites. The problem is, the EDC system doesnt allow for double data entry, but the sponsor is insisting on this.
From one point I understand as this central site data entry person is entering from someone else's source notes, but on the other hand EDC has done away with DDE.
What is your experience with this situation?
Do you have any documentation I can use to convince the sponsor ?

EDC Consultant said...

Hi eGirl,

Apologies for not replying earlier.

This may be too late, but here is my perspective based on past expereince here.

Data entry is often carried out by study nurses from Source documents. This is common. What is slightly less common is that these source documents are transferred to another site, and then keyed. I can see how this might make your sponsor nervous.

However, ultimately, the Investigator should be signing off on the eCRF - your chosen EDC system should support Investigator signature.

What I would suggest is that the system is configured to ensure each page is signed by the investigator in the EDC system. This will help assure that the values are correct.

I would very strongly advise attempting to do Double Data Entry with an EDC system that does not support it. It cannot do it. It will be highly unsatisfactory for everyone involved.

Contact me privately at the email address at the foot of the blog page and I can provide further advise on how you might resolve issues here.