Search This Blog

Monday, December 22, 2008

eClinical in 2009?

So what will the eClinical landscape be like in 2009?   Here is a projection on what I believe will transpire in the eClinical business area over the course of the next year.  As ever, these are purely my opinions, and do not reflect the opinion of the company I work for...

We have seen significant changes over the preceding 2 years.  The number of vendors have shrunk with Medidata and PhaseForward apparently swallowing up the vast majority of new business.   

Oracle have been suggesting the availability of a new, improved product.  If they want to stand a chance against the big 2, then they are going to have to get the EDC portion right this time. Previous attempts were so far from the mark with regards basic EDC functionality and ease of use that they bombed in the market.

Medidata's announcement of Developer Central is potentially more significant.  ClinPage describes this as an API. However, I think that is a simplification that underplays the significance of the release. Back in September, I wrote a blog on Web Services, and how they might impact eClinical.

A number of EDC/CDM vendors have claimed to offer API's - FW-IMPACT and Oracle for example, but many were simply an exposure of a set of Stored Procedures. Techie's developing eClinical systems have talked about a full web service solution for many years. Medidata appear to have take their experiences with CDISC and actually delivered a fully operational solution - a full CDISC compliant Web Service for eClinical data Importing and Exporting.  

Google were the first company to really generate buzz around the principle of an API over the Internet with the release of the Google Toolkit.  With this, any developer with an Internet connection could send a program request - using web service calls - to Google, and receive search responses that they could use for their own purposes.    If the Web Service eClinical API that Medidata has announced is all its potentially cracked up to be, then we should see a long queue of  eClinical providers - eDiary, CTMS etc all lining up to be in a position to offer Medidata Rave real-time connectivity out of the box.

On a technical note, it will be interesting to see if the use of the CDASH standards will be utilized as a means to standardize the metadata used to capture the data.  If this does occur, we could see a simple handshake occurring at the start-point between the two inter-connected systems to confirm 100% compliance with CDASH. If this is the case, then no manual metadata synchronization need occur. In reality, I suspect ALL CDASH based implementations will be customized, but, its a good start.

Back to 2009 forecasting...

The financial crisis will clearly impact the industry.  The smaller companies - BioTech's specifically - will struggle to find cash.  If the development programs they are running rely on a constant cash flow, then they could go under.  This might provide rich pickings for the big companies looking to pick up a bargain, but, potentially at the cost of new innovations.  For the lower end of the eClinical marketplace, things will be tough.    eClinical vendors that can demonstrate lowering costs - especially over paper - will see increasing business.

eDiary vendors will continue to see business grow provided they can show interoperability, and, can manage to keep infrastructure costs down.   We may see an increase in the prevalence of eDiary solutions based around Off the Shelf or pre-existing hardware - the iPhone for example appears to have the ease of use, connectivity  and synchronization capabilities to make it more widely usable then any other OTS product today.

CTMS vendors will increasingly struggle.  The value of such solutions is increasingly marginalized with the enhancements offered by leading EDC system providers.  Similar to the eDiary space, if they can 'play nicely' with the other systems, they stand a chance.  Otherwise, they will be considered unnecessary in the overall eClinical Life Cycle.

Overall, I think we will see the more marginal areas of the eClinical vendor solution business impacted the most.  When finances are tight, the argument for these types of systems will be difficult.  Life Science companies will focus on cost saving and direct efficiency gain solutions.  This will be no time for long term technology prospecting.


For those following these blogs, I wish you well for the holiday period, and Best Wishes for 2009.

Wednesday, November 26, 2008

Return of the 4GL for eClinical - Part 2

In the first part of the series, I described how in the Technology business, Fourth Generation Languages provided a platform for the effective development of database driven application software.   With this instalment, I would like to examine how the principles of a 4GL might be applied in the design of an eClinical Application Development Tool. 

This part of the series of articles will focus on the requirements - in particular for handling rules, and briefly how these requirements might be met with a syntax.  It should be noted that many of the principles apply directly to EDC systems, however, the target application area is not EDC specifically, but rather a full eClinical platform.

Here are the bullet requirement items;

  • Triggers - Capable of defining rules that can be applied based on the occurrence of an event
  • Powerful - capable of describing all required rules that might arise in capturing and cleaning eClinical Data
  • Human readable syntax - it must act as a specification to a user as well as machine readable instructions when executed
  • Referencing - must provide a simple unambiguous mechanism to reference data
  • Repeatable - must be re-executable
  • Multi-Action - Queries - yes, but what about other actions
  • Testable - through a black box paradigm, built in self testing
  • Speed - must execute very quickly. <10ms per rule on a typical environment.
  • Business aware - must provide features that automate certain common business elements typical of eClinical


Firing mechanism

Not Winnie the Pooh's buddy, but a logical method of controlling the execution of rules in an EDC system, is to base the triggering mechanism on the change of an input value.  For example, to check if a subject is between the ages of 18 and 65 simply set an input value of the field containing the 'Age'.  When the age changes, the rule executes.

In practice, this works for the majority of requirements, but not for all.  Sometimes, triggers are required based on the change of other study elements or attributes.  For example, it might be required that when a query is raised for a subject, that a rule is triggered.  In this example, this is subject level triggering rather than response value triggering.

To achieve this, the syntax should support the attachment of rules to study objects.  For example a possible syntax for triggering might be;

on change of [subject|visit|repeat|field].[Object Id].[attribute]:

Each object would have a default attribute - this would be applied if not specified.   A sample reference might be;

on change of field.Age:


on change of subject.Status:

The default attribute of a field object might be the value. 

Also, the 'on change' syntax would be assumed based on the reference values.  If you referenced the 'age' in a rule, then that would automatically become the trigger point in addition to any 'on change' attributes defined.

Should a rule trigger based on all values being in existence, or, just some of the values?   With EDC application in particular,  a positive response is required - a page that is left blank cannot be assumed to represent no data - it could simply mean the data hasn't yet been entered.  Special consideration is required here.  Readers familiar with eCRF implementations will recognize the common Header question such as 'Have any Adverse Events occurred?' This is often used to interpret missing subsequent values from entries left blank on purpose. From a triggering perspective, rather than looking at value that may not have been entered - such as AE Description - the code would need to first check the AnyAE flag.


This is a more advanced concept, but important from the perspective of keeping the resulting tool simple.  Looking at the example above.  More thought is required when developed rules when conditional questions are involved.   So - with a dynamic eCRF, how can we simplify things.

A feature I will refer to a 'pre-requisites' that are aware to the rules engine would potentially solve this problem.  It is necessary to relate, at the metadata level the required need for a particular value. In our example, if AnyAE = No, then we would not expect a AEDescription. However, if a rule looked at AEDescription in isolation, it may not work. In reality, the rule would need to check AEDescription AND AnyAE.  Infact, this would be the case on any check that cross referenced the AE eCRF.

Now, lets image we could place a Pre-Requisite rule against the AEDescription field; AnyAE='Yes'.  That rule could meet two purposes.  1). it could control the availability of the field on the page and 2). it could act as a additional criteria for any rule that referenced the field.  With a Pre-requisite principle, the rule would simple check AEDescription.  The AND condition would be automatically added (behind the scenes) based on the attached pre-requisite rule.

The end result to the study developer would be that they wouldn't need to worry about whether or not a value should exist based on other criteria - this would be catered for by the pre-requisites.  The resulting syntax for rules would be easier.


This requirement is at odds with some of the other requirements.  How do you create a syntax that can cater for each and every rules situation, while at the same time be human readable, and re-executable.    The answer is that you will never ever create the perfect syntax that does everything.   What you do need, is a syntax that delivers on at least 99% of requirements.  The enhancements to the syntax from that of handling regular expressions need to be business aware.  Constructs for things that are common to eClinical system must be available as standard.  By providing support for these, the need to go outside of the 4GL bounds should be reduced to a negligible degree.

Human Readable Syntax

eClinicalOpinion raised the question of syntax on a comment on Part 1 of the series.  I see syntax applying to all language components.  What I mean by that is that all application components should be representable as a syntax that can be manipulated as free format text through a text editor OR from within an Application Development Environment. The maintenance of the syntax though will depend on the activity being performed.  For example, it might prove easier to create a data collection form (i.e. CRF) through a point and click UI.  The end result though would be a human and machine readable syntax.

So - if the metadata is all described in the form of a syntax, does that mean that the preparation of the syntax cannot be table driven - not at all - a syntax would consist of 3 things - basic language constructs, references to data/metadata and references to application objects. The preparation of the syntax would be through a table driven approach. Lists of metadata (fields, forms etc) would be stored in tables as would list of application objects such as what can be done to a subject or a visit event.

Should the syntax be XML - I don't think so.  XML might be one of the languages for representing the metadata - ala CDISC ODM - but the most effective syntax should be optimized for purpose. XML is not easily Human readable.

A number of other factors govern syntax.  Writing a compiler or an interpreter is not that easy to do well. Also, the execution of the resulting interpreted or compiled code needs to be fast. If the code that is produced needs to be processed many times before it reaches the executable machine code, then the length of time it takes to execute is longer. 4GL's are processed through 4 iterations. There are answers. For ECMAScript or JavaScript - SpiderMonkey is an opensource Javascript engine.  This can be embedded into an application and extended with high level application area constructs.  Other embedded scripting tools are available.

The combination of an open source script engine with an object model that is eClinical aware, it is possible to create a syntax that has all the power and flexibility required, while at the same time keeps the complexity of an underlying (potentially super normalized) database hidden.

Value Referencing

This is a key consideration in the definition of a rules syntax.  The inputs and outputs must be easy to reference (readable), non ambiguous and re-usable.  Control of the inputs and outputs is important in order to assure stability of the rules once deployed.   A form of 'black box' approach allows simplified and potentially even automated testing.


Version and Change Management

One of the critical factors in providing an application development solution for EDC is the need to support Version and Change Management - this is a key factor in defining the scripting language scope. Clinical Trials are unusual in that the structure and rules associated with the data may change during the deployment. More import still, the data that may have already been entered may need to be re-applied to the revised structure and rules.  This can prove to be considerably challenging.  If a Study Builder is given total control to add fields, CRF Pages and Visits to an existing study, it would be virtually impossibly to programmatically manage the mapping of data between an old version of a study definition to the new, and still guarantee data integrity.  What this means is that any environment must support metadata release management. In addition, to protect the integrity of regulatory compliance, once data is entered, it must be impossible to delete it even if a protocol change is applied.

Metadata Release Management

Two methods exist for the redeployment of a revised set of metadata against existing data.   You can either only process the changes, or, you can re-apply the existing data to the new definition.  Both methods are workable, but, the latter option can be slow, and requires an extensive and complete object based audit trail containing the former data and actions that can be re-executed against the new metadata.  The former option - to - process changes - requires that the metadata that has been released is managed.  Once data is entered into a system against a set of metadata - the metadata must go into a 'managed' state where all subsequent changes to the metadata can only be re-deployed when it is compatible with the data that has previously been entered.

So - how does this all impact the syntax?

Well, if the syntax is entirely open as far as the actions it performs, and the inputs that it receives, then it is very difficult to handle changes.  On the other hand, if the syntax is limited to operating in a 'blackbox' fashion - for example, comparing datapoints - and returning a boolean followed by the raising of a Query, then the management of a change, or, specifically, the re-execution of the rule against existing data, is predictable.

Lets imagine a large study. It has 1000 rules associated with data.  The study is up and running, and 1,000's of patients have been created.  During the 4th visit, it is discovered that one of the rules defined needs to be changed. The many thousands of other data points, and, rules executed are fine, but, the data points associated with this one rule needs to be changed. The change may effect the previous outcome.  With a combination of managed metadata - where the system is aware of the rule that has changed - and the object based audit trail, it is possible to limit the impact of the change to only the area of the study, and the associated data effected.  This is achieved by only re-executing the actions relative to the changes.

Some Metadata will arrive from an unmanaged source - for example, as an import from an external tool - in this instance, all unmanaged metadata will be assumed to be 'unclean'  and therefore changed.

Rule Actions

So, if a rule executes - by whatever mechanism - and the result of the execution demands an action, what should the action or actions be?   

Some EDC solutions are limited to raising Discrepancies or Queries. Even for the systems that support other actions, Queries, are by far the most common.  However, EDC systems are often differentiated by their ability to offer more advanced forms of actions.

Conditionally adding CRF Pages is one particular action that makes sense.  Changing the status of elements - such as a Subject status might also be useful.

However, one very specific consideration must be supported.  Any actions carried out here may potentially need to be rolled back, or, re-applied as the result of a Protocol update.  Each action that is offered must be fully compatible with the need for re-execution with no adverse results.


Many eClinical systems fall short when it comes to supporting the full requirements for implementations. In particular, support for testing.  In a strictly controlled, regulated environment, it is as important to prove that a configured eClinical system has been fully tested. The underlying product must be fully validated of course, but, over and above that, regulatory bodies are becoming increasingly aware of the need for configuration testing. 

Good test support in a eClinical 4GL must be built in from the start.  Adding this later is often impossible to achieve well.

To ensure a language is testable, the metadata objects - rules in many cases - need to be managed. The system must keep track of the elements that have been tested, and those that have not.


The underlying platform probably has a greater bearing on the performance of the 4GL, than simply the syntax.  Also, with web based systems, network latency is an issue.  A potential language needs to be capable of rapid execution. On a typical eCRF it should take less than 50ms to turn around the submission of a page - outside of network latency. Achieving this requires optimization at each step in the execution process.  Extracting data from a super-normalized database - 1 value from 1 record is an issue.  A means to address this is critical.  Avoiding slow interpretation of high level 4GL code is also key.   If the code that is manipulated by the user can be pre-compiled into a more CPU friendly form, then that will help.

Business Aware

Object Orientation

I must admit to being a person that has struggled to get my head around true - or pure - object orientation.  When people start talking about Polymorphism, modularity etc... it all becomes rather cryptic to me. For study developers this could all be too much.

However, I do think that Object Oriented Programming or OOP does lend itself well to specific business problem modelling.  With eClinical, you have certain rules that can be built into objects.  For example, lets imagine we have created an object called a 'Visit'.   A visit can have an attribute 'Name' with values of 'Screening, Visit 1 etc'.  It can also have another attribute 'Number' with values of  '1.0, 2.0 etc'.   Visits could belong to subjects.  Visits can have CRF Pages associated with them.  By defining these 'business objects' and 'object attributes' within the application tool, we can take away some of the complexity of handling relationships and actions from the study programmer.  Instead of having to create a SQL Select inner join between a Visit table and a CRF Form table, the relationship is pre-formed within the application business layer.

So - object orientation - yes, but, only where the resulting user (study developer) experience when preparing studies could be described as 'simple'.


This is all very high level still, but, the above does contain some concepts that may have value in the definition of a potential eClinical 4GL.  In the next posting of the series, we will most likely look at how technologies such as xForms might be supportive in providing an interactive user interface over a web front end to an eClinical 4GL.

Comments welcome!

Thursday, November 13, 2008


I have read with great interest an article posted in ClinPage - The Future of ODM, SDTM and CDISC.   These discussions relate primarily to the proposed requirement from the FDA for data submissions to be made in XML format rather than SAS Transport file format.   I don't think we will see many arguments around this point - XML is now the accepted extensible method of describing the combined data and metadata.  What is more contentious is that it is requested that data be provided in the HL7 v3 Message format.  FDA Docket No. FDA-2008-N-0428 from August 2008 elaborates on where the FDA are in the process.

In addition to the move to an HL7 Message format rather than SAS XPT, commentary exists on a suggestion that a move to ODM rather than SDTM would be considered.   This point is also put forward by Jozef Aerts of xml4pharma.

I would like to comment on a comparison of SDTM versus ODM.

Operational Data Model

ODM was the first CDISC standard to successfully go through the authoring process.  It was aimed as a means to represent data in to context of data capture. Data was indexed to Visits and Forms. The syntax was designed to describe data not from an effective storage format, but from a source to destination format.  You could get data from System A by Visit and Form to System B by Visit and Form.   This is great where the presentation of the data has importance and meaning.

Submission Data Tabulation Model

SDTM, unlike ODM, focuses on groupings of data - not by CRF Form - but by the use of data.  All Demographics information appears on the same record for example.  The SDTM structure has now also become the basis for data delivery and storage within many organizations.  A number of large PharmaBio companies based internal cross company standards on SDTM.

Modelling from Data Captured

The format of data will differ depending on the medium used to capture the data.  Some form factors might have 30 questions on a form, others such as Patient Diaries, might only have 1 or 2 question per form. In addition, when designing a CRF for ease of use, it may not make sense to apply the content of each SDTM domain as the basis for deciding what does and does not go onto a single form. Whether the data appeared on one form, or across many forms is not important when it comes to the value of the data.  Many EDC vendors have gone down the route of designing the database for data capture according to EAV rules - Entity-Attribute-Value form - where each value captured on any form is dropped into a single table. Once captured, data is then re-modelling into a relation structure that may or may not model the layout of the page. (xForms is a generic technology touted as being a potential means of addressing this challenge - I will leave further discussion on this to a later article).

Based on the above, it would seem logical that SDTM is of greater value when used as the method of delivery of data for submission or analysis than ODM.

However, that is not the only reason why SDTM makes sense over ODM when developing and executing eClinical studies.  The primary reason related to metadata re-use.

ODM is not a suitable format for modelling studies because it does not lend itself to ensuring that similar studies are able to effectively re-use metadata.  Sure - I can take a study, copy the metadata, and I have another study... easy... but what about changes.   What if I remove a few fields, add a few fields, change the visit structure. That will of course change the data outputs format if ODM was the format- an issue - see above, but, more importantly it will greatly impact any rules that might exist on the forms.  Rules that use some form of wildcarding mechanism may, or may not work.   Anyway, this is not a posting on metadata architecture, so I will leave it at that.

Bringing together SDTM and HL7 v3

So back to SDTM and HL7.  Is this the right way to go?   I can understand the logic behind this.  Being able to bring EHR and Clinical Trial data together within a common standard could be very useful.  However, at what cost?  

I am not aware of any eClinical application that automatically creates SDTM compliant data sets - regardless of transport layer.  The mapping of proprietary metadata to SDTM is quiet involved with varying degrees of software development required from the various system vendors.  Typically, either SAS macro transformations are used, or, some form of ETL (Extract, Transform and Load) Tool.  This is all complicated enough. Creating a tool that creates SDTM datasets in HL7 v3 is considerably more complicated. Even for large companies it will be a major development under taking.  The complexity is such that smaller companies will simply fail to manage to effectively deliver the data in a cost effective way.

Tools providers may step in - they may offer a means to convert a basic SDTM ASCII file with additional information into a SDTM HL7 v3 file. XML4Pharma as based on recent critique of the approach do not appear to be wishing to jump into supporting this, but, if this becomes a mandate, some companies will.

Playing on the other side of the argument - one of the principles of XML is that the data is also human readable.  In reality, once you add all of the 'overhead', especially with a complicate syntax such as HL7 v3, you end up with something that is only readable by technical gurus.  But then, maybe it shouldn't be people that interpret these files, maybe the complexity has got to the point where it only makes sense that a computer application interprets the files and then presents the appropriate information to the user.  Modern eClinical systems offer views on data. Maybe the presentation of the Submission data is managed in the same way - through an application that presents a view based on purpose.

Thursday, November 6, 2008

Cleaning the right data

We discussed the lack of significance of EndPoint data in EDC systems today.  I would like to put forward a model for improving the means of raising the significance of Endpoint information.

During a recent presentation by the Paul Clarkson, Director of Clinical Data Management at Genentech, it was described under the banner of Smart Clinical Trials how a better focus is being placing on the definition of data that drives Primary, Secondary and Safety Objectives in Genentech studies.   Paul eluded that the process he followed during the pilot of this approach was to simply create a spreadsheet built up from the events versus the procedures, and then dropping the metadata that was due to be captured into categories of either Primary, Secondary, Safety or Indeterminate purpose data. This was through color coding.  Following this, the assignments were reviewed with appropriate personnel to agree the value, or otherwise of the capturing and cleaning of the data.

Taking that above as a potentially valuable model, not only for identifying data that does not require to be captured, but also identifying the relative significance of the data captured against the target end-points, I started thinking about how this might be effectively support in the eClinical system.

The last end-point discussion posting highlighted a gap in the ability of eClinical systems to correctly prioritize the value behind different types of data.  For example, the cleaning of a verbatim comment entered onto a CRF form unrelated value to achieving any of the study end-points has as much procedural significance as the coding of an Adverse Event term. It is all just data that must be cleaned with equal significance.

For adaptive clinical trials, and for achieving end-point objectives, data is not all of equal significance.  So, how do we support the definition and use of data of differing comparative values.  Lets look at how Genentech did it. They took the metadata - the questions - and the categorized them against one (or more) endpoint objectives. From a study design perspective - without considerable effort, we could potential place a category on the metadata during the eCRF Form preparation.   Of course the categorization in itself has limited value.  The eClinical system would need to do something with it.

Today, EDC system often indicate through workflow and task lists who has to do what.  Currently - this is a blanket rule that does not consider the significance of types of data.  With a Smart model above, the view of the workflow and tasks could be adjusted to present activities that meet specific end point objectives.  So - instead of presenting to a monitor or data manager all outstanding activities, why not provide a list that is ordered, or even filtered by end-point categorization. This would allow the cleaning activity to focus work on information that first and foremost achieves the primary, secondary and safety end points in as short as period of time as possible.   That is not to say that other cleaning activity will not occur - it will - just the priorities will be presented appropriately based on the significance of data to achieving the objective of the study.

For Adaptive Clinical trials, a focus on end-point significance could be a differentiator in quickly achieving the statistically significant sample sizes required to drive dynamic randomization or decision making.

Tuesday, November 4, 2008

Running Rules in EDC - further commentary

Before returning with the 2nd part of the series - 4GL's for eClinical, I just wanted to discuss further the execution of rules in EDC. Here is further commentary on the topic.

All EDC solutions offer a means to set rules that check data that has been entered. Typically, these rules would compare one or more values, against one or more other values, and, either based on a true results, or a false result log a query.  

The traditional CDM systems used to run what are called Consistency Checks on a batch basis - (aka Batch Checks).  This was efficient in that the database would run across all data executing the rules in a sweep - typically once per rule. For a database, this was quiet efficient. However, this was designed to run sometime after the data was recorded. For centralized data entry, where the staff entering the data are not the staff responding to the queries, this was fine. Double Data Entry is often used to capture the data entry errors.

EDC systems work on a different model.  The personnel entering data into the system are typically at the site. It makes sense that once the data is entered into the EDC tool, that the rules run immediately giving the operator the opportunity to make corrections immediately. 

The question is, when should these rules run?

  1. As soon as the data has been entered and the user leaves the field?
  2. As soon as the data is submitted to the database?
  3. Later, as part of batch checking?

Many opinions are held on this topic.  Let me tackle the first, and easiest one.

Batch Checking

Option 3 - run later as part of batch checking.  I don't believe any person feels that running all rules on a batch basis for data entered at site makes sense.  I have heard it argued that 'some' rules should be on a batch basis.  The arguments for this have been a). for performance reasons and/or b). as all values are not available at the time the data is entered. I would respond to this argument by saying a). that a system should not have performance problems that would mean 'any' check cannot run during data entry. EDC systems should run even complex checks in < 100ms.  As to b). this is a design issue.  Most rules engines fire when all values are available, or, do not resolve to do anything if values are missing.  So - at least my opinion batch checking is largely superfluous.

Online Checking

Now - what about between the two online checking options?  At the field level, or on page submit?

Online Checking - Queries

If a user is recording data field by field, it can be distracting to see messages popping up repeatedly.  This is partly a question related to UI.  If the focus of the cursor is not adversely effected, this may be fine. Otherwise, it can be rather annoying. Some of you will be familiar with applications that 'steal' focus.  You think you are correctly keying information into an application only to find that the focus has been grabbed by a popup!  Very frustrating.  So - provided the focus is not impacted, producing queries, at least should be fine.  But what about other activities?  How about Dynamic CRF Pages?

Online Checking - Dynamics

It may make sense to insert a new field, or set of fields, based on a former response.  On paper, the content is fixed - it will say something like - If answered 'Y' then proceed to xxxx.  With an electronic medium, we have an opportunity to adjust the questions asked based on former responses. I believe dynamic forms only cause real problems for 'Heads Down Data Entry' staff [1].     With online EDC, heads down data entry is less common.  What is more typical is that the user reads the question, and completes a response.  If the next question changes, the impact is limited.  A common example - in a demographics form, subject is recorded as 'Female' - Dynamic adds a question such Is subject of child-bearing potential?

From a technical perspective, with web applications it is somewhat easier to handle a full page submit.  On an HTML based form, the actual data entry operates in the same fashion as an old paged style terminal (for those of us old enough to remember them!). The communication between the client (web browser) and the server only occurs when the user hits the save or submit button. 

Web 2.0 / Ajax

Web 2.0 technologies mean lots of things.  One feature though typical of these new application are that they are more interactive then traditional basic HTML paged apps. Ajax is a method now commonly used to create active response to data entry - an early example was used by google - Search-as-you-Type (SayT).  The technology provides the opportunity to capture data entered, and carry out an action immediately as a result - for google, that was to perform a search and present the results based on the term entered so far.   For EDC, this may result in some form of page dynamic such as the adding of a question, or block of questions.  From a browser independence perspective, Ajax doesn't tend to cause problems as code is available for virtually all browsers. The majority of work is completed on the server side.

So - with web2.0 Ajax technologies, what else can we do with online rules execution?  

Well, we can take all the values entered into a CRF Page, compare the values with other values entered on other pages, and execute any action that is suitable for execution prior to data submit.  From eclinical_revolutn's comment, some vendors such as PhaseForward are already doing this.

We could go as far as submitting the data as the page is completed - as the user leaves field 1 and goes to field 2, Ajax is used to submit the value for field 1. The argument against this approach is that users must make a positive statement to submit data. I don't concur with this.  In my mind, the positive statement is that the user has tabbed or cursor'ed out of the field.  The argument for the save as you go approach is that if a connection is lost, at least the data entered up to that point in time is saved. It is a training thing. If a user is trained that data is saved when you leave the field, then by leaving the field, they are confirming the save. A further argument against the save-as-you-go approach is that users are used to simply closing a browser, and the data entered, but not submitted is cancelled. Again, training and the removal of a Save or Submit button.  There are some challenges though - if the user completes information to the last field, and then closes the browser, should the last field value be saved?...

So - are EDC Vendors currently looking at new ways to interact with users using Web 2.0 technologies - I think so. Will we see user interfaces that match the interactivity that is offered by a thick client style rich UI - yes, but not until around 2010.

[1] Heads down data entry -an odd term  used to describe typically rapid keyboard data entry where the user does not look at the screen while keying - for example, a data entry clerk might be reading a paper CRF and entering the data into a CDM system.

Thursday, October 30, 2008

EDC Rules - When should they run?

All EDC systems have some form of rules facility. The rules are typically designed to check data. If the check fails, then typically a query is produced.

Web based EDC systems typically run edit checks when a page is submitted.

Prior to the release of web based systems, it was typical to check the data as soon as it was entered - between fields.

With Web 2.0 technologies, it may prove possible to routinely run checks as soon as data is entered - prior to page submit.

So - a question - which approach is best?

Monday, October 27, 2008

Why eClinical fails to deliver significant ROI


I stumbled across an article posted in ClinPage back in May 2008 that reported on a presentation given by Ron Waife.  I cannot say I always agree with Ron's assessments, however I believe he is 100% accurate with his analysis on this occasion. Steve Woody also made an interesting point regarding a potential solution to the problem.

The crux of Ron's position is that Sponsor companies are fundamentally failing in taking advantage of eClinical technologies, primarily due to a failure to embrace new processes, and to break down silo based working models.

Ron makes a sensible suggestion regarding a potential mode that will work - a skunk work approach - that I fully share.

If there are any Pharma execs out there with the power to make change happen - they would do very well to listen to Ron's advise.

An interpretation of the proposal is defined as follows - purposely greatly simplified! ;

  1. Take an adaptive friendly drug program...
  2. Create a skunk work team comprising of a small number of open minded individuals from each existing department - Protocol Writer, (e)Study Builder, Statistician, Safety Manager, Clinical Lead etc.
  3. Put them in a 'virtual' room, and ask them to work tightly together.
  4. The team must work on an 'Agile' style development approach - [ I will expand on this in a later post ]
  5. The program / studies will be adaptive - the data will be available early and the decisions made rapidly.
  6. The Statistician - playing an active, leading role throughout the program - will model the original program, assess the ongoing (daily) execution against the model and adapt accordingly. 
  7. A leader of this team should be measured based on the effectiveness of the Program - positive or negative - against a plan.

Sometimes, I think we are too focused on shaving a few days of the time to DB Lock.  With an agile adaptive approach - could we not be thinking months and even years of savings?

Steve's suggestion was that a focus on a business model approach might focus the minds of the sponsor companies.   His statement regarding the CRO industry;

... which was created and is sustained by the inefficiency of clinical research, is hooked on the heroin (money).

may come across as rather strong, but I believe there is a degree of truth here.  CRO's are often the most conservative when it comes to change... 'lets do whatever the client that pays the money wants...' even if it is not necessarily good for them...

However, and this is a big 'however'... CRO companies do act on a conservative basis due to a need to provide a low risk solutions.  How many sponsor organizations want to hear about a new 'high risk' implementation method that will be applied to the trial they are responsible for? So - I don't think the blame is entirely merited.

Moving off topic now, so I will close this post... I am interested in hearing comments...

Friday, October 24, 2008

Should eClinical systems be 'EndPoint' aware


EDC Wizard made some interest points in response to the earlier posting 'EDC EndPoints'

The original posting was probably incorrectly titled. It should have maybe said - "Should eClinical systems be EndPoint aware?"

I tend to stay away from the term EDC when I can. I think this term does not really apply now to some of the leading 'EDC' vendors. I think they are still labelled as EDC as customers expect to purchase an EDC solution. However, today, they are more 'Clinical Trial Support Suites'.  Vendors are adding more and more upstream and downstream functionality.  In doing so, some are clueing up to the fact that the 'bit in the middle' - the data capture and cleaning part - may benefit from early involvement from other parties traditionally left out of the mix.

SDV'ing is an interesting point. EDC Wizard states that

Many sponsors are implementing reduced SDV plans that take a risk-based approach to comparing source data to EDC entries

The activity list for Monitors will increasingly be led by the eClinical system tools.  They track what has, and what has not been SDV'd. With a % being applied, a model needs to be applied by the eClinical tool that applies this % appropriately.  I am not aware of a tool that has successfully implemented this. Another challenge exists regarding the classification of data that is eSource and that which is not.

What has, and what has not been SDV'd should not be shown to the Investigator by the tool. I believe most tools support differing views based on user roles. This functionality should be applied.

EDC Wizard goes on to say;

I am not sure I would recommend that EDC systems be modified to flag data as primary, secondary, SDV, or non-SDV. It's hard enough to move from protocol to EDC database to study start without adding more complications to database builds.

A very valid point - tools are becoming increasingly complex. 'Keep it Simple' is certainly a solid principle to hold where possible. However - with the current model of blanket significance / locking --> data delivery, I think we are missing an opportunity for early decision making.  If the move towards define once, use many times continues to be applied with eClinical systems, then complexity may reduce rather than increase - define the endpoint criteria up-front in one place, and have this information take downstream into EDC and onto data delivery.

Thursday, October 23, 2008

EDC Endpoints

Endpoints are defined as an event or outcome that can be measured objectively to determine whether the intervention being studied is beneficial.

EDC systems often ignore the importance of the definition of an EndPoint.  As far as an EDC system is concerned, all data is effectively considered equally significant.  [Possibly correspondents from Medidata and/or Phaseforward can correct me on how Rave and/or Inform respectively, handle this.]

Lets say in a sample clinical trial, you have 100 pages of information captured for a subject, and 10 questions per page.  That is a total of 1000 data values that potential have to be captured.   The capture and cleaning process typically involves the entry, review, SDV and freeze/lock.  The time to perform this for a key data value is the same as the time for an item that has limited significance. 

EDC systems typically use a hierarchical tree structure of status handling.  Every data value is associated with a status.  A Page status is reflective of the status of all the data values on the page. The visit status is reflective of all the CRF Pages in the visit etc.   However, this does place a common blanket significance to all data that is captured.  

It could be argued that all data that is defined as equivalent significance in the execution of a study - the protocol stated a requirement to capture the data for some reason.  However, I believe it can defined at the outset the subset of information that is captured that actually contains endpoint significance.  The question is - going back to our example with 1000 data values per subject - is it possible to make an early assessment of data, based on a statistically safe  error threshold rather than wait until all subject, all visits, all pages and all data values are locked?

For example, let us consider efficacy and in particular efficacy in a Phase II Dose Escalation study.  Information on the dosing of a subject, followed by the resulting measurements of effectiveness may occur relatively quickly in the overall duration of a trial.  However, a blanket 'clean versus not clean' rule means that non of the data can be examined until either ALL the data achieves a full DB lock, or, an Interim DB Lock (all visits up to a defined point) is achieved.

So - a question to the readers - is it possible to make assessments on data even if a portion of the data is either missing, or unverified?

One potential solution might be a sub-classification of data (or rather metadata).

When defining fields, a classification could be assigned that identifies as recorded value as 'end-point' significant.  The actual number of potential endpoints could be list based and defined at a system level. One Primary end-point would be supported with as many secondary end-points as necessary.  A value might be classified against 1 or more endpoint classifications.

The key to the value of this would be on the cleaning and data delivery.  Rather than determining a tree status based on all data values captured, the tree status would be an accumulation of the data values that fell within the endpoint classification.

So - with our example, lets say that of the 1000 data values captured per subject only 150 might be considered of endpoint significance for efficacy.  Mock Endpoint Significance chartOnce all of the data values are captured and designated as 'clean', then the data would be usable for immediate statistical analysis.  Of course other secondary end-points may exist that will demand longer term analysis of the subject data - for example follow-ups.

The chart models that with a typical data capture / cleaning cycle with ongoing analysis of end-point significant data - statistical significant efficacy is determined at 3 months rather than 5.

The potential value that can be gained when making early decisions has been well proven. Adaptive Clinical trials often rely on the principle.  By delivering data of a statistically safe state of cleanliness earlier, we could potential greatly accelerate the overall development process.

Friday, October 3, 2008

Paul Bleicher departs PhaseForward

I was interested to hear that Paul Bleicher has stepped down from the Chair of PF to focus on a new venture in Healthcare Informatics.   I wonder if this is in any way is related to the (potentially scurrilous) gossip that they were looking at procuring ClickFind from Datatrak

Probably not, but the timing though is interesting.  Just when senior management must be looking at the core technology to determine if it has it in it to go after the SaaS or PaaS market.

Dr Bleicher's departure means that the originals at PF - Richard Dale, Paul Bleicher, Jeff Klofft and Gilbert Benghiat have all now moved on.  Richard and Jeff in my view were the original technical visionaries that were supported by some good initial developers led by Gil.  Bleicher gave the company that initial credibility with this CRO and Medical background - in some ways he was the 'expert' user, and clearly has a good head for entrepreneurial business. 

Anyway - good luck to Dr Bleicher!

Interesting times...

Thursday, October 2, 2008

Return of the 4GL for eClinical? - Part 1


In the 1980's the thing of the day as far as Database Application development was RAD 4GL's. That is - Rapid Application Development Fourth Generation Languages.  They were popular because they tackled the problem of slow software development with complex generic tools. They offered high level constructs for developing Database applications.  If you wanted to drawn pretty pictures - sorry. If you wanted to control real-time machinery - sorry.  But, if you wanted to write an Database Application - yes, they worked very well.

In the last couple of years, two particular technologies have been popular with developers - Ruby on Rails and, more recently Dganjo. These are based on 3rd generation tools - Ruby and Python respectively - extended through the development of a standard framework. The frameworks have been developed for supporting Database Driven Website applications.  These are, in a way the 4GL's for the 21'st Century.

I was one of the these early 4GL Developers for a number of years. In my young exuberant days, I used to boast that I could write a full multi-user Stock Control System from scratch in 3 days.  [The truth was, that due to a failed backup - I did actually have to write (again) a full Stock Control System for a client in 3 days!]

One particular 4GL Tool that I was particularly proficient at produced database tables, menus, forms, event driven code, database triggers, reports etc.  I suppose looking back it was a bit like Oracle Forms, but without the nasty complex parts, or the heavy weight toolset.

One of the attributes of the tool was that it provided a programming syntax that was sufficiently business aware to make it relevant for business functions, while at the same time sufficient flexible to be capable of developing complex database applications.  It was the closest syntax I have seen to natural language.  It was the sort of syntax that the developers of SQL and PL/SQL might have produced if they had started again in the mid 80's. The language was sufficient for even the most complex Database applications without having to resort to a 3rd Generation language such as C or Fortran. [Oh dear - I am sounding a bit like a IBM OS/2 user, bitter about Microsoft winning through with Windows!)

Anyway, I am getting off the point.

In thinking about eClinical technologies, and, in particular EDC Tools, I have wondered why a company has not created a 4th Generation Trial Development Tool that offers similar generic features for database, forms and rules authoring while embedding standard features such as standard audit trailing, flag setting, security and web enablement.   At this point, I am sure some readers will be saying - oh but such tools do exist.  Well, yes, you do have 'Study Building' tools, but, they are very specific.  A general language is not provided that can be used across the tool set.

Oracle Corp, eResearch Technology and Domain went down similar routes with Oracle Clinical, eDM(DLB Recorder) and ClinTrial by attempting to leverage existing tools from Oracle Forms x 2, and  Powerbuilder (ClinTrial). However, these tools were not really designed for eClinical specifically.  You ended up using a high level language to dynamically create high level syntax - for example Dynamic SQL.  This became very complicated, proprietary and often slow.  The normalization of the Oracle Clinical Database is an example of where the natural attributes of the Oracle RDBMS and the Forms tools just weren't sufficiently flexible to handle fully dynamic data structures.

Why an eClinical 4GL might makes sense today?

Two principles of a 4GL were High abstraction and Greater statement power.

Abstraction in that you could create data capture forms and reports that were sufficiently abstracted from the database to ensure the user did not need to understand the underlying data structure in order to effectively use the application.

Greater Statement power allowed a small amount of readable code to do a large amount of work.

Both of the above attributes are relevant to the world of eClinical. 

The challenge when designing a good EDC tool is to provide a framework that is as friendly as possible, while at the same time provide sufficient flexibility to perform all of the functions that might be required. Vendors have achieved this by going down one of two routes.  Either the data driven approach where syntax for rules are built up from menu's (i.e. list of Visits, Forms etc), or going a free form syntax route using something like VBScript.  Both approaches fail to a degree.

A purely data tables driven approach is very limited in the constructs that can be built up.  Often, tools have had to fall back to lower level approaches in order to fill the gaps.  Also, because the syntax is effectively built from parameters that are fed into routines within the application tool, the performance can be poor. Optimization is very difficult to achieve.

A free form syntax route also causes problems.  You need to test the validity of the script in a similar fashion to the testing of the actual core product.   The more flexibility - the more room for unexpected actions or results in the deployed system.

So - what is the answer?

Could a hybrid- and in this context - a 4GL Hybrid syntax, that runs within a 4GL application framework be the solution?

  • Should the hybrid syntax be based on a pre-existing language such as ECMAScript, Ruby, Python or some other
  • Should the database interaction be transparently built into the Language (ala MUMPS)
  • Should datatyping be strict or loose?...   [ what is datatyping anyway? ]
  • MVC - what is it, and is it relevant?

I plan on answering these questions in a future posting.

Sunday, September 28, 2008

Web Services in eClinical

Web Services is one of these technical terms than many folk have heard of, some people understand, and a very few people can actually use.  The definition of a Web Service from a technical perspective - courtesy of Wikipedia -  is "a software system designed to support interoperable machine-to-machine interaction over a network.

From an eClinical perspective, Web Services allow disparate eClinical systems to communicate on a (near) real-time basis.

I believe that Web Services will help resolve many of the integration issues that eClinical systems suffer from today.  You can procedure 2 great systems, but if they don't speak properly together at lot of business value is lost.  Combining CDISC with Web Services may well be a solution to many problems encountered.

Web Services - the Basics

Technologies similar to web services have been around for many many years. For example, when you visit an autoteller, put your Visa card in the slot to withdrawn cash, a Web Service 'type' of communication goes on between the bank you communicate with and the Credit Card company actually releasing the funds.  What they actually say when such communications occur will of course differ depending on application, but, with Web Services, the way they say it is standardized.

Web Services have evolved into many different things, but,  the underlying principles remain the same.  Generally, they communicate using XML (Extensible Markup Language) based text over a Protocol called SOAP.

Many folk will be familiar with XML already - CDISC ODM is build around XML as a means to give meaning to clinical data that is transferred.  SOAP though may be new term.  SOAP, put simply provides a means to transfer - typically over the Internet, XML messages from system to system over firewall friendly channels (or Ports). 

When you open up a browser, and enter something like  what you are actually doing is asking to communicate with google on internet port '80'.  The http equates to Port 80.  You might also see https.  The 's' part signifies 'secure' and indicates the use of Port 443 (known as SSL).  Many corporate and site networks place restrictions on the ports that are open to the internet.  Ports 80 and 443 are some of the few ports almost always open, and therefore usable for communication.  SOAP can use both these ports.  Therefore, web services running on SOAP can speak between systems, avoiding firewall conflicts. This means that if you want System A to speak to System B via Web Services, all you need to do is ensure that an Internet link is available, and you're off and running.

CDISC & eClinical before Web Services

So, what about Web Services, CDISC and eClinical.  Why should I care?

Well, traditionally, eClinical systems have been relatively 'dumb' when it has come to communicating.  An IVR system would be used to capture the recruitment, or randomization of a subject.  The IVR would then send a Text file via old fashioned FTP file transfer to an EDC system, and the EDC system would - at some time in the future - process the text file - creating a new subject, or recording the randomization in the EDC system tables. Sounds ok... but.. what if things go wrong?

With this model, the EDC and IVR systems don't really speak to each other.  The IVR system sends something - yes, but if the EDC system doesn't like it - then oops! The IVR will keep sending things regardless.   That is one issue.   The second issue is that because the two systems don't actively communicate, they cannot cross check (or Handshake) with each other.  Imagine if the EDC system held information that the IVR did not.  Lets say for instance that the investigator recorded in the EDC system that the subject had dropped out. If the investigator later used the IVR to possibly then Randomize this same patient the IVR could check against the EDC system that the subject was valid and current. Maybe not a perfect example, but, the capability exists.

Web Services provides the mechanism for system A to speak with system B.  CDISC ODM provides the syntax with which to communicate.  When both systems make reference to a 'FORM', both systems know what is meant.

Web Services - eClinical  - so...

In traditional systems design, you had a decision to make when you developed new modules of software as part of a suite of applications.  Do I store database information in the same place - sharing a common database, or, do I store it in a separate database and communication / synchronize between the two systems.  If you stored everything in the same database - you simplified the table structure, and didn't need to worry about data replication, but, systems were tied together. If you separated the databases, then of course you had duplicate data between the databases, and you had to replicate.  This replication was complicated and problematic.

Ok, now lets imagine that the systems come from different vendors. Of course each vendor wants to sell their own system independently - a separate database is mandatory.  They hold common information.... no problem, we write interfaces. 

Complicated software is designed to examine information that is common between systems, and transfer this by batch transfer.  So, for example, we have a list of Sites in System A - we also have a list of Sites in System B.  We have a list of site personnel in system A, we also have a list of site personnel in system B - no problem I hear you say. Lets imagine that System A doesn't fully audit trail the information that has changed on the Site's tables.  How would System B know what to take....  we need to transfer all the sites, and compare the site information with the previous site information... getting tricky...and this is just a simple list of sites.

Now, lets imagine a more complicated situation, common in an eClinical system.  A Protocol amendment occurs, a new arm has been added to a study whereby subjects meeting particular criteria are branched into two separate dosing schemes.

Transferring or Synchronizing this sort of information between 2 systems would be possible, but very very difficult.  System A may not have a good place to the put the information from System B.  The question is though - do both systems really need the same data?  If System B wants to know something, why doesn't it just ask System A at the time it needs the answer, instead of storing all the same data itself?

This is where Web Services can come in.

Lets imagine an IVR system wanted to check with an EDC system if a subject was current in a study (current meaning not dropped out, early terminated or a screen failure).  A Web Service could be offered by the EDC system to respond with a 'True' or 'False'  to a call 'IS_SUBJECT_CURRENT' ?  Of course hand-shaking would need to occur before it hand for security and so on, but following this, the IVR system would simply need to make the call, provide a unique Subject identifier, and the EDC system web service would respond with either 'True' or 'False.  With Web Services, this can potentially occur in less than a second.

Lets take this one step further.  The EDC system would like to record a subject randomization.  The site personnel enter all the key information into the EDC system.  The EDC system then makes a Web Service call to the IVR system - passing all of the necessary details.  The IVR takes these details, checks them, and if valid, returns the appropriate subject randomization no..  The EDC system presents the Randomization No. for the subject on the eCRF for the site personnel to use.  This all happens realtime, and via web service calls in systems located in completely different locations.

Web Services -  Metadata independence

Web Services are significant for a number of reasons.  Yes, they allow systems to communicate in a near real-time basis over the internet - that's quite cool in itself.  What's more significant though in terms of eClinical systems is that Systems A and B don't really need to understand how the other systems do what they do.

If System A had to read the database of System B, it would need to understand how System B actually used the data in the database.  The same applies to an interface.  If System A received data from System B, it needs to process that data with an understanding of how System B works before it could use it, or potentially update it.

Web Services - beyond CDISC?

CDISC ODM allows you to transfer data, and to some extent metadata from one system to another.   To ensure it works for all, the support is to some extent, the 'lowest common denominator' of metadata.  It is only really able to describe data that is common and understandable to every other system - (barring extensions - see eClinicalOpinion on these).

Imagine if we could create a common set of Web Service calls.  The common calls would take certain parameters, and, return a potential set of responses.  The Messaging might be based on CDISC ODM, but the actions would be new and common. 

  • Add_Subject(Study, Site, SubjectId) returns ScreeningNo
  • Add_DataValue(Study, Site, Subject, Visit....) returns Success, QueryResponse
  • Read_DataValue(Study, Site, Subject, Visit....) returns DataValue,QueryResponse, DataStatus

With this sort of mechanism, the degree of processing of data and metadata between systems is limited.  The 'owning' system does all the work. The data and metadata that the systems need, stay with the original system. 

One remaining challenge exists - the common indexing of information - if a data value is targeted towards a particular site, subject, Visit, Page and Line, then they all must be known and specified.  That said, a bit of business logic (protocol knowledge) can be applied.  For example, if a DBP is captured for a subject, and the target study only has one reference to DBP for a subject in the whole CRF, should I really need to specify the Visit, Page and Instance? Sufficient uniqueness rules could apply.

If CDISC were to create a standard set of InBound and OutBound Web Service calls, you would see a great simplification in how normally disconnected systems inter-operate.  Not only could we send data from System A to System B, we could appreciate what happens when it gets there - 'Can I login', 'Did that Verbatim Code?' 'Can I have lab data for subject x'... etc etc.

Will Web Services technologies change the eClinical landscape?  No.  But, technology advances such as these all help to make the whole eClinical process somewhat less complicated.

Monday, September 22, 2008

Green EDC?

Besides working in Clinical Trial technologies, I also have an interest in tackling issues around global warming.  Electronic Data Capture in clinical trials has the advantage in that it inherently lends itself towards tackling many of the issues that contribute towards global warming.   I am interested in both the benefits that can be achieved during the implementation cycle, as well as the value during the execution phase.

Green EDC does make a lot of sense from many perspectives.  In almost all instances, working with an electronic medium is considerably faster than working with paper. This leads to a reduction in cycle times, and lower costs. You can basically 'do more with less'.

Using technology today


Probably the most obvious one.

Large, expensive Investigator Meetings were common until only a few years ago.  Changes in rules and attitudes have ensured they have become more functional rather than an investigator incentive. As a result, unnecessary travel has been reduced.  However, are investigator meetings actually required?

From the EDC perspective, training is often provided during the course of the meeting.  This can take up as much as half the time spent.

eLearning solutions, available from many of the EDC vendors, should alleviate the need for Investigator meeting training.  At most, you might be looking at a short demonstration. If eLearning is implemented well, it has many advantages over traditional instructor led training.  It should be integrated into the product, it should be integrated into the workflow, and, it should adaptable to meet the changing training requirements from study to study.

Integration is important. The early eLearning solutions were often implemented as separate tools.  You would be provided a separate login to access the eLearning - you almost had to learn to operate the eLearning tool!   With fully integrated eLearning inside EDC, the EDC system login directs trainee through appropriate eLearning topics based on the role they have been assigned to, before they can participate in the study. Participation and test results are all logged ensuring an electronic record is maintained for process compliance.  Another advantage of eLearning is the ability to train new staff prior to Monitoring visits.  In the past, a new study nurse for example had to wait until potentially the Monitors next site visit.  With eLearning, they simply await a login, perform the eLearning and they are ready to participate.

Less Monitoring Visits

Besides the removal of the need for monitors to attend sites regularly for staff training, the overall number of Monitoring visits should be reduced with effective use of EDC systems themselves.  Monitors are increasingly taking the role of the Data Manager in carrying out the duties of data review over and above the cleaning functions of the study build itself.  Source Data Verification remains necessary, but the actual Q & A regarding the data itself can be carried out remotely.  Less Monitoring visits means typically less air-travel.

Less Paper

This seems obvious - we are talking about replacing paper with an electronic medium - however, we also have paper flows in other areas of the process.   The development of the EDC product itself together with the implementation phase of the system for a study often involves the preparation of many binders of paper materials designed to support a vendor (or regulatory) audit.  Despite the electronic nature of the solutions that EDC companies provide, very few companies effectively push a paper development and implementation process.  The use of electronic document management systems are beginning to change this, but, it does require the support of Quality Assurance and Regulatory groups who tend to be more comfortable with a large pile of paper binders and wet ink signatures than fully electronic systems.

Widely used standards for document signatures are currently a barrier.  The SAFE BioPharma Association have had some success in offering up a electronic document signing solution that could be used across the industry, but it still has some challenges related to Hardware and Technology dependency. We may see progress following SAFE's partnership with CDISC.

Paper itself is not that 'non-green', but the delivery of paper can be.  Organizations involved in clinical trials today are often global.  Fedex'ing a CRF, a specification or a submission in paper format is simply not necessary with technology available today.

Messaging and Video Conferencing

Is it not odd that a technology such as video conference has such limited use in business today, and yet teenagers all round the global use it regularly to chat with their friends over the internet? 

Some EDC systems make use of tools to provide interactive support, but they are often poor.

If I am using an Internet application - such as banking or the like, and I have a problem, I would like to chat - by keyboard or over the phone - immediately.  Messenger services either inside, or outside of the EDC application are available today.  I have yet to see a good implementation of interactive support within the tool.  [I am sure EDC vendors that already have such services built in will correct me here!]

By offering interactive support and communication tools within an EDC product, site personnel can achieve equivalent, or potentially better support than can be achieved through infrequent monitoring visits.


Tomorrows Technology?

So, what can we still do to improve our green credentials when carrying out clinical trials?


Applying standards, can improve overall efficiencies.

Since 2004, SDTM, the Submission Data Tabulation Model from CDISC is part of a move to a standard electronic medium for data submissions.  The electronic submission of course is not new, but the standardization of the format is new making it somewhat easier for regulatory bodies to process data received regardless of the source.  Standard outputs require standard inputs, so with the recent introduction of CDASH - Clinical Data Acquisition Standards Harmonization- a standardization of the input structures in clinical trials - the overall SDTM production process should become less onerous. 

Downstream from EDC, we have the eCTD - electronic Common Technical Document an interface for the pharmaceutical industry to agency transfer of regulatory information.  From 1/1/2008, this is a required format (barring a waiver) by the FDA and has/will result in a great reduction in the amount of paper delivered to regulatory authorities.

Trial Execution Efficiency

Both Adaptive Clinical trials as well as non adaptive studies where a Bayesian continual reassessment method (CRM) is taken, can help ensure that unnecessary trial execution work is avoided.  In the past, trials tended to be more serial in nature - complete study A before moving onto Study B.  With the ability to adjust the design, and, more commonly, the ability to examine data subsets early in the trial, either an early termination or a change of focus can be made saving time, money and of course our impact on the environment.

Source Data Verification

Going back to my eBanking analogy, if I want to record a payment transaction into my eBanking solution, I do not write it down on paper first, and then transcribe what I write down into the eBanking transaction form. There is no value in that.

In clinical trials, if an investigator is involved in a clinical trial where the subject responds to questions during data entry, it would make sense to in certain circumstances to enter these directly into the EDC system, and not on paper first  - as source date - this would avoid potential transcription errors. 

But no....The interpretation of guidelines regarding Source Data typically means that web based systems cannot be used to retain the 'Source Data'.  I personally have an issue with this interpretation, but I will not elaborate here.  It is indicated that the 'Source Data' must remain at the site. Web based systems do not hold data onsite, but instead hold the data on a central server.  [Dave Iberson-Hirst from CDISC provided a good description of the challenges of Source data in his article in Applied Clinical Trials focusing on electronic Patient Diaries - for the sake of brevity, I have summarized the points of argument.]

Workarounds have involved the use of an offline, or hybrid system where the data is stored on a local device at the site.  More commonly though, data is recorded on paper first, and then transcribed into the electronic CRF.  Admittedly, a lot of source data will come from Medical records, and other paper records, however, it seems rather regressive that due to the wording of a regulation, more data is captured on paper first for web based systems that the old fashion Remote Data Entry tools.  Hopefully improved guidance in the near future will avoid this.


I am a great believer that being 'Green' means being efficient in business. Reducing the inefficiencies in how we go about our work not only offers savings in time and money, it can also have a positive impact on our environment. Here's to an electronic clinical world!

Thursday, September 18, 2008

Top 10 mistakes made when implementing EDC

(last update from admin @ eclinicalopinion)

Ok, I am calling for a challenge here.  I am making an attempt at identifying 10 top Top 10 winnermistakes that I believe are made when companies attempt to implement EDC.  No science to this. Just a bit of fun.

I will make edits if anyone posts comments that I believe out do my own;


1. Pick the nastiest, most complex study to implement as a first study

Sponsors may be trying to check the EDC system and vendor as a confirmation of claims of functionality, services and support.  It may also be an internal organizations 'sell' when bringing in a new system. In reality, the risk factors are at their highest, and the chances of failure greater than at any subsequent time in an Enterprise EDC system rollout.  Instead of learning and improving with optimized processes and a well designed workflow model, a 'get it out the door quickly' approach is forced and pain is suffered from all parties!

2. Expecting the return on EDC to be immediate - admin @ eclinicalopinion

Many clients are very experienced with paper and have wrung the very last drop of efficiency out of their process. They start with EDC believing that they are entering a new golden era only to be disappointed with the gains (or losses!) on their first study.
As with any new process or technology, it takes time to refine. The potential gains are real but it will take a few trials before a company hits its stride with EDC.

3. Over emphasis on faster closeout- admin @ eclinicalopinion

Companies new to EDC getting excited about the faster closeout of EDC trials but ignoring the issue of longer start-up times with EDC. With paper you could print the CRF's and send them out before you have finalized (or even built) the database that will finally store the data.

4. Use all the functionality that was demonstrated

A common problem. When a sales person demo's the product, it looks cool. Almost every feature looks good, and could add value.... Well, in reality, not always.  Many EDC systems developed today offer features as a 'tick in the box', but when the feature is used and combined with other features, sometimes the value falls short.  For example, most systems offer some form of data flagging... Reviewed, SDV'd, Frozen, Locked etc etc. Do not use all flags on all fields.  That will be slower than paper.

5. Resource the same way

If you have the same resourcing for Data Management and Monitoring AND you are also resourcing separately for building and testing EDC studies - then you have done something wrong.

With a good EDC product, the rules that would typically be applied manually are applied automatically. The delta should be picked up by a smaller number of 'eyes'.  Many CRO's have played the 'better safe than sorry' card to charge for the same Monitoring and Data Management as paper as well as EDC license and deployment costs.  This demonstrates an inexperienced CRO.

6. Model the eCRF according to the paper CRF Layout

Trying to make an electronic CRF identical to an original paper CRF will result in tears.  The users will be frustrated with the workflow.  The 'e' nature of the medium will not be utilized and the study will be less effective.

Instead, consider appropriate workflow and dynamic eCRF's.  I will stress 'appropriate'. Overdoing the bells and whistles can cause frustration, but, no bells and whistles and many of the advantages of EDC are lost.

7. eCRF Design by committee

The surest way to blown budgets and timelines is to attempt to develop an eCRF based on a committee of individuals.  The sponsor should delegate a chosen few (ideally 1) to work with the EDC CRF Designer. The study should be built to a greater %, then following this period, a wider review should be carried out.

8. Wait until the end of the study to look at the Data

It is surprising how often this is still the case. EDC means cleaner data faster, but often sponsors, and their Statistical departments are geared towards working with final data-sets. Good EDC systems can deliver clean data on a continuous basis.  Whether the data is able to achieve a statistically significant sample size is another question, but, information is often available for companies that are willing to leverage it.

9. Fail to use the built in communication tools provided

Many EDC systems offer the means for different parties involved in the execution to communicate.  These might be in the form of Post it Notes, Query Messages or internal email. Often these facilities are either not used, or not used effectively.  This means that the true status of study is a combination of information in the EDC tool, tasks in Outlook, actions in emails, scribbled notes on a Monitors pad etc etc.

10. Do lots of programming for a study

This covers many areas. It could be programming to handle complicated validation rules, or it could be programming to adapt data-sets to meet requirements. If you're EDC system requires lots of programming in order to define an EDC study, then I would suspect you have the wrong EDC system. Good EDC systems today are configured based on metadata stored in tables. Old systems relied on heavy customization of the core code for each deployment, or, relied on some form of programming in order to complete the study build. If you write code, then you need to test it. The testing is similar to software validation.  This takes time and money. 

Most EDC tools can be extended through 'programming'. If you need to do this, try to do the work outside of the critical path of a study.  Develop an interface, test and dry run it, and then, utilize it in a live study. In this way, you will have time to do it right with proper documentation, support and processes.

and relegated below the top 10...

11. Start by developing Library Standards from Day 1

This may sound like an odd one, but, let me explain.  Implementing EDC effectively with an EDC system, even with a highly experienced vendor, takes time.  All parties are learning, and, modern EDC systems take a while to adapt. Workflow, system settings and integrations all need to come up to speed and be optimized before standards can really be applied and add value. Start too early, and the library standards are just throw aways once the teams come up to speed. It is best to leverage the knowledge and skills of the CRO or Vendor first.

12. Develop your Data Extracts after First Patient In

Sometimes tempting due to tight timelines, but, if you have an EDC tool that requires either programming or mapping of data to reach the target format, then the less consideration you give to the outputs when you design the study, the harder it can be to meet the these requirements after FPI. This leads to higher costs, and, the potential for post deployment changes if you discover something is missing.


Many thanks to admin @ eclinicalopinion for the additional 2 mistakes made, now coming in at numbers 2 & 3!  More comments welcome...

Friday, September 12, 2008

How do Monitors work on site with Online only systems?


I have a question that I would like to ask.

When Monitors go out onsite to carry out Monitoring activities, and, the monitoring activity involves working with the data captured into an EDC system - how do they do it if the EDC system is online only?

  • They could use the site computer - but, often this is a computer shared for other purposes. With increasing security at site locations, often it is not appropriate for an external person to login to a site system.
  • They could use their own laptop with the site network link?  Using the site network infrastructure is often a big no-no. The site will often not provide a wireless network key, again due to security restrictions
  • They could use their own laptop, but with a 3G or similar wireless technology?  Well - not in many hospital establishments.  Mobile communication, including 3G is not permitted.

So, without resorting to an offline solution, how do they work?


One of the functional areas that I have been surprised at the lack of good support for is CTMS features built out of EDC products.

CTMS systems often perform a variety of tasks - Portfolio Management, Site Management, Trial Planning, Monitor Reporting, Study Progress Tracking, the list goes on for a while...

One of the difficulties with using separate EDC and CTMS systems was often the need to either synchronize or interchange the metadata relevant to both systems.  Both systems need to be aware of the visit structure, and, potential workflow associated with more complicated visit structures.  For example, it might be the case that a study has multiple arms, and, one or more trigger points that determine the branching.   In order to predict the CRF's to be completed from a trial planning perspective the CTMS either needs to recreate the structure - including an appreciation of the trigger points, or, it needs to import the structure from the CTMS.  This is all a bit messy.

So, why do EDC vendors not do a better job of leveraging the data and metadata in an EDC database in order to drive at least a % of the needs of a CTMS?

I am open to better ideas here, but to get this started :-

EDC systems try to generalize the definition of studies. Often things like Visit Dates, Visit Numbers, Branching points and subject statuses etc. are all purely data based - often simply fields on a CRF.  They have no special meaning that differentiates them from other CRF fields.  Vendors could tackle this by offering a form of user definable flags.  On the metadata, a spare attribute would be provided that in turn points to an internal metadata codelist. The codelist could be populated by trigger values such as 'Visit Date' or 'Visit Number' etc.  When interfacing, the CTMS system just need to be aware of the special nature of flags, and use them to pick up the one or more fields that are associated with the flags. So, even though 10 different fields are used to hold Visit Dates, provided the 'Visit Date' flag is attached, the CTMS knows the find them. Being data based, no special programming required from study to study.

A second issue is the lack of CTMS information defined on an eCRF. EDC systems are generally only geared to capture data through the eCRF medium. When a sponsor approaches either a CRO or Vendor, often, the contents of a CRF page is gospel. The thought of bolting on the capturing of additional fields is considered inappropriate - eek!  it could be mistaken for clinical data!  - for example - and I have seen it done - a standard could be developed that ensures that for every subject visit, an end of visit form is filled out.  The end of visit form would not capture clinical data. It would be used to capture (or derive) study progress information. Visit Date, End of Visit Subject Status and other such information could be captured a validated in one place. With this kind of information, it is somewhat less difficult to either report on status and progress using native EDC reporting tools, or, feed the standard information across to a CTMS.

The third reason I believe for EDC systems not effectively supporting CTMS needs is the whole lack of planning features - yes, when building a deploying a study it is possible to enter no. of expected subjects at a site level - all EDC systems include this, and, in my experience all EDC system rarely use it - but that doesn't give you the timeline planning - based on a specific recruitment interval, when will I reach my target subject recruitment.... when will the recruited subjects complete LPLV... etc.

So - could EDC systems better leverage the information they have to offer basic CTMS functionality - absolutely.  Should they do more - Yes.

The recent Clinpage article vendors offering multiple integrated solutions suggest that the existence of a fully integrated solution will prove to be a key decision influencer over and above the actual feature functionality of individual systems. There is something to be said for that, but (and this is a big but), the systems must be sufficiently well integrated, robust and flexible to actually work in real life.  I would venture that many of the purportedly fully integrated product suites are in fact separate products loosely coupled with single-sign-on and a similar UI.   I have known (and I will not mention any names) of products that were sold under the same branding, used the same UI, could be accessed through a portal using Single-Sign-on, but that did not share a single application table once utilized - data, and metadata went into entirely separate tables - seamless, yes - a chasm!

With a strong CTMS feature inside an EDC product - does a future exist for standalone CTMS tools - I personally think not.  Although more highly functional, their data entry requirements versus the EDC tools that leverage existing metadata and admin data offers a less convincing value proposition.   Then again, how many good EDC vendors know how to create a good CTMS? 

Friday, September 5, 2008

Tools for validating data in EDC


I have had the fortune to work with a number of different EDC products over the years.  In each case, they implemented features that allowed a study developer to create validate rules associated with the data captured.  Some were good and some were bad. In almost all cases, it was difficult to appreciate the tools shortcomings until a solid amount of work was carried out utilizing them.  They say the devil is in the detail - with EDC edit checking tools, this certainly proves to be the case.

I would like to discuss (or rather ramble on) about the history of validation rule tools in EDC - at least from the mid 90's.


1st Generation Tools - SQL Base

The early tools, primarily before EDC, used either SQL, or an pre-compiled version of SQL together with a scripting language as a syntax for edit checking.  This approach had the advantage that the data access was standardized to a degree with SQL.  The disadvantage was that the SQL worked directly against an underlying database. The developer had to understand and operate against the underlying database in order to correctly write edit checks.

PL/SQL was a common language to leverage. Tools such as Clintrial and DLB Recorder (now eResearch Technologies - eXpert Data Management) relied heavily on the logic and data constructs provided.

The downside to (PL/)SQL based edit check syntaxes were that they often assumed that the underlying database was a relational database that matched the structure of screens that the logic was often associated with.  The product had to therefore be a relational database building tool - good on the surface, but not good when it came to meeting the needs and flexibility of EDC.

2nd Generation Tools - Expression Builders

In the early to mid 1990's, a new set of tools arrived that generally attempted to take away much of the earlier complexity of 1st Generation tools, and, that took advantage of the fact that the underlying data structures were not relational.

The first set of 2nd generation tools tackled the issue of logical data checking through the provision of expression building tools. Initially, these were restrictive with the only means to build the expressions being through a thick client front-end with no free-format expression entry possible.  This made the tool development somewhat easier, and, the corresponding expression parsing simple.  The downside to the approach though was the  it was not possible to define all the required rules in the provided expression builders.

3rd Generation Tools - Hybrid Logic Builders

Expression builders alone were seen as being too restrictive in the development of a complete set of edit checks for a study.  Also, Power users felt constrained. The fallback position for implementations were that edit checking had to be performed at the back-end with SAS or SQL queries. 

To work around these limitations, a 3rd generation of tools were produced that provided a combination of expression building as well as direct syntax entry.  The direct syntax entry was either provided by allowing the developer to edit and extend the code that was derived through the expression builder, or, it was provided as an alternative to expression built code.

The added flexibility of the direct syntax provided a mechanism allowing studies to tackle close to 100% of all edit checking associated with a protocol.  Back end data checking was limited to rules that could not be determined prior to study rollout.

One limitation of the syntax approach is the issue of testing.  With a point and click configuration approach, the scope of testing can be controlled, and, to a degree even automated.  With a open language based syntax, the potential combinations that need to be tested for are higher.  In fact, the testing that may be applied here is equivalent to the sort of testing that is required when carrying out full system validation.  I will be discussing the methods of, and challenges in, testing studies in a later post.

Key Factors in good Validation Rule Syntax

Absolute Data References

This topic primarily applies to script based systems rather than expression builders.  Expression Builders typically present the metadata that exist in drop down lists (Visits, Forms or Field Names). Referencing data fields in a free format expression can be somewhat more challenging.  Take the following example;


This is an example of an absolute reference to a data field. The 1st Inclusion question on the Inclusion/Exclusion Form in Visit 1. But why the brackets?  Well, the metadata has spaces and a / in the names. To ensure the interpreter doesn't think the space or / separates this operand from an operator the brackets bound them.   A way around this of course is to use names without spaces. CDISC offers this with Object Identifiers or OID's. They don't have spaces, so, the issue does not arise.  However, the OIDS can be less than friendly when it comes to making an expression human readable. Anyway, OID's or equivalents are standard practice except where the number of elements in an expression are always fixed.  Even with OID's though, the length of these logical expressions can be horrific.

Programming languages have simplified the issue of dealing with long qualifications by providing variables (or aliases). You define the alias at the start, and then refer to the simple alias name through the expression.  So for the above - you could say;


Then, to compare values, it might be

if INC1= 'No' then Raise Query "XXXXXX"

Wildcard Data References

It is common for the same eCRF pages to be dropped into multiple visits.  In this circumstance, the visit references that may exist in attached edit checks need to change.  The way this is usually achieved is through wildcarding.

If the above Inclusion / Exclusion check appeared in say Visit 1 and Visit 2 (hypothetically) then the Visit reference would need to be wildcarded to ensure it does not refer to Visit 1 when the form is dropped into Visit 2. The tool would replace the wildcard with the appropriate reference based on where the form appears. You can infact have 3 types of reference - Absolute as above, Relative or Any.

'Any' or 'Current' style references are often presented with a '*' a '$' or some other special symbol. This designates that the element is replaced

Relative references are usually derived based on the original source of the edit check. So, if the edit check fired in Visit 2, and the relative reference stated -1 - or 'previous', then this might indicate the current visit -1.

Wildcarding causes some difficulties though when it comes to reuse.  Testing is only predictable when the logic is applied within a study.  If you take the form out of one study and drop it into another, it is possible that with a different visit structure, you may obtain different - potentially undesirable - results.


Data References do have an impact on a number of areas. Well designed data referencing ensures that the maximum amount of re-use can be achieved from study to study as well as within a study. Also, the readability of validation rules is important.  If the Protocol Designer or Data Manager cannot understand the rule that is presented from an EDC system, how can it be assured that the rule is correct?


Other Considerations - Actions

The boolean true/false results of an edit check expression is only one side of the expression.  The other side is the action that place either as the result of a true, or a false result.  Systems designs seem to fall evenly on one of two approaches.  Either the syntax allows one or more actions some of which are to create a Discrepancy (or Query). The second type is where the Discrepancy is the only, and therefore the assumed action.   Clinical Data Management systems often went with just the Logic --> Query approach as the need for actions outside of Discrepancies was limited.

With most modern EDC systems, the edit check language provides the means to carry out one or more actions as the result of a boolean. Additional actions that might be support are things like status changing, assigning values, or even activating or inactivating metadata elements such as forms and fields. Some systems separate the query creation from other actions. The reason behind this is normally to help support protocol updates.  If a tool mixes queries in with other activities, it can be very difficult to deal with the situation where a new protocol definition needs to be applied to existing data.  For queries, it is easy - re-run the logic and if a query is created that was not previously created - then add it.  For other actions - a bit more tricky - for example, if you had set up your system to send out emails if an Serious Adverse Event occurred, then, you wouldn't want the emails to be re-submitted when you applied a protocol update.


Other Considerations - Batch Processing

This is an interesting one. Anyone that has sold EDC systems will have been asked this questions.

Does your system support Batch execution of validation rules?

With very limited exceptions, the answer was always no.  I could argue that in some cases, due to pressure from sales, batch execution was added to the detriment of the EDC products. EDC systems are designed along the principle that bad data is corrected immediately by presenting errors as soon as possible to the data entry person.

The only argument for batch processing that I have seen applied for a positive reason is in the area of performance.  An EDC system that suffers poor performance may resort to batch execution to improve page response times. However, this is often unsatisfactory - CDM systems run across data typically using the efficiencies that single SQL Select statements can bring.  EDC systems often operate on a datapoint by datapoint basis with only limited cache optimisation possible. Batch running EDC edit checks can be tortuously slow. Presenting queries after the user has left the page is also sub-optimal.


The Future?


A gap exist right now in the development of standard, (i.e. CDISC) where they pertain to rules applied to data.   

Why do we need standards for Rules? I hear you say.  

Well, from an EDC study build perspective, the associated edit checks is often the largest single work effort in preparing the study. In fact, in comparison to preparing forms and folders, the edit check together with testing can often be 3-4 times more work. So, when attempting to leverage standards such as ODM, the relative savings that can be achieved related to automating the study build are limited.  

The second reason behind the need for standards around rules is the potential knowledge that might be associated with them.   Imagine you have access to a warehouse of clinical data.  In that warehouse you have 100's of instance of a particular set of data - lets say vital signs.  Can you use all the data? What if some of the data had restrictions that other data did not have?

Rules were originally applied to data in order to determine cleanliness within a study. These rules may also have determined whether the data reached the warehouse.  Inappropriate data may have been filtered out.   By taking away the rules in the warehouse, you take away a proportion of the context behind the data.  If you take the rules - that form part of the metadata - can you really utilize the data in an unbiased way?    Maybe Statisticians will say this doesn't happen, or, the impact is negligible... I am happy to receive comments.


As mentioned in a recent posting on eClinical Opinion there are many input requirements for validation logic.  If you are thinking proprietary, then you will want a syntax that is as close to the sort of language that is used in protocol definitions as possible, will at the same time as concise as is necessary to assure re-use and non ambiguity.  The point and click builders will not go away - they can be useful.  At the same time though, for power users, you need to have high end editor features.  I believe the strongest vendors will create editors that are clinical trial business object aware. When building syntax, they will know for instance that a field may be qualified by a sequence no. or form.  They will understand that given two dates, an Age can be derived.

Device independency may become significant once again. In the past, provided you ran on a browser - things were fine. However, who wants to enter patient diary data on a PDA Browser.  The iPhone is a perfect example. The iPhone Apps are not browser apps. They make use of the internet, but the leverage another UI. By taking the validation rules away from the front end, the actual device used to offer up the questions will not mater. The same rules apply regardless of the capture medium.