caBIG Annual Meeting – A developers perspective

For the past couple of days I’ve been attending the caBIG Annual Meeting (it’s the 5th such meeting and by all accounts the most well attended).

About caBIG

caBIG™ stands for the cancer Biomedical Informatics Grid™. caBIG™ is an information network enabling all constituencies in the cancer community – researchers, physicians, and patients – to share data and knowledge. The components of caBIG™ are widely applicable beyond cancer as well.

The mission of caBIG™ is to develop a truly collaborative information network that accelerates the discovery of new approaches for the detection, diagnosis, treatment, and prevention of cancer, ultimately improving patient outcomes.

In a nutshell, caBIG is an initiative of the National Cancer Institute built heavily upon open-source components that aims to motivate and facilitate the sharing of data. It strongly suggests a front-loaded UML workflow (using MDA) and incorporates certain aspects of ontologies and common data definitions to help guarantee consistent semantics (and syntax).

From a purely technical perspective, I’ve never been sold on the idea of MDA. I’ve had experience with both open-source and commercial modeling tools that have never fulfilled on the promise of true round-tripping (and if you don’t have round-tripping… well, you’re in for a world of hurt). Now in the caBIG case, there is a pipeline of transformations that you’re more or less required to run through…

  1. Create a UML representation of your object and domain models
  2. Annotate the model with caBIG specific annotations and stereotypes
  3. Run the annotated model through the Semantic Integration Workbench (another caBIG tool)
  4. Submit the final model (XMI) to caBIG for approval and insertion into the caDSR (cancer data standards repository)

Make a change in the future and you’re more or less required to run through steps #1-4 again.

Once you have a validated UML model, you can then run through the caCORE SDK and generate skeleton code for a 3-tiered application consisting of (at a high level) a Hibernate data model, an external API and some middleware code to glue the API and data model together. Round-tripping is essentially non-existent from what I’ve heard and seen.

You’re done! Congratulations on achieving Silver-level compliance.

wait a second.

It’s a little bit too process heavy for my liking. I would have liked to see the NIH/caBIG be first and foremost focused on data interoperability and less on tools, particularly those that dictate particular workflows (like UML -> annotation -> MDA -> Code Generation).

I’d much rather see an extensible API with pluggable end-points, a meta data registration service and a suite of validation test cases. Define suitable goals and keep it simple. Provide suitable incentives and vendors *will** support it*.

There’s more than one way to provide interoperability.

As a developer working on existing products that are considering support for caBIG, the requirement to fundamentally change my development process is a bit unnerving. Speaking generally, there’s no guarantee that everyone has UML models for their systems and even if they did, attempting to do full MDA transformations on them would be fairly ambitious.

That’s it for now. It’s been an interesting conference and I’ve learned a lot about the various initiatives and their progress. Off to visit customers in Cincinnati tomorrow!