Automating Healthcare
Solving business problems with savvy automation

Growing a Data Warehouse

Business problem
We need data to set strategy, manage efficiency and improve quality. We have numerous clinical and business systems, each with a database. Q: How can we extract data from all the systems into one place where analysts can use it to create reports for the business? A: We need to build a data warehouse.

Getting started
In the beginning was Meditech. It was the primary system for everything important about patients, staff and money. Meditech client/server offers two ways to extract data:

  • a proprietary report writer (NPR) to pull data directly from the native Meditech databases; and,
  • an external "data repository" into which normalized data are extracted through automated feeds from the native Meditech databases into a SQL Server database.

The Meditech data repository became the "staging server" for Meditech data, and the primary source for Meditech data in the new data warehouse.

The first data mart included everything clinical from Meditech: patients, encounters, transactions, labs, diagnoses, procedures, scheduled appointments, and many dictionaries. Data were refreshed daily from the staging server, which was itself refreshed overnight.

Daily isn't enough
For routine reporting, a daily data refresh is generally adequate. But we were building applications, such as a Clinical Operations Dashboard, which required near-real-time data. We switched several Meditech data repository feeds to "continuous" mode, which improved the effective refresh rate to every few hours, but it still wasn't enough.

Bill, our Meditech guru turned SQL Server DBA, created NPR reports to grab up-to-the-minute data about inpatient, surgical day care, and emergency department admissions, discharges and transfers (ADT). An automated process runs these reports continuously, round-the-clock. This process was honed to make each part as efficient as possible, resulting in a total cycle time of 10-15 minutes. Now we knew the full details of any Meditech ADT event within 10-15 minutes!

Beyond Meditech
With the advent of the Epic outpatient system, a major new clinical data source was added. The Epic data repository (an Oracle database) is only updated overnight. This meant that we could feed the data warehouse from the Epic data repository, but we needed another source for near-real-time data about encounters and appointment schedules.

Fortunately, we were able to tap into the interface engine that connects Meditech with Epic and grab encounter and appointment changes as they occur. We now had continuously current data for both inpatient and outpatient encounters.

Other data marts contain data from various systems, including breast cancer screening, radiology, claims, complaint tracking, surgery, NEHEN, telephone PBXs, pharmacy, patient satisfaction surveys, UHC, and various departmental databases. All datasets contain at least one common ID for patients or staff, allowing analysts to connect datasets as needed for analysis and reporting.

New Challenges
Now approaching a half-terabyte in size, the data warehouse is a victim of its success. With ever more data, querying, and analysts, the server is straining under the load. We are currently working to beef up the server and improve performance.

The Meditech data repository has been a less reliable resource since the most recent upgrade to v5.3. We are currently having to rely more and more on NPR reports for daily refreshing of the data warehouse from Meditech. This is not a welcome development, because automated processes that depend on NPR reports are prone to failure, and we were thrilled to move away from depending on them several years ago.

OLAP cubes will be the next major challenge. We haven't been able to find the time yet, but we're planning to work with SQL Server analysis services and/or a business intelligence application to build better performance dashboards and scorecards and do more sophisticated analysis.

Lessons learned

  • Multiple approaches may be required to extract data from a proprietary system database.
  • It may be a daily challenge, but it is possible to have near-real-time data for critical business processes without expensive, proprietary interfaces.
  • Having near-real-time data opens all sorts of possibilities for automated processes that can improve quality and patient safety.

Posted 24 March 2008


Custom Applications
ADT Event Alerts
Clinical Operations

Integrated Clerkship

On-call Schedules
People Profiles
Chronic Disease

Security Badge Requests
Charge Capture
Mental Health Treatment
      Plan Tracking

Earned Time Calculator

Supervisory Tree
E-mail Distribution Lists
User Access Requests
HR Requests
Employee Health &

Interpreter Dispatching
Generic Patient Registry
Conference Room

Tuition Reimbursement
Equipment Rental
Code Cart Tracking
Nursing Audits

Show me the data
Growing a Data

Building a Data Portal
Reporting on Full Auto

Intranet Design
Driving With Databases
Speeding with Static

Transparent Security
      and Permissions

Redesigning the

Who works here?
Organizational buckets
System access: Who
      has what?

System access: Use
      it or lose it

Integrating Security

Integrating Provider

Creating A Supervisory

Data Quality Dashboard


RSS Feed