Growing a Data Warehouse
Business
problem
We need data to set strategy, manage efficiency and improve quality.
We have numerous clinical and business systems, each with a database.
Q: How can we extract data from all the systems into one place where
analysts can use it to create reports for the business? A: We need
to build a data warehouse.
Getting
started
In the beginning was Meditech. It was the primary system for everything
important about patients, staff and money. Meditech client/server
offers two ways to extract data:
-
a proprietary report writer (NPR) to pull data directly from the
native Meditech databases; and,
- an
external "data repository" into which normalized data
are extracted through automated feeds from the native Meditech
databases into a SQL Server database.
The
Meditech data repository became the "staging server" for
Meditech data, and the primary source for Meditech data in the new
data warehouse.
The
first data mart included everything clinical from Meditech: patients,
encounters, transactions, labs, diagnoses, procedures, scheduled
appointments, and many dictionaries. Data were refreshed daily from
the staging server, which was itself refreshed overnight.
Daily
isn't enough
For routine reporting, a daily data refresh is generally adequate.
But we were building applications, such as a Clinical
Operations Dashboard, which required near-real-time data. We
switched several Meditech data repository feeds to "continuous"
mode, which improved the effective refresh rate to every few hours,
but it still wasn't enough.
Bill,
our Meditech guru turned SQL Server DBA, created NPR reports to
grab up-to-the-minute data about inpatient, surgical day care, and
emergency department admissions, discharges and transfers (ADT).
An automated process runs these reports continuously, round-the-clock.
This process was honed to make each part as efficient as possible,
resulting in a total cycle time of 10-15 minutes. Now we knew the
full details of any Meditech ADT
event within 10-15 minutes!
Beyond
Meditech
With the advent of the Epic outpatient system, a major new clinical
data source was added. The Epic data repository (an Oracle database)
is only updated overnight. This meant that we could feed the data
warehouse from the Epic data repository, but we needed another source
for near-real-time data about encounters and appointment schedules.
Fortunately,
we were able to tap into the interface engine that connects Meditech
with Epic and grab encounter and appointment changes as they occur.
We now had continuously current data for both inpatient and outpatient
encounters.
Other
data marts contain data from various systems, including breast cancer
screening, radiology, claims, complaint tracking, surgery, NEHEN,
telephone PBXs, pharmacy, patient satisfaction surveys, UHC, and
various departmental databases. All datasets contain at least one
common ID for patients or staff, allowing analysts to connect datasets
as needed for analysis and reporting.
New
Challenges
Now approaching a half-terabyte in size, the data warehouse is a
victim of its success. With ever more data, querying, and analysts,
the server is straining under the load. We are currently working
to beef up the server and improve performance.
The
Meditech data repository has been a less reliable resource since
the most recent upgrade to v5.3. We are currently having to rely
more and more on NPR reports for daily refreshing of the data warehouse
from Meditech. This is not a welcome development, because automated
processes that depend on NPR reports are prone to failure, and we
were thrilled to move away from depending on them several years
ago.
OLAP
cubes will be the next major challenge. We haven't been able to
find the time yet, but we're planning to work with SQL Server analysis
services and/or a business intelligence application to build better
performance dashboards and scorecards and do more sophisticated
analysis.
Lessons
learned
- Multiple
approaches may be required to extract data from a proprietary
system database.
- It
may be a daily challenge, but it is possible to have near-real-time
data for critical business processes without expensive, proprietary
interfaces.
- Having
near-real-time data opens all sorts of possibilities for automated
processes that can improve quality and patient safety.
Posted 24 March 2008
|