| |
Thumbnail: click to enlarge

M-HDML Architecture
In our increasingly data centric world, data
mining technologies are being enlisted in a wide variety of
uses: from retail sales, to video gaming, to, most recently,
combating terrorism. The staggering amount of data has improved
the stock of intelligent data mining systems and knowledge
discovery techniques that help users extract meaningful information
from enormous data sets. In the industrial arena, more and
more organizations are investing in data mining techniques
(software and hardware) as a means for gaining profitable
business insights from their huge central transactional databases.
The Gartner group estimates that the use of data mining applications
will increase from less than 5% currently to 80% over the
next decade.*
KBSI's Military Health Data Mining Algorithms
Library (M-HDML) initiative, an SBIR award from the Department
of Defense (DoD), applied data mining
technologies and techniques in a relatively new arena: medical
diagnostics. The initiative was part of the DoD's Knowledge
Management movement aimed at improving the reliability and
utility of the DoD's knowledge assets.
Phase II Development
Like other industries, health care both collects
and utilizes diverse types of data--not only clinical data crucial
to diagnosing and monitoring the health of patients, but also data
necessary to the administration of hospitals, medical resources,
and health care in general.
KBSI's M-HDML initiative took a unique approach
in developing data mining and analysis techniques, templates, and
software tools for the DoD's vast medical health system (MHS). The
goal was to both improve individual patient care and, by translating
clinical data into a standardized form that lends itself to data
mining, aid the DoD in enhancing medical readiness--an important
component of military readiness.
The primary challenge that the KBSI team addressed centered
on questions of data taxonomy with regard to the data mining algorithms.
Dr. Satheesh Ramachandran, who headed the initiative for KBSI, explains:
"Different data mining algorithms support the discovery of
different taxonomical knowledge types. These types, consequently,
require different processes to determine their relevance and application.
Simply put, the type of knowledge that is being sought dictates
the choice of the specific data-mining tasks and the algorithms
that support those tasks."
The different types of clinical data that M-HDML works with include questionnaires, diagnostic reports, pathology
reports, lab-tests, evaluations and progress reports, medical studies
and experiments, etc. These different types of clinical data are
collected at various locations, over various, often protracted,
periods of time. More significantly, however, the data can exist
in various systems and formats ranging from rudimentary paper forms
to electronic files. The 1996 Clinger-Cohen Act mandated improvements
to the efficiency of the DoD medical health system, and a subsequent
review of the system noted the need for a thorough business process
reengineering (BPR) of the functions, processes, and information
systems of the MHS infrastructure. Key tasks performed by the information
management directorate of the MHS include the establishment of flexible,
open-systems-compliant configurations for clinical data with the
goal of making accurate medical information available wherever and
whenever needed. Redesigning the information systems for more pliable
data hosting is an important first step. The next logical step is
the introduction of knowledge discovery tools and techniques to
explore these integrated systems and deliver the useful knowledge
they contain.
In keeping with this paradigm, the M-HDML initiative first focused
on developing a generic representation scheme that will encapsulate
the various types of clinical data. As might be expected, existing
clinical data and databases are not based upon well-thought out
representations for signs, symptoms, treatments, tests, etc. and
little work had been done in terms of developing a general language
or structure for representing clinical data that is conducive to
quick analysis. The choice of a particular data mining algorithm
depends largely on the nature of the data (how data concerning,
for example, a condition's severity or frequency is encoded) and
the type of paradigm in which the algorithm must function. M-HDML's
standardized representation framework allows multiple data analysis
paradigms to work in tandem, exchanging information across paradigms
and allowing them to compliment each other. For M-HDML, KBSI
based these representations on the notion of ontologies: structured
language for representing knowledge.
A related M-HDML task was to develop a catalogue
of the vast array of data mining algorithms that could be used and
map these algorithms, according to their success, to the knowledge
requirements of the military health system. These algorithm maps
serve as templates for common knowledge discovery
tasks. And the templates, by codifying common techniques,
also lend themselves to user adaptation or customization for related
data inquiries.
The algorithm maps gave the M-HDML team a clear
understanding of the nature of these algorithms and a better sense
of the shape of the data representation structure. KBSI created
similar libraries of data mining algorithms through earlier funded
initiatives from the DoD and NASA and also created and validated such a library for a host of applications
in DoD logistics. The M-HDML library of algorithms and established
framework allowed the M-HDML team to begin detailing typical data-mining
development processes--rules for using domain knowledge structures.
The final step and end goal of M-HDML was to utilize
these rules in developing an overarching framework for the integrated
use of data mining algorithms and strategies--the Military Health
Data Mining Library--and encapsulate the framework in a supporting
software environment. This software tool-kit is Web-enabled,
allowing users to collaborate, regardless of their geographic location,
and exchange data, templates, analyses, and, most importantly, knowledge.
The M-HDML initiative provides
the DoD with much needed knowledge that is useful from a purely
medical standpoint. However, the M-HDML concept also offers a novel
approach to large-scale data mining and knowledge discovery that
can have an even wider application. The cataloguing of algorithms
and templates--the creation of a data mining library--provides a
sophisticated framework for other knowledge discovery projects that
involve large, disparate stores of data and large, diverse sets
of users. KBSI's novel approach not only benefits the DoD in
their Knowledge Management push, but can also help the growing
number of commercial businesses that are expanding the possibilities
and potential of data mining.
* Brachman, R. J., Anand, T. 1996. "The Process of Knowledge
Discovery in Databases." In Advances in Knowledge Discovery
and Data Mining, pp. 37- 57, AAAA/MIT.
|
|