|
KBSI's Military Health Data Mining Algorithms Library
(MHDML) project, an SBIR award from the Department of Defense (DoD)
that is in Phase II, will apply data mining technologies and techniques
in a relatively new arena: medical diagnostics. The initiative is
part of the DoD's Knowledge Management movement aimed at improving
the reliability and utility of the Department's knowledge assets.
The MHDML Approach
Like other industries, health care both collects
and utilizes diverse types of data--not only clinical data crucial
to diagnosing and monitoring the health of patients, but also data
necessary to the administration of hospitals, medical resources,
and health care in general.
KBSI's MHDML project will take a unique approach
in developing data mining and analysis techniques, templates, and
software tools for the DoD's vast medical health system (MHS). The
goal is to both improve individual patient care and, by translating
clinical data into a standardized form that lends itself to data
mining, aid the DoD in enhancing medical readiness--an important
component of military readiness.
The primary challenge facing the KBSI team centers
on questions of data taxonomy with regard to the data mining algorithms.
Dr. Satheesh Ramachandran, who heads the project for KBSI, explains:
"Different data mining algorithms support the discovery of
different taxonomical knowledge types. These types, consequently,
require different processes to determine their relevance and application.
Simply put, the type of knowledge that is being sought dictates
the choice of the specific data-mining tasks and the algorithms
that support those tasks."
The different types of clinical data that MHDML will
work with include questionnaires, diagnostic reports, pathology
reports, lab-tests, evaluations and progress reports, medical studies
and experiments, etc. These different types of clinical data are
collected at various locations, over various, often protracted,
periods of time. More significantly, however, the data can exist
in various systems and formats ranging from rudimentary paper forms
to electronic files. The 1996 Clinger-Cohen Act mandated improvements
to the efficiency of the DoD medical health system, and a subsequent
review of the system noted the need for a thorough business process
reengineering (BPR) of the functions, processes, and information
systems of the MHS infrastructure. Key tasks performed by the information
management directorate of the MHS include the establishment of flexible,
open-systems-compliant configurations for clinical data with the
goal of making accurate medical information available wherever and
whenever needed. Redesigning the information systems for more pliable
data hosting is an important first step. The next logical step is
the introduction of knowledge discovery tools and techniques to
explore these integrated systems and deliver the useful knowledge
they contain.
In keeping with this paradigm, MHDML will first focus
on developing a generic representation scheme that will encapsulate
the various types of clinical data. As might be expected, existing
clinical data and databases are not based upon well-thought out
representations for signs, symptoms, treatments, tests, etc. and
little work has been done in terms of developing a general language
or structure for representing clinical data that is conducive to
quick analysis. The choice of a particular data mining algorithm
depends largely on the nature of the data (how data concerning,
for example, a condition's severity or frequency is encoded) and
the type of paradigm in which the algorithm must function. M-HDML's
standardized representation framework will allow multiple data analysis
paradigms to work in tandem, exchanging information across paradigms
and allowing them to compliment each other. For M-HDML, KBSI will
base these representations on the notion of ontologies--structured
language for representing knowledge.
A related MHDML task will be to develop a catalogue
of the vast array of data mining algorithms that could be used and
map these algorithms, according to their success, to the knowledge
requirements of the military health system. These algorithm maps
will ultimately serve as templates for common knowledge discovery
tasks. And, the templates, by codifying common techniques, will
also lend themselves to user adaptation or customization for related
data inquiries.
The algorithm maps will give the MHDML team a clear
understanding of the nature of these algorithms and a better sense
of the shape of the data representation structure. KBSI has created
similar libraries of data mining algorithms through earlier funded
projects from the DoD and NASA and we are currently in the process
of creating and validating such a library for a host of applications
in DoD logistics. The MHDML library of algorithms and established
framework will allow the MHDML team to begin detailing typical data-mining
development processes--rules for using domain knowledge structures.
The final step and end goal of MHDML is to utilize
these rules in developing an overarching framework for the integrated
use of data mining algorithms and strategies--the Military Health
Data Mining Library--and encapsulate the framework in a supporting
software environment. This software tool-kit will be Web-enabled,
allowing users to collaborate, regardless of their geographic location,
and exchange data, templates, analyses, and, most importantly, knowledge.
The MHDML project will go a long ways in providing
the DoD with much needed knowledge that is useful from a purely
medical standpoint. However, the M-HDML concept also offers a novel
approach to large-scale data mining and knowledge discovery that
can have an even wider application. The cataloguing of algorithms
and templates--the creation of a data mining library--provides a
sophisticated framework for other knowledge discovery projects that
involve large, disparate stores of data and large, diverse sets
of users. KBSI's novel approach will not only benefit the DoD in
their Knowledge Management push, but will also help the growing
number of commercial businesses that are expanding the possibilities
and potential of data mining.
*
Brachman, R. J., Anand, T. 1996. "The Process of Knowledge
Discovery in Databases." In Advances in Knowledge Discovery
and Data Mining, pp. 37- 57, AAAA/MIT.
|