Applicability of the ISII Inference Architecture to Pharmaceutical Research

The ISII inference architecture is not specific to finding ICD9 codes. On the contrary, it has applicablity in a number of domains including pharmaceutical research.

Specific Features

  • The ISII inference architecture is not tied to a specific domain (nor even to the broader domain of medicine); it is a general purpose text parsing and rule-based inference engine. The data extracted by the inference engine is arbitrary. It can be ICD9 codes, signs and symptoms, procedures, conditions, vital signs, laboratory values, trade names of pharmaceuticals, etc.
  • All components of the system are driven by runtime data. This data is suitable for modification by non-programmers such as domain experts, or it can be automatically created. For example, the ICD9 data was created largely by ICD9 coders and trained nurses. Rules can also be added and tested dynamically.
  • Extracted data can be complex concepts, such as typical medical codes, but it can also be lower-level concepts suitable for multi-axial classification. For example, information about body location can be extracted separately from information about a procedure. Or, information about vital signs can be extracted separately from information about prescribed medications. Of course, links between separate concepts are be maintained and concepts can be combined in later processing or reporting.
  • Parsing of medical documents can be either free form or template-driven. Information can be extracted from free text such as doctor dictation, but templates can allow for more structured data extraction from lab reports, etc. Selection of templates can be automated based on document provenance or content. Templates are specified (in part) in the widely-known "regex" regular-expression language, allowing them to be easily created or modified.
  • Selection of rule sets can also be automated based on document provenance or content. Multiple rule sets can be used concurrently, for example to allow simultaneous independent extraction of ICD codes and multi-axial medical data.
  • The system is capable of extracting and making decisions using numeric data. For example, numeric data can be extracted regarding vital signs, drug dosages, length of time for various conditions, etc. A host of typical numeric operations is available for use in inference and reporting (e.g. equality, comparison, range-checking).
  • The system can make use of data in addition to that in medical reports. For example, laboratory values from databases rather than reports. Data can also be injected into the system for control purposes.
  • Full provenance is maintained for all decisions. For example, the exact documents, sentences, words and other data involved in arriving at a conclusion is maintained and archived. This data can be used to generate detailed reports, and the links between such data can be used in multivariate analysis.
  • The system is fast enough to extract data in real time in most cases, or to handle large numbers of charts efficiently in batch mode. If, after initial processing, changes or additions are made to a rule set, the data can be reanalyzed efficiently.
  • All input and output to the system is performed using industry standard Xml. This Xml format allows for efficient "packed" storage in operational applications and for efficient network transport. The Xml data is, however, suitable for "unpacking" into detailed storage for data mining purposes.