Digital Medicine

: 2018  |  Volume : 4  |  Issue : 2  |  Page : 77--83

Online definition of comparable and searchable medical information

Wolfgang Orthuber 
 Department of Orthodontics, UKSH, Kiel University, Kiel, Germany

Correspondence Address:
Wolfgang Orthuber
Department of Orthodontics, UKSH, Kiel University, Arnold Heller Street 3, House 26, Kiel 24105


For decision support, a globally connected digital information system is desirable, which uses diagnostic findings and makes language independently statistical (anonymized) information from similar cases of all countries available. It can be realized efficiently in the following way: The definitions of all used diagnostics and measurement procedures are placed online. The defined data are called “Domain Vectors.” Doctors who use the online definitions get measurement results as Domain Vectors in comparable and searchable form. Anonymized selective statistics over patient groups with similar data can help to find the best treatment. Precondition for such distributed and simultaneously connected Domain Vectors are their global online definitions. Every Domain Vector only consists of a link to its definition (e.g., via URL or an abbreviated equivalent) plus numbers. This article explains details and concludes that introduction of the Domain Vectors with their online definitions would be an important step toward internationally connected medicine.

How to cite this article:
Orthuber W. Online definition of comparable and searchable medical information.Digit Med 2018;4:77-83

How to cite this URL:
Orthuber W. Online definition of comparable and searchable medical information. Digit Med [serial online] 2018 [cited 2021 Mar 1 ];4:77-83
Available from:

Full Text


The central focus of medical informatics is decision support. For this, we need access to a large searchable data collection with precise machine-readable representation of original medical information. Representation by words of a language is imprecise and done in very variable way and therefore fraught with problems. We need a much better solution. From this, the aim of this article is derived.

 Aim of the Article

We describe a new approach with data structure for the representation of medical information. The data structure should fulfill the following requirements:

Globally and language independently definedPrecisely comparable and searchableEfficient with minimal space requirementPracticable, so that storage and search of medical information can be automatized to a great extent.

 Solution: Online-Defined Searchable Information

Before deepening online-defined medical information, we recall:

Information means selection from a set of possibilities (from a domain)(1)

In [1] is shown that already Shannon [2] used this approach. All digital information is transferred as (bits which represent) number sequences which each represent a selection from a set of possibilities. This set is called “domain.” Sender and receiver of information must know the definition of the domain for correct information transfer. Until now (2018), the domain is implicitly defined by context (and knowledge which can utilize it). Already this was enough for the “Digital Revolution.”

At least since the introduction of the Internet much more is possible: Globally, uniform definition of the domain via link or “UL” The UL is “Uniform locator” and can be a URL [3] or an abbreviated equivalent. This leads to the data structure.

UL (of online definition) plus sequence of numbers(2)

Which is called “Domain Vector” or “DV.” Its dimensions resp. numbers represent a selection (1) from a domain which is called “Domain Space” or “DS” and its definition is uniformly identified and located by the UL on the Internet. “DS definition” is also a “DV definition,” because by definition of the DS all (in the DS) contained DVs are defined. For simplification, therefore subsequently we call the “DS definition” also “DV definition.” Due to the uniform UL, the kind of data can be recognized worldwide, and further high-resolution language-independent comparison and metric search is possible. The UL in the DV can be abbreviated, and the number sequence can be represented in binary form with minimal space requirement. Hence, the DV data structure (2) fulfils among others the in the aim of the article-listed requirements 1–3, as already shown.[1] Subsequently, for the medical application, we describe how also requirement 4 can be fulfilled by using the DV data structure.

 The Medical Application

Data collection in medicine is common. It is, however, not trivial to collect the data so that they are comparable. This is introduced and explained [4],[5],[6],[7],[8],[9],[10],[11],[12]

Words of language are a selection from a domain (language vocabulary). Furthermore, (original) medical information is a selection from a domain (a set of possibilities), like every information (1). It is not necessary to translate original (medical) information (in variable way) repeatedly and redundantly into word sequences (text). Using the DV data structure, the domain is defined once on the Internet in the DV definition. This means that every number in (2)is defined once.

To utilize this advantage in medicine, we have to think about decision-relevant features in case of a certain connected group of diagnoses X. These can be, for example, age, gender, the position of the main finding, severity, labor parameters, selected treatment, treatment result, etc. Then, for every independent feature, we define an ordered domain (set of possibilities) and a bijective mapping from every element of the domain to a number. It is an integer in case of a discrete domain (e.g., gender) or a floating-point number in case of a seemingly continuous domain (e.g., a certain measurement result). We put these definitions together in the machine-readable form online on the Internet at certain location with unique UL.

So using this UL, we can access all information which we need to represent in case of a selected group of diagnoses X the defined decision-relevant information about findings, treatment and result in DV form (2).

The DV definition can be used to store medical information globally in comparable and searchable form and also to create DVs as input for similarity search.[9] Such DVs can be used to find for decision support similar medical situations with chosen treatment and result.

Therefore, we recommend to collect for different diagnoses systematically definitions of decision-relevant features: Everyone with expert knowledge about a certain connected group of diagnoses (e.g., a subgroup of ICD-10-CM [15] is invited to publish up to now uncollected decision-relevant features, for example, certain findings, preconditions, restrictions, so that these can be added to the systematic collection without redundancy. Then we have important preconditions to realize the above-summarized procedure for all covered groups of diagnoses. Far-reaching automatization is possible due to complete machine readability of DV definitions and DVs. For example, the practitioner can get for the given diagnosis proposals for further measurements. Connected digital measuring devices can provide the results already as DVs for storage and for search in the growing database. Therefore also the above-listed requirement 4 is fulfilled.

To clarify the general procedure for the representation of decision-relevant medical information as sequence of numbers (in the DV), we now provide an example.

 Exemplary Dv Definition

For illustration, we provide an example which also shows how complex visual findings can be handled. [Figure 1] shows the radiologic finding of an osteoporotic compression fracture of TH 7.{Figure 1}

There is always the question for the best therapy. Is an operation recommendable, and if yes, which operation? We are interested in experiences with similar findings. For comparison of images, we have to make feature extraction and measurements on the images. [Figure 2] shows possible measurements for comparison of such findings. Some further relevant data:{Figure 2}

Date 0 = Date of birthDate 1 = Date of measurement (shows together with date0 also age)g = Gender (m or w)d = 2d2/(d1 +d3) [from [Figure 2]n = Number of vertebra numbered continuously from head to pelvist = t-score at a representative undamaged vertebra (DXA bone density)w = Weight as body mass indexx, y, z = coordinates from an anatomy reference [13] (if available)

For comparable description of the findings, we could define, for example,

DV1: = (date0, date1, g, d, n, t, w, x, y, z)(3)

Every DV can be combined with further DVs, e.g., in case of further diagnoses. Then the dimensions of the combination are together searchable in multidimensional searches. The coordinates x, y, z show an optional possibility to make localized findings searchable independence of localization. Such useful extensions can be introduced stepwise, as soon as data and software for this are available.

Very relevant in this case was the DXA measurement t of bone density. The t-score at L1 was 4.6, i.e., advanced osteoporosis. The patient noticed the connection with his underweight and started to correct this. Ergometer training was reduced, and strength training became important part of conservative treatment.

For detailed description of therapy and results, we could also create DV definitions. There are several aspects concerning conservative therapy, for example, training, which could be represented efficiently in DV form.

The additional operative therapy, in this case, was a dorsoventral spondylodesis. The operation trauma is relevant for the decision to or against the operation and with this, for example,

Date 2 = Date of operation (always necessary for documentation)q1 = Summary length of skin incision (rough estimation of the minimum)q2 = Summary cross-sectional area of muscle dissection (rough estimation of the minimum)For a rough comparison of the trauma at a spondylodesis.

DV2: = (date2, q1, q2)(4)

could be used. Obviously q1 and q2 are only rough estimators, but they are much better than nothing. q2 is relevant at evaluation of a spondylodesis trauma. Muscle dissections lead to a permanent damage of the patient, therefore it should be obligatory to inform about this before. Of course also further parameters could be interesting, for example, parameters which describe the patients' satisfaction, also parameters about possible long-term pain and complications, depending on diagnosis and therapy. [Figure 3] shows the postoperative situation after spondylodesis. There have been no unplanned complications, but the patient was shocked about the muscle damage [Figure 4]. q2 of DV2 (4) was clearly relevant. Before this spondylodesis, the patient has not been informed about minimally invasive operation methods.[14] Later, he got 2 further spondylodesis operations by other surgeons. These had q2= 0 and caused not nearly as much permanent complaints. The patient noticed the importance of q2= 0. Documenting DVs like DV2 (4) could help, because these can be defined to show just the relevant information in compact and efficiently searchable way.{Figure 3}{Figure 4}

 Importance of Systematic Collection of Data as Dvs

In detail, systematic data collection means:

Creation of more and more specific DV definitions in dependence of rough diagnosis for precise description of findings, therapy and therapeutic success. For example, ICD-10-CM [15] can be used as first entry. So after making a rough diagnosis, the doctor can look at a database with ICD-10 codes and related DV definitions. In this case, for example, S22.000A can be used for all wedge compression fractures of a thoracic vertebra. The database will also contain links to unifying codes for building connected groups of diagnoses with overlapping features (see above), for example from S22.060A (wedge compression fracture of T7–T8 vertebra) to S22.000A (wedge compression fracture of unspecified thoracic vertebra). Complementary DV definitions like DV1 and DV2 are collected thereReused definitions can be automatically copied from other parts of the patient record, e.g. date 0 or g in DV1 (3). Diagnosis-specific dimensions of of DV1 (3) later can be filled also by help of software, for example, using feature extraction algorithms from [Figure 2]. There is no obligation to fill all dimensions of the DV definitions, but these provide a useful frame to get comparable descriptionsIn course of time, the most meaningful dimensions of DVs are most frequently used. This can be shown. Hence, the doctor can get automatic recommendations to measure and collect findings as DVs in a way which is expressive and comparable. Collection of searchable and comparable DVs can become more and more a matter of routine in the electronic patient recordIf the patient agrees, the own DVs are also copied pseudonymously in a protected local database. The database with DVs is searchable and provides as output anonymized statisticsGenerally, the DVs with most important findings of the current patient are used for search and the database returns as anonymized statistics the averages from the group of patients with similar findings under the condition of different therapies. So, we get to the current patient-adapted statistics, like in an individual study. For example, we could search with parameters d, n of DV1 (3) for patients with similar such parameters to find the treatment with best long-term results. So, also a better estimation of the chances for success of conservative therapy without operation is possibleThe statistics of the own local database may be small, but due to the globally defined data format of the DVs (2) all other local databases with DVs have worldwide the same definitions of data. So, just the same search request can be sent not only to the own but also to all other databases with DVs worldwide. Each answers with anonymized statistics. Due to the uniform definition of DVs, all such statistics can be combined automatically, where every local statistics is weighted by its size. The result is a large global statisticsSo, we can get individual global statistics over patients with similar parameters to find the best treatment decisionThe treatment decision and further progress is provided in DV form to the electronic patient record. If the patient agrees, also further documentation is pseudonymously provided as DVs to the local database. Due to global definition of DVs, this increases the common global data collection. So, global statistics is increasing and becomes more and more valuable (see 6).


Until now, the usual way is representation of information as (unidentified, by context defined) number sequence, for example, as text. The large combinatorial freedom for the representation of information as free text complicates utilization. To make the data much more efficiently available, we need their automatic and reliable recognition by machines. Therefore, there are already large efforts to make medical information machine readable. HL7 FHIR [16] focuses on this. We show an excerpt from the provided examples [17] in [Table 1]. For comparison (5) shows the equivalent information as DV:{Table 1}

<DV >; 2016-03-28; 83.9; </DV>(5)

Despite long URL and XML form (5) is much shorter than [Table 1] and the kind of data in addition uniformly identified. It could be even shorter in efficient binary form (2). An editor of such DVs could use the associated globally unique DV definition to ensure uniform and comparable DVs automatically. [Table 2] shows a demonstration example of a DV definition for (5). Worldwide only one such definition is necessary for all DVs with the same UL. Later also for the DV definition, a binary form is preferred to optimize efficiency.{Table 2}

While in the up-to-now usual standard [Table 1], the definition is mixed and repeated in variable form with data, the DV (5) contains an unambiguous pointer to the worldwide unique definition [Table 2] with the sequence of raw data. The DV is not only much shorter, its data are also uniformly identified by the pointer (the UL). So, its structure (2) automatically avoids unnecessary combinatorial freedom for machine-readable representation of comparable digital information. Nomenclatures like LOINC [18] do not suffice for diagnostic-dependent fine description of findings, but they can be converted and included into DV definitions and so contribute to a much larger global collection which can be created by all users: The UL in the DV (2) allows worldwide worksharing for the definition of precise machine readable (DV) data. So, all concerned groups, also (organizations of) medical specialists and patients can easily provide their experience and so make for an increasing set of diagnoses increasing relevant medical information like DV1 (3) and DV2 (4) machine readable, comparable, and searchable. More and more DVs can be defined and stored worldwide at distributed locations on the Internet. Because the data type of every DV is globally uniformly identified by the UL, comparable information can be collected retrospectively and systematically used as described above for decision support and for the comparison of therapeutic successes where up to now elaborate local trials are possible. The data are globally defined and statistically usable for clinical research. Of course, this is important for the growth of general medical knowledge.

Privacy – DVs are by design (2) optimized in efficiency and allow well-defined transparency and privacy. If they are published, their information is globally searchable. At this it is also possible to allow access only after averaging. At this, as a result of a search not individual DVs are provided, but the average of a group of DVs with a minimal size. This variant is interesting for medicine, where the dimensions of the DVs often represent measurements. For decision support, averages of these measurements are only the interesting thing and simultaneously averaging of measurements leads to anonymization.

To sum up, all medical information can be coded efficiently in DV (2) form and so can be made globally comparable and searchable. The principle can be used for joining and selective search of all kinds of objectifiable information in medicine and also for other professional communication (e.g., in industry, science [1]). The covered information is exactly definable by users who create the DV definitions.

Therefore (instead of worldwide multiple mixtures of definitions with data as in [Table 1] or in free text - or even undefined data), we prefer globally identified and uniformly defined medical information [worldwide multiple DVs with one UL as in (5) combined with one global DV definition like [Table 2]. Important steps for realization of this are:

Construction of a guided online presence, where the users can login and generate own (if wished multilingual) DV definitions like [Table 2]. [Figure 5] shows an exemplary window for the definition of a DV dimension (number). It was generated by the demo version[8] In [9], the content of DV definitions is explained in more detail. The DV definition can contain also pointers to supplements, e.g., for every dimension resp. number a series of pictures for graphical illustration of the contentCreation of browser Plugins and further software for visualization and editing of DVs with the use of the associated DV definitions. Such software can also help for incorporation into the existing EHR softwareInstruction of users, especially medical specialists and researchers, in the usage of the online presence for the definition of DVs. So, more and more DV definitions, like those of DV1 (3) and DV2 (4), can be created to an increasing count of medical topics and rough diagnoses (e.g., from ICD-10-CM [15]). Repeatedly used low-dimensional or one-dimensional DV definitions (e.g., date) can be placed uniquely in a common directory for better reusage. Insertion of the DV definition's UL is sufficient to reuse it as part of a larger DV definition{Figure 5}

As soon as there are DV definitions and software for handling, DV data collection can start

For evaluation of the distributed data collection, we need also a distributed search engine. Creation of a distributed search engine for DVs is possible using distributed synchronized indices as described in [9] [8] demonstrates the principle. As soon as the data collection is large, complex searches are possible.


It is advantageous to transport medical information in well-defined machine-readable form. At this, DV form (5) is more efficient than alternatives which mix data with definitions [Table 1], and the global online definition and uniform identification of DVs by UL (2) leads to searchability and to better evaluability and interoperability. DVs can be used for joining and selective search of all kinds of objectifiable information in medicine and other professional communication (e.g., in industry, science [1]). Therefore, introduction of DVs is recommendable. Important steps for this are listed above at the end of the last chapter.

Financial support and sponsorship


Conflicts of interest

There are no conflicts of interest.


1Orthuber W. Why Informatics and General Science Need a Conjoint Basic Definition of Information. Arxiv Preprint arXiv: 1801.03106; 2018. Available from: [Last accessed on 2018 Jun 08].
2Shannon CE. A mathematical theory of communication. Bell Syst Techn J 1948;27:379-423. Available from: [Last accessed on 2018 Jun 08].
3Berners-Lee T, Masinter L, McCahill M. Uniform Resource Locators (URL); 1994. Available from: [Last accessed on 2018 Jun 08].
4Orthuber W, Fiedler G, Kattan M, Sommer T, Fischer-Brandies H. Design of a global medical database which is searchable by human diagnostic patterns. Open Med Inform J 2008;2:21-31.
5Orthuber W, Sommer T. A searchable patient record database for decision support. In MIE; 2009. p. 584-8. Available from: [Last accessed on 2018 Jun 08].
6Orthuber W, Dietze S. Towards Standardized Vectorial Resource Descriptors on the Web. Vol. 2. In GI Jahrestagung; 2010. p. 453-8. Available from: [Last accessed on 2018 Jun 08].
7Orthuber W, Papavramidis E. Standardized vectorial representation of medical data in patient records. Med Care Compunetics 2010;6:153-66. Available from: [Last accessed on 2018 Jun 08].
8Orthuber W. Demonstration of Numeric Search in User Defined Data. Numeric Search viewed May, 2018. Available from: [Last accessed on 2018 Jun 08].
9Orthuber W. Uniform Definition of Comparable and Searchable Information on the Web. Arxiv Preprint arXiv: 1406.1065; 2014. Available from: [Last accessed on 2018 Jun 08].
10Orthuber W. How to Make Quantitative Data on the Web Searchable and Interoperable Part of the Common Vocabulary. In GI-Jahrestagung 2015; Proceedings: 1231-1242. Available from: [Last accessed on 2018 Jun 08].
11Orthuber W. Collection of Medical Original Data with Search Engine for Decision Support. In MIE; 2016. p. 257-61. Available from: [Last accessed on 2018 Jun 08].
12Orthuber W, Hasselbring W. Proposal for a New Basic Information Carrier on the Internet: URL Plus Number Sequence 2016; Proceedings of the 15th International Conference WWW/Internet: 279-284. Available from: [Last accessed on 2018 Jun 08].
13Zhang SX, Heng PA, Liu ZJ. Chinese visible human project. Clin Anat 2006;19:204-15.
14Bühren V, Beisse R, Potulski M. Minimally invasive ventral spondylodesis in injuries to the thoracic and lumbar spine. Chirurg 1997;68:1076-84.
15World Health Organization. ICD-10: International Statistical Classification of Diseases and Related Health Problems: Tenth Revision; 2016. Available from: [Last accessed on 2018 Jun 08].
16Bender D, Sartipi K. HL7 FHIR: An Agile and RESTful Approach to Healthcare Information Exchange. In Computer-Based Medical Systems (CBMS) 2013; IEEE 26th International Symposium; 2013. p. 326-31. Available from: [Last accessed on 2018 Jun 08].
17HL7. FHIR Resource Observation – Examples. FHIR Release 3 (STU; v3.0.1-11917) Generated on Wed, April 19, 2017. Available from: [Last accessed on 2018 Jun 08].
18McDonald CJ, Huff SM, Suico JG, Hill G, Leavelle D, Aller R, et al. LOINC, a universal standard for identifying laboratory observations: A 5-year update. Clin Chem 2003;49:624-33.