PERSPECTIVE Year : 2018  Volume : 4  Issue : 4  Page : 148156 Global predefinition of digital information Wolfgang Orthuber Department of Orthodontics, UKSH, Kiel University, 24105 Kiel, Germany Correspondence Address:
Preface Digital information consists of bits which code number sequences which are defined in a very variable way. This can be improved very much. There are already earlier articles,[1],[2] which describe the potential of welldefined vectors resp. number sequences for the description of resources in medicine and generally. In an article by Orthuber,[3] the uniform definition of information is explained in detail. For demonstration purposes, a search engine prototype[4] is online which allows clear definition and search of number sequences (digital data) in a local database. Later, Orthuber and Hasselbring[5] explicitly recommend globally identified and predefined number sequences for general transportation of digital information. In an article by Orthuber,[6] it is recalled that every kind of information (also a digital number sequence) means selection from a set of possibilities. This set is called “domain”. Using online definitions, we can globally define the domains of digital information resp. number sequences. We call the definition of every domain also “predefinition” (e.g., in the title of this article) because this clarifies the temporal order: The domain must be defined for all the participants of communication before the transfer of information. Then, information is transported digitally as number sequence which selects within the domain. Global predefinition of the domain is not restricted to medical information, it can be used generally and efficiently for every kind of information. Therefore, the consequences of this would be far reaching. The longer preprint[6] of this article describes this together with additional aspects. Despite former publications and the great potential of a new global predefinition of digital data, up to now, there are no consequences and also no discussions about this. It seems that the potential of global predefinition of information is not enough understood. Therefore, the first aim of this article is stepwise explanation of the general principle, the global predefinition of digital information. Then, we derive important consequences as perspective. For example, global predefinition of information is an efficient means to merge and combine global experience for decision support, for example, in medicine. We will use not only references with square brackets [...] to external literature, but also internal references (as usual, e.g., in mathematical literature) with round brackets (...). These refer to relevant passages in this article, to facilitate clear bottomup argumentation. The structure of this article follows such bottomup argumentation, with perspective. Explanation of the General Principle For exchange of experience, we need information exchange. We need to compare and find “similar” information. Therefore, we want a global representation of information which allows definition and recognition of similarity. For this, we require answers to the following three basic questions: What is information?What is similar information?How can we predefine this globally? Especially question (1) is frequently underestimated. It is answered in several ways, but we need a helpful technical answer, which prepares for answers to the next questions. The answers are essential, for example, when comparing and searching precise information. An approach to “information” should use only clear basic terms to be technically helpful. Sets are basal welldefined concepts where we can begin. Already, Shannon's wellknown mathematical theory of communication[7] uses this concept. On the first page he wrote: “The significant aspect is that the actual message is one selected from a set of possible messages.” (4) From (4), we can derive the answer to question (1) and the definition of information: Information means selection of a possibility from a set of possibilities (domain). (5) The “set of possibilities” is called “domain” for abbreviation. For faultless conversation, the sender and receiver of information must have the same knowledge of the domain a priori. Therefore, uniform predefinition of the domain (5) is important for all participants of a communication. The term “predefinition” underlines the temporal order and means that a complete machine readable definition of every variable (resp. number) resp. dimension of the domain is given a priori for all participants of conversation. (E.g., in case of free text, this is not the case because free text defines itself a posteriori resp. belated during input at the sender.) Predefinition of information means predefinition of the set of possibilities (domain). (6) Then, the definition of information is given by (5). Looking at (5), we can regard information also as a mapping from the domain to an element of the domain. For the answer to question (2), we need comparison of different possibilities of information (namely domain vectors) DV1, DV2, DV3,... by a welldefined distancefunction, whose results lead to statements like “DV2 is more similar to DV1 than DV3” resp. “DV2 is less different from DV1 than DV3” or even “DV2 is identical to DV1.” Here, we see that the answer to question (1) is essential: we need a priori a “set of possibilities” or “domain” of information, so that we can compare its elements resp. domain vectors DV1, DV2, DV3,... using the distancefunction to come to statements like “less different” or “more different.” These statements (“less” or “more”) introduce an order. They can be represented by “smaller” or “greater” numbers. Hence, the distancefunction compares two elements DV1, DV2 and provides as result a number, which is the smaller, the more similar DV1 and DV2 are. For welldefined similarity, the elements of the domain (5) of information must be comparable using a distance function. Such a domain is usually ordered along certain dimensions and called “metric space” in literature.[8] Let DV1, DV2, and DV3 be any elements of a domain D which is a metric space. Then, there is a distance function F (given in the predefinition of the domain) which fulfills: F(DV1, DV2) = 0 if and only if DV1 = DV2 F(DV1, DV2) + F(DV2, DV3) ≥ F(DV1, DV3) F(DV1, DV2) = F(DV2, DV1). (7) Every element of the domain D can be represented by a sequence of numbers. For computability, the count of these numbers must be finite. This means that there is a maximal count d of these numbers. This number d is called “dimensionality” of D. We require The dimensionality of the domain D is finite and a priori given (predefined). (8) There are useful and simple functions which fulfill (7), for example, “Manhattan distance” or “Euclidean distance.” The function F can be adapted to the requirements. For example, it is recommendable that the most important criteria (numbers resp. dimensions) mostly contribute to the value of F. Hence, the distance function F introduces an order. The smaller the F (DV1, DV2), the more “similar” are the DV1 and DV2. This provides the answer to question (2) and also shows that it is important to order the domain D a priori along meaningful dimensions, so that the distance function can convert differences of these dimensions into a meaningful distance. Using the existing technical infrastructure, question (3) can be solved efficiently by the following requirement: The predefinition of the domain D is uniformly locatable on the Internet. (9) This predefines the domain and also similarity of information globally and provides an answer to question (3). For practicability, reusage, and nesting of predefinitions (9) and for avoiding redundancy, we introduce an additional nontechnical requirement: The predefinition of the domain D can be used freely without legal restrictions. (10) The above preconditions will be applied now. Domain Spaces as Domains with Domain Vectors as Elements If a domain fulfills (7) (8) (9) (10), it is called “Domain Space” or abbreviated “DS.” Every element of the DS is called “Domain Vector” or abbreviated “DV.” Due to (9), it is predefined online and therefore globally defined. The DV contains a pointer to the online predefinition called “Uniform Locator” (abbreviated “UL”) [Figure 1] and a sequence of numbers which represents a selection (5) from the domain. An example of such a selection is illustrated as DV in [Figure 2]. The structure of the DV is minimized to maximize efficiency and to avoid unnecessary combinatorial freedom in the transported data.{Figure 1}{Figure 2} Structure of the DV: UL (of online predefinition) plus sequence of numbers (11) Hence, the DV consists of the following two main components: UL of online predefinition (It consists of numbers which localize the predefinition of the domain.)Sequence of numbers (It consists of numbers which select one element in the domain.) Of course, also the UL is a number sequence, but the nomenclature (11) should emphasize its special function. Due to its central importance, the structure of the DV is illustrated in [Figure 3]. The UL is a pointer to the predefinition of the subsequent sequence of numbers (11) in this DV. It can be, for example, a uniform resource locator (URL) or a hierarchical number sequence or a more abbreviated pointer, for example, a byte with meaning “same UL as before” or “same concatenation of ULs as before” or a short local pointer to a local table with global internet pointers, i.e., at last to any addressable location on the Internet. Important is efficiency to achieve the function: The UL provides a globally unambiguous and efficient pointer to the unique online predefinition and simultaneously an identifier of the domain of subsequent data (numbers).{Figure 3} How to get high resolution and also enough range of domain vectors DV predefinitions can include the existing predefinitions via nesting.[5] They can be high dimensional and can describe complex situations. To achieve high resolution and simultaneously enough range in meaning, sustainable work is necessary for the development of online predefinitions. This work can be done by all users because the UL in every DV (11) can address every location in the Internet. There are two basic means to get enough range in meaning besides high resolution: Predefinition of many dimensions in a DV: This quickly leads to large cardinality and so allows combining high resolution with high range. Enough preknowledge about possibly interesting dimensions is necessary. It is recommendable to include the existing predefinitions via nesting[5]Belated combination of DVs: Combinations of DVs is always possible like combinations of words in text. The advantage is more flexibility, and the disadvantage is less unambiguity: It is possible that different combinations lead to the same meaning. However, DVs can be predefined in a more complex manner than simple words, so that smaller and less ambiguous combinations of these “large building blocks” lead to enough meaning. In the initial stage, combinations of DVs (and even text) will be usual to get meaning. Later, more complex and highdimensional DV predefinitions are available, so that free text and the part of belated DV combinations can be reduced more and more. Predefinitions of specialized domain vectors In free text, the meaning of every letter and word results from context which is language dependent and not provided a priori. The consequence is combinatorial freedom and lack of comparability. In the DV data structure (11), however, there is predefinition of information (6). This means that the set of possibilities of every dimension resp. number is a priori well defined to minimize the ambiguity of information (5). Hence, it is advisable to predefine for more and more topics specific to DVs with dimensions for all relevant aspects, so that “similar” meaning leads to “similar” DVs (11). Later, the DVs are directly comparable and systematically searchable. Obviously, comparable data would be important for medicine and generally for scientific, technical, and other precise information exchange. For all communication which is focused on a certain topic or domain, predefinition of special DVs can save a lot of detours caused otherwise by language and/or incommensurable standards. These can be used directly (I) or combined (II). When predefining DVs for a certain topic or domain, it is desirable to cover as much as possible breadth in meaning by independent dimensions resp. numbers. Later, this facilitates search. Often, many details are important in a certain subject matter, and it is useful to predefine appropriate dimensions for the description of such details. It is advantageous to reuse already existing predefinitions via nesting. This can lead to predefinition of highdimensional DVs. However, this does not mean that all these dimensions are used in every case. The dimensions are by default only optional container for information. The user who selects a certain DV for information transfer can typically decide which dimensions are filled with values and which are not, according to the requirements. The better the DV fits to the requirements, the more information its dimensions can transport in efficient, reproducible, comparable, and searchable way. In the course of time, more and more such predefinitions are available to increase the comparable part of exchanged information. Predefinitions for Decision Support Global predefinition of DVs is important for precise sharing of experiences, for example, in medicine. Hence, a typical nontrivial application of specialized DVs is decision support. Experts about a certain subject area can share their experiences and together develop the best predefinitions of specialized DVs. The DV contains: (12) Dimensions about preconditions of the decision. Interesting are all parameters with relevant influence on the decision and its result. (13) Dimensions about the decision. Interesting are all relevant parameters which describe the decision. (14) Dimensions about the decision result. Interesting are all parameters which describe relevant consequences of the decision. Usage: Before decision, the most interesting parameters resp. dimensions for descriptions of the current situation (preconditions) (12) are selected and together with varying decision parameters (13) provided to the search engine, which returns for every decision variant a group of cases with “similar” parameters [Figure 2] and for every group means and standard deviation of the result dimensions (14). The better the dimensions (12) (13) describe the situation at the time of decisionmaking and the more similar the found dimensions are and the larger the found group is, the more reliable the means of the result dimensions (14) represent the expected outcome of a decision variant. The statistics recommend the decision variant with the most favorable numbers in the result dimensions (14). Medicine is an important application of this procedure and described in the next chapter. Example: Domain Vectors in Medicine The medical application is explained with patient examples more detailed in the article by Orthuber.[9] Here, we describe the most important technical and organizational details. We start with the present situation: There is a remarkable progress in the application of digital techniques in medicine, but there are still relevant interoperability problems which hinder data exchange not only worldwide, but also locally between multidisciplinary teams which want to work together for integrated and patientcentric care. Hence, for interoperable transport of medical information, standards are in development. Currently, HL7 FHIR[10] is recommended. [Table 1] contains an example[11] of this standard. The data (two numbers which represent date and heart rate) are mixed with longer definitions. Due to the transport of variable definitions together with data, basal preconditions (9) for globally defined information are not fulfilled. In contrast to this, the DV (11) is uniformly defined and much shorter because it contains only the UL of the global predefinition plus the data (two numbers in this case). The UL allows that all users, also all professional medical organizations and patient organizations, can create global predefinitions of medical information. A predefinition could also contain abbreviated parts of definitions as shown in [Table 1] (so parts of the HL7 FHIR standard can be reused), but it is globally unique and not transported (in varying form) together with data. The medical data are DVs [Figure 3] which can transport information about medical findings, treatment, and treatment results, so describe preconditions (12), decision (13), and result (14) in globally comparable form. Certain DVs represent a selection as shown in simplified way in case of two dimensions in [Figure 2], which also illustrates “similar” findings. Hence, “similar” medical situations become searchable and comparable because numbers are searchable and comparable. It is relevant for medicine that the predefinitions of the DVs resp. data (11) are inherently international: This makes medical experiences internationally comparable. Existing nomenclatures such as LOINC[12] are convertible into predefinitions of DVs. According to (10), their usage should be free.{Table 1} Important steps for decision support in medicine are as follows: The physician provides an initial diagnosis (e.g., from International Classification of Diseases10[13]) to the decision support system, which answers by showing the most frequent (dimensions of DVs which represent) further diagnostics resp. measurements (inclusive data about treatment) made by colleagues under this condition.The physician decides about further diagnostics. Due to the usage of ULs, these have unique names. Because the system is connected, after completion of further diagnostics and measurements, all results are available automatically.The physician decides about the most interesting results xk. The system shows to every xk the standard deviation sk and suggests intervals Ik= [xk− rksk, xk+ rksk] for search. At this, the physician can modify the rk and also shift some Ik, for example, for testing variable treatments. After this, the intervals Ik together describe a search command which is sent to the system.The system searches the Group G with all patients whose xk lie within the Ik and makes further statistics within this group. The system shows among others to chosen parameters xl the means ml and standard deviations dl. These can be also descriptions of treatment results. Hence, by modifying the Ik in (c), the effect of different treatments on the treatment result can be checked within the individual group G “near” to the individual patient.Using the provided information and optionally further results, it is possible to continue at (b) or even at (a), to find the treatment with the best results. Distribution of Search and Anonymization of Search Results The search (d) can be distributed. It is not necessary to collect all patient data within a central database. Important is that the predefinition of the data is globally valid, i.e., that there is one and the same predefinition of all data with the same UL. This is automatically guaranteed by the DV data structure (11). Under this condition, it is not even necessary to use the same search engine. The search parameters from (c) can be distributed worldwide to several databases with own search engines. Every search engine responds to the sender of the search parameters (c) back with the anonymized statistics (d) over the found Group G as search result within the own database. All these are collected at the sender of the search request (c). Every search result there is weighted by the size of the found group for calculation of the combined statistics over all found patients. In this way, we can get a worldwide anonymous statistics over all patients. Hence, we do not need a central collection of patient data, it is sufficient to provide the data of every patient to the same local database (d) which can answer anonymously to worldwide requests (c). Why Informatics, Medicine, and General Science Need a Conjoint Basic Predefinition of Information It is well known that the costs and inefficiency caused by interoperability problems and redundant programming are very high. Hence, the advantages of the DV data structure (11) for informatics are obvious: The UL is a global identifier and simultaneously a global pointer to the predefinition of information. At this, efficiency is central focus: The UL allows maximal abbreviation and the number sequence allows predefinition of every bit. The appropriate software can be selected automatically to avoid interoperability problems, due to exact identification by the UL. The software for DVs can be made available online, if wished with source code. It can be locatable, for example, by pointer from the DV predefinition or by using the UL as identifier. Hence, software for DVs is searchable. This also motivates to purposeful improvement of code quality instead of redundant programming. Data with the same UL can be handled in distributed systems (e.g., search engines) with the same algorithms. The previous chapter shows the medical application. The algorithms can be also made available online as searchable software modules. The output of appropriate algorithms can be combined, if wished. Due to machine readability of online predefinitions, of data, and even of associated program modules, a lot of time and costs can be saved. There is a large scope for development also in general science: The (on the Internet simple) global predefinition of domains for language independent objectifiable information exchange could become usual. This is important because “science” by definition deals with “objectifiable” resp. “globally comparable” information. (15) Predefinition of uniform domains could be done in dependence of the subject area. Hence, important configurations could be predefined systematically (e.g., by a suitable professional organization of specialists) as DVs [Figure 3] and so become comparable and searchable. At this, predefinition of similarity (for comparison and search) can be adapted to the needs and interests. The optimization of the different predefined domains could become an important topic in science. The large accumulation of publications is increasing and more and more difficult to survey. Without conjoint predefinition of information, for example, the results of medical studies are usually not directly comparable. A timeconsuming metastudy[14] can improve the situation in case of partially comparable results, but it cannot directly combine these. If, however, the results have the same predefinition and are published online in machine readable form (11) together with statistical weight, even an automatic comparison and combination is possible. Already during design, a scientific work can be adapted to the existing online definitions, and as a consequence the results are comparable and searchable. Hence, preconditions for the utilization of scientific research could be improved. Here, we cannot list all the possible benefits of global, conjoint predefinition of information. If information is carried as well predefined meaningful DV, it is comparable by machines and therefore also divergent information is easily detectable by machines. Hence, in case of a welldefined environment, it is more difficult to publish wrong facts unnoticeable. Certainly, authors can refuse the application of DVs. However, as soon as these information carriers are fully established, their refusal is also a hint to the reader. The reader can decide to look only to literature if it contains DVs with certain ULs (11). If for a relevant topic there are still no DVs, these can be predefined online. Discussion According to [Figure 3], the DV data structure (11) represents globally predefined information, and also similarity between DVs is globally defined (7). Hence, it provides a clear answer to all the three questions (1)(2)(3). However, there are also disadvantages. Disadvantages of the domain vector data structure The most significant disadvantage is initially relevant: Up to now, there are not enough data in DV form (11). Investment in software, predefinitions, and interesting data (DVs) is necessary, so that search of DVs becomes attractive, and selfinitiated users start to provide more and more DVs.Compared to the until now usual structure of digital information (sequence of numbers), the main objectifiable disadvantage of the DV (11) is the additional space requirement by (additional bits resp. numbers of) the UL. However, without UL, longer predefinition via context is necessary [Table 1]. Moreover, according to its definition, the UL can be optimized, so that it needs only minimal count of bits, compared to a URL. The new term “UL” has been introduced to signal this.The DV (11) is dependent on online predefinitions. Therefore, download of used parts of online predefinitions is necessary. After this, the DV can be used also offline. There should be an infrastructure which guarantees stability of online predefinitions, for example, by backup and mirroring. It may be also argued that work is necessary for making the online predefinitions. However, much more work is necessary without these, due to the alternative repeated and varying predefinitions by context. These are mixed with data, which later makes comparison of data more difficult or even impossible. In contrast to this, DVs are automatically comparable and have several advantages. Advantages of the domain vector data structure For the space needed by the UL, we get a global pointer to the (much larger) online predefinition of the following data (numbers). Simultaneously, we get a global identifier of the (domain of the) data, which makes these comparable by machines.[6] Some advantages of this are as follows: The DV data structure (11) enables the combination of maximal competence (of all users which can predefine digital information via UL) with maximal efficiency (number sequence allows predefinition of every bit). Redundant varying (with data mixed) definitions (also unnecessary syntax overhead) are avoided.Every DV predefinition is also a DS predefinition and hence efficiently predefines online all DVs with the same UL [Figure 1]. This enables similarity comparison later.Existing predefinitions can be reused freely (10) within new predefinitions (nesting of DV predefinitions),[5] so that complex search over multiple DSs becomes possible. If, for example, there is a onedimensional DV predefinition with dimension “length in meter” and another DV predefinition with dimension “width in meter,” these can be included in new more complex predefinitions. Then, it is possible to search within all new predefined DSs for all DVs with certain length and width.The predefinition of a DV is adaptable to the needs – from simple to complex. Additional dimensions can be appended afterward.Efficient predefinition is possible. If the value set of a DV dimension is known a priori, we need, for example, not more than 1 bit for selection in case of 2 alternatives (yes/no). In case of [Figure 2], due to 64 = 26 possibilities, we need 6 bits and so less than 1 byte for selection. If (the range of) the value set of a dimension is not known a priori, the first byte of a number (mantissa and exponent if necessary) can also contain bits with length information. Such a number can adapt its length to the requirements [selfextending number, [Figure 4], to minimize the count of unused bits.Generation of DV predefinitions and of DVs can be automated, for example, by integration into programming languages.The identification of DV data by UL makes these interoperable, globally searchable, and comparable. This is generally important in informatics and for all objectifiable information exchange. For general science, (15) comes to the point. Further features of DVs are described in studies by Orthuber.[3],[6] There is also the search engine http://numericsearch.com, which introduces the principle and demonstrates user defined similarity search in a local database [Figure 5], [Figure 6], [Figure 7].{Figure 4}{Figure 5}{Figure 6}{Figure 7} Conclusion Information means selection from a set of possibilities (domain). Predefinition of the domain (5) is necessary for all participants of a communication before transfer of information. The globally predefined DV data structure (11) is worldwide comparable and allows the combination of maximal competence (because all users can predefine) with maximal efficiency for information transport (every bit in the “number sequence” (11) can be predefined). This has great potential, especially in scientific and other professional communication. Therefore, introduction of the new DV data structure with framework is recommendable for longterm improvement of our digital infrastructure. Financial support and sponsorship Nil. Conflicts of interest There are no conflicts of interest. References


