Abstraction of Data Elements of Clinical Symptoms in Chinese Medicine

Authors names: 

Xiao-Xia XIAOa,b, Jun-Feng YANa,b, Dong-Bo LIUa,b, Hao LIANGa,c, Yin-Yin PENGb, Man LIb, Xiao-Qing ZHOUa,c*

Authors working units: 
  1. Collaboration and Creation Center of Digital Chinese Medicine, Hunan University of Chinese Medicine, Chang Sha 410208, Hunan Province, China
  2. Department of Medical Informatics, Hunan University of Chinese Medicine, Chang Sha 410208, Hunan Province, China
  3. Diagnostic Institute, Hunan University of Chinese Medicine, Chang Sha 410208, Hunan Province, China
Corresponding Author: 

Xiao-Qing ZHOU

Corresponding Author Information: 

*Corresponding author. Professor. Research fields: TCM diagnostics.

E-mail address: zxq5381@sohu.com


This report analyzes the existing problems interminology referring to clinical symptoms of traditional Chinese medicine (TCM) from the viewpoint of data sharing and elaborates the necessity of establishing a standard directory of clinical data elements of TCM. We evaluated the principles and methods of data element extraction according to the status quo of the clinical information system and characteristics of symptoms for TCM and consequently proposed a three-layer model for optimal extraction. 

【Abstract】This report analyzes the existing problems interminology referring to clinical symptoms of traditional Chinese medicine (TCM) from the viewpoint of data sharing and elaborates the necessity of establishing a standard directory of clinical data elements of TCM. We evaluated the principles and methods of data element extraction according to the status quo of the clinical information system and characteristics of symptoms for TCM and consequently proposedathree-layer model for optimal extraction.


【Keywords】TCM clinical symptoms, data elements, standardization, TCM diagnostics

1.Significance of data elements of traditional Chinese medicine(TCM) clinical symptoms

Clinical symptoms form an important basis of TCM treatment with syndrome differentiation and evaluation of therapeutic effects and represent a vital component of the TCM clinical information system (CIS). However,regional and individual differences in natural language to describe clinical symptoms often lead to the generation of homonyms and synonyms. As an example of a homonym, the word 'indigestion' is explained as body fluid and blood accumulation in Soul Box for Acupuncture·The Origin of Various Diseases but described as''milk and food stagnation in modern semantics[1].Synonyms in the language include'insomnia', which is otherwise defined as sleepless, inability to fall asleep, unable to close the eyes, unable to lie down, waking overnight, thorough awakening, less sleep orreduction of sleep time, and 'anepithymia', also termed inappetence,loss of appetite,indigestion and anorexia [2,3,4 ] .

In view of the rapid developments in the information industry, there has been an increasing demand for the construction of a TCM clinical information platform for standardization of clinical terminology. Attempts to standardize TCM clinical terminology at different levels have led to significant achievementsin the establishment and classification of terms. However, the concept ofTCM clinical symptomsremainsunclear and the inclusion criteria are not uniform. One symptom might be the summary of a symptoms group.For example,the description of leucorrhea provided in the textbook of Chinese Medicine Diagnosticsnamed The National "Eleventh Five-Year"plan of general higher education, is "symptoms, such as white and large quantity of secretions or with thin quality like snivel, or something coming out without feeling and smell”[3], which is different from the modern definition of"vaginal discharge of secretions" [5,6].

To use the computer to deal withTCM clinical information, the concepts of TCM (terminology) comprise the most basic data. To achieve sharing, exchange and semantic interoperability of data, it is necessary to make the concepts clear, unambiguous and formalized. In the 1990s, ISO/IEC proposed a data element methodology to standardize the shared data,along with effective data classification, terminology standardization and data modeling. Individual countries have developed corresponding data element catalogs for national defense, health care, science and technology, government administration and other fields. China has similarly developed a catalog of health information data, but only a small proportionincludes TCM data elements. Extraction of data elements that meet the basic theory criteria of Chinese medicine is therefore a critical requirement for optimizing its clinical application.

Extraction of data elements of TCMclinical symptomsshould aid indetermining, describing, explaining and clarifying the meaning of the terminology along with achieving standardization of TCM symptoms. The purpose of symptomatic data standardization is to mainly establish the principles and norms with characteristics of application for the information expression, classification and positioning, so that it is simplified, structured and standardized, allowing the concepts of symptoms in computer science to be easily understood, compared and shared. This process could also promote standardization and reusability of symptom terms along with the integration and sharing of clinical data, which is animportant basic exercise in digital Chinese medicine.

Ontology, the philosophical study ofthe primitive or matrix of the world, has been used in the field of computers since the 1960s to achieve formal expression of terms, questions, classes, objects, attributes and relationships in specific fields (i.e., formal expression of knowledge), which, in turn,enables knowledge in the field to be understood by the machine and automatically processed. The Chinese medicine information field uses ontology to standardize and identifyTCM terms. China Academy of Chinese Medical Sciences has generated a series of studies on TCM clinical term sets based on ontology [7,8,9].However, the clinical application of datais still lack ofsymptom ontology, which is related to semantic and identified terminology [10,11]. Extraction of TCM clinical symptom data elements shouldprovide a standard glossary for ontology construction and facilitate the assembly of a clinical knowledge base.

2.Concept and Extraction of Data Elements

2.1 Definition of Data Elements

A data element is a data unit that specifies the definition, identity, representation and allowable values for a set of attributes considered the smallest units that cannot be subdivided in a particular semantic environment. A data element consists of Data Element Concept (DEC) and Representationcomponents. DEC is a concept expressed by means of a data element consisting of an object class and a property,irrelevant to the concrete representation(Figure 1).


Object classrepresents a set of concepts, abstract terms or real thingsthat can be clearly definedin terms of boundaries and meaning, with characteristics and behavior that follow the same rules. An object class can be a general or individual concept. An object class setwith two or more elementsis a general concept. For instance, “patients” is a general concept while a patient in the First Affiliated Hospital of Hunan University of Chinese Medicine is an individual concept. In a study onpatient diagnoses and treatments, symptoms can be applied as a general concept while fever is an individual concept.

Propertyrepresentsthe common features of all members of an object class used to distinguish and describe the object, corresponding to attributes in an object or entity-relationship model. For example, symptom collection (including observation, smelling, askingand palpationintraditional Chinese medicine) and symptom observation (self-feeling and examination by other people) are classified as symptom properties.

Representationis the combination of data range and data types and may also contain the measurement unit or representation class if necessary. Upon linking of representation and data element concepts, a data element is produced. The approaches of symptom collection in Chinese medicine include observation, smelling, asking and palpation, and these four diagnoses represent the data element “symptom collection ways”.

2.2 Data Element Attributes

To realizedata management and sharing, data element attributes should be registered in the dictionary and controlled in a standard manner to ensure uniformity during the process of information exchange. Data elements usually include name, definition, allowed value and identification type,(Figure 2). In the figure, "1:1" and "1:N" indicate mandatory attributes while"0:1" and "0:N"signify optional attributes.

2.3 Metadata, Data element and Ontology terms

Metadata is a term used to represent other data, and data element is used to describe the smallest unit of data in a particular context. As highlighted by ISO/IEC 11179-1:4 (E), the data element is regarded as the basic container for data. The description of the data element includessemantic (including situation and symbolic semantics) and representative (allowed value)components.is permitted (data),  and this is by including two parts in the conceptual model of metadata registry system attributes to describe. The attribute described in ISO/IEC 11179-3 is the data element, which can be registered in the metadata registry system. The data elements of an organization must incorporatemetadata intended to facilitate user understanding and sharing of data. Metadata are expressed in a consistent and standard manner.

Ontology, metadata, and data elements have to normalize the data from the definition of data and the relationship among data. Ontology focuses on expressing the relationshipsamong the data in specific areas to accomplishstandardization and sharing of knowledge. Metadata and data elements facilitate regulation of the data from the viewpoint of resource management and sharing. Accordingly, datadescriptions contain information on relationships among the data, usually in the form of a hierarchy, but do not preclude the use of MeSH.Ontology can also serve as a method of metadata description.

2.4 Methods for Data Element Extraction

Symptomatic data element extraction is used not only for classification of TCM clinical symptoms but also standardization ofdefinition. The pathway ofdata element extraction and registration can be divided into "top-down" and "bottom-up" approachesaccording to whether or not the corresponding information system exists.The “top-down”approach involving data extraction via function modeling-business process modeling-information modeling-extraction of data elements-data element submissionis applicable in cases where the information system has not yet been established whereasthe “bottom-up”approach is utilized in cases where the information system already exists.

2.4.1 Top-down approach

This approachrefers to the pathway from data element concept to data element, and is applicablebefore the establishment of a new information system. This procedure facilitates the acquisition of more comprehensive, semantic-based data elementswith good interoperability.Metadata generated from such data elements are more stable and unified. However, the method requires more forward attempts, and therefore,better understanding of the application status of the field and development trends.Additionally, accurate management of extraction granularity is essential while extracting data elements, whichcannot be applied directlyotherwise. Taking into account TCM clinical symptoms as adata element concept, symptom collectionway, observation manner,time of symptom occurrence and accompanying symptoms serve as the attributes of the symptoms, whichcan be further classified until the application requirements are met. Chinese medicine information systemsare widely used in TCM hospitals at present, and simply using this method for data element extraction is not appropriate.

2.4.2. Bottom-up approach

This approach refers to the pathway from data element to data element concept, and is more suitable for cases where the information system already exists in the application field. In this situation, the data dictionary in the information system or symptom data item in the database corresponds to a data element. The data elements extracted using this method provide only limited information, such as a collection of names and allowed values, and other attributes must be determined by understanding based on the underlying data elements and concepts derived from other facts. For example, in the database, the name "fever" can correspond to a data element, and the allowable values of the field "yes or no" provide a representation of the data element. However, the definition of data element needs to be obtained from other places viafield names obtained from  industry dictionaries, textbooks or monographs and other industry-recognized data sources. Due to differences in the application environment, construction period, use of equipment, use of technology and data storage methods, data elements extracted from a single system are unable to meet the needs of Chinese medicine practice. Meanwhile, if we extract data elements from multiple systems, the standardization problem becomes more complex. Therefore, this approach for data element exaction is also limited in terms of practical applicability.

In summary, considering taking into account that current TCM hospitals generally possess their own medical information management systems, a combination of top-downbottom and bottom-up proceduresmay present the optimal approach to extract TCM clinical data elements. Accordingly, rapid extraction of data elements using the data dictionary and existing databases in the current information system is possible. Meanwhile, we can analyze data element granularity can be analyzed through the current applicationto facilitate extraction of data elements that cover the full field of TCM clinical information to the greatest possible extent.

3. Construction of a Three-layer Model for Optimal Symptom Data Element Extraction

3.1 Sources of Symptom Information

TCM clinical symptoms form the basis of diagnosis and main content of clinical data collection. The two major queries on the specific clinical symptoms of Chinese medicine and their sourcesrequire resolving during extraction of data elements. TCM clinical symptoms can be classified into the three categories according to data sources:

(1) Ancient medical cases, traditional Chinese medicine classics and Chinese medicine diagnostic textbooks. Abundant symptoms have been reported from theseresources. For instance, TCM Diagnostics(edited by Professor Wen-Feng Zhu) has recorded 1373 symptoms [4]. Moreover, standardization of commonly used TCM Clinical Terms (edited by Jing-Bo LI and Li MA)has facilitated the compilation of 2069 common symptoms, with a definitionfor each symptom.

(2) National and industrial standards, such as the national terminology in Chinese Medicine Termswith456 documented symptoms. Terminology of TCM Clinical Diagnostics and Treatment • Disease Part (1997) contains a collection of 49 symptomatic terms [12].

(3) Clinical records and databases of Chinese medicine information systems. Symptoms obtained from these sources are also derived from national standards, textbooks or classics, but are closer to clinical application settings. The total number of symptoms is less than the first class,but does not entirely constitute a subset. Expression of symptoms in clinical records is affected by three factors: their descriptions in Western medicine, real practice in the clinical setting and the language habits of theindividual doctor recording the cases.

3.2 Methods of Extracting Symptom Data Elements and Construction of an Appropriate Model

From the sources of symptoms, it is not complete never mind which class is used for symptom data element extraction. Data elements,if extractedsolely from the clinical information system, are applicable for clinical use but not sufficient to cover the entire application area, and lack the descriptive characteristics of symptom definitions. If we extract data elements from other types, we need to consider the granularity of extraction andreal application scenarios.

From the extraction method, the first two categories of symptom data elements are more suitable forthe top-down approach whilethe bottom-up approach is better for the third category. Based on the above analyses, we propose that the overall protocol should take into account completeness and practicality to acquire a unified principle for the extraction as well as build a standard registration system for improving the standardization level of TCM clinical symptom data elements. To resolve this issue, we considered building a middle layer to combinethe top-down and bottom-up approaches, which is also convenient for data element registration [13]. Based on this theory, the extraction model of TCM symptom data elements was constructed as a three-layer structure (Fig. 3). The features of the novel model structure arecompared with those of top-down and bottom-up extraction methodsin Table 1.

Figure 3. The conceptual layer in the extraction model is from the basic theory of traditional Chinese medicine and TCM classics to obtain the concepts of symptoms, followed by screening of abstracts to extract the common data elements. The generic data element is independent of the particular application system. For example, in TCM terminology, symptoms and syndromes are related to disease position, which can be used as the corresponding attribute for both (in other words, a common data element). Application data elements can be derived from common data elements. For instance, headaches can be derived from the common data elements “headache position” and “headache characteristics”. Simultaneously, data elements can be extracted from the data dictionary, clinical case records and clinical database within the TCM information management system. Data elements extracted from specific information systems often correspond to particular application scenarios. A common data element can be obtained by generalization of the application data element. For instance, “dark red lip” in clinical case records can be generalized as the word “lip” and the color “dark red”. Thus, by employing the model shown in Figure 3, we can extract the data elements closer to practical application in the existing clinical information system while taking into account the degree of extraction to cover TCM clinical symptoms to the best possible extent.

4. Symptom data element extraction principles and classification

4.1. Principles of TCM Clinical Data Element Extraction

Phenomenasuch as unclear definitions, cases where the definition item in the symptom directly or indirectly contains something that is already defined andnon-uniform naming of symptoms exist in TCM clinical symptom terminology. For instance, the definition of leucorrhea in modern medicine is broader than that in traditional literature. Chest tightness in Chinese medicine is defined as “self-feeling of tightness or stuffiness in the chest”, which has a number of synonyms. Even more commonly, symptom descriptions can benon-uniform, for e.g., “eye itching” is alsotitled “eye itch” and “eye itchiness”. Therefore, during the extraction process, data elements extracted from authoritative guides are regarded the basic data elements and other synonyms are givenvia the data element attributes (Figure 2).

Integration of Chinese and Western medicine has facilitated significant extension of the TCM fourdiagnosticmethods,such asinclusion ofB-scan ultrasound and nuclear magnetic resonance. Objectification of the TCM diagnostic criteria has provided a novel basis for analytical development.For instance, pulse analysis deviceshave been developed to objectifypulse diagnoses. In the current setting, symptom data elements that have integrated western medicine diagnostics can,on the one hand, use standard western medicine data elements, andon the other hand,facilitate extractionvia reclassification in the TCM knowledge system.

Symptom information is an important basis for syndromedifferentiation or disease diagnosis.Therefore, during the process of extracting TCM symptom data elements, it is usually necessary to determine the attributes of the symptoms according to correlations with syndromedifferentiation or disease diagnosis and extract these attributes as symptom data elements. This method of extraction is conducive to establishment of semantic relationships between syndromes and symptoms or between symptoms. For e.g., to determine the characteristics associated with stool, it is necessary to understand the tight relationships between excretion of stool, its transportation, transformation functions of spleen and stomach, as well as liver dispersion and kidneyYang warming. Functional changes in the spleen, stomach and liver can affect the property color, smell, time, volume, defecation frequency and defecation feeling. Therefore, a combination of these attributes and the object class 'stool', which is syndrome-differentiated and disease diagnosis-related, can be extracted as a data element.

4.2.Classification Methods of TCM Clinical Symptoms

The most critical issue of the Top-down data extraction approach is symptom classification, which can be performed in several ways.

4.2.1.Classification by relevance to syndrome

Through construction of a semantic network model of symptoms and syndromes, we can develop a rich clinical diagnosis mode of TCM and construct a  clinical knowledge base to effectively guide diagnosis. However, clinical syndromes are complex and diverse. Moreover, symptoms and syndromesmay display a one-to-many relationship and vice versa, and no fixed modes are available to determine a specific syndrome. To determine a syndrome,a few main symptoms or a major symptom and multiple secondary symptoms can be utilized. For instance, to diagnose whether or not Qi stagnation exists in a chronic hepatitis B patient, we need to assess a combination of major symptoms, such as flank pain, emotional depression and irritability, and minor symptoms, such as sighing, belching, hiccups, abdominal distension and fullness of the ribs. The simultaneous occurrence of three major symptoms or combination of two major and any other two minor symptoms or five minor symptoms can be diagnosed as Qi stagnation [14]. However, this type of classification is complex and has significant redundancy, posing a considerable challenge in terms of information expression.

4.2.2.Classification by symptom characteristics

Classificationscan be based on the characteristics of the symptom itself, for instance, the position, property and degree of the symptom in question. In this case, pain position, property and degree are taken ascommon data elements, and concrete position, property and degree as representations of the three data elements. From the three common data elements, we can extract "chest pain degree grading codes", the name, definition, data type and representation format of which are presented in Table 2.The corresponding value of the data element is enumerable, with values from 1 to 4 (Table 3).

4.2.3.Classification by TCM clinical symptom acquisition

Data can be classified based on observation, smelling, asking and palpation, in line with TCM diagnostic practice of thousands of years. If classificationsare based on the manner of symptom collection, symptom stratification is simple. By performing sub-classification in accordance with the order of symptom occurrence and subordinate interrelationships, the semantic relationships among symptoms can be enriched.

4.3Extraction of Common and Basic Data Elements

In TCM terminology, several symptoms include contain basic elements, such as body part, cold, heat and color. For example, chest pain, chest tightness and chest heat are associated with the chest of the body, chest heat, palm heat and heart heat with heat, and face blue and nail blue with cyan color. Basic elementsin TCM symptoms havespecificclinical characteristics. In terms of data element definition, the physical body part, cold, heat, color are representative of the characteristics andhave their own specific representation, therefore they can be extracted as common data elements, which means, they only consist of attributes and representations. Combinations of common data elements, such as wildcard and basic data, can be developed for clinical application. Conversely, frequently used symptom data elements can sometimes be generalized into common data elementsin the clinical application process.

5. Conclusion

To achieve sharing of TCM clinical data and semantic interoperability, standardization of Chinese medicine terminology as well as data atomization is necessary. In this report, we put forward the use TCM clinical data elements to standardize clinical terminology in accordance with the requirements of the information system to solve the problem ofnon-uniformity due tohomonyms, synonyms and inappropriate correspondence between connotations and extensions. Simultaneously, the principles of TCM clinical symptom extraction were set according to the current situation regarding clinical information and applicability of extracted symptom data elements. We additionally analyzed the classification problems during extraction of symptom data elements based on TCM theory. Consequently, a three-layer model of data element extraction was proposed on the basis that while the TCM information system already exists, the terminology included is currently incomplete.

Extraction of TCM clinical symptom data elements is also based on the GB-T 18391.6-2009 Information Technology Metadata Registration System (MDR) and Health Information Data Element Standardization Rules. The extraction process needs to take into account the completeness of the basic theory and timeliness of clinical application to improve the scope of TCM practice.The establishment of TCM clinical symptom data elements has significantly promoted standardization of terminology and integration and sharing of clinical data, providing strong support for the incorporation of TCM into evidence-based clinical practice and research.

Competing interests

The authors declare no conflicts of interest.


  • 1. Shu-Ping HOU. The origin and development of disease Stagnation and academic contention. Information on Traditional Chinese Medicine 2008, 25(3):82-82.
  • 2. Nai-Li YAO. TCM symptom differential diagnostics. Beijing: People's Health Publishing House, 2003.
  • 3. Wen-Feng ZHU. TCM diagnostics. Beijing: China Press of Traditional Chinese Medicine,2007.
  • 4. Qi-Ming ZHANG. Traditional Chinese medicine symptomology. TCM Ancient Books Publishing House 2013.
  • 5. Chang-Di HU. comparative analysis for 3560 cases of vaginal inflammation tested by microscopy and five-linked methods. Chinese Journal of Microecology, 2012, (04): 357-358,360.
  • 6. Yi-Wen HONG. Discussion on the clinical significance of leucorrhea test. Asia-Pacific Traditional Medicine 2011,(11):198-199.
  • 7. Yang YANG, Yuan-Bai LI, Meng CUI. Investigation on establishment of TCM clinical term set. Chinese Journal of Information on Traditional Chinese Medicine, 2006, 13(12):105-105.
  • 8. Yu-Feng GUO, Bao-Yan LIU, Nai-Li YAO, et al. Exploration on the standardized feature elements of TCM clinical term set based on the SNOMED CT core architecture. Chinese Journal of Information on Traditional Chinese Medicine 2008, 15(9):96-97.
  • 9. Yan DONG. Construction of TCM clinical terminology based on the ontology. China Academy of Chinese Medical Sciences 2016 articles compile.
  • 10. Lin LIU, Xue-Zhong ZHOU, Xia-Ji ZHOU, et al. Modernization of Traditional Chinese Medicine and Materia Medica-World Science and Technology 2015(8):1634-1638. Overview of ontology research of international clinical phenotype and its problems. World Science and Technology - Modernization of Traditional Chinese Medicine, 2015 (8): 1634-1638.
  • 11. Jun-Feng YAN, Chang-Fa WEI, Qing-Ping LIU, et al. Ideas and methods fo the research on TCM clinical diagnosis data elements standard.  Journal of Medical Informatics, 2013, 34(8):43-45.
  • 12. CN-GB. Clinic terminology of traditional Chinese medical diagnosis and treatment—Disease, 1997.
  • 13. Yong-Tao LIU. Study on Data Element Standardization Model. Science &Technology Information, 2010 (030): 145-146.
  • 14. Yong-An YE, Shu-Ying RU. TCM syndrome elements diagnostic criteria for chronic hepatitis B (ALT ≥ 2 × ULN). Journal of Traditional Chinese Medicine 2014,20: 1799-1800.



CIS: clinical information system