This page has only limited features, please log in for full access.
In recent years, transportation system safety analysis has become increasingly challenging and highly demanding. Unstructured data contain sufficient information from which inherent interactions can be extracted. Determining how to process and fuse a large amount of unstructured data is a challenging task. In this paper, we propose a text-based Bayesian network (TBN) method to establish a Bayesian network (BN) based on text records, where the BN’s arcs are obtained from barrier relationships identified by a graphical model and its prior probabilities stem from fault trees. The comparative experimental results illustrate that the text-based method in TBN is efficient. The precision, recall and F-measure of TBN are 8.64%, 10.70% and 9.84% higher, respectively, than the most frequent (MF) result. Moreover, compared to the traditional BN, whose prior probabilities are frequently acquired from experts, the prior probabilities of the proposed text-based BN (TBN) have a high confidence. The experimental results of a train derailment accident case study show that with changes in the train derailment probabilities and the safety potentials of the barriers, the TBN generates quantitative results and reveals the critical risks of derailment accidents. Additionally, this work demonstrates relevant nonlinear relationships to improve the assessment results. Therefore, based on text-based data, this study reveals that barrier safety analysis has the potential to identify high-risk barriers, which can guide managers to enhance these barriers.
Liu Yang; Keping Li; Guozheng Song; Faisal Khan. Dynamic Railway Derailment Risk Analysis with Text-Data-Based Bayesian Network. Applied Sciences 2021, 11, 994 .
AMA StyleLiu Yang, Keping Li, Guozheng Song, Faisal Khan. Dynamic Railway Derailment Risk Analysis with Text-Data-Based Bayesian Network. Applied Sciences. 2021; 11 (3):994.
Chicago/Turabian StyleLiu Yang; Keping Li; Guozheng Song; Faisal Khan. 2021. "Dynamic Railway Derailment Risk Analysis with Text-Data-Based Bayesian Network." Applied Sciences 11, no. 3: 994.
The rapidly developing internet and other media have produced a tremendous amount of text data, making it a challenging and valuable task to find a more effective way to analyze text data by machine. Text representation is the first step for a machine to understand the text, and the commonly used text representation method is the Bag-of-Words (BoW) model. To form the vector representation of a document, the BoW model separately matches and counts each element in the document, neglecting much correlation information among words. In this paper, we propose a network-based bag-of-words model, which collects high-level structural and semantic meaning of the words. Because the structural and semantic information of a network reflects the relationship between nodes, the proposed model can distinguish the relation of words. We apply the proposed model to text classification and compare the performance of the proposed model with different text representation methods on four document datasets. The results show that the proposed method achieves the best performance with high efficiency. Using the Eccentricity property of the network as features can get the highest accuracy. We also investigate the influence of different network structures in the proposed method. Experimental results reveal that, for text classification, the dynamic network is more suitable than the static network and the hybrid network.
Dongyang Yan; Keping Li; Shuang Gu; Liu Yang. Network-Based Bag-of-Words Model for Text Classification. IEEE Access 2020, 8, 82641 -82652.
AMA StyleDongyang Yan, Keping Li, Shuang Gu, Liu Yang. Network-Based Bag-of-Words Model for Text Classification. IEEE Access. 2020; 8 (99):82641-82652.
Chicago/Turabian StyleDongyang Yan; Keping Li; Shuang Gu; Liu Yang. 2020. "Network-Based Bag-of-Words Model for Text Classification." IEEE Access 8, no. 99: 82641-82652.
Root cause identification is an important task in providing prompt assistance for diagnosis, security monitoring and guidance for specific routine maintenance measures in the field of railway transportation. However, most of the methods addressing rail faults are based on state detection, which involves structured data. Manual cause identification from railway equipment maintenance and management text records is undoubtedly a time-consuming and laborious task. To quickly obtain the root cause text from unstructured data, this paper proposes an approach for root cause factor identification by using a root cause identification-new word sentence (RCI-NWS) keyword extraction method. The experimental results demonstrate that the extraction of railway fault text data can be performed using the keyword extraction method and the highest values are obtained using RCI-NWS.
Liu Yang; Keping Li; Dan Zhao; Shuang Gu; Dongyang Yan. A Network Method for Identifying the Root Cause of High-Speed Rail Faults Based on Text Data. Energies 2019, 12, 1908 .
AMA StyleLiu Yang, Keping Li, Dan Zhao, Shuang Gu, Dongyang Yan. A Network Method for Identifying the Root Cause of High-Speed Rail Faults Based on Text Data. Energies. 2019; 12 (10):1908.
Chicago/Turabian StyleLiu Yang; Keping Li; Dan Zhao; Shuang Gu; Dongyang Yan. 2019. "A Network Method for Identifying the Root Cause of High-Speed Rail Faults Based on Text Data." Energies 12, no. 10: 1908.
Text keywords are defined as meaningful and important words in a document, which provide a precise overview of its content and reflect the author’s writing intention. Keyword extraction methods have received a lot of attentions, among which is the network-based method. However, existing network-based keyword extraction methods only consider the connections between words in a document, while ignoring the impact of sentences. Since a sentence is made of many words, while words affect one another in a sentence, neglecting the influence of sentences will result in the loss of information. In this paper, we introduce a word network whose nodes represent words in a document, and define that any keyword extraction method based on a word network is called as a Word-net method. Then, we propose a new network model which considers the influence of sentences, and a new word-sentence method based on the new model. Experimental results demonstrate that our method outperforms the Word-net method, the classical term frequency-inverse document frequency (TF-IDF) method, most frequent method and TextRank method. The precision, recall, and F-measure of our result are respectively 7.95, 8.27 and 6.54% higher than the Word-net result, and the average precision of our result is 17.56% higher than the TF-IDF result. A two-way analysis of variance is employed to validate the empirical analysis, which indicates that keyword extraction methods and keyword numbers have statistically significant effects on the evaluation of metric values.
Liu Yang; Keping Li; Hangfei Huang. A new network model for extracting text keywords. Scientometrics 2018, 116, 339 -361.
AMA StyleLiu Yang, Keping Li, Hangfei Huang. A new network model for extracting text keywords. Scientometrics. 2018; 116 (1):339-361.
Chicago/Turabian StyleLiu Yang; Keping Li; Hangfei Huang. 2018. "A new network model for extracting text keywords." Scientometrics 116, no. 1: 339-361.