This page has only limited features, please log in for full access.
recently, the compacted de Bruijn graph (cDBG) of complete genome sequences was successfully used in read mapping due to its ability to deal with the repetitions in genomes. However, current approaches are not flexible enough to fit frequently building the graphs with different k-mer lengths. Instead of building the graph directly, how can we build the compacted de Bruijin graph of longer k-mer based on the one of short k-mer In this article, we present StLiter, a novel algorithm to build the compacted de Bruijn graph either directly from genome sequences or indirectly based on the graph of a short k-mer. For 100 simulated human genomes, StLiter can construct the graph of k-mer length 15-18 in 2.5-3.2 hours with maximal ~70GB memory in the case of without considering the reverese complements of the reference genomes. And it costs 4.5-5.9 hours when considering the reverse complements. In experiments, we compared StLiter with TwoPaCo, the state-of-art method for building the graph, on 4 datasets. For k-mer length 15-18, StLiter can build the graph 5-9 times faster than TwoPaCo. For k-mer length larger than 18, given the graph of a short (k-x)-mer, such as x=1-2, StLiter can also build the graph more efficiently.
Changong Yu; Keming Mao; Yuhai Zhao; Cheng Chang; Guoren Wang. StLiter: A Novel Algorithm to Iteratively Build the Compacted de Bruijn Graph from Many Complete Genomes. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2021, PP, 1 -1.
AMA StyleChangong Yu, Keming Mao, Yuhai Zhao, Cheng Chang, Guoren Wang. StLiter: A Novel Algorithm to Iteratively Build the Compacted de Bruijn Graph from Many Complete Genomes. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2021; PP (99):1-1.
Chicago/Turabian StyleChangong Yu; Keming Mao; Yuhai Zhao; Cheng Chang; Guoren Wang. 2021. "StLiter: A Novel Algorithm to Iteratively Build the Compacted de Bruijn Graph from Many Complete Genomes." IEEE/ACM Transactions on Computational Biology and Bioinformatics PP, no. 99: 1-1.
Density Peaks (DP) Clustering organizes data into clusters by finding peaks in dense regions. This involves computing density (ρ) and distance (σ) of every point. As such, though DP-based schemes have been very effective in producing high quality clusters, their complexity is ${O(N^{2})}$ where N is the number of data points. In this paper, we propose a fast distributed density peaks clustering algorithm, FDDP, based on the z-value index. In FDDP, we first employ the z-value index to map multi-dimensional data points into one-dimensional space and then range-partition the data according to the z-value to balance the load across the processing nodes. We ensure minimal overlapping range to handle computations at the boundary points. We also propose a σ calculation algorithm, FC which facilitates a forward computing strategy to calculate ρ linearly. Additionally, we propose another FC computation algorithm, CB which using a caching and efficient searching strategy to calculate ρ. Moreover, FDDP is able to reduce the time complexity from ${O(N^{2})}$ to ${O(N\cdot log(N))}$ . We provide a theoretical analysis of FDDP and evaluated FDDP empirically. Our experimental results show that FDDP outperforms the state-of-art algorithms significantly.
Jing Lu; Yuhai Zhao; Kian-Lee Tan; Zhengkui Wang. Distributed Density Peaks Clustering Revisited. IEEE Transactions on Knowledge and Data Engineering 2020, PP, 1 -1.
AMA StyleJing Lu, Yuhai Zhao, Kian-Lee Tan, Zhengkui Wang. Distributed Density Peaks Clustering Revisited. IEEE Transactions on Knowledge and Data Engineering. 2020; PP (99):1-1.
Chicago/Turabian StyleJing Lu; Yuhai Zhao; Kian-Lee Tan; Zhengkui Wang. 2020. "Distributed Density Peaks Clustering Revisited." IEEE Transactions on Knowledge and Data Engineering PP, no. 99: 1-1.
Recently, Multi-Graph Learning was proposed as the extension of Multi-Instance Learning and has achieved some successes. However, to the best of our knowledge, currently, there is no study working on Multi-Graph Multi-Label Learning, where each object is represented as a bag containing a number of graphs and each bag is marked with multiple class labels. It is an interesting problem existing in many applications, such as image classification, medicinal analysis and so on. In this paper, we propose an innovate algorithm to address the problem. Firstly, it uses more precise structures, multiple Graphs, instead of Instances to represent an image so that the classification accuracy could be improved. Then, it uses multiple labels as the output to eliminate the semantic ambiguity of the image. Furthermore, it calculates the entropy to mine the informative subgraphs instead of just mining the frequent subgraphs, which enables selecting the more accurate features for the classification. Lastly, since the current algorithms cannot directly deal with graph-structures, we degenerate the Multi-Graph Multi-Label Learning into the Multi-Instance Multi-Label Learning in order to solve it by MIML-ELM (Improving Multi-Instance Multi-Label Learning by Extreme Learning Machine). The performance study shows that our algorithm outperforms the competitors in terms of both effectiveness and efficiency.
Zixuan Zhu; Yuhai Zhao. Multi-Graph Multi-Label Learning Based on Entropy. Entropy 2018, 20, 245 .
AMA StyleZixuan Zhu, Yuhai Zhao. Multi-Graph Multi-Label Learning Based on Entropy. Entropy. 2018; 20 (4):245.
Chicago/Turabian StyleZixuan Zhu; Yuhai Zhao. 2018. "Multi-Graph Multi-Label Learning Based on Entropy." Entropy 20, no. 4: 245.
Multi-instance multi-label learning is a learning framework, where every object is represented by a bag of instances and associated with multiple labels simultaneously. The existing degeneration strategy-based methods often suffer from some common drawbacks: (1) the user-specific parameter for the number of clusters may incur the effective problem; (2) SVM may bring a high computational cost when utilized as the classifier builder. In this paper, we propose an algorithm, namely multi-instance multi-label (MIML)-extreme learning machine (ELM), to address the problems. To our best knowledge, we are the first to utilize ELM in the MIML problem and to conduct the comparison of ELM and SVM on MIML. Extensive experiments have been conducted on real datasets and synthetic datasets. The results show that MIMLELM tends to achieve better generalization performance at a higher learning speed.
Ying Yin; Yuhai Zhao; Chengguang Li; Bin Zhang. Improving Multi-Instance Multi-Label Learning by Extreme Learning Machine. Applied Sciences 2016, 6, 160 .
AMA StyleYing Yin, Yuhai Zhao, Chengguang Li, Bin Zhang. Improving Multi-Instance Multi-Label Learning by Extreme Learning Machine. Applied Sciences. 2016; 6 (6):160.
Chicago/Turabian StyleYing Yin; Yuhai Zhao; Chengguang Li; Bin Zhang. 2016. "Improving Multi-Instance Multi-Label Learning by Extreme Learning Machine." Applied Sciences 6, no. 6: 160.