This page has only limited features, please log in for full access.
How can we accurately classify feature-based data such that the learned model and results are more interpretable? Interpretability is beneficial in various perspectives, such as in checking for compliance with exiting knowledge and gaining insights from decision processes. To gain in both accuracy and interpretability, we propose a novel tree-structured classifier called Gaussian Soft Decision Trees (GSDT). GSDT is characterized by multi-branched structures, Gaussian mixture-based decisions, and a hinge loss with path regularization. The three key features make it learn short trees where the weight vector of each node is a prototype for data that mapped to the node. We show that GSDT results in the best average accuracy compared to eight baselines. We also perform an ablation study of the various structures of covariance matrix in the Gaussian mixture nodes in GSDT and demonstrate the interpretability of GSDT in a case study of classification in a breast cancer dataset.
Jaemin Yoo; Lee Sael. Gaussian Soft Decision Trees for Interpretable Feature-Based Classification. Transactions on Petri Nets and Other Models of Concurrency XV 2021, 143 -155.
AMA StyleJaemin Yoo, Lee Sael. Gaussian Soft Decision Trees for Interpretable Feature-Based Classification. Transactions on Petri Nets and Other Models of Concurrency XV. 2021; ():143-155.
Chicago/Turabian StyleJaemin Yoo; Lee Sael. 2021. "Gaussian Soft Decision Trees for Interpretable Feature-Based Classification." Transactions on Petri Nets and Other Models of Concurrency XV , no. : 143-155.
Jinhong Jung; Lee Sael. Correction to: Fast and accurate pseudoinverse with sparse matrix reordering and incremental approach. Machine Learning 2021, 1 -2.
AMA StyleJinhong Jung, Lee Sael. Correction to: Fast and accurate pseudoinverse with sparse matrix reordering and incremental approach. Machine Learning. 2021; ():1-2.
Chicago/Turabian StyleJinhong Jung; Lee Sael. 2021. "Correction to: Fast and accurate pseudoinverse with sparse matrix reordering and incremental approach." Machine Learning , no. : 1-2.
Jinhong Jung; Lee Sael. Correction to: Fast and accurate pseudoinverse with sparse matrix reordering and incremental approach. Machine Learning 2020, 110, 449 -449.
AMA StyleJinhong Jung, Lee Sael. Correction to: Fast and accurate pseudoinverse with sparse matrix reordering and incremental approach. Machine Learning. 2020; 110 (2):449-449.
Chicago/Turabian StyleJinhong Jung; Lee Sael. 2020. "Correction to: Fast and accurate pseudoinverse with sparse matrix reordering and incremental approach." Machine Learning 110, no. 2: 449-449.
Defects in residential building façades affect the structural integrity of buildings and degrade external appearances. Defects in a building façade are typically managed using manpower during maintenance. This approach is time-consuming, yields subjective results, and can lead to accidents or casualties. To address this, we propose a building façade monitoring system that utilizes an object detection method based on deep learning to efficiently manage defects by minimizing the involvement of manpower. The dataset used for training a deep-learning-based network contains actual residential building façade images. Various building designs in these raw images make it difficult to detect defects because of their various types and complex backgrounds. We employed the faster regions with convolutional neural network (Faster R-CNN) structure for more accurate defect detection in such environments, achieving an average precision (intersection over union (IoU) = 0.5) of 62.7% for all types of trained defects. As it is difficult to detect defects in a training environment, it is necessary to improve the performance of the network. However, the object detection network employed in this study yields an excellent performance in complex real-world images, indicating the possibility of developing a system that would detect defects in more types of building façades.
Kisu Lee; Goopyo Hong; Lee Sael; Sanghyo Lee; Ha Kim. MultiDefectNet: Multi-Class Defect Detection of Building Façade Based on Deep Convolutional Neural Network. Sustainability 2020, 12, 9785 .
AMA StyleKisu Lee, Goopyo Hong, Lee Sael, Sanghyo Lee, Ha Kim. MultiDefectNet: Multi-Class Defect Detection of Building Façade Based on Deep Convolutional Neural Network. Sustainability. 2020; 12 (22):9785.
Chicago/Turabian StyleKisu Lee; Goopyo Hong; Lee Sael; Sanghyo Lee; Ha Kim. 2020. "MultiDefectNet: Multi-Class Defect Detection of Building Façade Based on Deep Convolutional Neural Network." Sustainability 12, no. 22: 9785.
How can we compute the pseudoinverse of a sparse feature matrix efficiently and accurately for solving optimization problems? A pseudoinverse is a generalization of a matrix inverse, which has been extensively utilized as a fundamental building block for solving linear systems in machine learning. However, an approximate computation, let alone an exact computation, of pseudoinverse is very time-consuming due to its demanding time complexity, which limits it from being applied to large data. In this paper, we propose FastPI (Fast PseudoInverse), a novel incremental singular value decomposition (SVD) based pseudoinverse method for sparse matrices. Based on the observation that many real-world feature matrices are sparse and highly skewed, FastPI reorders and divides the feature matrix and incrementally computes low-rank SVD from the divided components. To show the efficacy of proposed FastPI, we apply them in real-world multi-label linear regression problems. Through extensive experiments, we demonstrate that FastPI computes the pseudoinverse faster than other approximate methods without loss of accuracy. Results imply that our method efficiently computes the low-rank pseudoinverse of a large and sparse matrix that other existing methods cannot handle with limited time and space.
Jinhong Jung; Lee Sael. Fast and accurate pseudoinverse with sparse matrix reordering and incremental approach. Machine Learning 2020, 109, 2333 -2347.
AMA StyleJinhong Jung, Lee Sael. Fast and accurate pseudoinverse with sparse matrix reordering and incremental approach. Machine Learning. 2020; 109 (12):2333-2347.
Chicago/Turabian StyleJinhong Jung; Lee Sael. 2020. "Fast and accurate pseudoinverse with sparse matrix reordering and incremental approach." Machine Learning 109, no. 12: 2333-2347.
How can we obtain fast and high-quality clusters in genome scale bio-networks? Graph clustering is a powerful tool applied on bio-networks to solve various biological problems such as protein complexes detection, disease module detection, and gene function prediction. Especially, MCL (Markov Clustering) has been spotlighted due to its superior performance on bio-networks. MCL, however, is skewed towards finding a large number of very small clusters (size 1-3) and fails to detect many larger clusters (size 10+). To resolve this fragmentation problem, MLR-MCL (Multi-level Regularized MCL) has been developed. MLR-MCL still suffers from the fragmentation and, in cases, unrealistically large clusters are generated. In this paper, we propose PS-MCL (Parallel Shotgun Coarsened MCL), a parallel graph clustering method outperforming MLR-MCL in terms of running time and cluster quality. PS-MCL adopts an efficient coarsening scheme, called SC (Shotgun Coarsening), to improve graph coarsening in MLR-MCL. SC allows merging multiple nodes at a time, which leads to improvement in quality, time and space usage. Also, PS-MCL parallelizes main operations used in MLR-MCL which includes matrix multiplication. Experiments show that PS-MCL dramatically alleviates the fragmentation problem, and outperforms MLR-MCL in quality and running time. We also show that the running time of PS-MCL is effectively reduced with parallelization.
Yongsub Lim; In-Jae Yu; Dongmin Seo; U Kang; Lee Sael. PS-MCL: parallel shotgun coarsened Markov clustering of protein interaction networks. BMC Bioinformatics 2019, 20, 1 -12.
AMA StyleYongsub Lim, In-Jae Yu, Dongmin Seo, U Kang, Lee Sael. PS-MCL: parallel shotgun coarsened Markov clustering of protein interaction networks. BMC Bioinformatics. 2019; 20 (13):1-12.
Chicago/Turabian StyleYongsub Lim; In-Jae Yu; Dongmin Seo; U Kang; Lee Sael. 2019. "PS-MCL: parallel shotgun coarsened Markov clustering of protein interaction networks." BMC Bioinformatics 20, no. 13: 1-12.
Given a large tensor, how can we decompose it to sparse core tensor and factor matrices such that it is easier to interpret the results? How can we do this without reducing the accuracy? Existing approaches either output dense results or give low accuracy. In this paper, we propose VeST, a tensor factorization method for partially observable data to output a very sparse core tensor and factor matrices. VeST performs initial decomposition, determines unimportant entries in the decomposition results, removes the unimportant entries, and carefully updates the remaining entries. To determine unimportant entries, we define and use entry-wise 'responsibility' for the decomposed results. The entries are updated iteratively in a coordinate descent manner in parallel for scalable computation. Extensive experiments show that our method VeST is at least 2.2 times more sparse and at least 2.8 times more accurate compared to competitors. Moreover, VeST is scalable in terms of input order, dimension, and the number of observable entries. Thanks to VeST, we successfully interpret the result of real-world tensor data based on the sparsity pattern of the resulting factor matrices.
Moonjeong Park; Jun-Gi Jang; Sael Lee. VeST: Very Sparse Tucker Factorization of Large-Scale Tensors. 2019, 1 .
AMA StyleMoonjeong Park, Jun-Gi Jang, Sael Lee. VeST: Very Sparse Tucker Factorization of Large-Scale Tensors. . 2019; ():1.
Chicago/Turabian StyleMoonjeong Park; Jun-Gi Jang; Sael Lee. 2019. "VeST: Very Sparse Tucker Factorization of Large-Scale Tensors." , no. : 1.
Given large-scale multi-dimensional data (e.g., (user, movie, time; rating) for movie recommendations), how can we extract latent concepts/relations of such data? Tensor factorization has been widely used to solve such problems with multi-dimensional data, which are modeled as tensors. However, most tensor factorization algorithms exhibit limited scalability and speed since they require huge memory and heavy computational costs while updating factor matrices. In this paper, we propose GTA, a general framework for Tucker factorization on heterogeneous platforms. GTA performs alternating least squares with a row-wise update rule in a fully parallel way, which significantly reduces memory requirements for updating factor matrices. Furthermore, GTA provides two algorithms: GTA-PART for partially observable tensors and GTA-FULL for fully observable tensors, both of which accelerate the update process using GPUs and CPUs. Experimental results show that GTA exhibits $5.6~44.6\times$ speed-up for large-scale tensors compared to the state-of-the-art. In addition, GTA scales near linearly with the number of GPUs and computing nodes used for experiments.
Sejoon Oh; Namyong Park; Jun-Gi Jang; Lee Sael; U. Kang. High-Performance Tucker Factorization on Heterogeneous Platforms. IEEE Transactions on Parallel and Distributed Systems 2019, 30, 2237 -2248.
AMA StyleSejoon Oh, Namyong Park, Jun-Gi Jang, Lee Sael, U. Kang. High-Performance Tucker Factorization on Heterogeneous Platforms. IEEE Transactions on Parallel and Distributed Systems. 2019; 30 (10):2237-2248.
Chicago/Turabian StyleSejoon Oh; Namyong Park; Jun-Gi Jang; Lee Sael; U. Kang. 2019. "High-Performance Tucker Factorization on Heterogeneous Platforms." IEEE Transactions on Parallel and Distributed Systems 30, no. 10: 2237-2248.
How do we integratively profile large-scale multi-platform genomic data that are high dimensional and sparse? Furthermore, how can we incorporate prior knowledge, such as the association between genes, in the analysis systematically to find better latent relationships? To solve this problem, we propose a Scalable Network Constrained Tucker decomposition method (SNeCT). SNeCT adopts parallel stochastic gradient descent approach on the proposed parallelizable network constrained optimization function. SNeCT decomposition is applied to a tensor constructed from a large scale multi-platform multi-cohort cancer data, PanCan12, constrained on a network built from PathwayCommons database. The decomposed factor matrices are applied to stratify cancers, to search for top- $k$ similar patients given a new patient, and to illustrate how the matrices can be used to identify significant genomic patterns in each patient. In the stratification test, combined twelve-cohort data is clustered to form thirteen subclasses. The similarity of the top- $k$ patient to the query was high for 23 clinical features, including estrogen/progesterone receptor statuses of BRCA patients with average precision value ranges from 0.72 to 0.86 and from 0.68 to 0.86, respectively. We also illustrate how the factor matrices can be used for identifying significant patterns for each patient. \\ Resources are available at: https://github.com/leesael/GIFT
Dongjin Choi; Sael Lee. SNeCT: Scalable Network Constrained Tucker Decomposition for Multi-Platform Data Profiling. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2019, 17, 1785 -1796.
AMA StyleDongjin Choi, Sael Lee. SNeCT: Scalable Network Constrained Tucker Decomposition for Multi-Platform Data Profiling. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2019; 17 (5):1785-1796.
Chicago/Turabian StyleDongjin Choi; Sael Lee. 2019. "SNeCT: Scalable Network Constrained Tucker Decomposition for Multi-Platform Data Profiling." IEEE/ACM Transactions on Computational Biology and Bioinformatics 17, no. 5: 1785-1796.
How can we find patterns and anomalies in a tensor, i.e., multi-dimensional array, in an efficient and directly interpretable way? How can we do this in an online environment, where a new tensor arrives at each time step? Finding patterns and anomalies in multi-dimensional data have many important applications, including building safety monitoring, health monitoring, cyber security, terrorist detection, and fake user detection in social networks. Standard tensor decomposition results are not directly interpretable and few methods that propose to increase interpretability need to be made faster, more memory efficient, and more accurate for large and quickly generated data in the online environment. We propose two versions of a fast, accurate, and directly interpretable tensor decomposition method we call CTD that is based on efficient sampling method. First is the static version of CTD, i.e., CTD-S, that provably guarantees up to 11× higher accuracy than that of the state-of-the-art method. Also, CTD-S is made up to 2.3× faster and up to 24× more memory-efficient than the state-of-the-art method by removing redundancy. Second is the dynamic version of CTD, i.e. CTD-D, which is the first interpretable dynamic tensor decomposition method ever proposed. It is also made up to 82× faster than the already fast CTD-S by exploiting factors at previous time step and by reordering operations. With CTD, we demonstrate how the results can be effectively interpreted in online distributed denial of service (DDoS) attack detection and online troll detection.
Jungwoo Lee; Dongjin Choi; Lee Sael. CTD: Fast, accurate, and interpretable method for static and dynamic tensor decompositions. PLOS ONE 2018, 13, e0200579 .
AMA StyleJungwoo Lee, Dongjin Choi, Lee Sael. CTD: Fast, accurate, and interpretable method for static and dynamic tensor decompositions. PLOS ONE. 2018; 13 (7):e0200579.
Chicago/Turabian StyleJungwoo Lee; Dongjin Choi; Lee Sael. 2018. "CTD: Fast, accurate, and interpretable method for static and dynamic tensor decompositions." PLOS ONE 13, no. 7: e0200579.
Sungchul Kim; Sael Lee; Hwanjo Yu. Indexing methods for efficient protein 3D surface search. Proceedings of the ACM sixth international workshop on Data and text mining in biomedical informatics - DTMBIO '12 2012, 1 .
AMA StyleSungchul Kim, Sael Lee, Hwanjo Yu. Indexing methods for efficient protein 3D surface search. Proceedings of the ACM sixth international workshop on Data and text mining in biomedical informatics - DTMBIO '12. 2012; ():1.
Chicago/Turabian StyleSungchul Kim; Sael Lee; Hwanjo Yu. 2012. "Indexing methods for efficient protein 3D surface search." Proceedings of the ACM sixth international workshop on Data and text mining in biomedical informatics - DTMBIO '12 , no. : 1.