This page has only limited features, please log in for full access.
Project Goal: Development of a framework for automation of data quality improvement and securing fairness
Current Stage: idea
Project Goal: Development of automated textual data augmentation algorithm with 3-order tensor space model
Current Stage: idea
Project Goal: Development of a data support framework for generating deep learning model
Current Stage: idea
Recently, novelty detection with reconstruction along projection pathway (RaPP) has made progress toward leveraging hidden activation values. RaPP compares the input and its autoencoder reconstruction in hidden spaces to detect novelty samples. Nevertheless, traditional autoencoders have not yet begun to fully exploit this method. In this paper, we propose a new model, the Extended Autoencoder Model, that adds an adversarial component to the autoencoder to take full advantage of RaPP. The adversarial component matches the latent variables of the reconstructed input to the latent variables of the original input to detect novelty samples with high hidden reconstruction errors. The proposed model can be combined with variants of the autoencoder, such as a variational autoencoder or adversarial autoencoder. The effectiveness of the proposed model was evaluated across various novelty detection datasets. Our results demonstrated that extended autoencoders are capable of outperforming conventional autoencoders in detecting novelties using the RaPP method.
Seung Yeop Shin; Han-Joon Kim. Extended Autoencoder for Novelty Detection with Reconstruction along Projection Pathway. Applied Sciences 2020, 10, 4497 .
AMA StyleSeung Yeop Shin, Han-Joon Kim. Extended Autoencoder for Novelty Detection with Reconstruction along Projection Pathway. Applied Sciences. 2020; 10 (13):4497.
Chicago/Turabian StyleSeung Yeop Shin; Han-Joon Kim. 2020. "Extended Autoencoder for Novelty Detection with Reconstruction along Projection Pathway." Applied Sciences 10, no. 13: 4497.
Researchers frequently use visualizations such as scatter plots when trying to understand how random variables are related to each other, because a single image represents numerous pieces of information. Dependency measures have been widely used to automatically detect dependencies, but these measures only take into account a few types of data, such as the strength and direction of the dependency. Based on advances in the applications of deep learning to vision, we believe that convolutional neural networks (CNNs) can come to understand dependencies by analyzing visualizations, as humans do. In this paper, we propose a method that uses CNNs to extract dependency representations from 2D histograms. We carried out three sorts of experiments and found that CNNs can learn from visual representations. In the first experiment, we used a synthetic dataset to show that CNNs can perfectly classify eight types of dependency. Then, we showed that CNNs can predict correlations based on 2D histograms of real datasets and visualize the learned dependency representation space. Finally, we applied our method and demonstrated that it performs better than the AutoLearn feature generation algorithm in terms of average classification accuracy, while generating half as many features.
Taejun Kim; Han-Joon Kim; Kim. Dependence Representation Learning with Convolutional Neural Networks and 2D Histograms. Applied Sciences 2020, 10, 955 .
AMA StyleTaejun Kim, Han-Joon Kim, Kim. Dependence Representation Learning with Convolutional Neural Networks and 2D Histograms. Applied Sciences. 2020; 10 (3):955.
Chicago/Turabian StyleTaejun Kim; Han-Joon Kim; Kim. 2020. "Dependence Representation Learning with Convolutional Neural Networks and 2D Histograms." Applied Sciences 10, no. 3: 955.
Fraud detection is becoming an integral part of business intelligence, as detecting fraud in the work processes of a company is of great value. Fraud is an inhibitory factor to accurate appraisal in the evaluation of an enterprise, and it is economically a loss factor to business. Previous studies for fraud detection have limited the performance enhancement because they have learned the fraud pattern of the whole data. This paper proposes a novel method using hierarchical clusters based on deep neural networks in order to detect more detailed frauds, as well as frauds of whole data in the work processes of job placement. The proposed method, Hierarchical Clusters-based Deep Neural Networks (HC-DNN) utilizes anomaly characteristics of hierarchical clusters pre-trained through an autoencoder as the initial weights of deep neural networks to detect various frauds. HC-DNN has the advantage of improving the performance and providing the explanation about the relationship of fraud types. As a result of evaluating the performance of fraud detection by cross validation, the results of the proposed method show higher performance than those of conventional methods. And from the viewpoint of explainable deep learning the hierarchical cluster structure constructed through HC-DNN can represent the relationship of fraud types.
Jeongrae Kim; Han-Joon Kim; Hyoungrae Kim. Fraud detection for job placement using hierarchical clusters-based deep neural networks. Applied Intelligence 2019, 49, 2842 -2861.
AMA StyleJeongrae Kim, Han-Joon Kim, Hyoungrae Kim. Fraud detection for job placement using hierarchical clusters-based deep neural networks. Applied Intelligence. 2019; 49 (8):2842-2861.
Chicago/Turabian StyleJeongrae Kim; Han-Joon Kim; Hyoungrae Kim. 2019. "Fraud detection for job placement using hierarchical clusters-based deep neural networks." Applied Intelligence 49, no. 8: 2842-2861.
This paper suggests a novel way of dramatically improving the Naïve Bayes text classifier with our semantic tensor space model for document representation. In our work, we intend to achieve a perfect text classification with the semantic Naïve Bayes learning that incorporates the semantic concept features into term feature statistics; for this, the Naïve Bayes learning is semantically augmented under the tensor space model where the ‘concept’ space is regarded as an independent space equated with the ‘term’ and ‘document’ spaces, and it is produced with concept-level informative Wikipedia pages associated with a given document corpus. Through extensive experiments using three popular document corpora including Reuters-21578, 20Newsgroups, and OHSUMED corpora, we prove that the proposed method not only has superiority over the recent deep learning-based classification methods but also shows nearly perfect classification performance.
Han-Joon Kim; Jiyun Kim; Jinseog Kim; Pureum Lim. Towards perfect text classification with Wikipedia-based semantic Naïve Bayes learning. Neurocomputing 2018, 315, 128 -134.
AMA StyleHan-Joon Kim, Jiyun Kim, Jinseog Kim, Pureum Lim. Towards perfect text classification with Wikipedia-based semantic Naïve Bayes learning. Neurocomputing. 2018; 315 ():128-134.
Chicago/Turabian StyleHan-Joon Kim; Jiyun Kim; Jinseog Kim; Pureum Lim. 2018. "Towards perfect text classification with Wikipedia-based semantic Naïve Bayes learning." Neurocomputing 315, no. : 128-134.
This study uses text and data mining to investigate the relationship between the text patterns of annual reports published by US listed companies and sales performance. Taking previous research a step further, although annual reports show only past and present financial information, analyzing text content can identify sentences or patterns that indicate the future business performance of a company. First, we examine the relation pattern between business risk factors and current business performance. For this purpose, we select companies belonging to two categories of US SIC (Standard Industry Classification) in the IT sector, 7370 and 7373, which include Twitter, Facebook, Google, Yahoo, etc. We manually collect sales and business risk information for a total of 54 companies that submitted an annual report (Form 10-K) for the last three years in these two categories. To establish a correlation between patterns of text and sales performance, four hypotheses were set and tested. To verify the hypotheses, statistical analysis of sales, statistical analysis of text sentences, sentiment analysis of sentences, clustering, dendrogram visualization, keyword extraction, and word-cloud visualization techniques are used. The results show that text length has some correlation with sales performance, and that patterns of frequently appearing words are correlated with the sales performance. However, a sentiment analysis indicates that the positive or negative tone of a report is not related to sales performance.
Bangrae Lee; Jun-Hwan Park; LeenaM Kwon; Young-Ho Moon; Youngho Shin; Gyuseok Kim; Han-Joon Kim. About relationship between business text patterns and financial performance in corporate data. Journal of Open Innovation: Technology, Market, and Complexity 2018, 4, 1 -18.
AMA StyleBangrae Lee, Jun-Hwan Park, LeenaM Kwon, Young-Ho Moon, Youngho Shin, Gyuseok Kim, Han-Joon Kim. About relationship between business text patterns and financial performance in corporate data. Journal of Open Innovation: Technology, Market, and Complexity. 2018; 4 (1):1-18.
Chicago/Turabian StyleBangrae Lee; Jun-Hwan Park; LeenaM Kwon; Young-Ho Moon; Youngho Shin; Gyuseok Kim; Han-Joon Kim. 2018. "About relationship between business text patterns and financial performance in corporate data." Journal of Open Innovation: Technology, Market, and Complexity 4, no. 1: 1-18.
Global competition has increased the importance of patents as a means to protect and strengthen technology and competitiveness. The purposes of our study were to identify what industries in South Korea are strong or weak in terms of patent applications and to identify some strategies to enable weak industries to become strong. For this, we gathered statistics on seven variables as follows: number of businesses, number of employees, research and development investment, number of full-time equivalent researchers, number of research institutions, domestic market size, and number of patent applications. Especially, to compare the ratio of patent applications and the ratio of domestic market size across industries, the industries were classified into the following three categories: strong-, weak-, and no-patent. Furthermore, data envelopment analysis (DEA) suggested some strategies to strengthen patent applications for each industry. In the DEA analysis, the number of patent applications was used as the output variable and the other six variables were used as input variables. Our study will particularly assist industries where protection by patents is an important aspect of their businesses.
Bangrae Lee; Dongkyu Won; Jun-Hwan Park; LeenaM Kwon; Young-Ho Moon; Han-Joon Kim. Patent-Enhancing Strategies by Industry in Korea Using a Data Envelopment Analysis. Sustainability 2016, 8, 901 .
AMA StyleBangrae Lee, Dongkyu Won, Jun-Hwan Park, LeenaM Kwon, Young-Ho Moon, Han-Joon Kim. Patent-Enhancing Strategies by Industry in Korea Using a Data Envelopment Analysis. Sustainability. 2016; 8 (9):901.
Chicago/Turabian StyleBangrae Lee; Dongkyu Won; Jun-Hwan Park; LeenaM Kwon; Young-Ho Moon; Han-Joon Kim. 2016. "Patent-Enhancing Strategies by Industry in Korea Using a Data Envelopment Analysis." Sustainability 8, no. 9: 901.