This page has only limited features, please log in for full access.
Owing to the complexity involved in training an agent in a real-time environment, e.g., using the Internet of Things (IoT), reinforcement learning (RL) using a deep neural network, i.e., deep reinforcement learning (DRL) has been widely adopted on an online basis without prior knowledge and complicated reward functions. DRL can handle a symmetrical balance between bias and variance—this indicates that the RL agents are competently trained in real-world applications. The approach of the proposed model considers the combinations of basic RL algorithms with online and offline use based on the empirical balances of bias–variance. Therefore, we exploited the balance between the offline Monte Carlo (MC) technique and online temporal difference (TD) with on-policy (state-action–reward-state-action, Sarsa) and an off-policy (Q-learning) in terms of a DRL. The proposed balance of MC (offline) and TD (online) use, which is simple and applicable without a well-designed reward, is suitable for real-time online learning. We demonstrated that, for a simple control task, the balance between online and offline use without an on- and off-policy shows satisfactory results. However, in complex tasks, the results clearly indicate the effectiveness of the combined method in improving the convergence speed and performance in a deep Q-network.
ChaYoung Kim. Deep Reinforcement Learning by Balancing Offline Monte Carlo and Online Temporal Difference Use Based on Environment Experiences. Symmetry 2020, 12, 1685 .
AMA StyleChaYoung Kim. Deep Reinforcement Learning by Balancing Offline Monte Carlo and Online Temporal Difference Use Based on Environment Experiences. Symmetry. 2020; 12 (10):1685.
Chicago/Turabian StyleChaYoung Kim. 2020. "Deep Reinforcement Learning by Balancing Offline Monte Carlo and Online Temporal Difference Use Based on Environment Experiences." Symmetry 12, no. 10: 1685.
The current study seeks to identify variables that affect the career decision-making of high school graduates with respect to the choice of university (re-)entrance in South Korea where education has great importance as a tool for self-cultivation and social prestige. For pattern recognition, we adopted a support vector machine with recursive feature elimination (SVM-RFE) with a big-data of survey of Korean college candidates. Based on the SVM-RFE analysis results, new enrollers were mostly affected by the mesosystems of interactions with parents, while re-enrollers were affected by the macrosystems of social awareness as well as individual estimates of talent and aptitude of individual systems. By predicting the variables that affect the high school graduates’ preparation for university re-entrance, some survey questions provide information on why they make the university choice based on interactions with their parents or acquaintances. Along with these empirical results, implications for future research are also presented.
Taejung Park; ChaYoung Kim. Predicting the Variables That Determine University (Re-)Entrance as a Career Development Using Support Vector Machines with Recursive Feature Elimination: The Case of South Korea. Sustainability 2020, 12, 7365 .
AMA StyleTaejung Park, ChaYoung Kim. Predicting the Variables That Determine University (Re-)Entrance as a Career Development Using Support Vector Machines with Recursive Feature Elimination: The Case of South Korea. Sustainability. 2020; 12 (18):7365.
Chicago/Turabian StyleTaejung Park; ChaYoung Kim. 2020. "Predicting the Variables That Determine University (Re-)Entrance as a Career Development Using Support Vector Machines with Recursive Feature Elimination: The Case of South Korea." Sustainability 12, no. 18: 7365.
강화학습의 액터-크리틱은 정책 그라디언트 알고리즘으로써, 매우 활용도가 높다. 전형적인 제어-기반 프로그램 모델의 설계에 있어서 뉴럴 네트웍에 기반한 도메인 지식의 접목은 해당 액터-크리틱의 장점을 이용할 수 있다. 그리하여, 액터-크리틱과 뉴럴 네트웍의 AI(인공지능) 기술을 접목한 프로그램 모델 기술에 대한 연구가 활발하다. 기존 연구들에서는 CNN을 주로 사용하고, 더 복잡하고 지능적인 것을 원하는 경우, LSTM을 사용하기도 한다. 하지만, 컴퓨팅 파워를 많이 요구하는 실세계와 연관된 분야에서는 조금 더 작은 자원을 사용하는 것을 선호한다. 그리하여, 본 연구는 비대칭 딥 오토앤코더를 가진 액터-크리틱을 에이전트에 적용하여, 환경에 대한 리소스를 가장 기본적으로 활용하면서 최적의 보상을 얻을 수 있도록 한다. 단순한 딥 뉴럴 네트웍을 가진 기존의 알고리즘과 본 연구가 제안하는 알고리즘의 비교에서 적은 자원으로 본 모델이 조금 더 나은 리워드(Reward)를 받는 것을 보여주고 있다.
ChaYoung Kim. Scheme of a Classic Control-Based Program Model with Non-Symmetric Deep Auto-Encoder of Actor-Critic. The Journal of Korean Institute of Information Technology 2020, 18, 15 -20.
AMA StyleChaYoung Kim. Scheme of a Classic Control-Based Program Model with Non-Symmetric Deep Auto-Encoder of Actor-Critic. The Journal of Korean Institute of Information Technology. 2020; 18 (7):15-20.
Chicago/Turabian StyleChaYoung Kim. 2020. "Scheme of a Classic Control-Based Program Model with Non-Symmetric Deep Auto-Encoder of Actor-Critic." The Journal of Korean Institute of Information Technology 18, no. 7: 15-20.
In terms of deep reinforcement learning (RL), exploration is highly significant in achieving better generalization. In benchmark studies, ε-greedy random actions have been used to encourage exploration and prevent over-fitting, thereby improving generalization. Deep RL with random ε-greedy policies, such as deep Q-networks (DQNs), can demonstrate efficient exploration behavior. A random ε-greedy policy exploits additional replay buffers in an environment of sparse and binary rewards, such as in the real-time online detection of network securities by verifying whether the network is “normal or anomalous.” Prior studies have illustrated that a prioritized replay memory attributed to a complex temporal difference error provides superior theoretical results. However, another implementation illustrated that in certain environments, the prioritized replay memory is not superior to the randomly-selected buffers of random ε-greedy policy. Moreover, a key challenge of hindsight experience replay inspires our objective by using additional buffers corresponding to each different goal. Therefore, we attempt to exploit multiple random ε-greedy buffers to enhance explorations for a more near-perfect generalization with one original goal in off-policy RL. We demonstrate the benefit of off-policy learning from our method through an experimental comparison of DQN and a deep deterministic policy gradient in terms of discrete action, as well as continuous control for complete symmetric environments.
ChaYoung Kim; Jisu Park; Kim; Park. Exploration with Multiple Random ε-Buffers in Off-Policy Deep Reinforcement Learning. Symmetry 2019, 11, 1352 .
AMA StyleChaYoung Kim, Jisu Park, Kim, Park. Exploration with Multiple Random ε-Buffers in Off-Policy Deep Reinforcement Learning. Symmetry. 2019; 11 (11):1352.
Chicago/Turabian StyleChaYoung Kim; Jisu Park; Kim; Park. 2019. "Exploration with Multiple Random ε-Buffers in Off-Policy Deep Reinforcement Learning." Symmetry 11, no. 11: 1352.
Because of the increasing application of reinforcement learning (RL), particularly deep Q-learning algorithm, research organizations utilize it with increasing frequency. The prediction of cyber vulnerability and development of efficient real-time online network intrusion detection (NID) systems are progressions toward becoming RL-powered. An open issues in NID is the model design and prediction of real-time online data composed of a series of time-related feature patterns. There have been concerns regarding the operation of the developed systems because cyber-attack scenarios vary continuously to circumvent NID. These issues have been related to the human interaction significance and the decrease in accuracy verification. Therefore, we employ an RL that permits a deep auto-encoder in the Q-network (DAEQ-N). The proposed DAEQ-N attempts to achieve the maximum prediction accuracy in online learning systems into which continuous behavior patterns are fed and which are trained with more significant weights by classifying it as either “normal” or “anomalous.”
ChaYoung Kim; Jisu Park. Designing online network intrusion detection using deep auto-encoder Q-learning. Computers & Electrical Engineering 2019, 79, 106460 .
AMA StyleChaYoung Kim, Jisu Park. Designing online network intrusion detection using deep auto-encoder Q-learning. Computers & Electrical Engineering. 2019; 79 ():106460.
Chicago/Turabian StyleChaYoung Kim; Jisu Park. 2019. "Designing online network intrusion detection using deep auto-encoder Q-learning." Computers & Electrical Engineering 79, no. : 106460.
ChaYoung Kim. Learning Less Random to Learn Better in Deep Reinforcement Learning with Noisy Parameters. JOURNAL OF ADVANCED INFORMATION TECHNOLOGY AND CONVERGENCE 2019, 9, 127 -134.
AMA StyleChaYoung Kim. Learning Less Random to Learn Better in Deep Reinforcement Learning with Noisy Parameters. JOURNAL OF ADVANCED INFORMATION TECHNOLOGY AND CONVERGENCE. 2019; 9 (1):127-134.
Chicago/Turabian StyleChaYoung Kim. 2019. "Learning Less Random to Learn Better in Deep Reinforcement Learning with Noisy Parameters." JOURNAL OF ADVANCED INFORMATION TECHNOLOGY AND CONVERGENCE 9, no. 1: 127-134.
ChaYoung Kim. A MA-plot-based Feature Selection by MRMR in SVM-RFE in RNA-Sequencing Data. The Journal of Korean Institute of Information Technology 2018, 16, 25 -30.
AMA StyleChaYoung Kim. A MA-plot-based Feature Selection by MRMR in SVM-RFE in RNA-Sequencing Data. The Journal of Korean Institute of Information Technology. 2018; 16 (12):25-30.
Chicago/Turabian StyleChaYoung Kim. 2018. "A MA-plot-based Feature Selection by MRMR in SVM-RFE in RNA-Sequencing Data." The Journal of Korean Institute of Information Technology 16, no. 12: 25-30.
ChaYoung Kim. Feature Selection of SVM-RFE Combined with a TD Reinforcement Learning. The Journal of Korean Institute of Information Technology 2018, 16, 21 -26.
AMA StyleChaYoung Kim. Feature Selection of SVM-RFE Combined with a TD Reinforcement Learning. The Journal of Korean Institute of Information Technology. 2018; 16 (10):21-26.
Chicago/Turabian StyleChaYoung Kim. 2018. "Feature Selection of SVM-RFE Combined with a TD Reinforcement Learning." The Journal of Korean Institute of Information Technology 16, no. 10: 21-26.