This page has only limited features, please log in for full access.
Since the Keccak algorithm was selected by the US National Institute of Standards and Technology (NIST) as the standard SHA-3 hash algorithm for replacing the currently used SHA-2 algorithm in 2015, various optimization methods have been studied in parallel and hardware environments. However, in a software environment, the SHA-3 algorithm is much slower than the existing SHA-2 family; therefore, the use of the SHA-3 algorithm is low in a limited environment using embedded devices such as a Wireless Sensor Networks (WSN) enviornment. In this article, we propose a software optimization method that can be used generally to break through the speed limit of SHA-3. We combine the
Young Kim; Taek-Young Youn; Seog Seo. Chaining Optimization Methodology: A New SHA-3 Implementation on Low-End Microcontrollers. Sustainability 2021, 13, 4324 .
AMA StyleYoung Kim, Taek-Young Youn, Seog Seo. Chaining Optimization Methodology: A New SHA-3 Implementation on Low-End Microcontrollers. Sustainability. 2021; 13 (8):4324.
Chicago/Turabian StyleYoung Kim; Taek-Young Youn; Seog Seo. 2021. "Chaining Optimization Methodology: A New SHA-3 Implementation on Low-End Microcontrollers." Sustainability 13, no. 8: 4324.
With the advancement of 5G mobile telecommunication, various IoT (Internet of Things) devices communicate massive amounts of data by being connected to wireless networks. Since this wireless communication is vulnerable to hackers via data leakage during communication, the transmitted data should be encrypted through block ciphers to protect the data during communication. In addition, in order to encrypt the massive amounts of data securely, it is essential to apply one of secure mode of operation. Among them, CTR (CounTeR) mode is the most widely used in industrial applications. However, these IoT devices have limited resources of computing and memory compared to typical computers, so that it is challenging to process cryptographic algorithms that have computation-intensive tasks in IoT devices at high speed. Thus, it is required that cryptographic algorithms are optimized in IoT devices. In other words, optimizing cryptographic operations on these IoT devices is not only basic but also an essential effort in order to build secure IoT-based service systems. For efficient encryption on IoT devices, even though several ARX (Add-Rotate-XOR)-based ciphers have been proposed, it still necessary to improve the performance of encryption for smooth and secure IoT services. In this article, we propose the first parallel implementations of CTR mode of ARX-based ciphers: LEA (Lightweight Encryption Algorithm), HIGHT (high security and light weight), and revised CHAM on the ARMv8 platform, a popular microcontroller in various IoT applications. For the parallel implementation, we propose an efficient data parallelism technique and register scheduling, which maximizes the usage of vector registers. Through proposed techniques, we process the maximum amount of encryption simultaneously by utilizing all vector registers. Namely, in the case of HIGHT and revised CHAM-64/128 (resp. LEA, revised CHAM-128/128, and CHAM-128/256), we can execute 48 (resp. 24) encryptions simultaneously. In addition, we optimize the process of CTR mode by pre-computing and using the intermediate value of some initial rounds by utilizing the property that the nonce part of CTR mode input is fixed during encryptions. Through the pre-computation table, CTR mode is optimized up until round 4 in LEA, round 5 in HIGHT, and round 7 in revised CHAM. With the proposed parallel processing technique, our software provides
Jingyo Song; Seog Seo. Efficient Parallel Implementation of CTR Mode of ARX-Based Block Ciphers on ARMv8 Microcontrollers. Applied Sciences 2021, 11, 2548 .
AMA StyleJingyo Song, Seog Seo. Efficient Parallel Implementation of CTR Mode of ARX-Based Block Ciphers on ARMv8 Microcontrollers. Applied Sciences. 2021; 11 (6):2548.
Chicago/Turabian StyleJingyo Song; Seog Seo. 2021. "Efficient Parallel Implementation of CTR Mode of ARX-Based Block Ciphers on ARMv8 Microcontrollers." Applied Sciences 11, no. 6: 2548.
In edge computing service, edge devices collect data from a number of embedded devices, like sensors, CCTVs (Closed-circuit Television), and so on, and communicate with application servers. Since a large portion of communication in edge computing services are conducted in wireless, the transmitted data needs to be properly encrypted. Furthermore, the application servers (resp. edge devices) are responsible for encrypting or decrypting a large amount of data from edge devices (resp. terminal devices), the cryptographic operation needs to be optimized on both server side and edge device side. Actually, the confidentiality and integrity of data are essential for secure communication. In this paper, we present two versions of security software which can be used on edge device side and server side for secure communication between them in edge computing environment. Our softwares are basically web-based application because of its universality where the softwares can be executed on any web browsers. Our softwares make use of ESTATE (Energy efficient and Single-state Tweakable block cipher based MAC-Then-Encrypt)algorithm, which is a promising candidate of NIST LWC (National Institute of Standards and Technology LightWeight Cryptography) competition and it provides not only data confidentiality but also data authentication. It also implements the ESTATE algorithm using Web Assembly for efficient use on edge devices, and optimizes the performance of the algorithm using the properties of the underlying block cipher. Several methods are applied to efficiently operate the ESTATE algorithm. We use conditional statements to XOR the extended tweak values during the operation of the ESTATE algorithm. To eliminate this unnecessary process, we use a method of expanding and storing the tweak value through pre-computation. The measured results of the ESTATE algorithm implemented with Web Assembly and the reference C/C++ ESTATE algorithm are compared. ESTATE implemented as Web Assembly is measured in web browsers Chrome, FireFox, and Microsoft Edge. For efficiency on server side, we make use of OpenCL which is parallel computing framework in order to process a number of data simultaneously. In addition, when implementing with OpenCL, using conditional statements causes performance degradation. We eliminated the conditional statement using the loop unrolling method to eliminate the performance degradation. In addition, OpenCL operates by moving the data to be encrypted to the local memory because the local memory has a high operation speed. TweAES-128 and TweAES-128-6, which have the same structure as AES algorithm, can apply the previously existing studied T-table method. In addition, the input value 16-byte is processed in parallel and calculated. In addition, since it may be vulnerable to cache-timing attack, it is safely operated by applying the previously existing studied T-table shuffling method. Our softwares cover the necessary security service from edge devices to servers in edge computing services and they can be easily used for various types of edge computing devices because they are all web-based applications.
Bosun Park; Seog Seo. Efficient Implementation of NIST LWC ESTATE Algorithm Using OpenCL and Web Assembly for Secure Communication in Edge Computing Environment. Sensors 2021, 21, 1987 .
AMA StyleBosun Park, Seog Seo. Efficient Implementation of NIST LWC ESTATE Algorithm Using OpenCL and Web Assembly for Secure Communication in Edge Computing Environment. Sensors. 2021; 21 (6):1987.
Chicago/Turabian StyleBosun Park; Seog Seo. 2021. "Efficient Implementation of NIST LWC ESTATE Algorithm Using OpenCL and Web Assembly for Secure Communication in Edge Computing Environment." Sensors 21, no. 6: 1987.
Password-Based Key Derivation Function 2 (PBKDF2) is widely used cryptographic algorithm in order to generate secure keys to a password in various occasions. For example, it is used for file encryption and implementation of authentication systems, and so on. However, the generated derived key has a lower entropy than a general cryptography key, so its use is limited. To compensate for this the number of iteration counts of PBKDF2 should be increased. As the number of repetitive tasks increases, the entropy of the derived key increases, but it takes more time to generate the derived key. We present various optimization methods of PBKDF2. The main idea of our proposed method is reducing redundant block operations and optimizing the internal process of underlying Pseudo Random Function (PRF). In other words, we integrate several redundant operations and make full use of constant values used in PBKDF2. We use two HMAC algorithms: one using SHA-2 family and one using LSH family as the PRF of PBKDF2 (SHA-2 family is the most widely used hash functions, and LSH family is the latest hash function recently developed in South Korea). With our techniques, our implementations outperform Korea Internet & Security Agency (KISA) implementation by 121.26%, 325.91%, and 231.89% for using SHA256, LSH256, and LSH512 respectively; and also outperform OpenSSL implementation by 39.59% using SHA512. In addition, we show that the internal process of PBKDF2 can be computed independently. With our multi thread technique, our PBKDF2 implementations outperform KISA implementation by 2,152.66%, 1,986.85%, and 1,591.36% for using SHA256, LSH256, and LSH512 respectively; and our PBKDF2-HMAC-SHA512 implementation outperforms OpenSSL implementation by 523.57%. With our proposed implementation techniques, higher security can be achieved with more iteration operations. Furthermore, our optimization techniques can be easily expanded to optimize the performance of PBKDF2 on GPGPU and embedded devices.
Hojin Choi; Seog Chung Seo. Optimization of PBKDF2 Using HMAC-SHA2 and HMAC-LSH Families in CPU Environment. IEEE Access 2021, 9, 40165 -40177.
AMA StyleHojin Choi, Seog Chung Seo. Optimization of PBKDF2 Using HMAC-SHA2 and HMAC-LSH Families in CPU Environment. IEEE Access. 2021; 9 ():40165-40177.
Chicago/Turabian StyleHojin Choi; Seog Chung Seo. 2021. "Optimization of PBKDF2 Using HMAC-SHA2 and HMAC-LSH Families in CPU Environment." IEEE Access 9, no. : 40165-40177.
Since Rijndael algorithm was selected as the Advanced Encryption Standard (AES) by NIST, optimization research for the AES has been actively conducted on various IoT-based processors. In an 8-bit AVR environment, LIGHT version of Fast AES CTR-mode Encryption (FACE-LIGHT) was proposed at ICISC’2019 conference. However, in a Wireless Sensor Network environment, where sessions are frequently changed, FACE-LIGHT seems not efficient in terms of available memory and generating a pre-computation table. In this article, we present a new column-wise fashion implementation. Unlike previous best AES implementations, our proposed implementation in an 8-bit AVR microcontroller combines SubBytes, ShiftRows, and MixColums operations and optimizes the operation speed through efficient register scheduling. Our constant-time implementation uses a significantly less table than FACE-LIGHT in an 8-bit AVR microcontroller, achieving 2,251, 2,706, and 3,160 clock cycles when encrypting 128-bit data for each of three security levels. In particular, our 256-bit security level AES implementation is the fastest AES implementation as far as we know in 8-bit AVR microcontroller. Finally, we apply our implementation in CounTeR-mode_Deterministic Random Bit Generator (CTR_DRBG), one of the upper algorithms of a symmetric-key algorithm, to prove the generality of our optimization technology in various operating modes of AES.
Youngbeom Kim; Seog Chung Seo. Efficient Implementation of AES and CTR_DRBG on 8-Bit AVR-Based Sensor Nodes. IEEE Access 2021, 9, 30496 -30510.
AMA StyleYoungbeom Kim, Seog Chung Seo. Efficient Implementation of AES and CTR_DRBG on 8-Bit AVR-Based Sensor Nodes. IEEE Access. 2021; 9 (99):30496-30510.
Chicago/Turabian StyleYoungbeom Kim; Seog Chung Seo. 2021. "Efficient Implementation of AES and CTR_DRBG on 8-Bit AVR-Based Sensor Nodes." IEEE Access 9, no. 99: 30496-30510.
We propose the compact PRESENT on embedded processors. To obtain high-performance, PRESENT operations, including an add-round-key, a substitute layer and permutation layer operations are efficiently implemented on target embedded processors. Novel PRESENT implementations support the Electronic Code Book (ECB) and Counter (CTR). The implementation of CTR is improved by using the pre-computation for one substitute layer, two diffusion layer, and two add-round-key operations. Finally, compact PRESENT on target microcontrollers achieved 504.2, 488.2, 488.7, and 491.6 clock cycles per byte for PRESENT-ECB, 16-bit PRESENT-CTR (RAM-based implementation), 16-bit PRESENT-CTR (ROM-based implementation), and 32-bit PRESENT-CTR (ROM-based implementation) modes of operation, respectively. Compared with former implementation, the execution timing is improved by 62.6%, 63.8%, 63.7%, and 63.5% for PRESENT-ECB, 16-bit PRESENT-CTR (RAM based implementation), 16-bit PRESENT-CTR (ROM-based implementation), and 32-bit PRESENT-CTR (ROM-based implementation) modes of operation, respectively.
Hyeokdong Kwon; Youngbeom Kim; Seog Seo; Hwajeong Seo. High-Speed Implementation of PRESENT on AVR Microcontroller. Mathematics 2021, 9, 374 .
AMA StyleHyeokdong Kwon, Youngbeom Kim, Seog Seo, Hwajeong Seo. High-Speed Implementation of PRESENT on AVR Microcontroller. Mathematics. 2021; 9 (4):374.
Chicago/Turabian StyleHyeokdong Kwon; Youngbeom Kim; Seog Seo; Hwajeong Seo. 2021. "High-Speed Implementation of PRESENT on AVR Microcontroller." Mathematics 9, no. 4: 374.
The Keccak algorithm was selected by NIST as the standard SHA-3 hash algorithm for replacing currently used SHA-2 algorithm in 2015. Despite SHA-3’s improved security compared to SHA-2, its low performance in software implementation limits its wide use. In this paper, we propose an optimized SHA-3 implementation on 8-bit AVR microcontrollers (MCU) which are dominantly used for sensor devices in WSNs. Until now, there are only a few researches on optimization of SHA-3 in spite of its security importance. Furthermore, it is very challenging to optimize hash function, especially, SHA-3, on 8-bit AVR MCUs. This is because the internal state of SHA-3 is 1,600-bit which is much larger than internal state of symmetric algorithms (typically, 128-bit) like AES, ARIA, and so on. In other words, it is difficult to accommodate the whole of SHA-3’s internal state on the registers of AVR MCUs, which incurs heavy memory accesses during computation. Thus, we analyzed the structure of SHA-3 algorithm and found that each lane of the internal state can be executed independently for each process in SHA-3. By using this fact, we propose an optimization method which can reduce efficiently the times of memory accesses to the internal state. With this proposed method minimizing the memory accesses, our implementation of SHA3-256 achieves around 25.0% of performance improvement when hashing 500 bytes message compared with the previous best work on 8-bit AVR MCU. To the best of our knowledge, our software is the fastest SHA-3 implementation on AVR platforms until now. In addition, the proposed optimization method can be easily extended to other embedded MCUs such as 16-bit MSP430, 32-bit RISC-V and ARM-based MCUs.
Youngbeom Kim; Hojin Choi; Seog Chung Seo. Efficient Implementation of SHA-3 Hash Function on 8-Bit AVR-Based Sensor Nodes. Transactions on Petri Nets and Other Models of Concurrency XV 2021, 140 -154.
AMA StyleYoungbeom Kim, Hojin Choi, Seog Chung Seo. Efficient Implementation of SHA-3 Hash Function on 8-Bit AVR-Based Sensor Nodes. Transactions on Petri Nets and Other Models of Concurrency XV. 2021; ():140-154.
Chicago/Turabian StyleYoungbeom Kim; Hojin Choi; Seog Chung Seo. 2021. "Efficient Implementation of SHA-3 Hash Function on 8-Bit AVR-Based Sensor Nodes." Transactions on Petri Nets and Other Models of Concurrency XV , no. : 140-154.
Password-Based Key-Derivation Function 2 (PBKDF2) is commonly employed to derive secure keys from a password in real life such as file encryption and implementation of authentication systems. Nevertheless, owing to the limited entropy of the password, the security of the generated keys is lower than that of the normally generated keys. To address this, issue increase the number of iterative operations during the PBKDF2 may increase. However, the higher the number of iterative operations, the more time it takes to generate the key. This paper presents various techniques for optimizing the performance of PBKDF2. The main idea of our proposed methods is to reduce redundant block operations and to optimize Pseudo Random Function (PRF) itself by combining operations and making full use of fixed values within PBKDF2. As the underlying hash function in PRF, we utilize two algorithms: Hash-based Message Authentication Code-Secure Hash Algorithm 256 (HMAC-SHA256) and HMAC-Lightweight Secure Hash 256 (HMAC-LSH256) (SHA256 is the most widely used hash function and LSH256 was recently developed hash function in South Korea). With the proposed techniques, the proposed implementation of PBKDF2-HMAC-SHA256 provides a performance enhancement of about 135.27% over the reference implementation provided by Korea Internet & Security Agency (KISA) and about 80.21% over OpenSSL. Concerning PBKDF2-HMAC-LSH256, the proposed implementation provides a huge performance enhancement of about 330.48% over the reference implementation provided by KISA. With the proposed implementation, more iteration operations can be possible for higher security. Furthermore, we can use the proposed techniques to optimize PBKDF2 performance on embedded MCUs.
Hojin Choi; Seog Chung Seo. Optimization of PBKDF2-HMAC-SHA256 and PBKDF2-HMAC-LSH256 in CPU Environments. Transactions on Petri Nets and Other Models of Concurrency XV 2020, 321 -333.
AMA StyleHojin Choi, Seog Chung Seo. Optimization of PBKDF2-HMAC-SHA256 and PBKDF2-HMAC-LSH256 in CPU Environments. Transactions on Petri Nets and Other Models of Concurrency XV. 2020; ():321-333.
Chicago/Turabian StyleHojin Choi; Seog Chung Seo. 2020. "Optimization of PBKDF2-HMAC-SHA256 and PBKDF2-HMAC-LSH256 in CPU Environments." Transactions on Petri Nets and Other Models of Concurrency XV , no. : 321-333.
Recently, FACE-LIGHT was proposed on 8-bit AVR MCUs for fast AES encryption. FACE-LIGHT is an extended version of Fast AES-CTR mode Encryption (FACE) method which was firstly proposed for high-end processors and it is tailored for performance on 8-bit AVR MCUs. Even though it has achieved high performance, it has to suffer from the overhead caused by table generation. Thus, when the number of blocks is less than a certain number, the table generation overhead is greater than the gains from using the generated table in the process of encryption. In other words, FACE-LIGHT needs to generate new tables whenever the Initial Vector (IV) is changed. Thus, frequent table regeneration results in a significant performance degradation. In this paper, we present an efficient implementation of AES block cipher on 8-bit AVR Microcontrollers (MCUs). Our method combines ShiftRows, SubBytes, and MixColumns operations into one with column-wise fashion and makes full use of registers of AVR MCUs for high performance. With handcrafted assembly codes, our implementation has achieved 2,251, 2,706, and 3,160 clock cycles for 128-bit, 192-bit, and 256-bit security, respectively. Our implementation outperforms FACE-LIGHT with respect to overall performance including table generation and block encryptions until around 1,850 blocks (resp. 15,000 blocks) for 128-bit (resp. 192-bit) security. With respect to 256-bit security, our implementation always outperforms FACE-LIGHT without considering the table generation time. Our implementation operates in constant time and can be used for not only CTR mode, but also CBC mode differently from FACE-LIGHT.
Youngbeom Kim; Seog Chung Seo. An Efficient Implementation of AES on 8-Bit AVR-Based Sensor Nodes. Transactions on Petri Nets and Other Models of Concurrency XV 2020, 276 -290.
AMA StyleYoungbeom Kim, Seog Chung Seo. An Efficient Implementation of AES on 8-Bit AVR-Based Sensor Nodes. Transactions on Petri Nets and Other Models of Concurrency XV. 2020; ():276-290.
Chicago/Turabian StyleYoungbeom Kim; Seog Chung Seo. 2020. "An Efficient Implementation of AES on 8-Bit AVR-Based Sensor Nodes." Transactions on Petri Nets and Other Models of Concurrency XV , no. : 276-290.
We implement a cryptographic library using Web Assembly. Web Assembly is expected to show better performance than Javascript. The proposed library provides comprehensive algorithm sets including revised CHAM, Hash Message Authentication Code (HMAC), and ECDH using the NIST P-256 curve to provide confidentiality, data authentication, and key agreement functions. To optimize the performance of revised CHAM in the proposed library, we apply an existing method that is a four-round combining method and additionally propose the precomputation method to CHAM-64/128. The proposed revised CHAM showed an approximate 2.06 times (CHAM-64/128), approximate 2.13 times (CHAM-128/128), and approximate 2.63 times (CHAM-128/256) performance improvement in Web Assembly compared to JavaScript. In addition, CHAM-64/128 applying the precomputation method showed an improved performance by approximately 1.2 times more than the existing CHAM-64/128. For the ECDH using P-256 curve, the naive implementation of ECDH is vulnerable to side-channel attacks (SCA), e.g., simple power analysis (SPA), and timing analysis (TA). Thus, we apply an SPA and TA resistant scalar multiplication method, which is a core operation in ECDH. We present atomic block-based scalar multiplication by revising the previous work. Existing atomic blocks show a performance overhead of 55%, 23%, and 37%, but atomic blocks proposed to use only P=(X,Y,Z) show 18%, 6%, and 11% performance overhead. The proposed Web Assembly-based crypto library provides enhanced performance and resistance against SCA thus, it can be used in various web-based applications.
Bosun Park; Jingyo Song; Seog Seo. Efficient Implementation of a Crypto Library Using Web Assembly. Electronics 2020, 9, 1839 .
AMA StyleBosun Park, Jingyo Song, Seog Seo. Efficient Implementation of a Crypto Library Using Web Assembly. Electronics. 2020; 9 (11):1839.
Chicago/Turabian StyleBosun Park; Jingyo Song; Seog Seo. 2020. "Efficient Implementation of a Crypto Library Using Web Assembly." Electronics 9, no. 11: 1839.
With the development of information and communication technology, various types of Internet of Things (IoT) devices have widely been used for convenient services. Many users with their IoT devices request various services to servers. Thus, the amount of users’ personal information that servers need to protect has dramatically increased. To quickly and safely protect users’ personal information, it is necessary to optimize the speed of the encryption process. Since it is difficult to provide the basic services of the server while encrypting a large amount of data in the existing CPU, several parallel optimization methods using Graphics Processing Units (GPUs) have been considered. In this paper, we propose several optimization techniques using GPU for efficient implementation of lightweight block cipher algorithms on the server-side. As the target algorithm, we select high security and light weight (HIGHT), Lightweight Encryption Algorithm (LEA), and revised CHAM, which are Add-Rotate-Xor (ARX)-based block ciphers, because they are used widely on IoT devices. We utilize the features of the counter (CTR) operation mode to reduce unnecessary memory copying and operations in the GPU environment. Besides, we optimize the memory usage by making full use of GPU’s on-chip memory such as registers and shared memory and implement the core function of each target algorithm with inline PTX assembly codes for maximizing the performance. With the application of our optimization methods and handcrafted PTX codes, we achieve excellent encryption throughput of 468, 2593, and 3063 Gbps for HIGHT, LEA, and revised CHAM on RTX 2070 NVIDIA GPU, respectively. In addition, we present optimized implementations of Counter Mode Based Deterministic Random Bit Generator (CTR_DRBG), which is one of the widely used deterministic random bit generators to provide a large amount of random data to the connected IoT devices. We apply several optimization techniques for maximizing the performance of CTR_DRBG, and we achieve 52.2, 24.8, and 34.2 times of performance improvement compared with CTR_DRBG implementation on CPU-side when HIGHT-64/128, LEA-128/128, and CHAM-128/128 are used as underlying block cipher algorithm of CTR_DRBG, respectively.
SangWoo An; Youngbeom Kim; Hyeokdong Kwon; Hwajeong Seo; Seog Chung Seo. Parallel Implementations of ARX-Based Block Ciphers on Graphic Processing Units. Mathematics 2020, 8, 1894 .
AMA StyleSangWoo An, Youngbeom Kim, Hyeokdong Kwon, Hwajeong Seo, Seog Chung Seo. Parallel Implementations of ARX-Based Block Ciphers on Graphic Processing Units. Mathematics. 2020; 8 (11):1894.
Chicago/Turabian StyleSangWoo An; Youngbeom Kim; Hyeokdong Kwon; Hwajeong Seo; Seog Chung Seo. 2020. "Parallel Implementations of ARX-Based Block Ciphers on Graphic Processing Units." Mathematics 8, no. 11: 1894.
In Internet of Things services, various types of embedded devices are employed. Among them, ARM-based devices have been widely used as clients. Since these devices communicate with each other in wirelessly, transmitted data needs to be protected with secure block ciphers. Recently, several Add-Rotate-XOR (ARX)-based block ciphers, such as HIGHT and revised CHAM, have been developed for efficient encryption on embedded devices. In this paper, we present secure and fast implementations of ARX-based block ciphers HIGHT and revised CHAM in ARMv8 platforms. For performance efficiency, we basically apply task and data parallel processing mechanism by fully utilizing NEON architecture embedded in ARMv8 platforms. Typically, it is required to duplicate round key in NEON register to utilize the NEON architecture to process multiple data blocks simultaneously. In our implementations, we propose an optimal approach minimizing the cost of round key duplication and efficient key scheduling for task parallelism. For secure implementation, we develop efficient software countermeasures against realistic fault attack models. Thus, we present efficient software countermeasure based on intra-instruction redundancy. Especially, we propose enhanced random shuffling method which is the core operation for the proposed countermeasure. With the proposed random shuffling method, we can significantly reduce the overhead for preventing fault attacks. We present two versions of the software: a version providing highly fast ( $HF$ ) performance without fault attack countermeasures and a version providing highly secure ( $HS$ ) against fault attacks. Compared with referenced software, $HF$ with HIGHT, revised CHAM-64/128, CHAM-128/128, and CHAM-128/256 provides about 8 times, 38 times, 13 times and 13 times of enhanced performance, respectively. Compared with previous best results having fault attack countermeasure, $HS$ with HIGHT, revised CHAM-64/128, CHAM-128/128, and CHAM-128/256 provides about 50%, 30%, 80%, and 70% of enhanced performance, respectively. Both our $HS$ and $HF$ achieve better performance and higher security compared with related works.
Jingyo Song; Seog Chung Seo. Secure and Fast Implementation of ARX-Based Block Ciphers Using ASIMD Instructions in ARMv8 Platforms. IEEE Access 2020, 8, 1 -1.
AMA StyleJingyo Song, Seog Chung Seo. Secure and Fast Implementation of ARX-Based Block Ciphers Using ASIMD Instructions in ARMv8 Platforms. IEEE Access. 2020; 8 ():1-1.
Chicago/Turabian StyleJingyo Song; Seog Chung Seo. 2020. "Secure and Fast Implementation of ARX-Based Block Ciphers Using ASIMD Instructions in ARMv8 Platforms." IEEE Access 8, no. : 1-1.
Content-Centric Networking (CCN) is one of the emerging paradigms for the future Internet, which shifts the communication paradigm from host-centric to data-centric. In CCN, contents are delivered by their unique names, and a public-key-based signature is built into data packets to verify the authenticity and integrity of the contents. To date, research has tried to accelerate the validation of the given data packets, but existing techniques were designed to improve the performance of content verification from the requester’s viewpoint. However, we need to efficiently verify the validity of data packets in each forwarding engine, since the transmission of invalid packets influences not only security but also performance, which can lead to a DDoS (Distributed Denial of Service) attack on CCN. For example, an adversary can inject a number of meaningless packets into CCN to consume the forwarding engines’ cache and network bandwidth. In this paper, a novel authentication architecture is introduced, which can support faster forwarding by accelerating the performance of data validation in forwarding engines. Since all forwarding engines verify data packets, our authentication architecture can eliminate invalid packets before they are injected into other CCN nodes. The architecture utilizes public-key based authentication algorithms to support public verifiability and non-repudiation, but a novel technique is proposed in this paper to reduce the overhead from using PKI for verifying public keys used by forwarding engines and end-users in the architecture. The main merit of this work is in improving the performance of data-forwarding in CCN regardless of the underlying public-key validation mechanism, such as PKI, by reducing the number of accesses to the mechanism. Differently from existing approaches that forgive some useful features of the Naive CCN for higher performance, the proposed technique is the only architecture which can support all useful features given by the Naive CCN.
Taek-Young Youn; Joongheon Kim; David Mohaisen; Seog Seo. Faster Data Forwarding in Content-Centric Network via Overlaid Packet Authentication Architecture. Sustainability 2020, 12, 8746 .
AMA StyleTaek-Young Youn, Joongheon Kim, David Mohaisen, Seog Seo. Faster Data Forwarding in Content-Centric Network via Overlaid Packet Authentication Architecture. Sustainability. 2020; 12 (20):8746.
Chicago/Turabian StyleTaek-Young Youn; Joongheon Kim; David Mohaisen; Seog Seo. 2020. "Faster Data Forwarding in Content-Centric Network via Overlaid Packet Authentication Architecture." Sustainability 12, no. 20: 8746.
As the development of Internet of Things (IoT), the data exchanged through the network has significantly increased. To secure the sensitive data with user’s personal information, it is necessary to encrypt the transmitted data. Since resource-constrained wireless devices are typically used for IoT services, it is required to optimize the performance of cryptographic algorithms which are computation-intensive tasks. In this paper, we present efficient implementations of ARX-based Korean Block Ciphers (HIGHT and LEA) with CounTeR (CTR) mode of operation, and CTR_DRBG, one of the most widely used DRBGs (Deterministic Random Bit Generators), on 8-bit AVR Microcontrollers (MCUs). Since 8-bit AVR MCUs are widely used for various types of IoT devices, we select it as the target platform in this paper. We present an efficient implementation of HIGHT and LEA by making full use of the property of CTR mode, where the nonce value is fixed, and only the counter value changes during the encryption. On our implementation, the cost of additional function calls occurred by the generation of look-up table can be reduced. With respect to CTR_DRBG, we identified several parts that do not need to be computed. Thus, precomputing those parts in offline and using them online can result in performance improvements for CTR_DRBG. Furthermore, we applied several optimization techniques by making full use of target devices’ characteristics with AVR assembly codes on 8-bit AVR MCUs. Our proposed table generation way can reduce the cost for building a precomputation table by around 6.7% and 9.1% in the case of LEA and HIGHT, respectively. Proposed implementations of LEA and HIGHT with CTR mode on 8-bit AVR MCUs provide 6.3% and 3.8% of improved performance, compared with the previous best results, respectively. Our implementations are the fastest compared to previous LEA and HIGHT implementations on 8-bit AVR MCUs. In addition, the proposed CTR_DRBG implementations on AVR provide better performance by 37.2% and 8.7% when the underlying block cipher is LEA and HIGHT, respectively.
Youngbeom Kim; Hyeokdong Kwon; SangWoo An; Hwajeong Seo; And Seog Chung Seo. Efficient Implementation of ARX-Based Block Ciphers on 8-Bit AVR Microcontrollers. Mathematics 2020, 8, 1837 .
AMA StyleYoungbeom Kim, Hyeokdong Kwon, SangWoo An, Hwajeong Seo, And Seog Chung Seo. Efficient Implementation of ARX-Based Block Ciphers on 8-Bit AVR Microcontrollers. Mathematics. 2020; 8 (10):1837.
Chicago/Turabian StyleYoungbeom Kim; Hyeokdong Kwon; SangWoo An; Hwajeong Seo; And Seog Chung Seo. 2020. "Efficient Implementation of ARX-Based Block Ciphers on 8-Bit AVR Microcontrollers." Mathematics 8, no. 10: 1837.
With the development of the Internet of Things (IoT) and cloud computing technology, various cryptographic systems have been proposed to protect increasing personal information. Recently, Post-Quantum Cryptography (PQC) algorithms have been proposed to counter quantum algorithms that threaten public key cryptography. To efficiently use PQC in a server environment dealing with large amounts of data, optimization studies are required. In this paper, we present optimization methods for FrodoKEM and NewHope, which are the NIST PQC standardization round 2 competition algorithms in the Graphics Processing Unit (GPU) platform. For each algorithm, we present a part that can perform parallel processing of major operations with a large computational load using the characteristics of the GPU. In the case of FrodoKEM, we introduce parallel optimization techniques for matrix generation operations and matrix arithmetic operations such as addition and multiplication. In the case of NewHope, we present a parallel processing technique for polynomial-based operations. In the encryption process of FrodoKEM, the performance improvements have been confirmed up to 5.2, 5.75, and 6.47 times faster than the CPU implementation in FrodoKEM-640, FrodoKEM-976, and FrodoKEM-1344, respectively. In the encryption process of NewHope, the performance improvements have been shown up to 3.33 and 4.04 times faster than the CPU implementation in NewHope-512 and NewHope-1024, respectively. The results of this study can be used in the IoT devices server or cloud computing service server. In addition, the results of this study can be utilized in image processing technologies such as facial recognition technology.
SangWoo An; Seog Chung Seo. Efficient Parallel Implementations of LWE-Based Post-Quantum Cryptosystems on Graphics Processing Units. Mathematics 2020, 8, 1781 .
AMA StyleSangWoo An, Seog Chung Seo. Efficient Parallel Implementations of LWE-Based Post-Quantum Cryptosystems on Graphics Processing Units. Mathematics. 2020; 8 (10):1781.
Chicago/Turabian StyleSangWoo An; Seog Chung Seo. 2020. "Efficient Parallel Implementations of LWE-Based Post-Quantum Cryptosystems on Graphics Processing Units." Mathematics 8, no. 10: 1781.
As the technology of Internet of Things (IoT) evolves, abundant data is generated from sensor nodes and exchanged between them. For this reason, efficient encryption is required to keep data in secret. Since low-end IoT devices have limited computation power, it is difficult to operate expensive ciphers on them. Lightweight block ciphers reduce computation overheads, which are suitable for low-end IoT platforms. In this paper, we implemented the optimized CHAM block cipher in the counter mode of operation, on 8-bit AVR microcontrollers (i.e., representative sensor nodes). There are four new techniques applied. First, the execution time is drastically reduced, by skipping eight rounds through pre-calculation and look-up table access. Second, the encryption with a variable-key scenario is optimized with the on-the-fly table calculation. Third, the encryption in a parallel way makes multiple blocks computed in online for CHAM-64/128 case. Fourth, the state-of-art engineering technique is fully utilized in terms of the instruction level and register level. With these optimization methods, proposed optimized CHAM implementations for counter mode of operation outperformed the state-of-art implementations by 12.8%, 8.9%, and 9.6% for CHAM-64/128, CHAM-128/128, and CHAM-128/256, respectively.
Hyeokdong Kwon; SangWoo An; Youngbeom Kim; Hyunji Kim; Seung Ju Choi; Kyoungbae Jang; Jaehoon Park; Hyunjun Kim; Seog Chung Seo; Hwajeong Seo. Designing a CHAM Block Cipher on Low-End Microcontrollers for Internet of Things. Electronics 2020, 9, 1548 .
AMA StyleHyeokdong Kwon, SangWoo An, Youngbeom Kim, Hyunji Kim, Seung Ju Choi, Kyoungbae Jang, Jaehoon Park, Hyunjun Kim, Seog Chung Seo, Hwajeong Seo. Designing a CHAM Block Cipher on Low-End Microcontrollers for Internet of Things. Electronics. 2020; 9 (9):1548.
Chicago/Turabian StyleHyeokdong Kwon; SangWoo An; Youngbeom Kim; Hyunji Kim; Seung Ju Choi; Kyoungbae Jang; Jaehoon Park; Hyunjun Kim; Seog Chung Seo; Hwajeong Seo. 2020. "Designing a CHAM Block Cipher on Low-End Microcontrollers for Internet of Things." Electronics 9, no. 9: 1548.
With the advent of IoT and Cloud computing service technology, the size of user data to be managed and file data to be transmitted has been significantly increased. To protect users’ personal information, it is necessary to encrypt it in secure and efficient way. Since servers handling a number of clients or IoT devices have to encrypt a large amount of data without compromising service capabilities in real-time, Graphic Processing Units (GPUs) have been considered as a proper candidate for a crypto accelerator for processing a huge amount of data in this situation. In this paper, we present highly efficient implementations of block ciphers on NVIDIA GPUs (especially, Maxwell, Pascal, and Turing architectures) for environments using massively large data in IoT and Cloud computing applications. As block cipher algorithms, we choose AES, a representative standard block cipher algorithm; LEA, which was recently added in ISO/IEC 29192-2:2019 standard; and CHAM, a recently developed lightweight block cipher algorithm. To maximize the parallelism in the encryption process, we utilize Counter (CTR) mode of operation and customize it by using GPU’s characteristics. We applied several optimization techniques with respect to the characteristics of GPU architecture such as kernel parallelism, memory optimization, and CUDA stream. Furthermore, we optimized each target cipher by considering the algorithmic characteristics of each cipher by implementing the core part of each cipher with handcrafted inline PTX (Parallel Thread eXecution) codes, which are virtual assembly codes in CUDA platforms. With the application of our optimization techniques, in our implementation on RTX 2070 GPU, AES and LEA show up to 310 Gbps and 2.47 Tbps of throughput, respectively, which are 10.7% and 67% improved compared with the 279.86 Gbps and 1.47 Tbps of the previous best result. In the case of CHAM, this is the first optimized implementation on GPUs and it achieves 3.03 Tbps of throughput on RTX 2070 GPU.
SangWoo An; Seog Chung Seo. Highly Efficient Implementation of Block Ciphers on Graphic Processing Units for Massively Large Data. Applied Sciences 2020, 10, 3711 .
AMA StyleSangWoo An, Seog Chung Seo. Highly Efficient Implementation of Block Ciphers on Graphic Processing Units for Massively Large Data. Applied Sciences. 2020; 10 (11):3711.
Chicago/Turabian StyleSangWoo An; Seog Chung Seo. 2020. "Highly Efficient Implementation of Block Ciphers on Graphic Processing Units for Massively Large Data." Applied Sciences 10, no. 11: 3711.
Binary field ( B F ) multiplication is a basic and important operation for widely used crypto algorithms such as the GHASH function of GCM (Galois/Counter Mode) mode and NIST-compliant binary Elliptic Curve Cryptosystems (ECCs). Recently, Seo et al. proposed a novel SCA-resistant binary field multiplication method in the context of GHASH optimization in AES GCM mode on 8-bit AVR microcontrollers (MCUs). They proposed a concept of Dummy XOR operation with a kind of garbage registers and a concept of instruction level atomicity ( I L A ) for resistance against Timing Analysis (TA) and Simple Power Analysis (SPA) and used a Karatsuba Block-Comb multiplication approach for efficiency. Even though their method achieved a large performance improvement compared with previous works, it still has room for improvement on the 8-bit AVR platform. In this paper, we propose a more improved binary field multiplication method on 8-bit AVR MCUs. Our method basically adopts a Dummy XOR technique using a set of garbage registers for TA and SPA security; however, we save the number of used garbage registers from eight to one by using the fact that the number of used garbage registers does not affect TA and SPA security. In addition, we apply a multiplier encoding approach so as to decrease the number of required registers when accessing the multiplier, which enables the use of extended block size in the Karatsuba Block-Comb multiplication technique. Actually, the proposed technique extends the block size from four to eight and the proposed binary field multiplication method can compute a 128-bit B F multiplication with only 3816 clock cycles ( c c ) (resp. 3490 c c ) with (resp. without) the multiplier encoding process, which is almost a 32.8% (resp. 38.5%) improvement compared with 5675 c c of the best previous work. We apply the proposed technique to the GHASH function of the GCM mode with several additional optimization techniques. The proposed GHASH implementation provides improved performance by over 42% compared with the previous best result. The concept of the proposed B F method can be extended to other MCUs, including 16-bit MSP430 MCUs and 32-bit ARM MCUs.
Seog Chung Seo; Donggeun Kwon. Highly Efficient SCA-Resistant Binary Field Multiplication on 8-Bit AVR Microcontrollers. Applied Sciences 2020, 10, 2821 .
AMA StyleSeog Chung Seo, Donggeun Kwon. Highly Efficient SCA-Resistant Binary Field Multiplication on 8-Bit AVR Microcontrollers. Applied Sciences. 2020; 10 (8):2821.
Chicago/Turabian StyleSeog Chung Seo; Donggeun Kwon. 2020. "Highly Efficient SCA-Resistant Binary Field Multiplication on 8-Bit AVR Microcontrollers." Applied Sciences 10, no. 8: 2821.
Beginning with the proposal of the McEliece cryptosystem in 1978, code-based cryptography has positioned itself as one of main categories in post-quantum cryptography (PQC). To date, the algebraic security of certain variants of McEliece cryptosystems has been challenged many times, although some of the variants have remained secure. However, recent studies on code-based cryptography have focused on the side-channel resistance since previous studies have indicated that the existing algorithms were vulnerable to side-channel analysis. In this paper, we propose the first side-channel attack on the Hybrid McEliece Scheme (HyMES) using only a single power consumption trace. HyMES is a variant of the McEliece system that provides smaller keys, along with faster encryption and decryption speed. By exploiting joint distributions of nonlinear functions in the decryption process, we were able to recover the private key of HyMES. To the best of our knowledge, this is the first work proposing a side-channel analysis based on a joint distribution of the leakages on the public-key system.
Byeonggyu Park; Suhri Kim; Seokhie Hong; Heeseok Kim; Seog Chung Seo. Single Trace Analysis against HyMES by Exploitation of Joint Distributions of Leakages. Applied Sciences 2020, 10, 1831 .
AMA StyleByeonggyu Park, Suhri Kim, Seokhie Hong, Heeseok Kim, Seog Chung Seo. Single Trace Analysis against HyMES by Exploitation of Joint Distributions of Leakages. Applied Sciences. 2020; 10 (5):1831.
Chicago/Turabian StyleByeonggyu Park; Suhri Kim; Seokhie Hong; Heeseok Kim; Seog Chung Seo. 2020. "Single Trace Analysis against HyMES by Exploitation of Joint Distributions of Leakages." Applied Sciences 10, no. 5: 1831.
Galois/Counter Mode (GCM) mode is one of the most widely used authenticated encryptions. To date, even though some works have investigated the security against side channel analysis (SCA) in the process of GCM computation, especially GHASH function, they failed to present comprehensive SCA security in consideration of both SPA/TA and DPA/CPA aspects simultaneously. In this paper, we present a secure GCM implementation on 8-bit AVR microcontroller environments. The proposed implementation provides comprehensive SCA security in consideration of not only SPA/TA, but also DPA/CPA. In order to defeat SPA/TA, we introduce the concepts of Dummy XOR with garbage registers and instruction level atomicity (ILA), and also present secure binary field (BF) multiplication method using them, which runs in a constant-time and fixed pattern. We also propose an efficient multiplicative masking method which can prevent DPA/CPA when computing GHASH function in the GCM process. Through actual implementation of the proposed method on an 8-bit AVR ATmega128 microcontroller, we show that the proposed method outperforms existing alternatives while providing comprehensive SCA security. With respect to the performance of secure binary field multiplication, the proposed multiplication method outperforms the related work by around 51.86% when computing a 128-bit binary field multiplication. Regarding the overhead of the multiplicative masking method, the proposed method requires only one additional BF multiplication and negligible amount of field additions regardless of the number of input blocks, while the related work consumes around the {log(m + n + 1)+2} number of additional BF multiplications when there are (m + n + 1) input blocks. Through SCA-related experiments, we prove the SCA security of the proposed methods.
Seog Chung Seo; Heeseok Kim. SCA-Resistant GCM Implementation on 8-Bit AVR Microcontrollers. IEEE Access 2019, 7, 103961 -103978.
AMA StyleSeog Chung Seo, Heeseok Kim. SCA-Resistant GCM Implementation on 8-Bit AVR Microcontrollers. IEEE Access. 2019; 7 (99):103961-103978.
Chicago/Turabian StyleSeog Chung Seo; Heeseok Kim. 2019. "SCA-Resistant GCM Implementation on 8-Bit AVR Microcontrollers." IEEE Access 7, no. 99: 103961-103978.