This page has only limited features, please log in for full access.
More and more attention is being paid to the use of massive parallel computing performed on many-core Networks-on-Chip (NoC) in order to accelerate performance. Simultaneously deploying multiple applications on NoC is one feasible way to achieve this. In this paper, we propose a multi-phase-based multi-application mapping approach for NoC design. Our approach began with a rectangle analysis, which offered several potential regions for application. Then we mapped all tasks of the application into these potential regions using a genetic algorithm, and identified the one which exhibited the strongest performance. When the packeted regions for each application were identified, a B*Tree-based simulated annealing algorithm was used to generate the optimal placement for the multi-application mapping regions. The experiment results show that the proposed approach can achieve a considerable reduction in network power consumption (up to 23.45%) and latency (up to 24.42%) for a given set of applications.
Fen Ge; Chenchen Cui; Fang Zhou; Ning Wu. A Multi-Phase Based Multi-Application Mapping Approach for Many-Core Networks-on-Chip. Micromachines 2021, 12, 613 .
AMA StyleFen Ge, Chenchen Cui, Fang Zhou, Ning Wu. A Multi-Phase Based Multi-Application Mapping Approach for Many-Core Networks-on-Chip. Micromachines. 2021; 12 (6):613.
Chicago/Turabian StyleFen Ge; Chenchen Cui; Fang Zhou; Ning Wu. 2021. "A Multi-Phase Based Multi-Application Mapping Approach for Many-Core Networks-on-Chip." Micromachines 12, no. 6: 613.
In order to achieve the purpose of defending against side channel attacks, a compact SM4 circuit was designed based on the mask and random delay technique, and the linear transformation module was designed with random insertion of the pseudo operation method. By analyzing the glitch data generated by the S-box of SM4 with different inputs, the security against glitch attacks was confirmed. Then, the DPA (Differential Power Analysis) was performed on the designed circuit. The key could not be successfully obtained even in the case of 100,000 power curves, so that the safety of SM4 against DPA is verified. Finally, using Synopsys DC (Design Compiler, Mountain View, CA94043DC, USA) to synthesize the designed circuit, the results show that the area of the designed circuit in the SMIC 0.18 process is 82,734 μm2, which is 48% smaller than results reported in other papers.
Fang Zhou; Benjun Zhang; Ning Wu; Xiangli Bu. The Design of Compact SM4 Encryption and Decryption Circuits That are Resistant to Bypass Attack. Electronics 2020, 9, 1102 .
AMA StyleFang Zhou, Benjun Zhang, Ning Wu, Xiangli Bu. The Design of Compact SM4 Encryption and Decryption Circuits That are Resistant to Bypass Attack. Electronics. 2020; 9 (7):1102.
Chicago/Turabian StyleFang Zhou; Benjun Zhang; Ning Wu; Xiangli Bu. 2020. "The Design of Compact SM4 Encryption and Decryption Circuits That are Resistant to Bypass Attack." Electronics 9, no. 7: 1102.
As a typical artificial intelligence algorithm, the convolutional neural network (CNN) is widely used in the Internet of Things (IoT) system. In order to improve the computing ability of an IoT CPU, this paper designs a reconfigurable CNN-accelerated coprocessor based on the RISC-V instruction set. The interconnection structure of the acceleration chain designed by the predecessors is optimized, and the accelerator is connected to the RISC-V CPU core in the form of a coprocessor. The corresponding instruction of the coprocessor is designed and the instruction compiling environment is established. Through the inline assembly in the C language, the coprocessor instructions are called, coprocessor acceleration library functions are established, and common algorithms in the IoT system are implemented on the coprocessor. Finally, resource consumption evaluation and performance analysis of the coprocessor are completed on a Xilinx FPGA. The evaluation results show that the reconfigurable CNN-accelerated coprocessor only consumes 8534 LUTS, accounting for 47.6% of the total SoC system. The number of instruction cycles required to implement functions such as convolution and pooling based on the designed coprocessor instructions is better than using the standard instruction set, and the acceleration ratio of convolution is 6.27 times that of the standard instruction set.
Ning Wu; Tao Jiang; Lei Zhang; Fang Zhou; Fen Ge. A Reconfigurable Convolutional Neural Network-Accelerated Coprocessor Based on RISC-V Instruction Set. Electronics 2020, 9, 1005 .
AMA StyleNing Wu, Tao Jiang, Lei Zhang, Fang Zhou, Fen Ge. A Reconfigurable Convolutional Neural Network-Accelerated Coprocessor Based on RISC-V Instruction Set. Electronics. 2020; 9 (6):1005.
Chicago/Turabian StyleNing Wu; Tao Jiang; Lei Zhang; Fang Zhou; Fen Ge. 2020. "A Reconfigurable Convolutional Neural Network-Accelerated Coprocessor Based on RISC-V Instruction Set." Electronics 9, no. 6: 1005.
Optical network-on-chip is considered to be a promising technology to solve the problems of low bandwidth and high latency in the traditional interconnection network. However, due to the inevitable leakage of optical devices, the optical signal will receive crosstalk noise during transmission. In this paper, a heuristic fusion mapping algorithm PSO_SA for crosstalk optimization is proposed. First, the initial optimal mapping is obtained by particle swarm optimization, and then the local optimization of the mapping scheme is removed by combining with simulated annealing algorithm. The experimental results show that the crosstalk optimization performance of PSO_SA algorithm is better than that of GA algorithm in 263 dec, Wavelet, DVOPD and other applications, and the maximum optimization degree is 28.7%.
Xinhao Shi; Ning Wu; Fen Ge; Fang Zhou; Muhammad Rehan Yahya. Optimizing Crosstalk in Optical NoC through Heuristic Fusion Mapping. Electronics 2020, 9, 1 .
AMA StyleXinhao Shi, Ning Wu, Fen Ge, Fang Zhou, Muhammad Rehan Yahya. Optimizing Crosstalk in Optical NoC through Heuristic Fusion Mapping. Electronics. 2020; 9 (6):1.
Chicago/Turabian StyleXinhao Shi; Ning Wu; Fen Ge; Fang Zhou; Muhammad Rehan Yahya. 2020. "Optimizing Crosstalk in Optical NoC through Heuristic Fusion Mapping." Electronics 9, no. 6: 1.
By using through-silicon-vias (TSV), three dimension integration technology can stack large memory on the top of cores as a last-level on-chip cache (LLC) to reduce off-chip memory access and enhance system performance. However, the integration of more on-chip caches increases chip power density, which might lead to temperature-related issues in power consumption, reliability, cooling cost, and performance. An effective thermal management scheme is required to ensure the performance and reliability of the system. In this study, a fuzzy-based thermal management scheme (FBTM) is proposed that simultaneously considers cores and stacked caches. The proposed method combines a dynamic cache reconfiguration scheme with a fuzzy-based control policy in a temperature-aware manner. The dynamic cache reconfiguration scheme determines the size of the cache for the processor core according to the application that reaches a substantial amount of power consumption savings. The fuzzy-based control policy is used to change the frequency level of the processor core based on dynamic cache reconfiguration, a process which can further improve the system performance. Experiments show that, compared with other thermal management schemes, the proposed FBTM can achieve, on average, 3 degrees of reduction in temperature and a 41% reduction of leakage energy.
Lili Shen; Ning Wu; Gaizhen Yan. Fuzzy-Based Thermal Management Scheme for 3D Chip Multicores with Stacked Caches. Electronics 2020, 9, 346 .
AMA StyleLili Shen, Ning Wu, Gaizhen Yan. Fuzzy-Based Thermal Management Scheme for 3D Chip Multicores with Stacked Caches. Electronics. 2020; 9 (2):346.
Chicago/Turabian StyleLili Shen; Ning Wu; Gaizhen Yan. 2020. "Fuzzy-Based Thermal Management Scheme for 3D Chip Multicores with Stacked Caches." Electronics 9, no. 2: 346.
Silicon photonics has become a commonly used paradigm for on-chip interconnects to meet the requirements of higher bandwidth in computationally intensive applications for manycore processors. Design of an optical switch is a vital aspect while constructing an optical NoC topology which influences the performance of network. We present a HoneyComb optimized reconfigurable optical switch (HCROS), a 6 × 6 non-blocking optical switch where optimized reconfiguration of optical links utilizing the states of basic 2 × 2 optical switching elements (OSE) was achieved while keeping the input-output (I/O) interconnection intact. The proposed 6-port HCROS architecture was further optimized to reduce the number of OSEs to minimize overall power consumption. We proposed a generic algorithm to find the optimal switching combination of OSEs for a particular I/O link to minimize the insertion loss and power consumption. In comparison to other non-blocking architectures, a maximum of 66% reduction in OSEs was observed for the optimized HCROS, which consumes only 12 OSEs. Simulations were performed for all 720 I/O links in different configurations to evaluate the power consumption and insertion loss. We observed up to 92% power savings in the case of optimized HCROS as compared to un-optimized HCROS, and a 79% minimization in insertion loss was also reported as a result of optimization.
Muhammad Rehan Yahya; Ning Wu; Gaizhen Yan; Tanveer Ahmed; Jinbao Zhang; Yuanyuan Zhang. HoneyComb ROS: A 6 × 6 Non-Blocking Optical Switch with Optimized Reconfiguration for ONoCs. Electronics 2019, 8, 844 .
AMA StyleMuhammad Rehan Yahya, Ning Wu, Gaizhen Yan, Tanveer Ahmed, Jinbao Zhang, Yuanyuan Zhang. HoneyComb ROS: A 6 × 6 Non-Blocking Optical Switch with Optimized Reconfiguration for ONoCs. Electronics. 2019; 8 (8):844.
Chicago/Turabian StyleMuhammad Rehan Yahya; Ning Wu; Gaizhen Yan; Tanveer Ahmed; Jinbao Zhang; Yuanyuan Zhang. 2019. "HoneyComb ROS: A 6 × 6 Non-Blocking Optical Switch with Optimized Reconfiguration for ONoCs." Electronics 8, no. 8: 844.
Aiming to protect cryptographic circuits against physical attacks, researchers have proposed a variety of mature and effective countermeasures. However, most of these defensive technologies are used for specific and single attack, thus it is hard to thwart combined attack, such as combined power and fault attacks. In this paper, we propose a dual complementary infection countermeasure for Advanced Encryption Standard (AES) cryptographic circuit to defend against both power and fault attacks. According to the target AES circuit, we first design and construct a dual complementary AES circuit to defend against power attacks, which can balance the power consumption when processing different data. Besides, to defend against fault attacks, in the dual complementary AES circuit, we design an improved random infection mechanism to diffuse the effect of injected faults. Experiment results show that the proposed countermeasure can thwart both power and fault attacks effectively. Compared with those AES circuits which can only defend against single attack, our designed circuit increases greatly the security under extra 83.1% area overhead and 2.1% impacts on the maximum working frequency.
Jinbao Zhang; Ning Wu; Fang Zhou; Fen Ge; Xiaoqiang Zhang. Securing the AES Cryptographic Circuit Against Both Power and Fault Attacks. Journal of Electrical Engineering & Technology 2019, 14, 2171 -2180.
AMA StyleJinbao Zhang, Ning Wu, Fang Zhou, Fen Ge, Xiaoqiang Zhang. Securing the AES Cryptographic Circuit Against Both Power and Fault Attacks. Journal of Electrical Engineering & Technology. 2019; 14 (5):2171-2180.
Chicago/Turabian StyleJinbao Zhang; Ning Wu; Fang Zhou; Fen Ge; Xiaoqiang Zhang. 2019. "Securing the AES Cryptographic Circuit Against Both Power and Fault Attacks." Journal of Electrical Engineering & Technology 14, no. 5: 2171-2180.
Recently, in 3D Chip-Multiprocessors (CMPs), a hybrid cache architecture of SRAM and Non-Volatile Memory (NVM) is generally used to exploit high density and low leakage power of NVM and a low write overhead of SRAM. The conventional access policy does not consider the hybrid cache and cannot make good use of the characteristics of both NVM and SRAM technology. This paper proposes a Cache Fill and Migration policy (CFM) for multi-level hybrid cache. In CFM, data access was optimized in three aspects: Cache fill, cache eviction, and dirty data migration. The CFM reduces unnecessary cache fill, write operations to NVM, and optimizes the victim cache line selection in cache eviction. The results of experiments show that the CFM can improve performance by 24.1% and reduce power consumption by 18% when compared to conventional writeback access policy.
Fen Ge; Lei Wang; Ning Wu; Fang Zhou. A Cache Fill and Migration Policy for STT-RAM-Based Multi-Level Hybrid Cache in 3D CMPs. Electronics 2019, 8, 639 .
AMA StyleFen Ge, Lei Wang, Ning Wu, Fang Zhou. A Cache Fill and Migration Policy for STT-RAM-Based Multi-Level Hybrid Cache in 3D CMPs. Electronics. 2019; 8 (6):639.
Chicago/Turabian StyleFen Ge; Lei Wang; Ning Wu; Fang Zhou. 2019. "A Cache Fill and Migration Policy for STT-RAM-Based Multi-Level Hybrid Cache in 3D CMPs." Electronics 8, no. 6: 639.
As a classical artificial intelligence algorithm, the convolutional neural network (CNN) algorithm plays an important role in image recognition and classification and is gradually being applied in the Internet of Things (IoT) system. A compact CNN accelerator for the IoT endpoint System-on-Chip (SoC) is proposed in this paper to meet the needs of CNN computations. Based on analysis of the CNN structure, basic functional modules of CNN such as convolution circuit and pooling circuit with a low data bandwidth and a smaller area are designed, and an accelerator is constructed in the form of four acceleration chains. After the acceleration unit design is completed, the Cortex-M3 is used to construct a verification SoC and the designed verification platform is implemented on the FPGA to evaluate the resource consumption and performance analysis of the CNN accelerator. The CNN accelerator achieved a throughput of 6.54 GOPS (giga operations per second) by consuming 4901 LUTs without using any hardware multipliers. The comparison shows that the compact accelerator proposed in this paper makes the CNN computational power of the SoC based on the Cortex-M3 kernel two times higher than the quad-core Cortex-A7 SoC and 67% of the computational power of eight-core Cortex-A53 SoC.
Fen Ge; Ning Wu; Hao Xiao; Yuanyuan Zhang; Fang Zhou. Compact Convolutional Neural Network Accelerator for IoT Endpoint SoC. Electronics 2019, 8, 497 .
AMA StyleFen Ge, Ning Wu, Hao Xiao, Yuanyuan Zhang, Fang Zhou. Compact Convolutional Neural Network Accelerator for IoT Endpoint SoC. Electronics. 2019; 8 (5):497.
Chicago/Turabian StyleFen Ge; Ning Wu; Hao Xiao; Yuanyuan Zhang; Fang Zhou. 2019. "Compact Convolutional Neural Network Accelerator for IoT Endpoint SoC." Electronics 8, no. 5: 497.
Differential power analysis (DPA) is an effective side channel attack method, which poses a critical threat to cryptographic algorithms, especially lightweight ciphers such as SIMON. In this paper, we propose an area-efficient countermeasure against DPA on SIMON based on the power randomization. Firstly, we review and analyze the architecture of SIMON algorithm. Secondly, we prove the threat of DPA attack to SIMON by launching actual DPA attack on SIMON 32/64 circuit. Thirdly, a low-cost power randomization scheme is proposed by combining fault injection with double rate technology, and the corresponding circuit design is implemented. To the best of our knowledge, this is the first scheme that applies the combination of fault injection and double rate technology to the DPA-resistance. Finally, the t-test is used to evaluate the security mechanism of the proposed designs with leakage quantification. Our experimental results show that the proposed design implements DPA-resistance of SIMON algorithm at certain overhead the cost of 47.7% LUTs utilization and 39.6% registers consumption. As compared to threshold implementation and bool mask, the proposed scheme has greater advantages in resource consumption.
Yuanyuan Zhang; Ning Wu; Fang Zhou; Jinbao Zhang; Muhammad Rehan Yahya. A Countermeasure against DPA on SIMON with an Area-Efficient Structure. Electronics 2019, 8, 240 .
AMA StyleYuanyuan Zhang, Ning Wu, Fang Zhou, Jinbao Zhang, Muhammad Rehan Yahya. A Countermeasure against DPA on SIMON with an Area-Efficient Structure. Electronics. 2019; 8 (2):240.
Chicago/Turabian StyleYuanyuan Zhang; Ning Wu; Fang Zhou; Jinbao Zhang; Muhammad Rehan Yahya. 2019. "A Countermeasure against DPA on SIMON with an Area-Efficient Structure." Electronics 8, no. 2: 240.
As a family of lightweight block ciphers, SIMON has attracted lots of research attention since its publication in 2013. Recent works show that SIMON is vulnerable to differential fault analysis (DFA) and existing DFAs on SIMON assume the location of induced faults are on the cipher states. In this paper, a novel DFA on SIMON is proposed where the key schedule is selected as the location of induced faults. Firstly, we assume a random one-bit fault is induced in the fourth round key KT−4 to the last. Then, by utilizing the key schedule propagation properties of SIMON, we determine the exact position of induced fault and demonstrate that the proposed DFA can retrieve 4 bits of the last round key KT−1 on average using one-bit fault. Till now this is the largest number of bits that can be cracked as compared to DFAs based on random bit fault model. Furthermore, by reusing the induced fault, we prove that 2 bits of the penultimate round key KT−2 could be retrieved. To the best of our knowledge, the proposed attack is the first one which extracts a key from SIMON based upon DFA on the key schedule. Finally, correctness and validity of our proposed attack is verified through detailed simulation and analysis.
Jinbao Zhang; Ning Wu; Fang Zhou; Muhammad Rehan Yahya; Jianhua Li. A Novel Differential Fault Analysis on the Key Schedule of SIMON Family. Electronics 2019, 8, 93 .
AMA StyleJinbao Zhang, Ning Wu, Fang Zhou, Muhammad Rehan Yahya, Jianhua Li. A Novel Differential Fault Analysis on the Key Schedule of SIMON Family. Electronics. 2019; 8 (1):93.
Chicago/Turabian StyleJinbao Zhang; Ning Wu; Fang Zhou; Muhammad Rehan Yahya; Jianhua Li. 2019. "A Novel Differential Fault Analysis on the Key Schedule of SIMON Family." Electronics 8, no. 1: 93.
NoC architecture has been increasingly applied to complex SoC chips and how to efficiently map the specific application to NoC infrastructure is an important topic urgently needed to study for NoC. At the same time, there are many challenges for NoC embedded IP cores testing. This paper proposes a sectional NoC mapping algorithm optimized for NoC IP cores testing. Associated with the pre-designed test structure, sectional NoC mapping firstly adapts the Partition Algorithm to arrange IP cores into parallel testing groups to minimize testing time. Then, it applies genetic algorithm for NoC mapping based on the traffic information between IP cores. The experiment results on ITC’02 benchmark circuits showed that the mapping costs decreased by 24.5 % on average compared with the random mapping and the testing time can be reduced by 12.67 % on average as well, which illustrated the effectiveness of the sectional NoC mapping scheme.
Zhang Ying; Wu Ning; Ge Fen. Sectional NoC Mapping Scheme Optimized for Testing Time. Transactions on Engineering Technologies 2015, 301 -314.
AMA StyleZhang Ying, Wu Ning, Ge Fen. Sectional NoC Mapping Scheme Optimized for Testing Time. Transactions on Engineering Technologies. 2015; ():301-314.
Chicago/Turabian StyleZhang Ying; Wu Ning; Ge Fen. 2015. "Sectional NoC Mapping Scheme Optimized for Testing Time." Transactions on Engineering Technologies , no. : 301-314.
NoC(Network-on-Chip) has been proposed as a new solution to deal with the global communication problem of complex SoC(System-on-Chip). However, there are many difficulties in testing and verification for NoC. We propose novel co-design of test architecture and data transfer schemes for 2D-Mesh topology NoC to improve the parallelism of test packets transmission. The testing efficiencies of different structures or transfer modes are evaluated under a coverage-driven and hierarchical NoC testbench, which is based on the VMM verification methodology and SystemVerilog language. The evaluation results of testing cost, testing time and hardware overhead show that the shortening of transmission path and parallel testing effectively decreases the power consumption and testing time. Furthermore, one of these test structures can be proved to the optimal scheme.
Ying Zhang; Ning Wu; Fen Ge. The Co-Design of Test Structure and Test Data Transfer Mode for 2D-Mesh NoC. Lecture Notes in Electrical Engineering 2012, 171 -184.
AMA StyleYing Zhang, Ning Wu, Fen Ge. The Co-Design of Test Structure and Test Data Transfer Mode for 2D-Mesh NoC. Lecture Notes in Electrical Engineering. 2012; ():171-184.
Chicago/Turabian StyleYing Zhang; Ning Wu; Fen Ge. 2012. "The Co-Design of Test Structure and Test Data Transfer Mode for 2D-Mesh NoC." Lecture Notes in Electrical Engineering , no. : 171-184.
A clustering-based topology generation approach is proposed to construct Network on Chip (NoC) topologies for given applications. The approach consists of four phases and constructs irregular NoC topology with design constraints, according to the communication requirements of the given application and characteristics of the router architectures. Specially, a recursion based link construction algorithm embedded in the topology generation is proposed to construct links between routers. The evaluation performed on various multimedia benchmark applications confirms the efficiency of the proposed approach. Experimental results show that the approach saves 61.5 % of power consumption on average in comparison with using regular Mesh topology. Significant network resource improvement is also achieved. Moreover, the approach performs well for two multimedia applications compared to existing algorithms.
Fen Ge; Ning Wu. Power-Aware Topology Generation Based on Clustering for Application-Specific Network on Chip. Lecture Notes in Electrical Engineering 2012, 135 -149.
AMA StyleFen Ge, Ning Wu. Power-Aware Topology Generation Based on Clustering for Application-Specific Network on Chip. Lecture Notes in Electrical Engineering. 2012; ():135-149.
Chicago/Turabian StyleFen Ge; Ning Wu. 2012. "Power-Aware Topology Generation Based on Clustering for Application-Specific Network on Chip." Lecture Notes in Electrical Engineering , no. : 135-149.