This page has only limited features, please log in for full access.
In this article, we focus on the unsupervised multiview feature selection, which tries to handle high-dimensional data in the field of multiview learning. Although some graph-based methods have achieved satisfactory performance, they ignore the underlying data structure across different views. Besides, their predefined Laplacian graphs are sensitive to the noises in the original data space and fail to obtain the optimal neighbor assignment. To address the above problems, we propose a novel unsupervised multiview feature selection model based on graph learning, and the contributions are three-fold: 1) during the feature selection procedure, the consensus similarity graph shared by different views is learned. Therefore, the proposed model can reveal the data relationship from the feature subset; 2) a reasonable rank constraint is added to optimize the similarity matrix to obtain more accurate information; and 3) an autoweighted framework is presented to assign view weights adaptively, and an effective alternative iterative algorithm is proposed to optimize the problem. Experiments on various datasets demonstrate the superiority of the proposed method compared to the state-of-the-art methods.
Qi Wang; Xu Jiang; Mulin Chen; Xuelong Li. Autoweighted Multiview Feature Selection With Graph Optimization. IEEE Transactions on Cybernetics 2021, PP, 1 -12.
AMA StyleQi Wang, Xu Jiang, Mulin Chen, Xuelong Li. Autoweighted Multiview Feature Selection With Graph Optimization. IEEE Transactions on Cybernetics. 2021; PP (99):1-12.
Chicago/Turabian StyleQi Wang; Xu Jiang; Mulin Chen; Xuelong Li. 2021. "Autoweighted Multiview Feature Selection With Graph Optimization." IEEE Transactions on Cybernetics PP, no. 99: 1-12.
Anomaly detection (AD) on hyperspectral images has been widely researched in recent decades due to its high practicability and wide range of application scenarios. Such AD methods derived from low-rank matrix decomposition (LRMD) have appeared rapidly and been applied effectively. However, most of them focused on the use of spectral information and neglected the abundant spatial characteristics. In this letter, a spectral-spatial total variation (TV) (SSTV) regularized low-rank matrix decomposition method with a Schatten 1/2 quasi-norm ( $S_{1/2}$ ) and denoising is proposed. First, to exploit the hyperspectral imagery (HSI) characteristics from the spectral perspective, we propose the low-rank matrix decomposition method with $S_{1/2}$ norm and image denoising modules. Second, we incorporate the SSTV regularization by employing a 2-D TV (TV) spatially and 1-D TV along the spectral dimension to realize the maximized utilization of spatial characteristics of HSI. Finally, the alternating direction multiplier method (ADMM) is brought in the calculating process to attain the consequent detection results. The superiority of the proposed method has been demonstrated by the excellent performance on three real datasets.
Jingyu Wang; Pengfei Huang; Ke Zhang; Qi Wang. Hyperspectral Anomaly Detection via $S_{1/2}$ and Total Variation Low Rank Matrix Decomposition. IEEE Geoscience and Remote Sensing Letters 2021, PP, 1 -5.
AMA StyleJingyu Wang, Pengfei Huang, Ke Zhang, Qi Wang. Hyperspectral Anomaly Detection via $S_{1/2}$ and Total Variation Low Rank Matrix Decomposition. IEEE Geoscience and Remote Sensing Letters. 2021; PP (99):1-5.
Chicago/Turabian StyleJingyu Wang; Pengfei Huang; Ke Zhang; Qi Wang. 2021. "Hyperspectral Anomaly Detection via $S_{1/2}$ and Total Variation Low Rank Matrix Decomposition." IEEE Geoscience and Remote Sensing Letters PP, no. 99: 1-5.
Fuzzy c-means (FCM) algorithms with spatial information have been widely applied in the field of image segmentation. However, most of them suffer from two challenges. One is that introduction of fixed or adaptive single neighboring information with narrow receptive field limits contextual constraints leading to clutter segmentations. The other is that incorporation of superpixels with wide receptive field enlarges spatial coherency leading to block effects. To address these challenges, we propose fuzzy Students t-distribution model based on richer spatial combination (FRSC) for image segmentation. In this Paper, we make two significant contributions. The first is that both narrow and wide receptive fields are integrated into the objective function of FRSC, which is convenient to mine image features and distinguish local difference. The second is that the rich spatial combination under Students t-distribution ensures that spatial information is introduced into the updated parameters of FRSC,which is helpful in finding a balance between the noise-immunity and detail-preservation. Experimental results on synthetic and publicly available images, further demonstrate that the proposed FRSC addresses successfully the limitations of FCM algorithms with spatial information and provides better segmentation results than state-of-the-art clustering algorithms.
Tao Lei; Xiaohong Jia; Dinghua Xue; Qi Wang; Hongying Meng; Asoke K Nandi. Fuzzy Students T-Distribution Model Based on Richer Spatial Combination. IEEE Transactions on Fuzzy Systems 2021, PP, 1 -1.
AMA StyleTao Lei, Xiaohong Jia, Dinghua Xue, Qi Wang, Hongying Meng, Asoke K Nandi. Fuzzy Students T-Distribution Model Based on Richer Spatial Combination. IEEE Transactions on Fuzzy Systems. 2021; PP (99):1-1.
Chicago/Turabian StyleTao Lei; Xiaohong Jia; Dinghua Xue; Qi Wang; Hongying Meng; Asoke K Nandi. 2021. "Fuzzy Students T-Distribution Model Based on Richer Spatial Combination." IEEE Transactions on Fuzzy Systems PP, no. 99: 1-1.
Graph-based clustering aims to partition the data according to a similarity graph, which has shown impressive performance on various kinds of tasks. The quality of similarity graph largely determines the clustering results, but it is difficult to produce a high-quality one, especially when data contain noises and outliers. To solve this problem, we propose a robust rank constrained sparse learning (RRCSL) method in this article. The L2,1-norm is adopted into the objective function of sparse representation to learn the optimal graph with robustness. To preserve the data structure, we construct an initial graph and search the graph within its neighborhood. By incorporating a rank constraint, the learned graph can be directly used as the cluster indicator, and the final results are obtained without additional postprocessing. In addition, the proposed method cannot only be applied to single-view clustering but also extended to multiview clustering. Plenty of experiments on synthetic and real-world datasets have demonstrated the superiority and robustness of the proposed framework.
Qi Wang; Ran Liu; Mulin Chen; Xuelong Li. Robust Rank-Constrained Sparse Learning: A Graph-Based Framework for Single View and Multiview Clustering. IEEE Transactions on Cybernetics 2021, PP, 1 -12.
AMA StyleQi Wang, Ran Liu, Mulin Chen, Xuelong Li. Robust Rank-Constrained Sparse Learning: A Graph-Based Framework for Single View and Multiview Clustering. IEEE Transactions on Cybernetics. 2021; PP (99):1-12.
Chicago/Turabian StyleQi Wang; Ran Liu; Mulin Chen; Xuelong Li. 2021. "Robust Rank-Constrained Sparse Learning: A Graph-Based Framework for Single View and Multiview Clustering." IEEE Transactions on Cybernetics PP, no. 99: 1-12.
License plate detection and recognition (LPDR) has attracted considerable attention in recent years, and many algorithms have presented the competitive performance on several datasets. However, there are still three significant issues to be addressed in this field. Firstly, most methods have poor detection performance in unconstrained scenarios with moving vehicles and highly distracting background objects. Secondly, existing systems generally focus on single image-based algorithms, yet traffic video sequences provide more effective information than individual frames for LPDR tasks. Thirdly, images and videos captured in complex environments may be adversely affected by distortions and low resolution, causing sensitive recognition performance and reduced robustness. To remedy these issues, we propose to automatically perform license plate detection, tracking, and recognition in real-world traffic videos and integrate them into a unified end-to-end framework via deep learning. The contributions of this paper are threefold: 1) A deep flow-guided spatiotemporal license plate detector is proposed to model the video contextual information by introducing optical flow and a novel spatiotemporal attention mechanism; 2) An online license plate tracker is developed to bridge video-based detection and recognition which utilizes both motion and deep appearance information, and innovatively, it can be end-to-end trained with the detector via multi-task learning; 3) The efficient quality-guided license plate recommender and recognizer are proposed to jointly perform stream recognition. The former recommends high-quality frames from video streams while the latter generates recognition results. We evaluate the proposed method on three traffic video-based license plate datasets, and ablation studies have been presented to verify the effectiveness of each component mentioned above. Moreover, extensive experiments are conducted for comparison with other approaches in different scenarios, and the results have demonstrated that our method achieves state-of-the-art performance on all datasets.
Cong Zhang; Qi Wang; Xuelong Li. V-LPDR: Towards a unified framework for license plate detection, tracking, and recognition in real-world traffic videos. Neurocomputing 2021, 449, 189 -206.
AMA StyleCong Zhang, Qi Wang, Xuelong Li. V-LPDR: Towards a unified framework for license plate detection, tracking, and recognition in real-world traffic videos. Neurocomputing. 2021; 449 ():189-206.
Chicago/Turabian StyleCong Zhang; Qi Wang; Xuelong Li. 2021. "V-LPDR: Towards a unified framework for license plate detection, tracking, and recognition in real-world traffic videos." Neurocomputing 449, no. : 189-206.
Accurate object detection in remote sensing images is an essential part of automatic extraction, analysis, and understanding of image information, which potentially plays a significant role in a number of practical applications. However, the scale diversity in remote sensing images presents a substantial challenge for object detection, regarded as one of the crucial problems to be solved. To extract multiscale feature representations and sufficiently exploit semantic context information, this letter proposes a semantic context-aware network (SCANet) model for multiscale object detection. We propose two novel modules, called receptive field-enhancement module (RFEM) and semantic context fusion module (SCFM), to enhance the performance of SCANet. The RFEM dedicates to more robust multiscale feature extraction by paying attention to distinct receptive fields through multibranch different convolutions. For the purpose of utilizing the semantic context information contained in the scene to guide the network to better detection accuracy, the SCFM integrates the semantic context features from the upper level with the lower level features and delivers them hierarchically. Experiments demonstrate that, compared with the state-of-the-art approaches, the SCANet yields superior detection results on the DOTA-v1.5 data set.
Ke Zhang; Yulin Wu; Jingyu Wang; Yezi Wang; Qi Wang. Semantic Context-Aware Network for Multiscale Object Detection in Remote Sensing Images. IEEE Geoscience and Remote Sensing Letters 2021, PP, 1 -5.
AMA StyleKe Zhang, Yulin Wu, Jingyu Wang, Yezi Wang, Qi Wang. Semantic Context-Aware Network for Multiscale Object Detection in Remote Sensing Images. IEEE Geoscience and Remote Sensing Letters. 2021; PP (99):1-5.
Chicago/Turabian StyleKe Zhang; Yulin Wu; Jingyu Wang; Yezi Wang; Qi Wang. 2021. "Semantic Context-Aware Network for Multiscale Object Detection in Remote Sensing Images." IEEE Geoscience and Remote Sensing Letters PP, no. 99: 1-5.
Anomaly detection has been drawing a great deal of attention by virtue of its practicability among the hyperspectral research area. Low-rank representation (LRR) has been widely employed to detect anomalies from hyperspectral imagery (HSI) effectively while a great number of methods derived from LRR replace rank function with a nuclear norm, which gives rise to a certain amount of error. In this letter, we propose a Schatten 1/2 quasi-norm (S1/2) regularized LRR (SRLRR) method with an improved algorithm of establishing the dictionary for hyperspectral anomaly detection. First, S1/2 regularization is proposed to substitute the initial nuclear norm to approximate the rank function. Second, an improved dictionary construction algorithm based on K-Means++ clustering is presented to integrate the model and improve the performance. Finally, the optimization algorithm through alternating direction multiplier method (ADMM) incorporating a half threshold operator is introduced to attain the eventual results. Our method has been testified on three typical data sets and demonstrates the eminent performance.
Jingyu Wang; Pengfei Huang; Ke Zhang; Qi Wang. Hyperspectral Anomaly Detection via S1/2 Regularized Low Rank Representation. IEEE Geoscience and Remote Sensing Letters 2021, PP, 1 -5.
AMA StyleJingyu Wang, Pengfei Huang, Ke Zhang, Qi Wang. Hyperspectral Anomaly Detection via S1/2 Regularized Low Rank Representation. IEEE Geoscience and Remote Sensing Letters. 2021; PP (99):1-5.
Chicago/Turabian StyleJingyu Wang; Pengfei Huang; Ke Zhang; Qi Wang. 2021. "Hyperspectral Anomaly Detection via S1/2 Regularized Low Rank Representation." IEEE Geoscience and Remote Sensing Letters PP, no. 99: 1-5.
The location information of road and lane lines is the supremely important thing for the automatic drive and auxiliary drive. The detection accuracy of these two elements dramatically affects the reliability and practicality of the whole system. In real applications, the traffic scene can be very complicated, which makes it particularly challenging to obtain the precise location of road and lane lines. Commonly used deep learning-based object detection models perform pretty well on the lane line and road detection tasks, but they still encounter false detection and missing detection frequently. Besides, existing convolution neural network (CNN) structures only pay attention to the information flow between layers, while it cannot fully utilize the spatial information inside the layers. To address those problems, we propose an attention-based spatial segmentation network for traffic scene understanding. We use the convolutional attention module to improve the network's understanding capacity of spatial location distribution. Spatial CNN (SCNN) obtains through the information flow within one single convolutional layer and improves the spatial relationship modeling ability of the network. The experimental results demonstrate that this method effectively improves the neural network's application ability of the spatial information, thereby improving the effect of traffic scene understanding. Furthermore, a pixel-level road segmentation dataset called NWPU Road Dataset is built to help improve the process of traffic scene understanding.
Xuelong Li; Zhiyuan Zhao; Qi Wang. ABSSNet: Attention-Based Spatial Segmentation Network for Traffic Scene Understanding. IEEE Transactions on Cybernetics 2021, PP, 1 -11.
AMA StyleXuelong Li, Zhiyuan Zhao, Qi Wang. ABSSNet: Attention-Based Spatial Segmentation Network for Traffic Scene Understanding. IEEE Transactions on Cybernetics. 2021; PP (99):1-11.
Chicago/Turabian StyleXuelong Li; Zhiyuan Zhao; Qi Wang. 2021. "ABSSNet: Attention-Based Spatial Segmentation Network for Traffic Scene Understanding." IEEE Transactions on Cybernetics PP, no. 99: 1-11.
Cross-domain crowd counting (CDCC) is a hot topic due to its importance in public safety. The purpose of CDCC is to alleviate the domain shift between the source and target domain. Recently, typical methods attempt to extract domain-invariant features via image translation and adversarial learning. When it comes to specific tasks, we find that the domain shifts are reflected in model parameters' differences. To describe the domain gap directly at the parameter level, we propose a neuron linear transformation (NLT) method, exploiting domain factor and bias weights to learn the domain shift. Specifically, for a specific neuron of a source model, NLT exploits few labeled target data to learn domain shift parameters. Finally, the target neuron is generated via a linear transformation. Extensive experiments and analysis on six real-world data sets validate that NLT achieves top performance compared with other domain adaptation methods. An ablation study also shows that the NLT is robust and more effective than supervised and fine-tune training. Code is available at https://github.com/taohan10200/NLT.
Qi Wang; Tao Han; Junyu Gao; Yuan Yuan. Neuron Linear Transformation: Modeling the Domain Shift for Crowd Counting. IEEE Transactions on Neural Networks and Learning Systems 2021, PP, 1 -13.
AMA StyleQi Wang, Tao Han, Junyu Gao, Yuan Yuan. Neuron Linear Transformation: Modeling the Domain Shift for Crowd Counting. IEEE Transactions on Neural Networks and Learning Systems. 2021; PP (99):1-13.
Chicago/Turabian StyleQi Wang; Tao Han; Junyu Gao; Yuan Yuan. 2021. "Neuron Linear Transformation: Modeling the Domain Shift for Crowd Counting." IEEE Transactions on Neural Networks and Learning Systems PP, no. 99: 1-13.
Indoor scene images usually contain scattered objects and various scene layouts, which make RGB-D scene classification a challenging task. Existing methods still have limitations for classifying scene images with great spatial variability. Thus, how to extract local patch-level features effectively using only image label is still an open problem for RGB-D scene recognition. In this paper, we propose an efficient framework for RGB-D scene recognition, which adaptively selects important local features to capture the great spatial variability of scene images. Specifically, we design a differentiable local feature selection (DLFS) module, which can extract the appropriate number of key local scene-related features. Discriminative local theme-level and object-level representations can be selected with DLFS module from the spatially-correlated multi-modal RGB-D features. We take advantage of the correlation between RGB and depth modalities to provide more cues for selecting local features. To ensure that discriminative local features are selected, the variational mutual information maximization loss is proposed. Additionally, the DLFS module can be easily extended to select local features of different scales. By concatenating the local-orderless and global-structured multi-modal features, the proposed framework can achieve new state-of-the-art performance on SUN RGB-D and NYU Depth version 2 datasets.
Zhitong Xiong; Yuan Yuan; Qi Wang. ASK: Adaptively Selecting Key Local Features for RGB-D Scene Recognition. IEEE Transactions on Image Processing 2021, 30, 2722 -2733.
AMA StyleZhitong Xiong, Yuan Yuan, Qi Wang. ASK: Adaptively Selecting Key Local Features for RGB-D Scene Recognition. IEEE Transactions on Image Processing. 2021; 30 (99):2722-2733.
Chicago/Turabian StyleZhitong Xiong; Yuan Yuan; Qi Wang. 2021. "ASK: Adaptively Selecting Key Local Features for RGB-D Scene Recognition." IEEE Transactions on Image Processing 30, no. 99: 2722-2733.
Linear discriminant analysis (LDA) is a well-known technique for supervised dimensionality reduction and has been extensively applied in many real-world applications. LDA assumes that the samples are Gaussian distributed, and the local data distribution is consistent with the global distribution. However, real-world data seldom satisfy this assumption. To handle the data with complex distributions, some methods emphasize the local geometrical structure and perform discriminant analysis between neighbors. But the neighboring relationship tends to be affected by the noise in the input space. In this research, we propose a new supervised dimensionality reduction method, namely, locality adaptive discriminant analysis (LADA). In order to directly process the data with matrix representation, such as images, the 2-D LADA (2DLADA) is also developed. The proposed methods have the following salient properties: 1) they find the principle projection directions without imposing any assumption on the data distribution; 2) they explore the data relationship in the desired subspace, which contains less noise; and 3) they find the local data relationship automatically without the efforts for tuning parameters. The performance of dimensionality reduction shows the superiorities of the proposed methods over the state of the art.
Xuelong Li; Qi Wang; Feiping Nie; Mulin Chen. Locality Adaptive Discriminant Analysis Framework. IEEE Transactions on Cybernetics 2021, PP, 1 -12.
AMA StyleXuelong Li, Qi Wang, Feiping Nie, Mulin Chen. Locality Adaptive Discriminant Analysis Framework. IEEE Transactions on Cybernetics. 2021; PP (99):1-12.
Chicago/Turabian StyleXuelong Li; Qi Wang; Feiping Nie; Mulin Chen. 2021. "Locality Adaptive Discriminant Analysis Framework." IEEE Transactions on Cybernetics PP, no. 99: 1-12.
Remote sensing images contain complex backgrounds and multi-scale objects, which pose a challenging task for scene classification. The performance is highly dependent on the capacity of the scene representation as well as the discriminability of the classifier. Although multiple models possess better properties than a single model on these aspects, the fusion strategy for these models is a key component to maximize the final accuracy. In this paper, we construct a novel dual-model architecture with a grouping-attention-fusion strategy to improve the performance of scene classification. Specifically, the model employs two different convolutional neural networks (CNNs) for feature extraction, where the grouping-attention-fusion strategy is used to fuse the features of the CNNs in a fine and multi-scale manner. In this way, the resultant feature representation of the scene is enhanced. Moreover, to address the issue of similar appearances between different scenes, we develop a loss function which encourages small intra-class diversities and large inter-class distances. Extensive experiments are conducted on four scene classification datasets include the UCM land-use dataset, the WHU-RS19 dataset, the AID dataset, and the OPTIMAL-31 dataset. The experimental results demonstrate the superiority of the proposed method in comparison with the state-of-the-arts.
Junge Shen; Tong Zhang; Yichen Wang; Ruxin Wang; Qi Wang; Min Qi. A Dual-Model Architecture with Grouping-Attention-Fusion for Remote Sensing Scene Classification. Remote Sensing 2021, 13, 433 .
AMA StyleJunge Shen, Tong Zhang, Yichen Wang, Ruxin Wang, Qi Wang, Min Qi. A Dual-Model Architecture with Grouping-Attention-Fusion for Remote Sensing Scene Classification. Remote Sensing. 2021; 13 (3):433.
Chicago/Turabian StyleJunge Shen; Tong Zhang; Yichen Wang; Ruxin Wang; Qi Wang; Min Qi. 2021. "A Dual-Model Architecture with Grouping-Attention-Fusion for Remote Sensing Scene Classification." Remote Sensing 13, no. 3: 433.
Hyperspectral image super-resolution (SR) methods based on deep learning have achieved significant progress recently. However, previous methods lack the joint analysis between spectrum and horizontal or vertical direction. Besides, when both 2D and 3D convolution are in the network, the existing models cannot effectively combine the two. To address these issues, in this article, we propose a novel hyperspectral image SR method by exploring the relationship between 2D/3D convolution (ERCSR). Our method alternately employs 2D and 3D units to solve the problem of structural redundancy by sharing spatial information during reconstruction for existing model, which can enhance the learning ability of 2D spatial domain. Importantly, compared with the network using 3D units, i.e., 2D units are replaced by 3D units, it can not only reduce the size of the model but also improve the performance of the model. Furthermore, to exploit the spectrum fully, the split adjacent spatial and spectral convolution (SAEC) is designed to parallelly explore information between spectrum and horizontal or vertical direction in space. Experiments on widely used benchmark datasets demonstrate that the proposed approach outperforms state-of-the-art SR algorithms across different scales in terms of quantitative and qualitative analysis.
Qiang Li; Qi Wang; Xuelong Li. Exploring the Relationship Between 2D/3D Convolution for Hyperspectral Image Super-Resolution. IEEE Transactions on Geoscience and Remote Sensing 2021, PP, 1 -11.
AMA StyleQiang Li, Qi Wang, Xuelong Li. Exploring the Relationship Between 2D/3D Convolution for Hyperspectral Image Super-Resolution. IEEE Transactions on Geoscience and Remote Sensing. 2021; PP (99):1-11.
Chicago/Turabian StyleQiang Li; Qi Wang; Xuelong Li. 2021. "Exploring the Relationship Between 2D/3D Convolution for Hyperspectral Image Super-Resolution." IEEE Transactions on Geoscience and Remote Sensing PP, no. 99: 1-11.
Remote sensing image captioning (RSIC), which aims at generating a well-formed sentence for a remote sensing image, has attracted more attention in recent years. The general framework for RSIC is the encoder-decoder architecture containing two submodels of encoder and decoder. Although the significant performance is obtained, the encoder-decoder architecture is a black-box model with a lack of explainability. To overcome this drawback, in this article, we propose a new explainable word-sentence framework for RSIC. The proposed word-sentence framework consists of two parts: word extractor and sentence generator, where the former extracts the valuable words in the given remote sensing image, while the latter organizes these words into a well-formed sentence. The proposed framework decomposes RSIC into a word classification task and a word sorting task, which is more in line with human intuitive understanding. On the basis of the word-sentence framework, some ablation experiments are conducted on the three public RSIC data sets of Sydney-captions, UCM-captions, and RSICD to explore the specific and effective network structures. In order to evaluate the proposed word-sentence framework objectively, we further conduct some comparative experiments on these three data sets and achieve comparable results in comparison with the encoder-decoder-based methods.
Qi Wang; Wei Huang; Xueting Zhang; Xuelong Li. Word-Sentence Framework for Remote Sensing Image Captioning. IEEE Transactions on Geoscience and Remote Sensing 2020, PP, 1 -12.
AMA StyleQi Wang, Wei Huang, Xueting Zhang, Xuelong Li. Word-Sentence Framework for Remote Sensing Image Captioning. IEEE Transactions on Geoscience and Remote Sensing. 2020; PP (99):1-12.
Chicago/Turabian StyleQi Wang; Wei Huang; Xueting Zhang; Xuelong Li. 2020. "Word-Sentence Framework for Remote Sensing Image Captioning." IEEE Transactions on Geoscience and Remote Sensing PP, no. 99: 1-12.
Unlike object detection in natural images that usually achieved great success, remote sensing imagery has its own challenges to detect and localize multiclass objects, such as large-scale change, uncertain direction, and high density. The context information of the objects is very worthwhile for solving these challenges in remote sensing images. In this letter, we propose a context-driven detection network (CDD-Net) to improve the accuracy of multiclass object detection in remote sensing images. For capturing the local neighboring objects and features, a local context feature network (LCFN) is proposed to learn the local context of the region of interest. Meanwhile, a hybrid attention pyramid network (HAPN) is designed, which can steer the focus to more valuable features. The HAPN inserts a squeeze and excitation block (SEB) and three asymmetric convolution blocks (ACBs) in the feature pyramid network (FPN). The experimental results over the DOTA-v1.5 data set demonstrate that the proposed CDD-Net yields promising results.
Yulin Wu; Ke Zhang; Jingyu Wang; Yezi Wang; Qi Wang; Qiang Li. CDD-Net: A Context-Driven Detection Network for Multiclass Object Detection. IEEE Geoscience and Remote Sensing Letters 2020, PP, 1 -5.
AMA StyleYulin Wu, Ke Zhang, Jingyu Wang, Yezi Wang, Qi Wang, Qiang Li. CDD-Net: A Context-Driven Detection Network for Multiclass Object Detection. IEEE Geoscience and Remote Sensing Letters. 2020; PP (99):1-5.
Chicago/Turabian StyleYulin Wu; Ke Zhang; Jingyu Wang; Yezi Wang; Qi Wang; Qiang Li. 2020. "CDD-Net: A Context-Driven Detection Network for Multiclass Object Detection." IEEE Geoscience and Remote Sensing Letters PP, no. 99: 1-5.
Remote sensing image scene classification has attracted great attention because of its wide applications. Although convolutional neural network (CNN)-based methods for scene classification have achieved excellent results, the large-scale variation of the features and objects in remote sensing images limits the further improvement of the classification performance. To address this issue, we present multiscale representation for scene classification, which is realized by a global-local two-stream architecture. This architecture has two branches of the global stream and local stream, which can individually extract the global features and local features from the whole image and the most important area. In order to locate the most important area in the whole image using only image-level labels, a weakly supervised key area detection strategy of structured key area localization (SKAL) is specially designed to connect the above two streams. To verify the effectiveness of the proposed SKAL-based two-stream architecture, we conduct comparative experiments based on three widely used CNN models, including AlexNet, GoogleNet, and ResNet18, on four public remote sensing image scene classification data sets, and achieve the state-of-the-art results on all the four data sets. Our codes are provided in https://github.com/hw2hwei/SKAL.
Qi Wang; Wei Huang; Zhitong Xiong; Xuelong Li. Looking Closer at the Scene: Multiscale Representation Learning for Remote Sensing Image Scene Classification. IEEE Transactions on Neural Networks and Learning Systems 2020, 1 -15.
AMA StyleQi Wang, Wei Huang, Zhitong Xiong, Xuelong Li. Looking Closer at the Scene: Multiscale Representation Learning for Remote Sensing Image Scene Classification. IEEE Transactions on Neural Networks and Learning Systems. 2020; ():1-15.
Chicago/Turabian StyleQi Wang; Wei Huang; Zhitong Xiong; Xuelong Li. 2020. "Looking Closer at the Scene: Multiscale Representation Learning for Remote Sensing Image Scene Classification." IEEE Transactions on Neural Networks and Learning Systems , no. : 1-15.
Many CNN-based segmentation methods have been applied in lane marking detection recently and gain excellent success for a strong ability in modeling semantic information. Although the accuracy of lane line prediction is getting better and better, lane markings' localization ability is relatively weak, especially when the lane marking point is remote. Traditional lane detection methods usually utilize highly specialized handcrafted features and carefully designed postprocessing to detect the lanes. However, these methods are based on strong assumptions and, thus, are prone to scalability. In this work, we propose a novel multitask method that: 1) integrates the ability to model semantic information of CNN and the strong localization ability provided by handcrafted features and 2) predicts the position of vanishing line. A novel lane fitting method based on vanishing line prediction is also proposed for sharp curves and nonflat road in this article. By integrating segmentation, specialized handcrafted features, and fitting, the accuracy of location and the convergence speed of networks are improved. Extensive experimental results on four-lane marking detection data sets show that our method achieves state-of-the-art performance.
Qi Wang; Tao Han; Zequn Qin; Junyu Gao; Xuelong Li. Multitask Attention Network for Lane Detection and Fitting. IEEE Transactions on Neural Networks and Learning Systems 2020, PP, 1 -13.
AMA StyleQi Wang, Tao Han, Zequn Qin, Junyu Gao, Xuelong Li. Multitask Attention Network for Lane Detection and Fitting. IEEE Transactions on Neural Networks and Learning Systems. 2020; PP (99):1-13.
Chicago/Turabian StyleQi Wang; Tao Han; Zequn Qin; Junyu Gao; Xuelong Li. 2020. "Multitask Attention Network for Lane Detection and Fitting." IEEE Transactions on Neural Networks and Learning Systems PP, no. 99: 1-13.
As a significant and fundamental task in the remote sensing field, object detection has received increasing attention and research studies. However, geospatial object detection is still a challenge owing to the dramatic variation in object scales, intraclass differences, and interclass similarity from multiscale and multiclass objects. To deal with these problems, an end-to-end feature-reflowing pyramid network (FRPNet) is proposed in this letter. FRPNet has two advantages that contribute to improve object detection accuracy. First, we embed a nonlocal block into the backbone in order to get the relevancy between different regions of the geospatial image for obtaining discriminative features. Furthermore, a feature-reflowing pyramid structure is proposed to generate high-quality feature presentation for each scale through fusing fine-grained features from the adjacent lower level, which improves the detection capability for multiscale and multiclass objects. Experiments on a public remote sensing data set DIOR illustrate that FRPNet can significantly improve the performance when compared to several state-of-the-art detection approaches in terms of mean average precision (mAP).
Jingyu Wang; Yezi Wang; Yulin Wu; Ke Zhang; Qi Wang. FRPNet: A Feature-Reflowing Pyramid Network for Object Detection of Remote Sensing Images. IEEE Geoscience and Remote Sensing Letters 2020, PP, 1 -5.
AMA StyleJingyu Wang, Yezi Wang, Yulin Wu, Ke Zhang, Qi Wang. FRPNet: A Feature-Reflowing Pyramid Network for Object Detection of Remote Sensing Images. IEEE Geoscience and Remote Sensing Letters. 2020; PP (99):1-5.
Chicago/Turabian StyleJingyu Wang; Yezi Wang; Yulin Wu; Ke Zhang; Qi Wang. 2020. "FRPNet: A Feature-Reflowing Pyramid Network for Object Detection of Remote Sensing Images." IEEE Geoscience and Remote Sensing Letters PP, no. 99: 1-5.
With the development of deep neural networks, the performance of crowd counting and pixel-wise density estimation is continually being refreshed. Despite this, there are still two challenging problems in this field: 1) current supervised learning needs a large amount of training data, but collecting and annotating them is difficult and 2) existing methods cannot generalize well to the unseen domain. A recently released synthetic crowd dataset alleviates these two problems. However, the domain gap between the real-world data and synthetic images decreases the models' performance. To reduce the gap, in this article, we propose a domain-adaptation-style crowd counting method, which can effectively adapt the model from synthetic data to the specific real-world scenes. It consists of multilevel feature-aware adaptation (MFA) and structured density map alignment (SDA). To be specific, MFA boosts the model to extract domain-invariant features from multiple layers. SDA guarantees the network outputs fine density maps with a reasonable distribution on the real domain. Finally, we evaluate the proposed method on four mainstream surveillance crowd datasets, Shanghai Tech Part B, WorldExpo'10, Mall, and UCSD. Extensive experiments are evidence that our approach outperforms the state-of-the-art methods for the same cross-domain counting problem.
Junyu Gao; Ieee. Please Verify. Yuan Yuan; Qi Wang. Feature-Aware Adaptation and Density Alignment for Crowd Counting in Video Surveillance. IEEE Transactions on Cybernetics 2020, PP, 1 -12.
AMA StyleJunyu Gao, Ieee. Please Verify. Yuan Yuan, Qi Wang. Feature-Aware Adaptation and Density Alignment for Crowd Counting in Video Surveillance. IEEE Transactions on Cybernetics. 2020; PP (99):1-12.
Chicago/Turabian StyleJunyu Gao; Ieee. Please Verify. Yuan Yuan; Qi Wang. 2020. "Feature-Aware Adaptation and Density Alignment for Crowd Counting in Video Surveillance." IEEE Transactions on Cybernetics PP, no. 99: 1-12.
Recently, crowd counting draws much attention on account of its significant meaning in congestion control, public safety, and ecological surveys. Although the performance is improved dramatically due to the development of deep learning, the scales of these networks also become larger and more complex. Moreover, a large model also entails more time to train for better performance. To tackle these problems, this article first constructs a lightweight model, which is composed of an image feature encoder and a simple but effective decoder, called the pixel shuffle decoder (PSD). PSD ends with a pixel shuffle operator, which can display more density information without increasing the number of convolutional layers. Second, a density-aware curriculum learning (DCL) training strategy is designed to fully tap the potential of crowd counting models. DCL gives each predicted pixel a weight to determine its predicting difficulty and provides guidance on obtaining better generalization. Experimental results exhibit that PSD can achieve outstanding performance on most mainstream datasets while training under the DCL training framework. Besides, we also conduct some experiments about adopting DCL on existing typical crowd counters, and the results show that they all obtain new better performance than before, which further validates the effectiveness of our method.
Qi Wang; Wei Lin; Junyu Gao; Xuelong Li. Density-Aware Curriculum Learning for Crowd Counting. IEEE Transactions on Cybernetics 2020, PP, 1 -13.
AMA StyleQi Wang, Wei Lin, Junyu Gao, Xuelong Li. Density-Aware Curriculum Learning for Crowd Counting. IEEE Transactions on Cybernetics. 2020; PP (99):1-13.
Chicago/Turabian StyleQi Wang; Wei Lin; Junyu Gao; Xuelong Li. 2020. "Density-Aware Curriculum Learning for Crowd Counting." IEEE Transactions on Cybernetics PP, no. 99: 1-13.