This page has only limited features, please log in for full access.

Unclaimed
Ruxandra Tapu
Institut Polytechnique de Paris, Télécom SudParis, Advanced Research and TEchniques for Multidimensional Imaging Systems Department, 9 rue Charles Fourier, 91000 Évry, France

Basic Info

Basic Info is private.

Honors and Awards

The user has no records in this section


Career Timeline

The user has no records in this section.


Short Biography

The user biography is not available.
Following
Followers
Co Authors
The list of users this user is following is empty.
Following: 0 users

Feed

Journal article
Published: 20 June 2021 in Sensors
Reads 0
Downloads 0

Emotion is a form of high-level paralinguistic information that is intrinsically conveyed by human speech. Automatic speech emotion recognition is an essential challenge for various applications; including mental disease diagnosis; audio surveillance; human behavior understanding; e-learning and human–machine/robot interaction. In this paper, we introduce a novel speech emotion recognition method, based on the Squeeze and Excitation ResNet (SE-ResNet) model and fed with spectrogram inputs. In order to overcome the limitations of the state-of-the-art techniques, which fail in providing a robust feature representation at the utterance level, the CNN architecture is extended with a trainable discriminative GhostVLAD clustering layer that aggregates the audio features into compact, single-utterance vector representation. In addition, an end-to-end neural embedding approach is introduced, based on an emotionally constrained triplet loss function. The loss function integrates the relations between the various emotional patterns and thus improves the latent space data representation. The proposed methodology achieves 83.35% and 64.92% global accuracy rates on the RAVDESS and CREMA-D publicly available datasets, respectively. When compared with the results provided by human observers, the gains in global accuracy scores are superior to 24%. Finally, the objective comparative evaluation with state-of-the-art techniques demonstrates accuracy gains of more than 3%.

ACS Style

Bogdan Mocanu; Ruxandra Tapu; Titus Zaharia. Utterance Level Feature Aggregation with Deep Metric Learning for Speech Emotion Recognition. Sensors 2021, 21, 4233 .

AMA Style

Bogdan Mocanu, Ruxandra Tapu, Titus Zaharia. Utterance Level Feature Aggregation with Deep Metric Learning for Speech Emotion Recognition. Sensors. 2021; 21 (12):4233.

Chicago/Turabian Style

Bogdan Mocanu; Ruxandra Tapu; Titus Zaharia. 2021. "Utterance Level Feature Aggregation with Deep Metric Learning for Speech Emotion Recognition." Sensors 21, no. 12: 4233.

Short communication
Published: 29 October 2018 in Pattern Recognition Letters
Reads 0
Downloads 0

Recent statistics of the World Health Organization (WHO), published in October 2017, estimate that more than 253 million people worldwide suffer from visual impairment (VI) with 36 million of blinds and 217 million people with low vision. In the last decade, there was a tremendous amount of work in developing wearable assistive devices dedicated to the visually impaired people, aiming at increasing the user cognition when navigating in known/unknown, indoor/outdoor environments, and designed to improve the VI quality of life. This paper presents a survey of wearable/assistive devices and provides a critical presentation of each system, while emphasizing related strengths and limitations. The paper is designed to inform the research community and the VI people about the capabilities of existing systems, the progress in assistive technologies and provide a glimpse in the possible short/medium term axes of research that can improve existing devices. The survey is based on various features and performance parameters, established with the help of the blind community that allows systems classification using both qualitative and quantitative measures of evaluation. This makes it possible to rank the analyzed systems based on their potential impact on the VI people life.

ACS Style

Ruxandra Tapu; Bogdan Mocanu; Titus Zaharia. Wearable assistive devices for visually impaired: A state of the art survey. Pattern Recognition Letters 2018, 137, 37 -52.

AMA Style

Ruxandra Tapu, Bogdan Mocanu, Titus Zaharia. Wearable assistive devices for visually impaired: A state of the art survey. Pattern Recognition Letters. 2018; 137 ():37-52.

Chicago/Turabian Style

Ruxandra Tapu; Bogdan Mocanu; Titus Zaharia. 2018. "Wearable assistive devices for visually impaired: A state of the art survey." Pattern Recognition Letters 137, no. : 37-52.

Conference paper
Published: 23 November 2017 in Computer Vision
Reads 0
Downloads 0

In this paper we introduce a novel single object tracking method that extends the traditional GOTURN algorithm with a visual attention model. The proposed approach returns accurate object tracks and is able to handle sudden camera and background movement, long-term occlusions and multiple moving objects that can evolve simultaneously in a same neighborhood. The process of occlusion identification is performed using image quad-tree decomposition and patch matching, based on a convolution neural network trained offline. The object appearance model is adaptively modified in time based on both visual similarity constraints and trajectory verification tests. The experimental evaluation performed on the VOT 2016 dataset demonstrates the efficiency of our method that returns high accuracy scores regardless of the scene dynamics or object shape.

ACS Style

Bogdan Mocanu; Ruxandra Tapu; Titus Zaharia. Object Tracking Using Deep Convolutional Neural Networks and Visual Appearance Models. Computer Vision 2017, 114 -125.

AMA Style

Bogdan Mocanu, Ruxandra Tapu, Titus Zaharia. Object Tracking Using Deep Convolutional Neural Networks and Visual Appearance Models. Computer Vision. 2017; ():114-125.

Chicago/Turabian Style

Bogdan Mocanu; Ruxandra Tapu; Titus Zaharia. 2017. "Object Tracking Using Deep Convolutional Neural Networks and Visual Appearance Models." Computer Vision , no. : 114-125.

Journal article
Published: 28 October 2017 in Sensors
Reads 0
Downloads 0

In this paper, we introduce the so-called DEEP-SEE framework that jointly exploits computer vision algorithms and deep convolutional neural networks (CNNs) to detect, track and recognize in real time objects encountered during navigation in the outdoor environment. A first feature concerns an object detection technique designed to localize both static and dynamic objects without any a priori knowledge about their position, type or shape. The methodological core of the proposed approach relies on a novel object tracking method based on two convolutional neural networks trained offline. The key principle consists of alternating between tracking using motion information and predicting the object location in time based on visual similarity. The validation of the tracking technique is performed on standard benchmark VOT datasets, and shows that the proposed approach returns state-of-the-art results while minimizing the computational complexity. Then, the DEEP-SEE framework is integrated into a novel assistive device, designed to improve cognition of VI people and to increase their safety when navigating in crowded urban scenes. The validation of our assistive device is performed on a video dataset with 30 elements acquired with the help of VI users. The proposed system shows high accuracy (>90%) and robustness (>90%) scores regardless on the scene dynamics.

ACS Style

Ruxandra Tapu; Bogdan Mocanu; Titus Zaharia. DEEP-SEE: Joint Object Detection, Tracking and Recognition with Application to Visually Impaired Navigational Assistance. Sensors 2017, 17, 2473 .

AMA Style

Ruxandra Tapu, Bogdan Mocanu, Titus Zaharia. DEEP-SEE: Joint Object Detection, Tracking and Recognition with Application to Visually Impaired Navigational Assistance. Sensors. 2017; 17 (11):2473.

Chicago/Turabian Style

Ruxandra Tapu; Bogdan Mocanu; Titus Zaharia. 2017. "DEEP-SEE: Joint Object Detection, Tracking and Recognition with Application to Visually Impaired Navigational Assistance." Sensors 17, no. 11: 2473.

Journal article
Published: 28 October 2016 in Sensors
Reads 0
Downloads 0

In the most recent report published by the World Health Organization concerning people with visual disabilities it is highlighted that by the year 2020, worldwide, the number of completely blind people will reach 75 million, while the number of visually impaired (VI) people will rise to 250 million. Within this context, the development of dedicated electronic travel aid (ETA) systems, able to increase the safe displacement of VI people in indoor/outdoor spaces, while providing additional cognition of the environment becomes of outmost importance. This paper introduces a novel wearable assistive device designed to facilitate the autonomous navigation of blind and VI people in highly dynamic urban scenes. The system exploits two independent sources of information: ultrasonic sensors and the video camera embedded in a regular smartphone. The underlying methodology exploits computer vision and machine learning techniques and makes it possible to identify accurately both static and highly dynamic objects existent in a scene, regardless on their location, size or shape. In addition, the proposed system is able to acquire information about the environment, semantically interpret it and alert users about possible dangerous situations through acoustic feedback. To determine the performance of the proposed methodology we have performed an extensive objective and subjective experimental evaluation with the help of 21 VI subjects from two blind associations. The users pointed out that our prototype is highly helpful in increasing the mobility, while being friendly and easy to learn.

ACS Style

Bogdan Mocanu; Ruxandra Tapu; Titus Zaharia. When Ultrasonic Sensors and Computer Vision Join Forces for Efficient Obstacle Detection and Recognition. Sensors 2016, 16, 1807 .

AMA Style

Bogdan Mocanu, Ruxandra Tapu, Titus Zaharia. When Ultrasonic Sensors and Computer Vision Join Forces for Efficient Obstacle Detection and Recognition. Sensors. 2016; 16 (11):1807.

Chicago/Turabian Style

Bogdan Mocanu; Ruxandra Tapu; Titus Zaharia. 2016. "When Ultrasonic Sensors and Computer Vision Join Forces for Efficient Obstacle Detection and Recognition." Sensors 16, no. 11: 1807.

Conference paper
Published: 21 October 2016 in Transactions on Petri Nets and Other Models of Concurrency XV
Reads 0
Downloads 0

In this paper we propose a new method for automatic storyboard segmentation of TV news using image retrieval techniques and content manipulation. Our framework performs: shot boundary detection, global key-frame representation, image re-ranking based on neighborhood relations and temporal variance of image locations in order to construct a unimodal cluster for anchor person detection and differentiation. Finally, anchor shots are used to form video scenes. The entire technique is unsupervised being able to learn semantic models and extract natural patterns from the current video data. The experimental evaluation performed on a dataset of 50 videos, totalizing more than 30 h, demonstrates the pertinence of the proposed method, with gains in terms of recall and precision rates with more than 5–7% when compared with state of the art techniques.

ACS Style

Bogdan Mocanu; Ruxandra Tapu; Titus Zaharia. Automatic Segmentation of TV News into Stories Using Visual and Temporal Information. Transactions on Petri Nets and Other Models of Concurrency XV 2016, 648 -660.

AMA Style

Bogdan Mocanu, Ruxandra Tapu, Titus Zaharia. Automatic Segmentation of TV News into Stories Using Visual and Temporal Information. Transactions on Petri Nets and Other Models of Concurrency XV. 2016; ():648-660.

Chicago/Turabian Style

Bogdan Mocanu; Ruxandra Tapu; Titus Zaharia. 2016. "Automatic Segmentation of TV News into Stories Using Visual and Temporal Information." Transactions on Petri Nets and Other Models of Concurrency XV , no. : 648-660.

Journal article
Published: 21 May 2016 in Multimedia Tools and Applications
Reads 0
Downloads 0

In this paper, we introduce a novel computer vision-based perception system, dedicated to the autonomous navigation of visually impaired people. A first feature concerns the real-time detection and recognition of obstacles and moving objects present in potentially cluttered urban scenes. To this purpose, a motion-based, real-time object detection and classification method is proposed. The method requires no a priori information about the obstacle type, size, position or location. In order to enhance the navigation/positioning capabilities offered by traditional GPS-based approaches, which are often unreliably in urban environments, a building/landmark recognition approach is also proposed. Finally, for the specific case of indoor applications, the system has the possibility to learn a set of user-defined objects of interest. Here, multi-object identification and tracking is applied in order to guide the user to localize such objects of interest. The feedback is presented to user by audio warnings/alerts/indications. Bone conduction headphones are employed in order to allow visually impaired to hear the systems warnings without obstructing the sounds from the environment. At the hardware level, the system is totally integrated on an android smartphone which makes it easy to wear, non-invasive and low-cost.

ACS Style

Ruxandra Tapu; Bogdan Mocanu; Titus Zaharia. A computer vision-based perception system for visually impaired. Multimedia Tools and Applications 2016, 76, 11771 -11807.

AMA Style

Ruxandra Tapu, Bogdan Mocanu, Titus Zaharia. A computer vision-based perception system for visually impaired. Multimedia Tools and Applications. 2016; 76 (9):11771-11807.

Chicago/Turabian Style

Ruxandra Tapu; Bogdan Mocanu; Titus Zaharia. 2016. "A computer vision-based perception system for visually impaired." Multimedia Tools and Applications 76, no. 9: 11771-11807.

Journal article
Published: 04 September 2015 in Journal of Ambient Intelligence and Smart Environments
Reads 0
Downloads 0

International audienceIn this paper we introduce a novel assistant for autonomous navigation of partially sighted people. The system, called ALICE, offers information about the location and possible directions a visual impaired user must follow in order to reach the desired destinations. The navigation is completed with a novel computer vision method that is able to detect and classify, in real-time, both static and dynamic obstacles without any a priori information about the obstacle type, size, position or location. The GPS localization is enriched with a visual landmark recognition technique. Finally, through audio feedback a set of warnings is launched so that the user is alerted of potential hazards. For the feedback, audio bone conduction headphones are employed in order to allow the visually impaired to hear the systems warnings, but also other sounds from the environment. At the hardware level, the system is totally integrated on a smartphone which makes it easy to wear, non-invasive and low-cost

ACS Style

Ruxandra Tapu; Bogdan Mocanu; Titus Zaharia. ALICE: A smartphone assistant used to increase the mobility of visual impaired people. Journal of Ambient Intelligence and Smart Environments 2015, 7, 659 -678.

AMA Style

Ruxandra Tapu, Bogdan Mocanu, Titus Zaharia. ALICE: A smartphone assistant used to increase the mobility of visual impaired people. Journal of Ambient Intelligence and Smart Environments. 2015; 7 (5):659-678.

Chicago/Turabian Style

Ruxandra Tapu; Bogdan Mocanu; Titus Zaharia. 2015. "ALICE: A smartphone assistant used to increase the mobility of visual impaired people." Journal of Ambient Intelligence and Smart Environments 7, no. 5: 659-678.