Posts by Collection

portfolio

2016

Primary-Ambient Source Separation

My first research project, spanning my M.Sc. at Nile University and an internship at Sony Stuttgart. The goal: automatically separate the direct sound from the diffuse ambience in a stereo recording, to enable surround sound upmixing.

2018

Singing Voice Intelligibility

For my M.Sc. at NUS, I studied what makes song lyrics easy or hard to understand, and built systems to measure it automatically — motivated by a real application: recommending music for language learning.

2022

Contextual Music Recommendation

My PhD project at Télécom Paris and Deezer, studying how listening context — activity, mood, device, time of day — shapes what music people want to hear, and building systems that learn to predict it automatically.

2024

Mood Monitoring with Speech Emotion Recognition

At Emobot, I led research on automatic speech emotion recognition — pushing accuracy significantly through synthetic data augmentation via emotion conversion, with direct impact on a real-time healthcare application.

publications

Published in The 13th Sound and Music Computing Conference (SMC), 2016

Primary-Ambient Extraction in Audio Signals Using Adaptive Weighting and Principal Component Analysis

This paper is about separting the primary and ambient sources from a sounds mixture to be used in surround sound upmixing. We propose a PCA-based approach to apply the separation

Published in The 19th International Society for Music Information Retreival Conference ISMIR, 2017

Intelligibility of Sung Lyrics: A Pilot Study

This paper is about estimating the intelligibility of the singing voice in a given song. We propose a set of acoustic features that are relevant for estimating the intelligibility. We also propose an approach for labeling songs with an intelligibility score accroding to human perception

Published in The IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018

Primary-Ambient Source Separation for Upmixing to Surround Sound Systems

This paper is about separting the primary and ambient sources from a sounds mixture to be used in surround sound upmixing. We propose a neural-network-based approach to apply the separation

Published in The 19th International Society for Music Information Retrieval Conference (ISMIR), 2018

Empirically Weighing the Importance of Decision Factors When Selecting Music to Sing

This paper is about estimating the singability of a given song and the factors that make one song more singable than another. We propose a number of acoustic features to automatically estimate the singability of a song.

Published in The IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020

Audio-Based Auto-Tagging With Contextual Tags for Music

This paper is about auto-tagging music tracks with context-related tags. The paper also presents a dataset of ∼50k tracks labelled with 15 different contexts.

Published in The 2020 International Conference on Multimedia Retrieval, 2020

Confidence-based Weighted Loss for Multi-label Classification with Missing Labels

This paper is about a weighted loss function that accounts for the missing labels in the training set that is easily usable in fine-tuning pre-trained models

Published in 21st International Society for Music Information Retrieval Conference (ISMIR), 2020

Should we consider the users in contextual music auto-tagging models?

This paper proposes a user-aware auto-tagging system for the contextual tags of music tracks

Published in 22nd International Society for Music Information Retrieval Conference (ISMIR), 2022

Exploiting Device and Audio Data to Tag Music with User-Aware Listening Contexts

This paper proposes real-time predictions of the user listening context

Published in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024

Towards Improving Speech Emotion Recognition Using Synthetic Data Augmentation from Emotion Conversion

This paper proposes using synthetic data augmentation via emotion conversion to improve speech emotion recognition models.

Karim M. Ibrahim