Sentiment Analysis and Speaker Diarization in Hindi and Marathi Using using Finetuned Whisper

Sentiment Analysis in Hindi and Marathi

Authors

  • Gowtham Papala Indian Institute of Information Technology, Nagpur, India
  • Aniket Ransing Indian Institute of Information Technology, Nagpur, India
  • Pooja Jain Indian Institute of Information Technology, Nagpur, India

DOI:

https://doi.org/10.12694/scpe.v24i4.2248

Keywords:

Automatic Speech Recognition, Whisper, Summarization, Sentiment analysis, Diarization, Affective ComputingSentiment Analysis, Question answering

Abstract

Automatic Speech Recognition (ASR) is a crucial technology that enables machines to automatically recognize human voices based on audio signals. In recent years, there has been a rigorous growth in the development of ASR models with the emergence of new techniques and algorithms. One such model is the Whisper ASR model developed by OpenAI, which is based on a Transformer encoder-decoder architecture and can handle multiple tasks such as language identification, transcription, and translation. However, there are still limitations to the Whisper ASR model, such as speaker diarization, summarization, emotion detection, and performance with Indian regional languages like Hindi, Marathi and others. This research paper aims to enhance the performance of the Whisper ASR model by adding additional components or features such as speaker diarization, text summarization, emotion detection, text generation and question answering. Additionally, we aim to improve its performance in Indian regional languages by training the model on common voice 11 dataset from huggingface. The research findings have the potential to contribute to the development of more accurate and reliable ASR models, which could improve human-machine communication in various applications.

 

Downloads

Published

2023-11-17

Issue

Section

Special Issue - Sentiment Analysis and Affective computing in Multimedia Data on Social Network