Dhanushkumar R

Data Scientist | Microsoft Learn Student Ambassador - GOLD | AI & Machine Learning Enthusiast

Dhanushkumar R

About Me

Dhanushkumar R

B.Tech in Artificial Intelligence and Data Science

I am a passionate Data Science and AI Enthusiast with a strong foundation in machine learning, deep learning, and natural language processing. My work focuses on leveraging cutting-edge tools and technologies to develop innovative solutions across various domains.

As a Microsoft Learn Student Ambassador (Gold), I actively contribute to the tech community by organizing workshops, events, and learning challenges to help students enhance their skills in Microsoft technologies.

Key Areas of Expertise:

  • Programming & Tools: Python, TensorFlow, PyTorch, Scikit-Learn, LangChain, Hugging Face, and ONNX
  • Core Domains: NLP, Computer Vision, Time Series Analysis, Audio Processing, and Generative AI
  • Specializations: Reinforcement Learning (PPO), Retrieval-Augmented Generation (RAG), ETL pipelines, Databricks and LLM fine-tuning
  • Data Visualization: Power BI, Tableau, and Streamlit for building interactive dashboards and insights
Machine Learning
Deep Learning
NLP
Computer Vision
Generative AI
Azure AI
RAG

Experience

Sep 2024 - Present

Data Scientist - Intern

BigTapp Analytics

  • Developed a web application for Singapore users, supporting insurance-related queries in multiple languages with text and TTS responses using FastAPI, HTML/JavaScript, Weaviate, Azure OpenAI, and gTTS
  • Designed and deployed multimodal RAG chatbots using Weaviate Cloud and on-premise environments
  • Built an end-to-end ASR, STT, and TTS application supporting multiple languages, integrating LLMs for enhanced accuracy
  • Developed agentic AI applications using LangGraph, Agno, and function-calling modules
  • Built multilingual chatbot applications using Thai and Malay-based SLMs with Ollama
  • Integrated LLM observability and monitoring with Langfuse
  • Led development of PII-sensitive RAG pipeline using LangChain and Presidio
  • Implemented Arabic PII masking with arabic-reshaper
  • Designed web scraping and content ingestion pipelines using bs4, crawl4ai, and llama-index
  • Resolved API issues and integrated Reader API for efficient data extraction
  • Built scalable REST APIs using FastAPI
  • Conducted research on LLM evaluation frameworks (RAGAS, Promptfoo, DeepEval)
Nov 2024 - Present

Founder

National Student Data Corps SJCE

  • Managed collaborations with companies for workshops, hackathons, and internships
  • Provided career guidance and insights to students in data science and AI
  • Conducted sessions on AI and data to help students scale their knowledge
Feb 2025 - May 2025

Microsoft Student Ambassador - Gold

Microsoft Learn Student Ambassadors

  • Completed technical onboarding
  • Created and hosted learning challenges for PowerApps, Azure Fundamentals, and Azure Data Fundamentals
  • Conducted workshops on Power Pages, Power BI, ETL, Git, and GitHub
  • Organized events on Data Science with Azure Jupyter Notebook
  • Initiated Microsoft Azure Community in the region
  • Invited Microsoft employees and MVPs to conduct sessions on cracking MAANGF companies
  • Trained and supported over 50 students for Azure certifications
Jan 2024 - May 2025

Microsoft Learn Students Community Representative

St. Joseph's Group of Institutions

  • Established and led the Microsoft Learn Student Community
  • Organized events and sessions to assist students with Microsoft technologies and certifications
  • Collaborated with students to provide guidance on courses and certifications
  • Built a community that enhanced students' skills and knowledge in Microsoft technologies
Jul 2024 - Aug 2024

Research Intern

Indian Institute of Technology, Kharagpur

  • Conducted research on EMG signal denoising using Discrete Wavelet Packets, VMD, and TQWT
  • Applied Gaussian noise to raw EMG data, removed noise with algorithms, and fused denoised EMG components
  • Achieved successful denoising results with threshold value extraction and signal fusion techniques
Jun 2024 - Aug 2024

Machine Learning Intern

Veritometrics

  • Worked on real-time timeline summarization with topic modeling using BERTopic and LLMs
  • Built a multi-agent conversation question-answering chatbot with LLAMA2
Apr 2024 - Aug 2024

AI Research Blogger

OptimumAI

  • Explored Generative AI and content writing
  • Focused on fine-tuning LLMs, including LoRA, QLoRA, and PaliGemma
Jan 2024 - May 2024

Machine Learning Intern

RBG.AI

  • Optimized OCR architecture using models like ResNet, MobileNet, VGGNet, LeNet, and custom CNNs
  • Developed NLP techniques for automated resume parsing
  • Contributed to the Artificial Intelligence Service Stack (AISS) and Machine Learning Service Stack (MLstack)
  • Collaborated with domain experts and embraced continuous learning
Jan 2024 - May 2024

Deep Learning Intern

Neobrim

  • Specialized in deep reinforcement learning, focusing on Proximal Policy Optimization (PPO)
  • Developed and deployed deep reinforcement learning models for automation tasks
  • Designed shallow and dense neural networks for a comprehensive automation framework
Jan 2024 - May 2024

Deep Learning Research Intern

Phosphene AI

  • Conducted R&D on Deepfake Generation and Detection
  • Developed a Unet 3D model for DeepFake video segmentation
  • Assisted in Deepfake video classification to enhance detection accuracy
Oct 2023 - Apr 2024

Open Source Contributor / Machine Learning Engineer

Omdena (Algeria Chapter)

  • Developed an open-source water management and forecasting system
  • Enhanced sustainable water resource utilization through accurate forecasts
  • Empowered local stakeholders with tools and knowledge for informed water management
  • Fostered collaboration among local government, NGOs, and research communities

Education

2021 - 2025

Bachelor of Technology (B.Tech)

St. Joseph's College Of Engineering

Field of Study: Artificial Intelligence and Data Science

GPA: 9.00/10

Coursework: Python, Fundamentals of Java, Data Structures, Foundations of Data Science, Data Mining, Computational Linguistics, Neuro Fuzzy Computing, Text Analytics, Computer Vision, Big Data Management, Ethics in Data Science, Data Visualization, Data Security, Linear Algebra, Probability and Statistics, Predictive Analytics

Activities: Microsoft Student Ambassador, NSDC-SJCE Founder, Event Organizer, Volunteering, Mentoring, Public Speaking, Intel OneAPI Student Ambassador

2019 - 2021

Higher Secondary

Senthil Public School

Field of Study: Maths Biology

Grade: 93/100

Subjects: Maths, Biology, Chemistry, Physics

Projects

Data Science Agent with Autogen

Multi-Agent Azure AI Autogen

Built "AutoDS: A Multi-Agent Data Science & Analytics Platform" using Autogen and Azure AI Services, splitting the end-to-end pipeline into specialized agents for raw data to deployable model and insights.

Phi_4_on-colab_with_ollama

Jupyter Notebook LLM Ollama

Implementation of Phi-4 model on Google Colab with Ollama, enabling efficient deployment and usage of this powerful language model.

Finetuning_Paligemma

Jupyter Notebook Fine-tuning Multimodal

Fine-tuning implementation for the Paligemma model, a powerful multimodal model that combines vision and language capabilities.

Weaviate_With_RAG

Jupyter Notebook RAG Vector Database

Implementation of Retrieval-Augmented Generation using Weaviate vector database, enhancing LLM responses with relevant context.

OneMed

Multimodal Chatbot Azure

Multiagent Medical Multimodal Chatbot Assistant with LLM-powered agentic capabilities, integrating document retrieval and external search tools. Implemented Azure Map for doctor and hospital location services and Azure OCR for analyzing medical documents.

Epilepsy Prediction with EEG Data

Deep Learning EEG Streamlit

Predicted epilepsy using an EEG dataset in .edf format and developed the Chrononet model with deep learning algorithms. Utilized ML pipelines and created a Streamlit-based UI for data analysis.

Bratzlife - AI-Powered Fitness Assistant

Google Generative AI LangChain Fitness

Developed an AI-driven multimodal fitness assistant using Google Generative AI for personalized workout plans, dietary suggestions, and motivational quotes. Integrated LangChain-based search with DuckDuckGo to recommend protein supplements and diet-friendly restaurants.

PolyLingua

Speech Multilingual NLP

A Multilingual Intelligent Speech Assistant capable of converting speech to text, translating non-English input, converting text to speech, detecting spoken language automatically, and enabling fluid two-way conversation across multiple languages.

AI-Powered Cryptocurrency Fraud Detection

Web3 Blockchain Fraud Detection

Leveraged AI and Web3 technologies to identify and mitigate fraudulent activities in cryptocurrency transactions. Analyzed blockchain data with machine learning models to detect suspicious patterns and ensure secure decentralized financial ecosystems.

Segmentation with Mamba

Mamba Segmentation Computer Vision

Developed a high-performance building segmentation pipeline using the Mamba framework, focused on the Massachusetts Buildings dataset.

Publications

Epilepsy Prediction with Electroencephalogram Using Machine Learning and Deep Learning

2024

Comprehensive analysis of epilepsy prediction techniques using EEG data, exploring both machine learning and deep learning approaches for accurate diagnosis.

Optimizing OCR Model Performance: Comparative Study of Backbone Architecture and HyperParameter Tuning

2024

In-depth analysis of various backbone architectures for OCR models, comparing performance metrics and exploring hyperparameter optimization techniques.

Enhancing Oncology Precision through Machine Learning to Revolutionize Cancer Prediction

2024

Exploration of machine learning applications in cancer prediction and diagnosis, highlighting advancements in precision oncology through AI-driven approaches.

Quantization with Unsloth

September 12, 2024

Techniques used to reduce memory footprint and computational requirements in LLMs, making them more efficient and accessible.

67 Claps 2 Responses

PPO Algorithm

February 21, 2024

Deep dive into Proximal Policy Optimization in reinforcement learning, explaining its principles, implementation, and applications.

220 Claps 1 Response

Skills

Key Expertise

Data Science & AI 95%
Machine Learning & Deep Learning 90%
Natural Language Processing 85%
Computer Vision 80%
Generative AI & LLMs 90%

Programming & Libraries

Python 95%
TensorFlow/PyTorch 85%
LangChain 90%
Scikit-learn 85%
OpenCV 80%

Cloud & Tools

Azure AI & Cloud Services 90%
Hugging Face 85%
Docker 75%
Git & GitHub 85%
Power BI & Tableau 80%

Specialized Techniques

Retrieval-Augmented Generation (RAG) 90%
LLM Fine-tuning 85%
AI Agents & LLM Evaluation 80%
Automatic Speech Recognition 75%
ETL & Data Pipelines 80%

Certifications

Azure AI Associate Engineer

Microsoft

Azure AI Associate Engineer

2024

Certified professional in Azure AI services and solutions

Azure Data Fundamentals

Microsoft

Azure Data Fundamentals

2023

Foundational knowledge of core data concepts and Azure data services

TensorFlow Developer

Deeplearning.ai

TensorFlow Developer

2023

Proficient in building and training neural networks using TensorFlow

IBM AI Engineer

IBM

IBM AI Engineer

2023

Expertise in AI engineering principles and practices

Python For Data Science

IIT Madras

Python For Data Science

2022

Advanced Python programming for data analysis and visualization

NLP Specialization

Deep Learning.ai

NLP Specialization

2023

Comprehensive training in natural language processing techniques

Machine Learning with Python

Internshipstudio

Machine Learning with Python

2023

Practical implementation of machine learning algorithms using Python

Python Competency Level 2

Persistent

Python Competency Level 2

2023

Advanced Python programming skills certification

Honors & Awards

Solve4planet Hackathon - 2nd Place

Recognized for innovative solution addressing environmental challenges.

UIPath Hackathon - 1st Prize

Won first place for developing an automated solution using UIPath's RPA platform.

Sleth It Out - 2nd Prize

Awarded second place in this competitive technical challenge.

Data Analytics Challenge: Predicting the Winner of the ICC Cricket World Cup 2023

Recognized for developing an accurate predictive model for cricket tournament outcomes.

SIH Finalist

Selected as a finalist in the prestigious Smart India Hackathon.

Contact

Location

Chennai, Tamil Nadu, India

GitHub

Idk507

Medium

@danushidk507