Hemesh | Data Solutions Engineer

Experience

Software Engineering Intern

June 2024 - Dec 2024

Reinsurance Group of America Chesterfield, MO

Migrated legacy JavaScript codebase to React 18 with TypeScript and Material UI, boosting maintainability and performance.
Optimized Node.js backend with caching and efficient API design, improving platform speed by 15%.
Automated UI testing using TypeScript, Playwright, and Cucumber JS; reduced manual regression effort by 80%.
Fixed security vulnerabilities (SSDLC) while modernizing the frontend for compliance
Designed and deployed an AWS Lambda function to automatically detect and terminate idle EC2 instances running EMR clusters, reducing cloud costs by $1,000 per month; integrated with Datadog for real-time monitoring and Slack for automated notifications

React TypeScript Node.js Mocha React Testing Library Material UI Jenkins Node.js Playwright AWS Lambda

Software Engineer & Machine Learning Engineer

Dec 2022 - Aug 2023

New Pro Data Madurai, India

Built an NLP resume parser with 95% accuracy, extracting key resume data automatically.
Used MPNet embeddings and FAISS for semantic job-resume matching, boosting accuracy to 93%.
Developed a Django-based HR chatbot that automated onboarding and eliminated third-party platforms.

Python Django MPNet HuggingFace AWS

Data Analyst Intern

Aug 2021 - Sep 2021

The Sparks Foundation Chennai, India

Conducted EDA with Matplotlib, Seaborn, and Plotly, optimizing inventory and reducing costs by 20%
Evaluated historical sales data to identify key trends affecting item availability; proposed strategic adjustments resulting in an increase of sold items per week from 120 units to an impressive average of 180 units
Analyzed sales and inventory data to identify pricing opportunities, addressing the top three margin loss contributors on high-demand products.

Python Pandas Matplotlib Seaborn EDA Tableau

Projects

Cookbook: AI Recipe Generator

React Firebase GroqCloud API LLM

Developed a web application that enables users to discover and generate personalized recipes using GroqCloud's LLM (Llama model).

Integrated AI to generate meal ideas tailored to user preferences
Supported filters for cuisine, nutrition goals, and allergies
Stored and retrieved user history with Firebase for seamless UX

Live Demo Repository

Semantic Book Recommender

Python Hugging Face LangChain Qdrant Gradio

Built a semantic recommendation system using embeddings to help users discover relevant books based on content similarity and genre preferences.

Embedded 7,000+ book descriptions into 384-d vectors using all-MiniLM-L6-v2 for similarity search
Integrated Hugging Face and LangChain with Qdrant for fast cosine retrieval and metadata filtering
Deployed a Gradio app with a zero-shot classifier (∼80% accuracy) for fiction vs non-fiction labeling

Live Demo Repository

AI Agent for Flappy Bird Simulation

Python Pygame NEAT

Designed and trained an AI agent using NEAT to autonomously master the Flappy Bird game through neural network evolution.

Developed autonomous gameplay using NEAT to evolve neural networks without human input
Implemented collision detection, game mechanics, and dynamic pipe difficulty for realistic simulation
Achieved 100% survival rates by optimizing bird movement based on pipe positions and altitude

Repository

Football Analysis

Python YOLOv8 OpenCV PyTorch KMeans

Built an AI-powered system for real-time football analysis using computer vision techniques to extract insights.

Used YOLOv8 for object detection of players, referees, and the ball, achieving 79% mAP and 80% IoU
Clustered players by jersey color using KMeans with a silhouette score of 0.6; added optical flow to stabilize tracking with camera motion
Computed player speed and distance via perspective transformation for real-world accuracy

Repository

Netflix Dashboard

Tableau

Created a Tableau dashboard analyzing Netflix’s global catalog to uncover content strategies and market potential.

Identified high-demand regions like the U.S. and India, and untapped areas such as Africa and Eastern Europe
Revealed that 68% of the content consists of movies, highlighting Netflix’s focus on versatile viewing patterns
Surfaced trends in genres like Documentaries and Dramas, and growth in niche categories such as true crime

Live Dashboard

Automotive Sales ETL Pipeline

Azure Data Factory Databricks PySpark

Designed a robust ETL architecture for automotive sales data using industry-standard patterns and modern data engineering tools.

Implemented scalable pipelines with Medallion Architecture, Delta Lake, and Unity Catalog
Built a Star Schema with Fact and Dimension tables; automated incremental loads via stored procedures in ADF
Used SCD modeling in Databricks to ensure high data consistency and traceability

YouTube Trends Analysis

AWS S3 AWS Glue AWS Lambda Athena QuickSight

Built an AWS-based data pipeline to analyze regional YouTube video trends, leveraging serverless tools for transformation and visualization.

Ingested and transformed trending video data using AWS S3, Glue, and Lambda into Parquet format
Performed SQL-based analytics with Athena and created interactive dashboards in QuickSight
Used AWS Glue Catalog for automated schema detection and metadata management

Stock Market Real-Time Data Engineering

Apache Kafka AWS Glue Athena

Built a real-time data pipeline to simulate and process stock market data, leveraging Kafka and AWS analytics services for streaming insights.

Implemented producer-consumer pipelines in Kafka to ingest and stream real-time stock data
Simulated data using Python and Boto3, writing outputs to S3 for downstream processing
Used AWS Glue Crawlers to catalog datasets and Athena for on-demand SQL analysis

Lip Reader

Python TensorFlow Streamlit

Built a sentence-level lip-reading system using deep learning reduce reliance on audio-based speech recognition.

Built a LipNet-inspired model using STCNNs, RNNs, and CTC loss for sequence prediction
Trained on the GRID dataset and developed a Streamlit app for real-time lip-reading
Integrated into an assistive communication system for improved accessibility

Repository

Video Generator from Blog Posts

React Node.js Express GPTScript Tools

Built a web application that converts blog URLs into summarized videos using GPTScript and FFmpeg pipelines.

Used GPTScript to summarize content and generate visual + audio media
Implemented Node.js backend to handle FFmpeg workflows and API requests
Developed a React frontend to present generated videos interactively
Enhanced content accessibility and engagement through automation

Repository

Movie Ticket Booking Application

MERN Stack Stripe Redux Toolkit JWT

Built a full-featured theatre management platform with real-time seat tracking and secure payment integration.

Developed role-based portals for Users, Admins, and Theatre Owners with secure access
Built with Ant Design, Redux Toolkit, JWT + BCrypt authentication, and deployed on Render
Integrated Stripe for payments and implemented show scheduling, seat selection, and ticket control

Repository

Algae Classification

Python TensorFlow

Built a deep learning system to classify microscopic algae images for environmental monitoring.

Developed a deep learning system using CNN, AlexNet, and ViT to classify algae images
Processed FlowCam DB images with data augmentation, achieving 98% top-5 accuracy
Deployed at the City of Bloomington’s office, integrating into a live preprocessing-to-display pipeline

View Poster Read Paper

Hemesh RM Data Solutions Engineer

About Me

Experience

Software Engineering Intern

Software Engineer & Machine Learning Engineer

Data Analyst Intern

Education

Master of Science - Computer Science

Bachelor of Technology - Computer Science

Projects

Cookbook: AI Recipe Generator

Semantic Book Recommender

AI Agent for Flappy Bird Simulation

Football Analysis

Netflix Dashboard

Automotive Sales ETL Pipeline

YouTube Trends Analysis

Stock Market Real-Time Data Engineering

Lip Reader

Video Generator from Blog Posts

Movie Ticket Booking Application

Algae Classification

Skills

Languages

Frameworks & Databases

Tools

Cloud

Libraries