Aaron Kambhampati →
Working on projects

Selected Work

Project Portfolio

A mix of vision-AI systems, data products, and LLM-powered tools — built end-to-end from messy raw data to clean, reliable experiences. These are the projects I’m most proud of shipping and maintaining.

What I focus on

  • Real-time computer-vision & gaze analytics
  • LLM workflows for learning & productivity
  • Clean, debuggable, production-ready code

Typical stack

  • Python, PyTorch / ONNX, OpenCV
  • Node.js, Express, EJS frontends
  • PostgreSQL / MySQL, Power BI

PROJECT DOMAINS

Finance Case Study IDP System
2026

IDP Document Intelligence

Built an end-to-end Intelligent Document Processing pipeline: classify → extract → validate → review, with confidence scoring and audit logs for high accuracy.

Scope

Invoices, KYC, bank statements, policy docs.

Pipeline

OCR + layout parsing + extraction + rules + HITL.

Impact

Reduced manual ops, faster TAT, fewer errors.

IDP preview top
IDP preview bottom
FINANCE HEALTH Supply Chain Management
Finance Adhaar Masking
2024

Adhaar Masking Handling Confidentiality

Real-time Adhaar Masking using multiple OCR Tools like AWS Textract, Google Documnt AI, Pytesseract. The algorithm behind an Adhaar number has a pattern, detecting layouts and tabels in a document, validated 1000's of document masking, mainted atmost confidentiality.

Input

A document that may contain an Adhaar number

Pipeline

Document → OCR → Layout Handling → Pattern Recogniton → Masking.

Impact

Since Aadhaar numbers are PII, unauthorized storage is a cognizable crime. This software masks numbers to protect user privacy.

Adhaar masking preview
Adhaar masking result
Finance Life Insurance
2024

Smart Underwriter Co-pilot

I worked on the medical underwriting component of an insurance risk assessment system, where I developed an AI framework combining a Vision Transformer (ViT) and Claude Sonnet 3 to analyze medical documents and diagnostic images such as ECGs, TMT reports, and chest X-rays. The solution assists underwriters in evaluating applicant health risks by identifying medical conditions and generating structured assessments, achieving an AUC-ROC score of 0.99 on validation data.

Scale

Thousands of events per day.

Stack

Python, AWS Bedrock, Googles vision Transformer, collab, Fast API, ML, Computer vision, Prompt Engineering, LLM's.

Impact

Reduced reporting time by 80%, sharper spend decisions.

Smart underwriter preview
Smart underwriter result
Finance Case Study GDP
2025

GDP Analysis

GDP Forecasting & Economic Trend Analysis – Developed an end-to-end time series forecasting pipeline using historical GDP and macroeconomic indicators as inputs. The workflow included data cleaning, missing value treatment, stationarity testing (ADF), feature engineering, trend and seasonality analysis, followed by model development using both SARIMA (statistical baseline) and LSTM (deep learning) approaches. Models were evaluated using forecasting error metrics and used to generate future GDP projections, enabling analysis of long-term economic growth patterns and supporting data-driven policy and investment decision-making.

Input

Historical GDP data along with macroeconomic indicators such as inflation, unemployment rate, interest rates, population growth, and trade statistics.

Pipeline

Economic Data → Preprocessing & Feature Engineering → SARIMA/LSTM Modeling → Forecast Generation → Economic Insights.

Impact

Generated accurate GDP forecasts and economic trend insights to support strategic planning, policy evaluation, and investment decision-making.

GDP forecast preview
GDP trend analysis
Computer Vision Case Study Vision Pipeline
2026

Vision Analytics Patented

Developed an edge AI-based audience analytics platform for digital advertising screens that measures viewer demographics and engagement in real time. The system processes camera feeds on a Raspberry Pi 5 using computer vision models for face detection, tracking, age/gender estimation, and gaze analysis to determine audience attention levels. Aggregated analytics are presented through dashboards, enabling advertisers and media owners to understand audience composition, measure campaign effectiveness, and optimize advertising placements while maintaining privacy by avoiding personal identification.

Hardware

Hailo-8, Raspberry Pi, standard webcams.

Pipeline

Camera Feed → YuNet Face Detection → Face Tracking → Age/Gender Estimation → FaceMesh & Gaze Analysis → Audience Analytics Aggregation → Real-Time Dashboard & Insights.

Impact

Enabled privacy-conscious audience measurement for digital advertising by providing real-time demographic and attention analytics, helping advertisers improve campaign performance and optimize media spending<5W power draw.

Vision analytics preview
Vision analytics dashboard
Health AI AI-Powered Medical System
2024

HEALTH AI Systems

Developed a medical risk assessment framework for insurance underwriting using Vision Transformers and Claude Sonnet 3 to analyze diagnostic reports and medical images.Rather then creating a pipeline, I have experiemented and created the firt type of LLM's called the "Directional LLM's", where The LLM is supported with a ML model it could be from a simple class to a treshold, which gives the LLM, to think in that direction and verify/validate/justify the decision.

Input

ECGs, TMT reports, chest X-rays, medical records, and patient history.

Pipeline

Medical Documents & Images → ViT-based Classification → Claude Sonnet 3 Reasoning → Condition Identification → Risk Assessment Report.

Focus

Assisted underwriters in evaluating medical risk and making more consistent insurance approval and premium decisions.

Health AI systems preview
Health AI systems result
CARE HEALTH Supply Chain Management
Health Case Study Eye Disease Detection
2025

Eye Disease CNN

Deep learning classifier for diabetic retinopathy and glaucoma detection from fundus images — trained on 90k+ scans, achieving 94% AUC with explainable Grad-CAM heatmaps for clinical review.

Data

APTOS, EyePACS, custom augmented sets.

Model

EfficientNet-B4 + Grad-CAM explainability layer.

Impact

94% AUC, 3× faster screening vs. manual review.

Eye disease fundus scan
Eye disease detection result
Sampling Health Pools
2025

Synthetic Healthcare Data Generation

Created synthetic healthcare datasets using GANs and YData Synthetic to generate realistic patient profiles while preserving privacy. Formed patient clusters that replicated real-world disease patterns, demographics, and clinical trends for downstream AI/ML applications.

Input

Patient demographics, medical history, lab results, vitals, diagnoses, and treatment records.

Pipeline

Data Preprocessing → Patient Clustering → GAN/YData Synthetic Generation → Statistical Validation → Synthetic Dataset Creation.

Impact

Enabled privacy-safe healthcare analytics, expanded limited datasets, and improved AI model development by providing realistic synthetic patient populations that closely matched real-world trends.

Synthetic data generation preview
Synthetic patient data output
Cardiac Case Study Heart Disease
2025

Heart Disease Prediction

Developed a machine learning framework to predict the likelihood of heart disease using patient clinical and lifestyle data, enabling early risk identification and preventive healthcare interventions.

Input

Age, gender, blood pressure, cholesterol levels, ECG results, heart rate, chest pain type, blood sugar levels, smoking history, and other clinical indicators.

Pipeline

Data Preprocessing → Missing Value Treatment & Feature Engineering → Exploratory Data Analysis → Feature Selection → Model Training (Random Forest, XGBoost, Logistic Regression) → Hyperparameter Tuning → Risk Prediction & Evaluation.

Impact

Improved early detection of high-risk patients, supported clinical decision-making, reduced manual risk assessment effort, and enabled proactive treatment planning through data-driven insights.

Heart disease prediction preview
Heart disease prediction result
Diabetes Case study Medical classification
2025

Diabetes Prediction

Built a machine learning model to predict the risk of diabetes using patient health and lifestyle indicators, helping identify high-risk individuals at an early stage.

Data

Public healthcare datasets from sources such as the National Institute of Diabetes and Digestive and Kidney Diseases (Pima Indians Diabetes Dataset), Centers for Disease Control and Prevention health datasets, and patient health records containing glucose levels, BMI, age, blood pressure, insulin l evels, family history, and pregnancy-related factors.

Model

Logistic Regression, Random Forest, XGBoost, Support Vector Machines (SVM), and Neural Networks for classification and risk scoring.

Impact

Enabled early identification of at-risk individuals, supported preventive healthcare initiatives, improved clinical decision-making, and facilitated targeted lifestyle and treatment interventions to reduce the progression of diabetes-related complications.

Diabetes prediction preview
Diabetes prediction result
COVID-19 Strain Analysis Viral Genome Comparison
2023

RNA Genome Comparison

Developed a bioinformatics and machine learning framework to analyze COVID-19 RNA sequences and compare emerging viral strains against known viruses such as SARS-CoV, MERS-CoV, and Ebola. Leveraged sequence analysis, structural biology tools, and statistical pattern recognition to identify genetic similarities, mutations, and evolutionary relationships, enabling better understanding of novel viral behavior and potential risks.

Input

PDB gene/protein structure files, viral RNA sequences, mutation datasets, and genomic data from sources such as National Center for Biotechnology Information, Global Initiative on Sharing All Influenza Data, and Protein Data Bank.

Pipeline

Sequence Processing (BioPython) → Similarity & Mutation Analysis → Structure Prediction (AlphaFold) → Pattern Comparison & Visualization.

Impact

Identified evolutionary relationships between viral strains, highlighted mutation hotspots and structural similarities, and supported faster biological interpretation of new variants for research and public health decision-making.

RNA genome comparison preview
Viral strain analysis result
Logistics Case Study Supply Chain
2026

FMCG Supply Chain Intelligence & Consumer Behavior Analysis

Worked on analyzing FMCG supply chain operations and consumer purchasing behavior to understand product movement, demand patterns, logistics efficiency, and customer preferences. Designed and analyzed surveys to capture behavioral trends, purchasing decisions, inventory movement, and end-to-end logistics flow across the FMCG product lifecycle. Additionally, developed recommendation systems to improve product placement, demand forecasting, and customer engagement.

Scope

Studied the complete FMCG product lifecycle from procurement, manufacturing, warehousing, and distribution to retail sales and customer consumption. Analyzed consumer behavior, logistics bottlenecks, inventory movement, and product demand patterns to support data-driven supply chain decisions.

Stack

Python, Pandas, NumPy, Scikit-Learn, SQL, Power BI/Tableau, Survey Analytics, Recommendation Systems (Collaborative & Content-Based Filtering), Statistical Analysis, Data Visualization, and Supply Chain KPI Monitoring.

Impact

Identified consumer buying patterns and logistics inefficiencies, improved visibility across the FMCG supply chain, enabled data-driven inventory and distribution planning, and developed recommendation models that enhanced product discovery, customer engagement, and demand alignment.

FMCG supply chain dashboard
FMCG consumer behavior analysis
SUPPLY CHAIN MGMT
Supply Chain Case Study Demand Forecasting
2026

Demand Forecasting

End-to-end demand forecasting and inventory replenishment system — ingests POS, warehouse, and supplier lead-time data to predict SKU-level demand 12 weeks ahead, auto-triggering reorder signals when safety stock thresholds are breached.

Data

POS logs, warehouse stock, supplier SLAs, weather.

Models

LSTM + XGBoost ensemble, anomaly detection layer.

Impact

40% fewer stockouts, 25% reduction in overstock cost.

Demand forecasting dashboard
Inventory replenishment output
Research Medical Diagnosis
2024–2026

Directional LLM's

Pioneered and experimentally validated the concept of Directional LLMs, a novel framework demonstrating that Large Language Models achieve substantially higher predictive accuracy when provided with an explicit reasoning direction prior to inference. Introduced the use of Vision Transformer (ViT)-generated directional labels and structured contextual cues to guide model reasoning toward clinically relevant decision pathways. Through extensive experimentation on medical diagnosis tasks, established that directional guidance can transform model performance from approximately 66% to 99% AUC-ROC, proving that targeted reasoning signals significantly enhance prediction quality compared to conventional prompting approaches.

Areas

Implemented and evaluated in medical diagnosis workflows involving ECG reports, pathology reports, clinical notes, radiology findings, and patient health records, where contextual guidance was used to steer model reasoning toward specific diagnostic objectives.

Output

A framework that augments LLM inference with directional signals, producing more accurate disease classification, risk assessment, diagnostic recommendations, and clinically relevant explanations compared to conventional prompting approaches.

Goal

To improve the reliability, interpretability, and predictive accuracy of LLMs in high-stakes healthcare applications by guiding model reasoning through structured contextual cues rather than relying solely on zero-shot or generic prompting strategies.

Directional LLMs research preview
Directional LLMs results
DATA RESEARCH PAPER
Research Superposition/Quantum Annealing
2025-Present

Quantum-Inspired Neural Optimization

Conducting ongoing research into a quantum-inspired optimization framework that models neural network initialization as a superposition of multiple candidate parameter states, enabling the simultaneous exploration of diverse learning trajectories. The hypothesis is that maintaining multiple optimization pathways during training can improve the likelihood of converging toward higher-quality minima compared to conventional single-initialization gradient-based learning.

Method

Designed a conceptual framework where multiple weight initialization states are maintained as a probabilistic state space rather than a single starting point. During training, candidate trajectories are evaluated and dynamically weighted based on optimization performance, drawing inspiration from quantum concepts such as superposition, state collapse, and amplitude amplification. Current work focuses on implementing classical approximations of these principles and benchmarking them against traditional optimization techniques such as SGD, Adam, and population-based approaches.

Metrics

Convergence Rate, Training Stability, Loss Landscape Exploration, Final Validation Accuracy, AUC-ROC, Generalization Performance, Computational Overhead, and Distance-to-Minima Analysis across multiple training runs.

Finding

Early experiments indicate that maintaining multiple candidate optimization pathways may improve exploration of the loss landscape and reduce sensitivity to poor initializations. Initial results suggest potential improvements in convergence consistency and solution quality; however, extensive validation across diverse architectures and datasets is still ongoing. Current efforts are focused on formalizing the optimization strategy, quantifying trade-offs between computational cost and performance gains, and evaluating its applicability to deep learning and healthcare AI problems.

Quantum-inspired optimization preview
Neural optimization results