Projects

Systems I have built, improved, migrated, optimized, or explored.

A public-safe view of my work across production ML platforms, healthcare data workflows, model lifecycle tooling, Spark optimization, document AI, applied ML, and AI-assisted engineering.

Vaibhav Sharma profile photo

Current work

Production ML, MLOps, and healthcare data platforms

Production ML

Healthcare Risk And Future-Cost Scoring

Supported production scoring workflows across CKD, COPD, diabetes, mortality, falls, high-risk pregnancy, readmissions, and estimated future-cost use cases.

  • Monthly risk and future-cost scoring for large member populations.
  • Daily readmission scoring workflows with operational validation.
  • Claims and clinical data workflow support for model teams.

MLOps platform

Enterprise ML Platform Capabilities

Built and supported internal platform capabilities for high-scale ML pipelines, model workflow execution, data quality, lineage tracking, and MLOps/LLMOps foundations.

  • Pipeline and model workflow enablement for data science teams.
  • Data quality, profiling, and lineage-oriented engineering support.
  • Onboarding and debugging support for production model workflows.

Model lifecycle

MLflow Model Versioning And Experiment Tracking

Integrated MLflow into platform workflows for model versioning, experiment tracking, reproducible runs, and cleaner model lifecycle operations.

  • Experiment tracking and run metadata structure.
  • Model versioning patterns for repeatable workflows.
  • Practical handoff from experimentation to operational workflows.

Operations

Model Validation And Run Tracking Dashboard

Created reusable validation notebooks and a centralized run-tracking view for run dates, parameters, validation links, execution links, and stakeholder coordination.

  • Standardized model validation and run metadata capture.
  • Improved traceability for model handoffs and reviews.
  • Reduced ad hoc coordination during production runs.

Migration

Analytics Workflow Migration To Databricks Workflows

Migrated an analytics workflow from older orchestration patterns to Databricks Workflows, replacing hard-coded configuration with dynamic configuration and moving intermediate storage to Delta.

  • One-click workflow execution pattern.
  • Delta-based intermediate storage and handoff.
  • Cleaner configuration management for repeatable runs.

Optimization

Transformer-Based NLP Workflow Optimization

Profiled and optimized Spark-heavy preprocessing and inference workflows for a transformer-based NLP model pipeline.

  • Reduced preprocessing from 5-6 hours to about 1 hour.
  • Reduced inference from about 5 hours to about 3 hours.
  • Used caching, broadcast joins, query refactoring, and cluster tuning.

Architecture

Reusable NLP Preprocessing Architecture

Restructured preprocessing so shared work could run once across models instead of being recomputed for each downstream model path.

  • Reduced redundant compute.
  • Improved maintainability of model pipelines.
  • Made preprocessing easier to validate and reason about.

Observability

AI/ML Observability Exploration

Explored observability, tracing, monitoring, and evaluation-oriented workflows for AI and ML systems.

  • Hands-on work with Arize-oriented observability concepts.
  • Trace and evaluation workflow exploration.
  • Practical focus on production debugging and model monitoring.

Earlier work

Document AI, model serving, and product engineering

Document AI

Healthcare Data Extractor

Built ML and deep learning systems to extract structured information from medical text, narratives, forms, and other unstructured healthcare documents.

  • NLP workflows using NER, BiLSTM, BERT, and coreference resolution.
  • Relational modeling and object detection components.
  • Applied ML for healthcare document intelligence.

Serving

Model Serving Platform

Developed TensorFlow Serving architecture for deployment and version control of machine learning models.

  • Model deployment patterns for internal ML workflows.
  • Version control support for served models.
  • Operational bridge between training and prediction use cases.

Product engineering

Microservices Migration

Contributed to migration of a healthcare product from monolithic architecture to microservices, including Java services and Python components for service discovery and distributed logging.

  • Java service development.
  • Python service discovery and logging components.
  • Migration support for a larger product architecture shift.

Internal tooling

ML Workflow UI

Built an internal tool to train, test, evaluate, and run predictions for in-house ML models through a user interface.

  • Training, testing, evaluation, and prediction workflow support.
  • UI layer for model users.
  • Practical tooling for repeatable ML experimentation.

Reporting

Jira Activity Reporting Tool

Created an internal reporting tool to fetch, analyze, and visualize daily activity for management reporting workflows.

  • Jira data extraction and analysis.
  • Visualization for operational reporting.
  • Python-based automation around recurring reporting needs.

Applied projects

Personal, academic, and early-career builds

AI productivity

Codex Todo Planner

Codex-assisted task planning workflow using structured project notes and AGENTS.md-style context to preserve implementation context while managing work.

  • Task pages, planner notes, and work-log structure.
  • Context-preserving workflow for recurring engineering tasks.
  • Designed for practical daily use rather than heavy process.

Knowledge systems

LLM Wiki Context Setup

Structured wiki and context workflows using project instructions, reusable notes, and Codex-assisted knowledge organization.

  • Project instruction files and wiki indexes.
  • Research and learning notes organized for reuse.
  • Lightweight structure for AI-assisted recall and development.

Security research

A5/1 Cipher Cryptanalysis

Student research project on cryptanalysis of the A5/1 cipher used in GSM communication using rainbow tables and GSM decoding concepts.

  • Cryptanalysis and security research exposure.
  • Rainbow-table based attack concepts.
  • Telecom security and GSM decoding fundamentals.

IoT internship

Remote Health Monitoring Module

Built an IoT module using sensors and NodeMCU for collecting temperature, pulse-rate, and distance data, with a remote tracking and monitoring app.

  • Sensor integration and data collection.
  • Remote monitoring application workflow.
  • Python analysis using pandas, scikit-learn, matplotlib, regex, and OpenCV.

Generative AI

Text2Face

Facial image generation project using Generative Adversarial Networks.

  • GAN-based image generation.
  • Applied computer vision and deep learning project.
  • Useful early foundation for modern generative AI interests.

Accessibility

Blistick

Assistive application for visually impaired users using object detection, face recognition, Android, Python, and OpenCV.

  • Object detection and face recognition workflow.
  • Android application layer.
  • Computer vision for assistive technology.

IoT and security

Secure Soldier Monitoring

IoT module for real-time soldier monitoring with blockchain-based server-side security exploration.

  • IoT-based real-time monitoring concept.
  • Server-side security exploration with blockchain.
  • Presented as a paper at ICSC 2019.

Public GitHub

Repositories currently visible on my public GitHub profile

Portfolio

Personal Portfolio Website

Static personal website for recruiter-facing profile, projects, resume summary, and professional links.

  • GitHub Pages hosted public site.
  • HTML and CSS implementation.
  • Public-safe career and project presentation.

Portfolio

JavaScript Portfolio Prototype

Earlier public portfolio implementation using JavaScript, useful as a design and iteration reference for the current website.

  • Personal website prototype.
  • JavaScript-based implementation.
  • Can be mined for sections, layout ideas, or assets later.

Machine learning

Anomaly Detection

Jupyter Notebook project for anomaly detection on Intel Lab sensor data.

  • Sensor data anomaly detection.
  • Notebook-based ML exploration.
  • Useful early data science project reference.

Machine learning

Sberbank House Price Prediction

Exploratory data analysis, feature engineering, and modeling on the Sberbank Russian house pricing dataset.

  • EDA and feature engineering.
  • Regression modeling workflow.
  • Notebook-based applied ML project.

Computer vision

Unsupervised Image Clustering

Notebook project exploring image clustering without labeled training data.

  • Unsupervised learning workflow.
  • Image feature and clustering exploration.
  • Computer vision experimentation.

Security

Image Encryption Using AES And Shuffling

Notebook project exploring image encryption using AES and pixel/data shuffling concepts.

  • Image encryption workflow.
  • AES-based security concept.
  • Notebook-based experimentation.

Python

Data Structures And Algorithms In Python

Python repository for data-structure and algorithm practice.

  • Python implementation practice.
  • Interview and fundamentals-oriented repository.
  • Useful baseline for coding fluency.

Web app

Reddit Clone

Reddit-style clone application using Spring Boot and Angular 8.

  • Full-stack application structure.
  • Spring Boot backend.
  • Angular frontend.

Learning

Basic Neural Net Compiler

Python repository connected to neural-network compiler learning and experimentation.

  • Python-based ML systems learning.
  • Compiler-oriented neural network concepts.
  • Public fork retained as a reference project.