My Projects

A walkthrough of all my projects spanning across multiple domains

ICU Notes Digitization

Building a cloud based pipeline to digitize handwritten ICU notes and graphs depicting patient's vitals over the entire day.

Challenges:

Presence of gridlines on the page interfered with OCR results

Inconsistency in the graph markers used for vitals affected their assignment to the correct vital feature

Handwritten data overflowing through gridlines led to errors in matching the detected data to correct time-slots.

First project as a team-lead !

In this project I led a team of 5 ML engineers and 2 Business Analysts. Apart from the technical front, I was also responsible for communicating with clients, understanding their requirements and creating a project timeline accordingly. With wonderful contributions from everyone we ended up converting the POC into a multi-million deal for the company

AutoML and Vertex AI

Got extensive experience on developing and deploying a Deep Learning based application using GCP's Vertex AI. The entire project pipeline was orchestrated using the newly launched Vertex AI pipelines with Kubeflow 2.0. Imbibed the learnings on data processing and transformations required for training a object detection AutoML model.

Federated Learning

Testing the novel method of Federated Learning for training models on medical data located at data centers across the world, without compromising on privacy and model performance

Challenges:

Setting up the server in China region with multiple regulations and restrictions

Handling cross-cloud infrastructure on both AWS and GCP across 5 regions each.

Working with a beta-version SDK from Nvidia for orchestrating the Federated Learning experiment.

Collaboration with Nvidia

Collaborated with NVIDIA to provide feedback and suggest improvements baed on their Clara SDK's functioning on Federated Learning experiments. Multiple feedbacks from our side were incorporated in their next version which was again tested by us in other projects.

Protecting the privacy

This project provided me with the unique opportunity of exploring different privacy protocols like homomorphic encryption, SVT, laplacian method and percentile method. We studied the effect of using each of these methods on model performance and collated it in a white paper. To ensure the privacy of patient information, we trained a GAN model to study the possibility of generating training-images based on model weights

Renal Stone Segmentation

3D Segmentation of a renal stone present in the urinary tract to calculate its exact size and location

Challenges:

Lack of annotated data for the entire urinary tract

Building scalable pipelines to preprocess and transform 300GB of PHI data.

Jumbled dataset with both contrastive and non-contrastive CTs captured from machines of different manufacturers..

Image Processing and Deep Learning

This project proved to be a fascinating blend of image processing and deep learning techniques. A 3D Unet was trained for segmenting the renal stone and finding out its shape. Image Processing came in to be handy for calculating the volume, detecting the stone type and accurately localizing the stone's position in the urinary tract.

Mastering new skills

The challenges associated with the project provided me with the opportunity to explore innovative and novel tools. In order to alleviate the issue of insufficient annotations we decided to try out Nvidia Clara's AI assisted annotation tool which ended up giving us a decent starting point for training the model. To ensure the efficient processing of such huge dataset, I learnt GCP Dataflow and was able to accelerate the process by almost 10x

Synthetic medical images

Generating realistic looking 3D medical images using different input parameters from user like organ, sex and modality.

Challenges:

Parametrization of the GAN

Building a unbiased dataset with enough images representing each set of parameters.

Finding the appropriate loss function and metric for training the GAN

Democratizing medical models

One of the biggest hindrances in training medical imaging models is the expensive and time consuming nature of data annotation. With successful completion of this POC project, a first step was taken towards developing inclusive AI solutions with the potential of revolutionizing affordable healthcare. Further improvements in the model can help us generate images based on additional parameters like age, medical condition and equipment manufacturer.

Explore, try and repeat

Given the uniqueness and difficulty of problem statement, there was a lot of scope for experimentation around model architectures, loss functions and metrics. Three different GAN variants; Style GAN, DC GAN, Cycle GAN were shortlisted. After thorough analysis of results Style GAN was chosen after making some minor modifications in its architecture. Final FID score of 7 was achieved at the end of POC.

SAMD Pipelining

Designing a cloud based end-to-end MLOPs pipeline for development of Software As Medical Device (SAMD).

Challenges:

Handling and protecting the privacy of PHI data as per FDA guidelines

Designing a development process that ensures transparency, and reproducibility.

Data versioning of cloud-storage and Virtual machine datasets to the tune of 1TB without delaying development process.

Fast, Agile and Transparent

The key highlight of the project was to ensure quick development of software using agile methodology but to ensure transparency in that process as well. The entire process was divided among different sections with medical professionals, technical experts and other stakeholders in loop to ensure a robust, accurate and unbiased AI medical software.

Assembling a toolkit

In order to design this pipeline various tools had to be integrated into it. To ensure transparency data versioning tools like DVC, Pachyderm and GCS api were explored. For ensuring fast development and deployment via agile methodology Azure Boards was finalized. Kubeflow pipelines were used for orchestrating the entire workflow and experiment tracking. Dataflow pipelines were used for scalable data processing. Seamless integration of all these tools led to the development of a final robust pipeline

My Projects

A walkthrough of all my projects spanning across multiple domains

ICU Notes Digitization

Challenges:​

​

Presence of gridlines on the page interfered with OCR results

​

Inconsistency in the graph markers used for vitals affected their assignment to the correct vital feature

​

Handwritten data overflowing through gridlines led to errors in matching the detected data to correct time-slots.

First project as a team-lead !

AutoML and Vertex AI

Federated Learning

Challenges:​

​

Setting up the server in China region with multiple regulations and restrictions

​

Handling cross-cloud infrastructure on both AWS and GCP across 5 regions each.​

​

Working with a beta-version SDK from Nvidia for orchestrating the Federated Learning experiment.

Collaboration with Nvidia

Protecting the privacy

Renal Stone Segmentation

Challenges:​

​

Lack of annotated​ data for the entire urinary tract

​

Building scalable pipelines to preprocess and transform 300GB of PHI data.

​

Jumbled dataset with both contrastive and non-contrastive CTs captured from machines of different manufacturers..

Image Processing and Deep Learning

Mastering new skills

Synthetic medical images

Challenges:​

​

Parametrization of the GAN​

​

Building a unbiased dataset with enough images representing each set of parameters.

​

Finding the appropriate loss function and metric for training the GAN

Democratizing medical models

Explore, try and repeat

SAMD Pipelining

Challenges:​

​

Handling and protecting the privacy of PHI data​ as per FDA guidelines

​

Designing a development process that ensures transparency, and reproducibility.

​

Data versioning of cloud-storage and Virtual machine datasets to the tune of 1TB without delaying development process.

Fast, Agile and Transparent

Assembling a toolkit

Atharv Abhijeet Bagde

Challenges:

Challenges:

Handling cross-cloud infrastructure on both AWS and GCP across 5 regions each.

Challenges:

Lack of annotated data for the entire urinary tract

Challenges:

Parametrization of the GAN

Challenges:

Handling and protecting the privacy of PHI data as per FDA guidelines