My Projects
A walkthrough of all my projects spanning across multiple domains
ICU Notes Digitization
Building a cloud based pipeline to digitize handwritten ICU notes and graphs depicting patient's vitals over the entire day.
Challenges:​
​
-
Presence of gridlines on the page interfered with OCR results
​
-
Inconsistency in the graph markers used for vitals affected their assignment to the correct vital feature
​
-
Handwritten data overflowing through gridlines led to errors in matching the detected data to correct time-slots.
First project as a team-lead !
In this project I led a team of 5 ML engineers and 2 Business Analysts. Apart from the technical front, I was also responsible for communicating with clients, understanding their requirements and creating a project timeline accordingly. With wonderful contributions from everyone we ended up converting the POC into a multi-million deal for the company
AutoML and Vertex AI
Got extensive experience on developing and deploying a Deep Learning based application using GCP's Vertex AI. The entire project pipeline was orchestrated using the newly launched Vertex AI pipelines with Kubeflow 2.0. Imbibed the learnings on data processing and transformations required for training a object detection AutoML model.
Federated Learning
Testing the novel method of Federated Learning for training models on medical data located at data centers across the world, without compromising on privacy and model performance
Challenges:​
​
-
Setting up the server in China region with multiple regulations and restrictions
​
-
Handling cross-cloud infrastructure on both AWS and GCP across 5 regions each.​
​
-
Working with a beta-version SDK from Nvidia for orchestrating the Federated Learning experiment.
Collaboration with Nvidia
Collaborated with NVIDIA to provide feedback and suggest improvements baed on their Clara SDK's functioning on Federated Learning experiments. Multiple feedbacks from our side were incorporated in their next version which was again tested by us in other projects.
Protecting the privacy
This project provided me with the unique opportunity of exploring different privacy protocols like homomorphic encryption, SVT, laplacian method and percentile method. We studied the effect of using each of these methods on model performance and collated it in a white paper. To ensure the privacy of patient information, we trained a GAN model to study the possibility of generating training-images based on model weights
Renal Stone Segmentation
3D Segmentation of a renal stone present in the urinary tract to calculate its exact size and location
Challenges:​
​
-
Lack of annotated​ data for the entire urinary tract
​
-
Building scalable pipelines to preprocess and transform 300GB of PHI data.
​
-
Jumbled dataset with both contrastive and non-contrastive CTs captured from machines of different manufacturers..
Image Processing and Deep Learning
This project proved to be a fascinating blend of image processing and deep learning techniques. A 3D Unet was trained for segmenting the renal stone and finding out its shape. Image Processing came in to be handy for calculating the volume, detecting the stone type and accurately localizing the stone's position in the urinary tract.
Mastering new skills
The challenges associated with the project provided me with the opportunity to explore innovative and novel tools. In order to alleviate the issue of insufficient annotations we decided to try out Nvidia Clara's AI assisted annotation tool which ended up giving us a decent starting point for training the model. To ensure the efficient processing of such huge dataset, I learnt GCP Dataflow and was able to accelerate the process by almost 10x
Synthetic medical images
Generating realistic looking 3D medical images using different input parameters from user like organ, sex and modality.
Challenges:​
​
-
Parametrization of the GAN​
​
-
Building a unbiased dataset with enough images representing each set of parameters.
​
-
Finding the appropriate loss function and metric for training the GAN
Democratizing medical models
One of the biggest hindrances in training medical imaging models is the expensive and time consuming nature of data annotation. With successful completion of this POC project, a first step was taken towards developing inclusive AI solutions with the potential of revolutionizing affordable healthcare. Further improvements in the model can help us generate images based on additional parameters like age, medical condition and equipment manufacturer.
Explore, try and repeat
Given the uniqueness and difficulty of problem statement, there was a lot of scope for experimentation around model architectures, loss functions and metrics. Three different GAN variants; Style GAN, DC GAN, Cycle GAN were shortlisted. After thorough analysis of results Style GAN was chosen after making some minor modifications in its architecture. Final FID score of 7 was achieved at the end of POC.
SAMD Pipelining
Designing a cloud based end-to-end MLOPs pipeline for development of Software As Medical Device (SAMD).
Challenges:​
​
-
Handling and protecting the privacy of PHI data​ as per FDA guidelines
​
-
Designing a development process that ensures transparency, and reproducibility.
​
-
Data versioning of cloud-storage and Virtual machine datasets to the tune of 1TB without delaying development process.
Fast, Agile and Transparent
The key highlight of the project was to ensure quick development of software using agile methodology but to ensure transparency in that process as well. The entire process was divided among different sections with medical professionals, technical experts and other stakeholders in loop to ensure a robust, accurate and unbiased AI medical software.
Assembling a toolkit
In order to design this pipeline various tools had to be integrated into it. To ensure transparency data versioning tools like DVC, Pachyderm and GCS api were explored. For ensuring fast development and deployment via agile methodology Azure Boards was finalized. Kubeflow pipelines were used for orchestrating the entire workflow and experiment tracking. Dataflow pipelines were used for scalable data processing. Seamless integration of all these tools led to the development of a final robust pipeline