I am an Advisory Research Engineer at IBM Research – India Lab with close to five years of experience in machine learning, deep learning, and AI systems. My work lies at the intersection of AI algorithms, system-level optimizations, and high-performance computing.
At Intel, I developed high-performance GPU kernels with SYCL for Falcon Shores architecture, optimized performance-critical operators, and explored advanced ML research combining VAEs with Diffusion Models, co-authoring papers submitted to CVR 2025 and IEEE CONNECT 2025.
Previously at Qualcomm, I optimized ONNX models for NLP, CV, and LLMs on the AI100 accelerator and contributed to custom node fusion operations for inference acceleration.
I hold a Master’s in Computer Science from IIT Bombay, where my research focused on multimodal meta-learning for sarcasm and emotion analysis.
My expertise spans deep learning for NLP, CV, and LLMs, GPU kernel optimization, AI systems, and bridging research with real-world performance.
My research interests lie at the intersection of machine learning frameworks and hardware, with a strong focus on hardware-aware model optimizations. I actively study and write about GPU and ML accelerator architectures; delving into the intricacies of Tensor and CUDA cores; and the development of advanced GPU kernels that are critical for maximizing model performance.
During my time at Intel, I engaged in research generating machine-style handwriting using a diffusion-based, text-conditioned latent model combined with VAE decoding. Alongside this, I conceptualized and implemented algorithmic strategies for highly efficient compute kernels.
At Qualcomm, my work centered on compiler optimization using Graph Neural Networks (GNNs) to improve model compilation and scalability. I am thoroughly enjoying my ongoing career transition—evolving from building foundational models in academia to mastering hardware architectures, deep learning compilation stacks, and kernel development for optimal performance.
Previously, as a Research Assistant, I developed a multimodal approach to analyzing emotions in sarcasm, focusing on linguistic incongruities and the hidden sentiments behind text. My broader work in NLP has encompassed emotion classification, sentiment analysis, and sarcasm detection.
Over the years, I have maintained a deep fascination with generative modeling for conversational AI, dedicating significant effort to optimizing the computational efficiency and memory footprint of these complex models.
Himanshu Upreti, Dheeraj Gattupalli, Vinayak Baddi, Mohit Sharma, Prasanna Biswas and Anuj Gupta. June 06, 2023.
"Pre-Processing For Deep Neural Network Compilation Using Graph Neural Networks". (Pending)
I regularly write posts detailing deep dives into GPU fundamentals (like Tensor and CUDA cores), the inner workings of PyTorch's compilation stack, and other advancements in ML.
View My PostsPython Instructor
Complete Python tutorial series from basics to advanced topics!
Watch Playlist