Namaste!
I am a Ph.D. candidate at the High-Performance and Distributed Systems Lab (HPDSL) in the Department of Computer Science at Rochester Institute of Technology (RIT).
Under the advisement of Prof. M. Mustafa Rafique and Prof. Bogdan Nicolae (ANL), I work on optimizing large-scale data movement across heterogeneous tiers of modern HPC datacenters. Currently, I am developing solutions for efficient checkpoint and restore operations of GPU-resident, distributed data structures that need to be captured in a consistent fashion under concurrency.
Before joining RIT, I worked as a full-stack developer for two years where I designed and developed applications from user interfaces to backend and analytics, which scaled across hundreds of nodes to serve 100M+ users across the web. In 2017, I completed my undergraduate degree in Computer Engineering from RAIT, University of Mumbai, India.
Publications
Conferences
Avinash Maurya, Robert Underwood, M. Mustafa Rafique, Franck Cappello, and Bogdan Nicolae. "DataStates-LLM: Lazy Asynchronous Checkpointing for Large Language Models". HPDC'24: The 33rd International Symposium on High-Performance Parallel and Distributed Computing (Pisa, Italy, 2024). [Paper][Slides]
Moiz Arif, Avinash Maurya, M. Mustafa Rafique, Dimitrios S. Nikolopoulos, and Ali R. Butt. "Application-Attuned Memory Management for Containerized HPC Workflows". IPDPS'24: The 38th IEEE International Parallel & Distributed Processing Symposium (San Francisco, USA) [Paper][Slides]
Avinash Maurya, Bogdan Nicolae, M. Mustafa Rafique, Franck Cappello. "Towards Efficient I/O Pipelines using Accumulated Compression". HiPC’23: The 30th IEEE International Conference on High-Performance Computing, Data, and Analytics (Goa, India, 2023) [Paper] [Slides]
Avinash Maurya, M. Mustafa Rafique, Thierry Tonellot, Hussain J. AlSalem, Franck Cappello, Bogdan Nicolae. "GPU-Enabled Asynchronous Multi-level Checkpoint Caching and Prefetching". HPDC'23: The 32nd International Symposium on High-Performance Parallel and Distributed Computing (Orlando, Florida, United States, 2023). [Paper][Slides]
Avinash Maurya, Bogdan Nicolae, M. Mustafa Rafique, Amr M. Elsayed, Thierry Tonellot, Franck Cappello. "Towards Efficient Cache Allocation for High-Frequency Checkpointing". HiPC’22: The 29th IEEE International Conference on High-Performance Computing, Data, and Analytics (Bangalore, India, 2022) BEST PAPER! [Paper] [Slides]
Avinash Maurya, Bogdan Nicolae, M. Mustafa Rafique, Thierry Tonellot, Franck Cappello. "Towards Efficient I/O Scheduling for Collaborative Multi-Level Checkpointing". MASCOTS’21: The 29th IEEE International Symposium on the Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (Virtual, Portugal, 2021) [Paper] [Slides]
Avinash Maurya, Bogdan Nicolae, Ishan Guliani, and M. Mustafa Rafique. "CoSim: A Simulator for Co-Scheduling of Batch and On-Demand Jobs in HPC Datacenters". DS-RT’20: The 24th IEEE/ACM International Symposium on Distributed Simulation and Real-Time Applications (Prague, Czech Republic, 2020) [Paper] [Slides] [Talk]
Workshops
Avinash Maurya, Jie Ye, M. Mustafa Rafique, Franck Cappello, and Bogdan Nicolae. "Breaking the Memory Wall: A Study of I/O Patterns and GPU Memory Utilization for Hybrid CPU-GPU Offloaded Optimizers". FlexScience'24 HPDC-workshop: The 14th Workshop on AI and Scientific Computing at Scale using Flexible Computing Infrastructures, colocated with HPDC'24(Pisa, Italy, 2024). [Paper][Slides]
Moiz Arif, Avinash Maurya, and M. Mustafa Rafique. "Accelerating Performance of GPU-based Workloads using CXL". FlexScience'23: The 13th Workshop on AI and Scientific Computing at Scale using Flexible Computing Infrastructures, co-located with the 32nd International Symposium on High-Performance Parallel and Distributed Computing (Orlando, Florida, United States, 2023). [Paper][Slides]
Avinash Maurya, Jaiaid Mobin, and M. Mustafa Rafique. 2022. "Towards Data Gravity and Compliance Aware Distributed Deep Learning on Hybrid Clouds". HiPCW'22: Workshop on Data Fabric for Hybrid Clouds (WDFHC) co-located with the 29th IEEE International Conference on High-Performance Computing, Data, and Analytics (Bangalore, India, 2022) [Paper] [Slides]
Posters and Talks
Avinash Maurya, Robert Underwood, Bogdan Nicolae, M. Mustafa Rafique, Franck Cappello. "VELOC-LLM: Towards Efficient Asynchronous Checkpointing for Large-Language Models". SuperCheck@SC'23: Fourth International Symposium on Checkpointing for Supercomputing, (Colorado, USA, 2023) [Slides]
Jaiaid Mobin, Avinash Maurya, M. Mustafa Rafique. "COLTI: Towards Concurrent and Co-located DNN Training and Inference". HPDC'23: The 32nd International Symposium on High-Performance Parallel and Distributed Computing (Orlando, Florida, United States, 2023). BEST POSTER! [Poster + Extended abstract]
Will Merges, Avinash Maurya, M. Mustafa Rafique. "Exploiting Lightweight OS Kernels for Emerging Datacenter Workloads". SRS HiPC'22: Student Research Symposium (SRS) co-located with the 29th IEEE International Conference on High-Performance Computing, Data, and Analytics (Bangalore, India, 2022) [Poster + Extended abstract]
Avinash Maurya, M. Mustafa Rafique, Bogdan Nicolae. "Toward Efficient Checkpointing across Deep Tiers of Memory Hierarchy, Doctoral Showcase" SC'22: The International Conference for High Performance Computing, Networking, Storage, and Analysis (Dallas, USA, 2022) [Poster + Presentation] [Slides]
Contact Details:
Avinash Maurya
Ph.D. Candidate
Lab: 70-3400
Computing and Information Sciences,
Golisano College of Computing and Information Sciences
Rochester Institute of Technology, NY, USA
am6429 [AT] cs [DOT] rit [DOT] edu
Read more about our work at the High Performance Distributed Systems Laboratory (HPDSL) website: https://cs.rit.edu/~hpdsl/