profile photo

Muhammad Waseem

Click on the email to unscramble.
Academic Email:
Personal Email:

 |  Blog  | 

I am a first-year MSc student in Computer Vision at MBZUAI, with a strong interest in building AI systems that are robust, fair, and trustworthy. Currently, I am working on enabling agent-based systems to self-learn from past failures, particularly in the context of jailbreaking and prompt-injection attacks, under the supervision of Nils Lukas.

Previously, I worked on understanding and enhancing the capabilities of vision-language models for language-controlled document editing under Sanket Biswas and Josep Llados at Computer Vision Center. I was also briefly a visiting researcher under Dr. Karthik Nandakumar in the Sprint AI Lab, where I worked on federated learning for improving convergence in extreme non-iid scenarios using client contributions.

You can also find me on other spaces below.


 ~  𝕏 (Twitter)  |  Github  |  LinkedIn  ~ 


Aug '25  

Joined MBZUAI for MSc in Computer Vision with full ride scholarship.

Dec '24  

Our paper DocEdit Redefined: In-Context Learning for Multimodal Document Editing got accepted at WACV 2025 workshop 🥳!

Sep '24  

Ranked 45th out of thousand teams participated in Amazon ML Challenge 2024. Eligible for PPI for the role of Applied Scientist Intern.

Jun '24  

Our paper LineTR: Re-Imagining Text-Line Segmentation got accepted at ICPR 2024 🥳!

Mar '24  

Selected for UGRIP at MBZUAI with an acceptance rate of 4%.

Dec '23  

Selected as Research intern to work in Computer Vision Center, Spain.

Nov '23  

Awarded Merit Scholarship for academic excellence in the year 2022-2023.

Feb '23  

Selected for Summer Research internship at CVIT Lab in IIIT Hyderabad.

Jan '23  

Special mention award at NIT Trichy & DataNetiix hackathon for our Research digest prototype.

Dec '22  

Awarded Merit Scholarship for academic excellence in the year 2021-2022.

Nov '22  

Selected as the Student Coordinator for University's Annual Tech fest.

Sept '21  

Joined Shiv Nadar University Chennai for B.Tech in Artificial Intelligence and Data Science

Mohamed bin Zayed University of Artificial Intelligence
MSc in Computer Vision
August '25 - May '27'

GPA: 3.85

Supervisor: Nils Lukas

Shiv Nadar University, Chennai
Bachelor of Technology in Artificial Intelligence and Data Science
September '21 - May '25

GPA: 9.67, Rank: 2

Student Societies:

  • Special Invitee | Students Grievance Redressal Committee (SGRC)
  • Student Coordinator | Invente - Annual Technical Fest
  • Technical Member | Chess Club


Visiting Researcher | SPriNT-AI Lab
July '24 - August '24

Working in SPriNT-AI (Security, Privacy and Trustworthiness in Artificial Intelligence) lab, focussing on effective utilization of shapley values in federated learning for non-iid setting.

UGRIP Intern | ViLA Lab
May '24 - June '24

[My Experience]

Our project, conducted under Dr. Zhiqiang Shen (Jason), focused on "Optimizing Prompts for Foundation Models" to reduce hallucination. We curated a benchmark dataset of 25k questions across ~60 topics like law, philosophy, and history. Additionally, we developed a web application to collect human preferences and assess the correctness of responses before and after applying 26 guiding principles. This preference data is crucial for future preference-based optimization techniques, enhancing the accuracy and reliability of AI-generated responses

Research Intern | Computer Vision Center (CVC)
Feb '24 - Aug '24

Worked on document editing, analyzing the potential of LLMs to generate structured commands to edit documents. This work resulted in a publication at WACV 2025 workshop.

Research Intern | Center for Visual Information Technology (CVIT)
May '23 - Feb '24

Co-Developed on a novel method to achieve precise text line segmentation for complex Indic and Southeast Asian historical palm leaves.


DocEdit Redefined: In-Context Learning for Multimodal Document Editing

Muhammad Waseem, Sanket Biswas, Josep Llados
VisionDocs: Workshop on Computer Vision Systems for Document Analysis and Recognition WACV 2025
[paper]

We introduce an innovative approach to structured document editing that uses Visual-Language Models (VLMs) to simplify the process by removing the need for specialized segmentation tools. Our method incorporates a cutting-edge in-context learning framework to enhance flexibility and efficiency in tasks like spatial alignment, component merging, and regional grouping. By leveraging open-world VLMs, we ensure that document edits preserve coherence and intent. To benchmark our approach, we introduce a new evaluation suite and protocol that assess both spatial and semantic accuracy, demonstrating significant advancements in structured document editing.

LineTR: Unified Text Line Segmentation for Challenging Palm Leaf Manuscripts

Vaibhav Agrawal, Niharika Vadlamudi, Muhammad Waseem, Amal Joseph, Sreenya Chitluri, Ravi Kiran Sarvadevabhatla
ICPR 2024
[paper]

We present LineTR, a novel two-stage approach for precise line segmentation in diverse and challenging handwritten historical manuscripts. LineTR's first stage uses a DETR-style network and a hybrid CNN-transformer to process image patches and generate text scribbles and an energy map. A robust, dataset-agnostic post-processing step produces document-level scribbles. In the second stage, these scribbles and the text energy map are used to generate precise polygons around text lines. We introduce three new datasets of Indic and South-East Asian manuscripts and demonstrate LineTR's superior performance and effectiveness in zero-shot inference across various datasets.


FedEmbed

[code]

Explored Nearest Neighbor-Based Classification in Federated Learning with Inspiration from Semantic Drift Compensation in Class-Incremental learning to improve model robustness in highly non-IID settings. Achieved promising results in proof-of-concept visualizations.

Multi-modal product details extractor

[code] [report]

Utilized Vision-Language Model (VLM) with custom prompt template and an augmentation pipeline to accurately extract product details from images. Built a robust post-processing pipeline to validate extracted data's measurement units. Improved the overall F1 score by 17%

   More projects can be found on Github


So, you've made it till the end : )

Let me share a little more about my hobbies outside of work and studies.

> I enjoy photography (who doesn't these days, especially with the social media buzz!). Exploring different camera settings and experimenting with shots has recently become a passion of mine. I'm especially drawn to skies, people, and life on the streets.
> I'm also interested in financial planning and long-term investment strategies. Although I'm still a beginner, I love discussing anything related to investing. It's probably the easiest icebreaker with me!
> On weekends, I often spend time editing videos on CapCut, which is something I find both creative and relaxing.
> And best of all, I love meeting new people and learning about their cultures, which I'm always curious about.


Last updated: December 31, 2025