profile photo

Muhammad Waseem

Click on the email to unscramble.
Academic Email:
Personal Email:

 |  Blog  | 

I am a first-year MSc Computer Vision student at MBZUAI. I am specifically interested in building Computer Vision applications that are robust, fair and trustworthy.

Previously, I worked on understanding and enhancing the capabilities of vision-language models for language-controlled document editing under Sanket Biswas and Josep Llados at Computer Vision Center. I was also briefly a visiting researcher under Dr. Karthik Nandakumar in the Sprint AI Lab, where I worked on improving federated learning for extreme non-iid scenarios using client contributions. Additionally, I worked on text line segmentation for degraded historical manuscripts at CVIT, IIIT Hyderabad, under Dr. Ravi Kiran Sarvadevabhatla.

Please feel free to check out my resume. You can also find me on other spaces below.


 ~  𝕏 (Twitter)  |  Github  |  LinkedIn  ~ 


Aug '25  

Joined MBZUAI for MSc in Computer Vision.

Dec '24  

Our paper DocEdit Redefined: In-Context Learning for Multimodal Document Editing got accepted at WACV 2025 workshop 🥳!

Sep '24  

Ranked 45th out of thousand teams participated in Amazon ML Challenge 2024. Eligible for PPI for the role of Applied Scientist Intern.

Jun '24  

Our paper LineTR: Re-Imagining Text-Line Segmentation got accepted at ICPR 2024 🥳!

Mar '24  

Selected for UGRIP at MBZUAI with an acceptance rate of 4%.

Dec '23  

Selected as Research intern to work in Computer Vision Center, Spain.

Nov '23  

Awarded Merit Scholarship for academic excellence in the year 2022-2023.

Feb '23  

Selected for Summer Research internship at CVIT Lab in IIIT Hyderabad.

Jan '23  

Special mention award at NIT Trichy & DataNetiix hackathon for our Research digest prototype.

Dec '22  

Awarded Merit Scholarship for academic excellence in the year 2021-2022.

Nov '22  

Selected as the Student Coordinator for University's Annual Tech fest.

Sept '21  

Joined Shiv Nadar University Chennai for B.Tech in Artificial Intelligence and Data Science

Mohamed bin Zayed University of Artificial Intelligence
MSc in Computer Vision
August '25 - May '27'
Shiv Nadar University, Chennai
Bachelor of Technology in Artificial Intelligence and Data Science
September '21 - May '25

GPA: 9.67, Rank: 2

Student Societies:

  • Special Invitee | Students Grievance Redressal Committee (SGRC)
  • Student Coordinator | Invente - Annual Technical Fest
  • Technical Member | Chess Club


Visiting Researcher | SPriNT-AI Lab
July '24 - August '24

Working in SPriNT-AI (Security, Privacy and Trustworthiness in Artificial Intelligence) lab, focussing on effective utilization of shapley values in federated learning for non-iid setting.

UGRIP Intern | ViLA Lab
May '24 - June '24

[My Experience]

Our project, conducted under Dr. Zhiqiang Shen (Jason), focused on "Optimizing Prompts for Foundation Models" to reduce hallucination. We curated a benchmark dataset of 25k questions across ~60 topics like law, philosophy, and history. Additionally, we developed a web application to collect human preferences and assess the correctness of responses before and after applying 26 guiding principles. This preference data is crucial for future preference-based optimization techniques, enhancing the accuracy and reliability of AI-generated responses

Research Intern | Computer Vision Center (CVC)
Feb '24 - Present

Working on document editing. Currently analysing the potential of LLMs to generate structured commands to edit documents.

Research Intern | Center for Visual Information Technology (CVIT)
May '23 - Feb '24

Co-Developed on a novel method to achieve precise text line segmentation for complex Indic and Southeast Asian historical palm leaves.


DocEdit Redefined: In-Context Learning for Multimodal Document Editing

Muhammad Waseem, Sanket Biswas, Josep Llados
VisionDocs: Workshop on Computer Vision Systems for Document Analysis and Recognition WACV 2025
[paper]

We introduce an innovative approach to structured document editing that uses Visual-Language Models (VLMs) to simplify the process by removing the need for specialized segmentation tools. Our method incorporates a cutting-edge in-context learning framework to enhance flexibility and efficiency in tasks like spatial alignment, component merging, and regional grouping. By leveraging open-world VLMs, we ensure that document edits preserve coherence and intent. To benchmark our approach, we introduce a new evaluation suite and protocol that assess both spatial and semantic accuracy, demonstrating significant advancements in structured document editing.

LineTR: Unified Text Line Segmentation for Challenging Palm Leaf Manuscripts

Vaibhav Agrawal, Niharika Vadlamudi, Muhammad Waseem, Amal Joseph, Sreenya Chitluri, Ravi Kiran Sarvadevabhatla
ICPR 2024
[paper]

We present LineTR, a novel two-stage approach for precise line segmentation in diverse and challenging handwritten historical manuscripts. LineTR's first stage uses a DETR-style network and a hybrid CNN-transformer to process image patches and generate text scribbles and an energy map. A robust, dataset-agnostic post-processing step produces document-level scribbles. In the second stage, these scribbles and the text energy map are used to generate precise polygons around text lines. We introduce three new datasets of Indic and South-East Asian manuscripts and demonstrate LineTR's superior performance and effectiveness in zero-shot inference across various datasets.


FedEmbed

[code]

Explored Nearest Neighbor-Based Classification in Federated Learning with Inspiration from Semantic Drift Compensation in Class-Incremental learning to improve model robustness in highly non-IID settings. Achieved promising results in proof-of-concept visualizations.

Multi-modal product details extractor

[code] [report]

Utilized Vision-Language Model (VLM) with custom prompt template and an augmentation pipeline to accurately extract product details from images. Built a robust post-processing pipeline to validate extracted data’s measurement units. Improved the overall F1 score by 17%

Uncovering bias and uncertainty in model using Semi-Supervised VAEs

[code] [blog]

This project aims to investigate and quantify the biases present in face detection models. Identified biases include a preference for white faces over black faces, higher accuracy in detecting male faces compared to female faces, better detection of faces without glasses, and variations in accuracy based on different hair colors. The ultimate goal is to highlight these biases and suggest ways to mitigate them, promoting the development of fairer and more inclusive face detection systems.

Urdu SeamFormer

[code] [report]

Addressed dataset-specific challenges for Urdu text-line segmentation and evaluated pre-trained weights for domain adaptation. Integrated the model into the Indian Government’s Bhashini API during my internship at IIIT Hyderabad.

Writer independent offline handwriting verification

[code] [blog]

Developed a model using PyTorch CRAFT and Vision Transformer to determine if two handwritten Hindi images are by the same writer. Achieved an AUC of 0.72 and 10th place in a NCVPRIPG workshop competition.

Nature Inspired Neural Networks

[code]

Optimizing neural network weights using nature-inspired algorithms instead of gradient descent and backpropagation. The algorithms include Ant Colony Optimization, Particle Swarm Optimization, Genetic Algorithm.

   More projects can be found on Github


So, you’ve made it till the end : )

Let me share a little more about my hobbies outside of work and studies.

> I enjoy photography (who doesn’t these days, especially with the social media buzz!). Exploring different camera settings and experimenting with shots has recently become a passion of mine. I’m especially drawn to skies, people, and life on the streets. Here are some of the cool pictures that I have taken.
> I’m also interested in financial planning and long-term investment strategies. Although I’m still a beginner, I love discussing anything related to investing. It’s probably the easiest icebreaker with me!
> On weekends, I often spend time editing videos on CapCut, which is something I find both creative and relaxing.
> And best of all, I love meeting new people and learning about their cultures, which I’m always curious about.


Last updated: August 31, 2025

This template is a modification to Jon Barron's website. It has further been modified by Rishab Khincha. Find the source code to my version here. Feel free to clone it for your own use while attributing the original author Jon Barron.