profile photo

Muhammad Waseem

Click on the email to unscramble.
Academic Email:
Personal Email:

 |  Blog  | 

I am a final-year undergraduate student at Shiv Nadar University Chennai where I'm majoring in Artificial Intelligence and Data Science. I am specifically interested in Computer Vision and Multi-modal models where I am currently focussing on improving robustness, ensuring fairness and preserving user's privacy.

I worked under the guidance of Sanket Biswas and Josep Llados at the Computer Vision Center, focusing on enhancing the capabilities of vision-language models for the task of language-controlled document editing. Our work has been accepted for presentation at the WACV 2025 conference's workshop on Computer Vision Systems for Document Analysis and Recognition.

I was a visiting researcher under Dr. Karthik Nandakumar at MBZUAI, Abu Dhabi, where I worked on federated learning for extreme non-iid scenarios.

I spent my Summer'2024 as UGRIP intern at MBZUAI, Abu Dhabi, where I worked on analysing the hallucination of LLM's responses to principled prompts under Dr. Zhiqiang Shen. We also collected human and model preferences for each of the response pair for future study on preference based optimization.

Previously, I did a Research Internship under Dr Ravi Kiran Sarvadevabhatla at CVIT Lab in IIIT Hyderabad to generate precise text line segmentation for complex Indic and Southeast Asian historical palm leaf.

Please feel free to check out my resume. You can also find me on other spaces below.


 ~  𝕏 (Twitter)  |  Github  |  LinkedIn  ~ 


Dec '24  

Our paper DocEdit Redefined: In-Context Learning for Multimodal Document Editing got accepted at WACV 2025 workshop 🥳!

Sep '24  

Ranked 45th out of thousand teams participated in Amazon ML Challenge 2024. Eligible for PPI for the role of Applied Scientist Intern.

Jun '24  

Our paper LineTR: Re-Imagining Text-Line Segmentation got accepted at ICPR 2024 🥳!

Mar '24  

Selected for UGRIP at MBZUAI with an acceptance rate of 4%.

Dec '23  

Selected as Research intern to work in Computer Vision Center, Spain.

Nov '23  

Awarded Merit Scholarship for academic excellence in the year 2022-2023.

Feb '23  

Selected for Summer Research internship at CVIT Lab in IIIT Hyderabad.

Jan '23  

Special mention award at NIT Trichy & DataNetiix hackathon for our Research digest prototype.

Dec '22  

Awarded Merit Scholarship for academic excellence in the year 2021-2022.

Nov '22  

Selected as the Student Coordinator for University's Annual Tech fest.

Sept '21  

Joined Shiv Nadar University Chennai for B.Tech in Artificial Intelligence and Data Science

Shiv Nadar University, Chennai
Bachelor of Technology in Artificial Intelligence and Data Science
September '21 - May '25

Awards: 2 times Merit Scholarship Awardee

Student Societies:

  • Special Invitee | Students Grievance Redressal Committee (SGRC)
  • Student Coordinator | Invente - Annual Technical Fest
  • Technical Member | Chess Club


Visiting Researcher | SPriNT-AI Lab
July '24 - August '24

Working in SPriNT-AI (Security, Privacy and Trustworthiness in Artificial Intelligence) lab, focussing on effective utilization of shapley values in federated learning for non-iid setting.

UGRIP Intern | ViLA Lab
May '24 - June '24

[My Experience]

Our project, conducted under Dr. Zhiqiang Shen (Jason), focused on "Optimizing Prompts for Foundation Models" to reduce hallucination. We curated a benchmark dataset of 25k questions across ~60 topics like law, philosophy, and history. Additionally, we developed a web application to collect human preferences and assess the correctness of responses before and after applying 26 guiding principles. This preference data is crucial for future preference-based optimization techniques, enhancing the accuracy and reliability of AI-generated responses

Research Intern | Computer Vision Center (CVC)
Feb '24 - Present

Working on document editing. Currently analysing the potential of LLMs to generate structured commands to edit documents.

Research Intern | Center for Visual Information Technology (CVIT)
May '23 - Feb '24

Co-Developed on a novel method to achieve precise text line segmentation for complex Indic and Southeast Asian historical palm leaves.


DocEdit Redefined: In-Context Learning for Multimodal Document Editing

Muhammad Waseem, Sanket Biswas, Josep Llados
VisionDocs: Workshop on Computer Vision Systems for Document Analysis and Recognition WACV 2025
[paper]

We introduce an innovative approach to structured document editing that uses Visual-Language Models (VLMs) to simplify the process by removing the need for specialized segmentation tools. Our method incorporates a cutting-edge in-context learning framework to enhance flexibility and efficiency in tasks like spatial alignment, component merging, and regional grouping. By leveraging open-world VLMs, we ensure that document edits preserve coherence and intent. To benchmark our approach, we introduce a new evaluation suite and protocol that assess both spatial and semantic accuracy, demonstrating significant advancements in structured document editing.

LineTR: Unified Text Line Segmentation for Challenging Palm Leaf Manuscripts

Vaibhav Agrawal, Niharika Vadlamudi, Muhammad Waseem, Amal Joseph, Sreenya Chitluri, Ravi Kiran Sarvadevabhatla
ICPR 2024
[paper]

We present LineTR, a novel two-stage approach for precise line segmentation in diverse and challenging handwritten historical manuscripts. LineTR's first stage uses a DETR-style network and a hybrid CNN-transformer to process image patches and generate text scribbles and an energy map. A robust, dataset-agnostic post-processing step produces document-level scribbles. In the second stage, these scribbles and the text energy map are used to generate precise polygons around text lines. We introduce three new datasets of Indic and South-East Asian manuscripts and demonstrate LineTR's superior performance and effectiveness in zero-shot inference across various datasets.


FedEmbed

[code]

Explored Nearest Neighbor-Based Classification in Federated Learning with Inspiration from Semantic Drift Compensation in Class-Incremental learning to improve model robustness in highly non-IID settings. Achieved promising results in proof-of-concept visualizations.

Multi-modal product details extractor

[code] [report]

Utilized Vision-Language Model (VLM) with custom prompt template and an augmentation pipeline to accurately extract product details from images. Built a robust post-processing pipeline to validate extracted data’s measurement units. Improved the overall F1 score by 17%

Uncovering bias and uncertainty in model using Semi-Supervised VAEs

[code] [blog]

This project aims to investigate and quantify the biases present in face detection models. Identified biases include a preference for white faces over black faces, higher accuracy in detecting male faces compared to female faces, better detection of faces without glasses, and variations in accuracy based on different hair colors. The ultimate goal is to highlight these biases and suggest ways to mitigate them, promoting the development of fairer and more inclusive face detection systems.

Urdu SeamFormer

[code] [report]

Addressed dataset-specific challenges for Urdu text-line segmentation and evaluated pre-trained weights for domain adaptation. Integrated the model into the Indian Government’s Bhashini API during my internship at IIIT Hyderabad.

Writer independent offline handwriting verification

[code] [blog]

Developed a model using PyTorch CRAFT and Vision Transformer to determine if two handwritten Hindi images are by the same writer. Achieved an AUC of 0.72 and 10th place in a NCVPRIPG workshop competition.

Nature Inspired Neural Networks

[code]

Optimizing neural network weights using nature-inspired algorithms instead of gradient descent and backpropagation. The algorithms include Ant Colony Optimization, Particle Swarm Optimization, Genetic Algorithm.

More projects can be found on Github


Last updated: December 31, 2024

This template is a modification to Jon Barron's website. It has further been modified by Rishab Khincha. Find the source code to my version here. Feel free to clone it for your own use while attributing the original author Jon Barron.