I am a first-year MSc student in Computer Vision at MBZUAI,
with a strong interest in building AI systems that are robust, fair, and
trustworthy.
Currently, I am working on enabling agent-based systems to self-learn from past
failures,
particularly in the context of jailbreaking and prompt-injection attacks, under
the supervision of Nils
Lukas.
Previously, I worked on understanding and enhancing the capabilities of
vision-language models for language-controlled document editing under Sanket
Biswas and Josep
Llados at Computer Vision Center.
I was also briefly a visiting researcher under Dr.
Karthik Nandakumar in the Sprint AI Lab, where I worked on federated
learning for improving convergence
in extreme non-iid scenarios using client contributions.
Our project, conducted under Dr. Zhiqiang Shen (Jason), focused on
"Optimizing Prompts for Foundation Models" to reduce hallucination. We
curated a benchmark dataset of 25k questions across ~60 topics like law,
philosophy, and history. Additionally, we developed a web application to
collect human preferences and assess the correctness of responses before and
after applying 26 guiding principles. This preference data is crucial for
future preference-based optimization techniques, enhancing the accuracy and
reliability of AI-generated responses
Research Intern | Computer Vision Center (CVC)
Feb '24 - Aug '24
Worked on document editing, analyzing the potential of LLMs to generate
structured commands to edit documents. This work resulted in a publication
at WACV 2025 workshop.
Research Intern | Center for Visual Information Technology (CVIT)
May '23 - Feb '24
Co-Developed on a novel method to achieve precise text line segmentation for
complex Indic and Southeast Asian historical palm leaves.
DocEdit Redefined: In-Context Learning for Multimodal Document
Editing
Muhammad Waseem, Sanket Biswas, Josep Llados
VisionDocs: Workshop on Computer Vision Systems for
Document Analysis and Recognition WACV 2025 [paper]
We introduce an innovative approach to structured document editing that uses
Visual-Language Models (VLMs) to simplify the process by removing the need
for specialized segmentation tools. Our method incorporates a cutting-edge
in-context learning framework to enhance flexibility and efficiency in tasks
like spatial alignment, component merging, and regional grouping. By
leveraging open-world VLMs, we ensure that document edits preserve coherence
and intent. To benchmark our approach, we introduce a new evaluation suite
and protocol that assess both spatial and semantic accuracy, demonstrating
significant advancements in structured document editing.
LineTR: Unified Text Line Segmentation for Challenging Palm
Leaf Manuscripts
Vaibhav Agrawal, Niharika Vadlamudi, Muhammad Waseem, Amal
Joseph, Sreenya Chitluri, Ravi Kiran Sarvadevabhatla
ICPR 2024 [paper]
We present LineTR, a novel two-stage approach for precise line segmentation
in diverse and challenging handwritten historical manuscripts. LineTR's
first stage uses a DETR-style network and a hybrid CNN-transformer to
process image patches and generate text scribbles and an energy map. A
robust, dataset-agnostic post-processing step produces document-level
scribbles. In the second stage, these scribbles and the text energy map are
used to generate precise polygons around text lines. We introduce three new
datasets of Indic and South-East Asian manuscripts and demonstrate LineTR's
superior performance and effectiveness in zero-shot inference across various
datasets.
Explored Nearest Neighbor-Based Classification in Federated Learning with
Inspiration from Semantic Drift Compensation in Class-Incremental learning
to improve model robustness in highly non-IID settings. Achieved promising
results in proof-of-concept visualizations.
Utilized Vision-Language Model (VLM) with custom prompt template and an
augmentation pipeline to accurately extract product details from images.
Built a robust post-processing pipeline to validate extracted data's
measurement units. Improved the overall F1 score by 17%
Let me share a little more about my hobbies outside of work and studies.
> I enjoy photography (who doesn't these days, especially
with the social media buzz!). Exploring different camera settings and
experimenting with shots has recently become a passion of mine. I'm
especially drawn to skies, people, and life on the streets.
> I'm also interested in financial planning and long-term
investment strategies. Although I'm still a beginner, I love discussing
anything related to investing. It's probably the easiest icebreaker with me!
> On weekends, I often spend time editing videos on CapCut,
which is something I find both creative and relaxing.
> And best of all, I love meeting new people and learning about their
cultures, which I'm always curious about.