Tanveer Hannan

Tanveer Hannan

Research Scientist Intern | PhD Candidate of AI

Microsoft

LMU Munich

Huawei Research Center

Biography

I’m on the job market, looking for industry Research Scientist roles. Feel free to connect via email hannan@dbs.ifi.lmu.de or see my résumé .

I am a fourth-year PhD student in the Department of Computer Science at LMU Munich, where I have the privilege of working with Prof. Thomas Seidl and Prof. Gedas Bertasius. My main research focus is computer vision, video understanding, and large vision language modeling. Currently, I am a Research Scientist Intern at Microsoft ASG working on Long Document Understanding, Efficient LLM for Edge Devices and previously at Huawei Trustworthy Lab focusing on the reliability and robustness of large vision language models.

I was also a Machine Learning Intern at Hensoldt Analytics where I also did my Master’s Thesis. Also, I was a research assistant at MCML and Siemens. Before joining LMU Munich, worked as a software developer at Helical inc.

Recent News:

Interests
  • Vision Language Modeling
  • Computer Vision
  • Video Understanding
  • Reliable AI
  • Natural Language Processing
Education
  • PhD in Computer Science, 2022-Present

    LMU Munich

  • MSc in Data Science, 2019-2021

    LMU Munich

  • BSc in Computer Science and Engineering, 2014-2018

    Bangladesh University of Engineering and Technology

Experience

 
 
 
 
 
Microsoft ASG
Research Scientist Intern
July 2025 – Present UK
Long Document Understanding, Efficient LLM for Edge Devices
 
 
 
 
 
Huawei
Research Scientist Intern
July 2024 – January 2025 Munich
Reliability of Large Vision Language Models
 
 
 
 
 
Hensoldt Analytics
Research Intern
July 2021 – December 2021 Munich
Multiple Object Tracking in Videos
 
 
 
 
 
MCML
Research Assistant
October 2020 – June 2021 Munich
Hierarchical Transformer for Object Detection
 
 
 
 
 
Siemens, Advanta
Student Intern
October 2020 – April 2021 Munich
Reinforcement Learning for Supply Chain Management
 
 
 
 
 
Helical Inc.
Software Engineer
November 2018 – August 2019 Munich
Software Developer

Recent Publications

Quickly discover relevant content by filtering publications.
(2025). AViLA: Asynchronous Vision-Language Agent for Streaming Multimodal Data Interaction. In ArXiv.

PDF Cite

(2025). My Answer Is NOT Fair: Mitigating Social Bias in Vision-Language Models via Fair and Biased Residuals. In ArXiv.

PDF Cite

(2024). ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos. In CVPR.

PDF Cite Code

(2024). RGNet: A Unified Retrieval and Grounding Network for Long Videos. In ECCV.

PDF Cite Code Project

(2024). Context Matters: Leveraging Spatiotemporal Metadata for Semi-Supervised Learning on Remote Sensing Images . In ECAI.

PDF Cite

Contact