Oğuzhan Fatih Kar
I am a Machine Learning Researcher at Apple . My research interests are in building generalist multimodal agents that can perceive, reason, and act in physical and digital worlds.
I received my Ph.D. in Computer Science from EPFL , where I was advised by Amir Zamir . My PhD thesis was on building scalable multimodal foundation models that can process diverse inputs such as images, text, 3D, semantics and other sensory data to solve a wide variety of real-world tasks. In 2023/2024, I interned at Google working on vision-language models with Federico Tombari . I received my M.S. and B.S. in Electrical Engineering from METU , where I was advised by Figen Oktem .
Email  / 
Google Scholar  / 
Github  / 
LinkedIn  / 
Twitter
How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks
R. Ramachandran, A. Garjani, R. Bachmann, A. Atanov*, O.F. Kar* , A. Zamir*
ICLR , 2026
[Website]
[Code]
FlexTok: Resampling Images into 1D Token Sequences of Flexible Length
R. Bachmann*, J. Allardice*, D. Mizrahi*, E. Fini, O.F. Kar , E. Amirloo, A. El-Nouby, A. Zamir, A. Dehghan
ICML , 2025
[Website]
[Code]
[Demo]
4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities
O.F. Kar* , R. Bachmann*, D. Mizrahi*, A. Garjani, M. Gao, D. Griffiths, J. Hu, A. Dehghan, A. Zamir
NeurIPS , 2024
[Website]
[Code]
[Demo]
BRAVE: Broadening the visual encoding of vision-language models
O.F. Kar , A. Tonioni, P. Poklukar, A. Kulshrestha, A. Zamir, F. Tombari
ECCV , 2024 [Oral, Top 2% ]
[Website]
Unraveling the Key Components of OOD Generalization via Diversification
H. Benoit*, L. Jiang*, A. Atanov*, O.F. Kar , M. Rigotti, A. Zamir
ICLR , 2024
[arXiv]
4M: Massively Multimodal Masked Modeling
D. Mizrahi*, R. Bachmann*, O.F. Kar , T. Yeo, M. Gao, A. Dehghan, A. Zamir
NeurIPS , 2023 [Spotlight, Top 4% ]
[Website]
Rapid Network Adaptation: Learning to Adapt Neural Networks Using Test-Time Feedback
T. Yeo, O.F. Kar , Z. Sodagar, A. Zamir
ICCV , 2023
[Website]
Your browser does not support the video tag.
3D Common Corruptions and Data Augmentation
O.F. Kar , T. Yeo, A. Atanov, A. Zamir
CVPR , 2022 [Oral, Top 4% ]
[Website]
[Code]
[Video]
[Live Demo]
[TrustML Talk]
Robustness via Cross-domain Ensembles
O.F. Kar* , T. Yeo*, A. Zamir
ICCV , 2021 [Oral, Top 3% ]
[Website]
[Code]
[Video]
[Slides]
Robust Learning Through Cross-task Consistency
A. Zamir*, A. Sax*, T. Yeo, O.F. Kar , N. Cheerla, R. Suri, Z. Cao, J. Malik, L. Guibas
Arxiv , 2020. CVPR , 2020 [Best Paper Award Nominee, Oral]
[Live Demo]
[Visuals]
[Website]
[Code]
[ECCV 2020 Demo Video]
High-resolution Multi-spectral Imaging with Diffractive Lenses and Learned Reconstruction
F.S. Oktem, O.F. Kar , C. D. Bezek, F. Kamalabadi
IEEE Transactions on Computational Imaging , 2021
[Arxiv]
Compressive Spectral Imaging with Diffractive Lenses
O.F. Kar , F.S. Oktem
Optics Letters , 2019
[arXiv]
Your browser does not support the video tag.
Real-time Compressive Video Reconstruction for Spatial Multiplexing Cameras
O.F. Kar , A. Gungor, H.E. Guven
IEEE GLOBALSIP , 2019
[Visuals]
Learning-based Regularization for Spatial Multiplexing Cameras
O.F. Kar , A. Gungor, H.E. Guven
IEEE GLOBALSIP , 2019
A Transform Learning-based Deconvolution Technique with Super-resolution and Microscanning Applications
A. Gungor*, O.F. Kar*
IEEE ICIP , 2019
A Matrix-free Reconstruction Method for Compressive Focal Plane Array Imaging
A. Gungor, O.F. Kar , H.E. Guven
IEEE ICIP , 2018