Oğuzhan Fatih Kar

I am a Machine Learning Researcher at Apple. My research interests are in building generalist multimodal agents that can perceive, reason, and act in physical and digital worlds.

I received my Ph.D. in Computer Science from EPFL, where I was advised by Amir Zamir. My PhD thesis was on building scalable multimodal foundation models that can process diverse inputs such as images, text, 3D, semantics and other sensory data to solve a wide variety of real-world tasks. In 2023/2024, I interned at Google working on vision-language models with Federico Tombari. I received my M.S. and B.S. in Electrical Engineering from METU, where I was advised by Figen Oktem.

Email  /  Google Scholar  /  Github  /  LinkedIn  /  Twitter

Honors
Recent Work
How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks

R. Ramachandran, A. Garjani, R. Bachmann, A. Atanov*, O.F. Kar*, A. Zamir*
ICLR, 2026
[Website] [Code]

FlexTok: Resampling Images into 1D Token Sequences of Flexible Length

R. Bachmann*, J. Allardice*, D. Mizrahi*, E. Fini, O.F. Kar, E. Amirloo, A. El-Nouby, A. Zamir, A. Dehghan
ICML, 2025
[Website] [Code] [Demo]

4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities

O.F. Kar*, R. Bachmann*, D. Mizrahi*, A. Garjani, M. Gao, D. Griffiths, J. Hu, A. Dehghan, A. Zamir
NeurIPS, 2024
[Website] [Code] [Demo]

BRAVE: Broadening the visual encoding of vision-language models

O.F. Kar, A. Tonioni, P. Poklukar, A. Kulshrestha, A. Zamir, F. Tombari
ECCV, 2024 [Oral, Top 2%]
[Website]

Unraveling the Key Components of OOD Generalization via Diversification

H. Benoit*, L. Jiang*, A. Atanov*, O.F. Kar, M. Rigotti, A. Zamir
ICLR, 2024
[arXiv]

4M: Massively Multimodal Masked Modeling

D. Mizrahi*, R. Bachmann*, O.F. Kar, T. Yeo, M. Gao, A. Dehghan, A. Zamir
NeurIPS, 2023 [Spotlight, Top 4%]
[Website]

Rapid Network Adaptation: Learning to Adapt Neural Networks Using Test-Time Feedback

T. Yeo, O.F. Kar, Z. Sodagar, A. Zamir
ICCV, 2023
[Website]

3D Common Corruptions and Data Augmentation

O.F. Kar, T. Yeo, A. Atanov, A. Zamir
CVPR, 2022 [Oral, Top 4%]
[Website] [Code] [Video] [Live Demo] [TrustML Talk]

Robustness via Cross-domain Ensembles

O.F. Kar*, T. Yeo*, A. Zamir
ICCV, 2021 [Oral, Top 3%]
[Website] [Code] [Video] [Slides]

Robust Learning Through Cross-task Consistency

A. Zamir*, A. Sax*, T. Yeo, O.F. Kar, N. Cheerla, R. Suri, Z. Cao, J. Malik, L. Guibas
Arxiv, 2020. CVPR, 2020 [Best Paper Award Nominee, Oral]
[Live Demo] [Visuals] [Website] [Code] [ECCV 2020 Demo Video]

M.S. Work (2018-2021)

(Complete list on Google Scholar)

High-resolution Multi-spectral Imaging with Diffractive Lenses and Learned Reconstruction

F.S. Oktem, O.F. Kar, C. D. Bezek, F. Kamalabadi
IEEE Transactions on Computational Imaging, 2021
[Arxiv]

Compressive Spectral Imaging with Diffractive Lenses
O.F. Kar, F.S. Oktem
Optics Letters, 2019
[arXiv]

Real-time Compressive Video Reconstruction for Spatial Multiplexing Cameras

O.F. Kar, A. Gungor, H.E. Guven
IEEE GLOBALSIP, 2019
[Visuals]

Learning-based Regularization for Spatial Multiplexing Cameras
O.F. Kar, A. Gungor, H.E. Guven
IEEE GLOBALSIP, 2019

A Transform Learning-based Deconvolution Technique with Super-resolution and Microscanning Applications
A. Gungor*, O.F. Kar*
IEEE ICIP, 2019

A Matrix-free Reconstruction Method for Compressive Focal Plane Array Imaging
A. Gungor, O.F. Kar, H.E. Guven
IEEE ICIP, 2018


Template

Last Update: March 2026