Method | VQA V2 | OK-VQA | VizWiZ | HatefulMems |
COSMOE-8B PT | 47.2 | 32.7 | 22.5 | 57.1 |
COSMOE-8x7b PT | 49.5 | 42.2 | 21.6 | 63.5 |
COSMOE-8x7b SFT | 53.4 | 38.5 | 30.4 | 64.2 |
IDEFICS-80B SFT | 37.4 | 36.9 | 26.2 | 58.9 |
@article{wang2024cosmo,
title={COSMO: Contrastive Streamlined Multimodal Model with Interleaved Pre-Training},
author={Wang, Alex Jinpeng and Li, Linjie and Lin, Kevin Qinghong and Wang Jianfeng and Lin, Kevin and Yang, Zhengyuan and Wang, Lijuan and Shou, Mike Zheng},
journal={arXiv preprint arXiv:2401.00849},
year={2024}
}
|