Qwen-VL
Qwen Visual Language model series from Alibaba. Strong at multilingual visual understanding, document parsing, and chart reading. Available as open weights on HuggingFace. Runs via vLLM.
InternVL2
InternVL2 series from Shanghai AI Lab — consistently top-ranked on open-source multimodal benchmarks. Strong at document understanding, chart analysis, and multi-image reasoning.