Large Language and Vision Assistant — connects a vision encoder to an LLM for instruction-following with images. OSS research model widely used as a multimodal base. Runs via Ollama.
Choose LLaVA when…
•You want an open-source multimodal model for self-hosted deployment
•You're doing research on vision-language instruction following
•You need a well-documented baseline for multimodal tasks
Builder Slot
How does your AI see and understand images?Optional for most stacks
Vision-language models for image understanding, captioning, visual QA, and document parsing