Nvidia introduces NVLM large language model, and it’s open source

More than just a chip maker, Nvidia has entered the open-source AI space with its NVLM 1.0 model.

Nvidia recently introduced its flagship multimodal large language model NVLM-D-72B, which, as the name suggests, contains 72 billion parameters. 

As Nvidia researchers detailed in their paper, NVLM-D-72B is designed for complex tasks, processing both visual and textual information with exceptional proficiency. Its release could challenge industry leaders like OpenAI, Meta, and Google, especially since Nvidia’s model is open-source—a rarity among today’s top AI models.

“We introduce NVLM-1.0, a family of frontier multimodal large language models that achieve state-of-the-art results on vision-language tasks, rivaling leading multimodal LLMs,” the researchers wrote.

Unlike many multimodal models, NVLM-D-72B not only excels at vision-language tasks but is also said to improve text-only task performance, showing a 4.3-point average gain on text benchmarks after multimodal training. 

By making its model weights publicly available and releasing its training code, Nvidia is seeking to empower developers and researchers in an apparent bid to encourage further innovation in AI (with the bottom line likely being the sale of even more chips). This move could push other companies to consider more open-source initiatives, which could potentially transform the AI landscape.

However, the open-source nature of such powerful tools also raises concerns about the potential misuse of AI, sparking debates about the need for regulation and safety measures.

Share this Post:

Accessibility Toolbar