Big Data

Building your Generative AI apps with Meta’s Llama 2 and Databricks


Today, Meta released their latest state-of-the-art large language model (LLM) Llama 2 to open source for commercial use1. This is a significant development for open source AI and it has been exciting to be working with Meta as a launch partner. We were able to try Llama 2 models in advance and have been impressed with it’s capabilities and all the possible applications.

Earlier this year, Meta released LLaMA, which significantly advanced the frontier of Open Source (OSS) LLMs. Although the v1 models are not for commercial use, they greatly accelerated generative AI and LLM research. Alpaca and Vicuna demonstrated that with high-quality instruction-following and chat data, LLaMA can be fine tuned to behave like ChatGPT. Based on this research finding, Databricks created and released the databricks-dolly-15k instruction-following dataset for commercial use. LLaMA-Adapter and QLoRA introduced parameter-efficient fine-tuning methods that can fine tune LLaMA models at low cost on consumer GPUs. Llama.cpp ported LLaMA models to run efficiently on a MacBook with 4-bit integer quantization.

In parallel, there have been multiple open source efforts to produce similar or higher quality models than LLaMA for commercial use to enable enterprises to leverage LLMs. MPT-7B released by MosaicML became the first OSS LLM for commercial use that is comparable to LLaMA-7B, with additional features, such asALiBi for longer context lengths. Since then, we have seen a growing number of OSS models released with permissive licenses like Falcon-7B and 40B, OpenLLaMA-3B, 7B, and 13B, and MPT-30B.

Newly released Llama 2 models will not only further accelerate the LLM research work but also enable enterprises to build their own generative AI applications. Llama 2 includes 7B, 13B and 70B models, trained on more tokens than LLaMA, as well as the fine-tuned variants for instruction-following and chat. 

Complete ownership of your generative AI applications

Llama 2 and other state-of-the-art commercial-use OSS models like MPT offer a key opportunity for enterprises to own their models and hence fully own their generative AI applications. When used appropriately, use of OSS models can provide several benefits compared with proprietary SaaS models:

  • No vendor lock-in or forced deprecation schedule
  • Ability to  fine-tune with enterprise data, while retaining full access to the trained model
  • Model behavior does not change over time
  • Ability to serve a private model instance inside of trusted infrastructure
  • Tight control over correctness, bias, and performance of generative AI applications

At Databricks, we see many customers embracing open source LLMs for various Generative AI use cases. As the quality of OSS models continue to improve rapidly, we increasingly see customers experimenting with these models to compare quality, cost, reliability, and security with API-based models.

Developing with Llama 2 on Databricks

Llama 2 models are available now and you can try them on Databricks easily. We provide example notebooks to show how to use Llama 2 for inference, wrap it with a Gradio app, efficiently fine tune it with your data, and log models into MLflow.

Serving Llama 2

To make use of your fine-tuned and optimized Llama 2 model, you’ll also need the ability to deploy this model across your organization or integrate it into your AI powered applications. 

Databricks Model Serving offering supports serving LLMs on GPUs in order to provide the best latency and throughput possible for commercial applications. All it takes to deploy your fine-tuned LLaMA model is to create a Serving Endpoint and include your MLflow model from the Unity Catalog or Model Registry in your endpoint’s configuration. Databricks will construct a production-ready environment for your model, and you’ll be ready to go! Your endpoint will scale with your traffic.

Sign up for preview access to GPU-powered Model Serving!

Databricks also offers optimized LLM Serving for enterprises who need the best possible latency and throughput for OSS LLM models – we will be adding support for Llama 2 as a part of our product so that enterprises who choose Llama 2 can get best-in-class performance.

There are some restrictions. See Llama 2 license for details.