Exploring Machine Learning Approaches for Fine-tuning LLaMA models

At Dwarves, we've been increasingly exposed to more state-of-the-art news coming from AI than ever before, of course, related to Large Language Models (LLM). We've had a taste of what AI has to offer with Stable Diffusion and more commercial apps, and have been eager to learn and hone our skillsets in applying these new AI breakthroughs in our everyday lives and our apps.

Introduction

If 2021 was the year of blockchain, it's probably safe to say that 2023 is the year of generative AI. The pace and progress of AI, and by extension AGI, is becoming very hard to keep up. Apps using OpenAI ChatGPT are just saturating the market, but there are already fears that ChatGPT plugins could take over a good majority of their use cases.

There has also been an increasing amount of interest in custom LLaMA models, almost a similar trend to what we saw with Stable Diffusion against DALL-E. The landscape for LLMs has been progressing at a neck-breaking pace, with the mean time for outdated AI news becoming closer to within a single day.

We're at a point where everything is moving fast and no one is yet an expert in the field of AI. We felt that we would get left behind if we at least didn't take a look at the technical side of AI, which eventually motivated our research in LLMs.

Prior Research

For AI, a lot of us at Dwarves use available tools to help us to do extensive learning, get over writer's block, experiment, and generally make our lives a little bit easier. A handful of us, including myself, have dabbled a bit in Stable Diffusion, mostly to create fun pictures, but also to help us get an idea of the current landscape of generative art.

For research on LLMs, we've investigated vector databases and how to apply a basic form of indexing on them for use with OpenAI. You can check out our basic example at https://df-doc-search.vercel.app/ and ask it some questions about our company, although don't expect too much 😶.

Likewise, we've created a few Jupyter notebooks working on Langchain and what strategies and utilities we use from it to generate more directed results. You can view some of what we've worked on and noted here:

Problem

Using OpenAI is great, but we will eventually find ourselves needing to use more private LLM models. Unlike Microsoft's Azure, a lot of companies don't have the opportunity or financial resources to make deals with OpenAI for data security and fine-tuning privacy for their foundational AIs. Along with efforts on engineering prompts with Langchain, we want to eventually fine-tune our own LLMs to suit more specialized needs to then pipeline them together for more complex use cases in the future.

While we want to fine-tune more private (and of course personal) LLMs, we want to do it in a way that doesn't reinvent the wheel and break the bank. We don't want to spend thousands of dollars just to recreate something that ChatGPT already does. There already has been huge progress in the open-source community with Dolly 2.0 and StableLM and we're not going to win the race on base models even if we joined.

Adapter fine-tuning with PEFT LoRA

One novel approach to enhancing the performance of LLMs involves the fine-tuning of LLaMA models using a technique called PEFT LoRA (Parameter-Efficient Fine-Tuning with Layer Rotation Attention). PEFT LoRA offers a cost-effective and efficient way to adapt models with very little data, given a strong instruction model. It is very similar to Dreambooth LoRA for Stable Diffusion, but with much less hassle.

Introduction

Prior Research

Problem

Adapter fine-tuning with PEFT LoRA

How does it work?