text-generation-webui

Training Your Own LoRAs

The WebUI seeks to make training your own LoRAs as easy as possible. It comes down to just a few simple steps:

Step 1: Make a plan.

Step 2: Gather a dataset.

Step 3: Do the training.

Step 4: Evaluate your results.

Step 5: Re-run if you’re unhappy.

Format Files

If using JSON formatted datasets, they are presumed to be in the following approximate format:

[
    {
        "somekey": "somevalue",
        "key2": "value2"
    },
    {
        // etc
    }
]

Where the keys (eg somekey, key2 above) are standardized, and relatively consistent across the dataset, and the values (eg somevalue, value2) contain the content actually intended to be trained.

For Alpaca, the keys are instruction, input, and output, wherein input is sometimes blank.

A simple format file for Alpaca to be used as a chat bot is:

{
    "instruction,output": "User: %instruction%\nAssistant: %output%",
    "instruction,input,output": "User: %instruction%: %input%\nAssistant: %output%"
}

Note that the keys (eg instruction,output) are a comma-separated list of dataset keys, and the values are a simple string that use those keys with %%.

So for example if a dataset has "instruction": "answer my question", then the format file’s User: %instruction%\n will be automatically filled in as User: answer my question\n.

If you have different sets of key inputs, you can make your own format file to match it. This format-file is designed to be as simple as possible to enable easy editing to match your needs.

Raw Text File Settings

When using raw text files as your dataset, the text is automatically split into chunks based on your Cutoff Length you get a few basic options to configure them.

Parameters

The basic purpose and function of each parameter is documented on-page in the WebUI, so read through them in the UI to understand your options.

That said, here’s a guide to the most important parameter choices you should consider:

VRAM

Rank

Learning Rate and Epochs

Loss

When you’re running training, the WebUI’s console window will log reports that include, among other things, a numeric value named Loss. It will start as a high number, and gradually get lower and lower as it goes.

“Loss” in the world of AI training theoretically means “how close is the model to perfect”, with 0 meaning “absolutely perfect”. This is calculated by measuring the difference between the model outputting exactly the text you’re training it to output, and what it actually outputs.

In practice, a good LLM should have a very complex variable range of ideas running in its artificial head, so a loss of 0 would indicate that the model has broken and forgotten to how think about anything other than what you trained it.

So, in effect, Loss is a balancing game: you want to get it low enough that it understands your data, but high enough that it isn’t forgetting everything else. Generally, if it goes below 1.0, it’s going to start forgetting its prior memories, and you should stop training. In some cases you may prefer to take it as low as 0.5 (if you want it to be very very predictable). Different goals have different needs, so don’t be afraid to experiment and see what works best for you.

Note: if you see Loss start at or suddenly jump to exactly 0, it is likely something has gone wrong in your training process (eg model corruption).

Note: 4-Bit Monkeypatch

The 4-bit LoRA monkeypatch works for training, but has side effects:

Legacy notes

LoRA training was contributed by mcmonkey4eva in PR #570.

Using the original alpaca-lora code

Kept here for reference. The Training tab has much more features than this method.

conda activate textgen
git clone https://github.com/tloen/alpaca-lora

Edit those two lines in alpaca-lora/finetune.py to use your existing model folder instead of downloading everything from decapoda:

model = LlamaForCausalLM.from_pretrained(
    "models/llama-7b",
    load_in_8bit=True,
    device_map="auto",
)
tokenizer = LlamaTokenizer.from_pretrained(
    "models/llama-7b", add_eos_token=True
)

Run the script with:

python finetune.py

It just works. It runs at 22.32s/it, with 1170 iterations in total, so about 7 hours and a half for training a LoRA. RTX 3090, 18153MiB VRAM used, drawing maximum power (350W, room heater mode).