{ "cells": [ { "cell_type": "markdown", "id": "2957fcef", "metadata": {}, "source": [ "\n", "# RAFT Supervised Fine-Tuning (QLoRA) — Local Training\n", "\n", "This notebook fine-tunes an open-source base model on a RAFT-style dataset (`input` → `output`) using **QLoRA** with **PEFT** and **Transformers**. It is designed to run locally (single or multi-GPU) and to export both **LoRA adapters** and (optionally) a **merged** model for inference.\n", "\n", "> **Assumptions**\n", "> - Your dataset lives at `./outputs/raft_dataset.jsonl` (from the previous notebook). Adjust the path if needed.\n", "> - You have a CUDA-capable GPU and can install `bitsandbytes`. (CPU training is possible but slow.)\n", "> - You have enough VRAM for the chosen base model when loaded in 4-bit NF4.\n" ] }, { "cell_type": "markdown", "id": "202f729e", "metadata": {}, "source": [ "## 0) Install dependencies" ] }, { "cell_type": "code", "execution_count": null, "id": "2da670d5", "metadata": {}, "outputs": [], "source": [ "\n", "# If needed, uncomment the following installs:\n", "# %pip install --quiet transformers==4.44.2 datasets==2.20.0 peft==0.12.0 accelerate==0.34.2 bitsandbytes==0.43.3 evaluate==0.4.2 sentencepiece==0.2.0\n", "# Optional extras:\n", "# %pip install --quiet trl==0.9.6 sacrebleu==2.4.3 rouge-score==0.1.2\n" ] }, { "cell_type": "markdown", "id": "1c047191", "metadata": {}, "source": [ "## 1) Configuration" ] }, { "cell_type": "code", "execution_count": null, "id": "f8c8d385", "metadata": {}, "outputs": [], "source": [ "\n", "from pathlib import Path\n", "\n", "# Paths\n", "DATA_JSONL = Path(\"./outputs/raft_dataset.jsonl\") # change if different\n", "RUN_NAME = \"raft_qlora_run\"\n", "OUTPUT_DIR = Path(f\"./finetuned/{RUN_NAME}\")\n", "OUTPUT_DIR.mkdir(parents=True, exist_ok=True)\n", "\n", "# Base model — examples: \"meta-llama/Llama-3.1-8B\", \"Qwen/Qwen2-7B-Instruct\", \"mistralai/Mistral-7B-Instruct-v0.3\"\n", "# Prefer an instruction-tuned base for better stability on SFT.\n", "BASE_MODEL = \"mistralai/Mistral-7B-Instruct-v0.3\"\n", "\n", "# Tokenization/prompt formatting\n", "SYSTEM_PREFIX = \"You are a helpful assistant. Answer concisely and truthfully based ONLY on the user's request.\"\n", "USE_CHAT_TEMPLATE = True # if the tokenizer has a chat template, we'll leverage it\n", "\n", "# QLoRA/PEFT params\n", "LORA_R = 16\n", "LORA_ALPHA = 32\n", "LORA_DROPOUT = 0.05\n", "TARGET_MODULES = None # None = let PEFT auto-detect common modules (works for most models)\n", "\n", "# 4-bit quantization (QLoRA)\n", "LOAD_IN_4BIT = True\n", "BNB_4BIT_COMPUTE_DTYPE = \"bfloat16\" # \"float16\" or \"bfloat16\"\n", "BNB_4BIT_QUANT_TYPE = \"nf4\" # \"nf4\" or \"fp4\"\n", "BNB_4BIT_USE_DOUBLE_QUANT = True\n", "\n", "# Training\n", "TRAIN_VAL_SPLIT = 0.98\n", "MAX_SEQ_LEN = 2048\n", "PER_DEVICE_TRAIN_BATCH = 1\n", "PER_DEVICE_EVAL_BATCH = 1\n", "GRADIENT_ACCUM_STEPS = 16\n", "LEARNING_RATE = 2e-4\n", "NUM_TRAIN_EPOCHS = 2\n", "WEIGHT_DECAY = 0.0\n", "WARMUP_RATIO = 0.03\n", "LR_SCHEDULER_TYPE = \"cosine\"\n", "LOGGING_STEPS = 10\n", "EVAL_STEPS = 200\n", "SAVE_STEPS = 200\n", "BF16 = True\n", "FP16 = False\n", "\n", "SEED = 42\n" ] }, { "cell_type": "markdown", "id": "6c1439a8", "metadata": {}, "source": [ "## 2) Load dataset (JSONL)" ] }, { "cell_type": "code", "execution_count": null, "id": "f43262fc", "metadata": {}, "outputs": [], "source": [ "\n", "import json, random\n", "from datasets import Dataset\n", "\n", "def read_jsonl(p: Path):\n", " rows = []\n", " with p.open(\"r\", encoding=\"utf-8\") as f:\n", " for line in f:\n", " line = line.strip()\n", " if not line:\n", " continue\n", " try:\n", " obj = json.loads(line)\n", " if \"input\" in obj and \"output\" in obj:\n", " rows.append(obj)\n", " except Exception:\n", " pass\n", " return rows\n", "\n", "rows = read_jsonl(DATA_JSONL)\n", "print(f\"Loaded {len(rows)} rows from {DATA_JSONL}\")\n", "\n", "random.Random(SEED).shuffle(rows)\n", "split = int(len(rows) * TRAIN_VAL_SPLIT)\n", "train_rows = rows[:split]\n", "val_rows = rows[split:] if split < len(rows) else rows[-max(1, len(rows)//50):]\n", "\n", "train_ds = Dataset.from_list(train_rows)\n", "eval_ds = Dataset.from_list(val_rows) if val_rows else None\n", "train_ds, eval_ds\n" ] }, { "cell_type": "markdown", "id": "2dd30f5a", "metadata": {}, "source": [ "## 3) Prompt formatting" ] }, { "cell_type": "code", "execution_count": null, "id": "155aad2a", "metadata": {}, "outputs": [], "source": [ "\n", "from transformers import AutoTokenizer\n", "\n", "tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, use_fast=True)\n", "if tokenizer.pad_token is None:\n", " tokenizer.pad_token = tokenizer.eos_token\n", "\n", "def format_example(ex):\n", " user = ex[\"input\"]\n", " assistant = ex[\"output\"]\n", "\n", " if USE_CHAT_TEMPLATE and hasattr(tokenizer, \"apply_chat_template\"):\n", " messages = [\n", " {\"role\": \"system\", \"content\": SYSTEM_PREFIX},\n", " {\"role\": \"user\", \"content\": user},\n", " {\"role\": \"assistant\", \"content\": assistant},\n", " ]\n", " text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)\n", " else:\n", " text = f\"[SYSTEM]\\n{SYSTEM_PREFIX}\\n[/SYSTEM]\\n[USER]\\n{user}\\n[/USER]\\n[ASSISTANT]\\n{assistant}\"\n", " return {\"text\": text}\n", "\n", "train_ds_fmt = train_ds.map(format_example, remove_columns=train_ds.column_names)\n", "eval_ds_fmt = eval_ds.map(format_example, remove_columns=eval_ds.column_names) if eval_ds else None\n", "\n", "print(train_ds_fmt[0][\"text\"][:400])\n" ] }, { "cell_type": "markdown", "id": "4a9f30a8", "metadata": {}, "source": [ "## 4) Tokenize" ] }, { "cell_type": "code", "execution_count": null, "id": "0f7eaa2c", "metadata": {}, "outputs": [], "source": [ "\n", "def tokenize(batch):\n", " return tokenizer(\n", " batch[\"text\"],\n", " truncation=True,\n", " max_length=MAX_SEQ_LEN,\n", " padding=\"max_length\",\n", " return_tensors=None,\n", " )\n", "\n", "train_tok = train_ds_fmt.map(tokenize, batched=True, remove_columns=train_ds_fmt.column_names)\n", "eval_tok = eval_ds_fmt.map(tokenize, batched=True, remove_columns=eval_ds_fmt.column_names) if eval_ds_fmt else None\n", "\n", "train_tok = train_tok.rename_column(\"input_ids\", \"input_ids\")\n", "train_tok = train_tok.add_column(\"labels\", train_tok[\"input_ids\"])\n", "if eval_tok:\n", " eval_tok = eval_tok.add_column(\"labels\", eval_tok[\"input_ids\"])\n", "\n", "train_tok, (eval_tok[0]['input_ids'][:10] if eval_tok else [])\n" ] }, { "cell_type": "markdown", "id": "5f53fc1e", "metadata": {}, "source": [ "## 5) Load base model with 4-bit quantization and prepare QLoRA" ] }, { "cell_type": "code", "execution_count": null, "id": "a21d625f", "metadata": {}, "outputs": [], "source": [ "\n", "import torch\n", "from transformers import AutoModelForCausalLM, BitsAndBytesConfig\n", "from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training\n", "\n", "bnb_config = None\n", "if LOAD_IN_4BIT:\n", " bnb_config = BitsAndBytesConfig(\n", " load_in_4bit=True,\n", " bnb_4bit_use_double_quant=BNB_4BIT_USE_DOUBLE_QUANT,\n", " bnb_4bit_quant_type=BNB_4BIT_QUANT_TYPE,\n", " bnb_4bit_compute_dtype=getattr(torch, BNB_4BIT_COMPUTE_DTYPE)\n", " )\n", "\n", "model = AutoModelForCausalLM.from_pretrained(\n", " BASE_MODEL,\n", " quantization_config=bnb_config,\n", " torch_dtype=torch.bfloat16 if BF16 else (torch.float16 if FP16 else None),\n", " device_map=\"auto\",\n", ")\n", "\n", "model = prepare_model_for_kbit_training(model)\n", "\n", "peft_config = LoraConfig(\n", " r=LORA_R,\n", " lora_alpha=LORA_ALPHA,\n", " lora_dropout=LORA_DROPOUT,\n", " bias=\"none\",\n", " task_type=\"CAUSAL_LM\",\n", " target_modules=TARGET_MODULES,\n", ")\n", "\n", "model = get_peft_model(model, peft_config)\n", "model.print_trainable_parameters()\n" ] }, { "cell_type": "markdown", "id": "b081dbd3", "metadata": {}, "source": [ "## 6) Train" ] }, { "cell_type": "code", "execution_count": null, "id": "3afd65f7", "metadata": {}, "outputs": [], "source": [ "\n", "from transformers import TrainingArguments, Trainer, DataCollatorForLanguageModeling\n", "import math\n", "\n", "data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)\n", "\n", "args = TrainingArguments(\n", " output_dir=str(OUTPUT_DIR),\n", " run_name=RUN_NAME,\n", " num_train_epochs=NUM_TRAIN_EPOCHS,\n", " per_device_train_batch_size=PER_DEVICE_TRAIN_BATCH,\n", " per_device_eval_batch_size=PER_DEVICE_EVAL_BATCH,\n", " gradient_accumulation_steps=GRADIENT_ACCUM_STEPS,\n", " learning_rate=LEARNING_RATE,\n", " lr_scheduler_type=LR_SCHEDULER_TYPE,\n", " warmup_ratio=WARMUP_RATIO,\n", " weight_decay=WEIGHT_DECAY,\n", " logging_steps=LOGGING_STEPS,\n", " evaluation_strategy=\"steps\",\n", " eval_steps=EVAL_STEPS,\n", " save_steps=SAVE_STEPS,\n", " save_total_limit=2,\n", " bf16=BF16,\n", " fp16=FP16,\n", " gradient_checkpointing=True,\n", " report_to=[\"none\"],\n", " seed=SEED,\n", ")\n", "\n", "trainer = Trainer(\n", " model=model,\n", " tokenizer=tokenizer,\n", " args=args,\n", " train_dataset=train_tok,\n", " eval_dataset=eval_tok,\n", " data_collator=data_collator,\n", ")\n", "\n", "train_result = trainer.train()\n", "metrics = trainer.evaluate() if eval_tok else {}\n", "perplexity = math.exp(metrics[\"eval_loss\"]) if metrics and \"eval_loss\" in metrics else None\n", "metrics, perplexity\n" ] }, { "cell_type": "markdown", "id": "e22700a2", "metadata": {}, "source": [ "## 7) Save LoRA adapters" ] }, { "cell_type": "code", "execution_count": null, "id": "efc434ce", "metadata": {}, "outputs": [], "source": [ "\n", "adapter_dir = OUTPUT_DIR / \"lora_adapter\"\n", "adapter_dir.mkdir(parents=True, exist_ok=True)\n", "\n", "model.save_pretrained(str(adapter_dir))\n", "tokenizer.save_pretrained(str(adapter_dir))\n", "\n", "print(f\"Saved LoRA adapter to: {adapter_dir}\")\n" ] }, { "cell_type": "markdown", "id": "afb33cae", "metadata": {}, "source": [ "## 8) (Optional) Merge adapters into base model and save full weights" ] }, { "cell_type": "code", "execution_count": null, "id": "dc6ccdee", "metadata": {}, "outputs": [], "source": [ "\n", "DO_MERGE = False # set True to produce a standalone merged model\n", "\n", "if DO_MERGE:\n", " from peft import PeftModel\n", " base_model = AutoModelForCausalLM.from_pretrained(\n", " BASE_MODEL,\n", " torch_dtype=torch.bfloat16 if BF16 else (torch.float16 if FP16 else None),\n", " device_map=\"auto\",\n", " )\n", " merged = PeftModel.from_pretrained(base_model, str(adapter_dir)).merge_and_unload()\n", " merged_dir = OUTPUT_DIR / \"merged_model\"\n", " merged.save_pretrained(str(merged_dir))\n", " tokenizer.save_pretrained(str(merged_dir))\n", " print(f\"Merged full model saved to: {merged_dir}\")\n", "else:\n", " print(\"Skipping merge (set DO_MERGE=True to enable).\")\n" ] }, { "cell_type": "markdown", "id": "010055a7", "metadata": {}, "source": [ "## 9) Quick inference with the trained adapter" ] }, { "cell_type": "code", "execution_count": null, "id": "40f3a8a5", "metadata": {}, "outputs": [], "source": [ "\n", "from peft import PeftModel\n", "import torch\n", "\n", "test_model = AutoModelForCausalLM.from_pretrained(\n", " BASE_MODEL,\n", " quantization_config=bnb_config,\n", " torch_dtype=torch.bfloat16 if BF16 else (torch.float16 if FP16 else None),\n", " device_map=\"auto\",\n", ")\n", "test_model = PeftModel.from_pretrained(test_model, str(adapter_dir))\n", "test_model.eval()\n", "\n", "def generate_answer(prompt, max_new_tokens=256, temperature=0.2, top_p=0.9):\n", " if USE_CHAT_TEMPLATE and hasattr(tokenizer, \"apply_chat_template\"):\n", " messages = [\n", " {\"role\": \"system\", \"content\": SYSTEM_PREFIX},\n", " {\"role\": \"user\", \"content\": prompt},\n", " ]\n", " model_inputs = tokenizer.apply_chat_template(messages, return_tensors=\"pt\", add_generation_prompt=True).to(test_model.device)\n", " else:\n", " text = f\"[SYSTEM]\\n{SYSTEM_PREFIX}\\n[/SYSTEM]\\n[USER]\\n{prompt}\\n[/USER]\\n[ASSISTANT]\\n\"\n", " model_inputs = tokenizer([text], return_tensors=\"pt\").to(test_model.device)\n", "\n", " with torch.no_grad():\n", " out = test_model.generate(\n", " **model_inputs,\n", " do_sample=True,\n", " max_new_tokens=max_new_tokens,\n", " temperature=temperature,\n", " top_p=top_p,\n", " eos_token_id=tokenizer.eos_token_id,\n", " pad_token_id=tokenizer.pad_token_id,\n", " )\n", " return tokenizer.decode(out[0], skip_special_tokens=True)\n", "\n", "sample_prompt = (train_rows[0][\"input\"] if len(train_rows)>0 else \"What are the visitor crowd levels like?\")\n", "print(generate_answer(sample_prompt)[:800])\n" ] }, { "cell_type": "markdown", "id": "3638b421", "metadata": {}, "source": [ "## 10) Light evaluation on the validation set" ] }, { "cell_type": "code", "execution_count": null, "id": "28129cf7", "metadata": {}, "outputs": [], "source": [ "\n", "import evaluate\n", "\n", "if eval_ds:\n", " rouge = evaluate.load(\"rouge\")\n", " preds, refs = [], []\n", " for ex in val_rows[:50]:\n", " preds.append(generate_answer(ex[\"input\"], max_new_tokens=192, temperature=0.0))\n", " refs.append(ex[\"output\"])\n", " results = rouge.compute(predictions=preds, references=refs)\n", " print(results)\n", "else:\n", " print(\"No eval split available; skipped.\")\n" ] }, { "cell_type": "markdown", "id": "1ca0d748", "metadata": {}, "source": [ "\n", "## 11) (Optional) Use with other runtimes\n", "\n", "- **Python Inference (PEFT)**: Load base model + adapter as shown in Section 9.\n", "- **Merged model**: Set `DO_MERGE=True` to create a standalone model directory; you can then convert to other runtimes (e.g., llama.cpp GGUF) using their conversion tools.\n", "- **Ollama**: If your runtime supports adapters or merged weights for the chosen base model, create a `Modelfile` pointing to them. Need a concrete path? Tell me your base and target runtime and I’ll add exact steps.\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "name": "python", "version": "3.x" } }, "nbformat": 4, "nbformat_minor": 5 }