RAFT updates, BERTopic config, cleanup

This commit is contained in:
2026-02-21 01:57:14 +01:00
parent 8cadcb1f69
commit 1a99b53d44
12 changed files with 10750 additions and 9778 deletions

View File

@@ -1,17 +1,21 @@
# Retrieval-Augmented Finetuning (RAFT)
**Ablauf**:
## Voraussetzungen
- Generelles Preprocessing (Voraussetzung für BERTopic)
- BERTopic
- Klassifikation muss durchgeführt sein, `data/intermediate/culture_reviews.csv` muss existieren
## Vorbereiten des Retrieval-Corpus
```bash
python prepare_corpus.py --input_tab ../data/intermediate/selected_topics_documents.csv --out_dir out
python prepare_corpus.py --input_tab ../data/intermediate/culture_reviews.csv --out_dir out
```
## Erstellen des RAFT-Datensatzes
```bash
python make_raft_data.py --out_dir out --n_examples 100
python make_raft_data.py --out_dir out --n_examples 10
```
## Training der QLoRA-Adapter