mirror of
https://github.com/marvinscham/masterthesis-playground.git
synced 2026-03-22 08:22:43 +01:00
RAFT updates, BERTopic config, cleanup
This commit is contained in:
@@ -1,17 +1,21 @@
|
||||
# Retrieval-Augmented Finetuning (RAFT)
|
||||
|
||||
**Ablauf**:
|
||||
## Voraussetzungen
|
||||
|
||||
- Generelles Preprocessing (Voraussetzung für BERTopic)
|
||||
- BERTopic
|
||||
- Klassifikation muss durchgeführt sein, `data/intermediate/culture_reviews.csv` muss existieren
|
||||
|
||||
## Vorbereiten des Retrieval-Corpus
|
||||
|
||||
```bash
|
||||
python prepare_corpus.py --input_tab ../data/intermediate/selected_topics_documents.csv --out_dir out
|
||||
python prepare_corpus.py --input_tab ../data/intermediate/culture_reviews.csv --out_dir out
|
||||
```
|
||||
|
||||
## Erstellen des RAFT-Datensatzes
|
||||
|
||||
```bash
|
||||
python make_raft_data.py --out_dir out --n_examples 100
|
||||
python make_raft_data.py --out_dir out --n_examples 10
|
||||
```
|
||||
|
||||
## Training der QLoRA-Adapter
|
||||
|
||||
Reference in New Issue
Block a user