Apr 19, 2024
Paper page — Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment
Posted by Cecile G. Tamura in category: futurism
Google presents Reuse Your Rewards.
Reward model transfer for zero-shot cross-lingual alignment.
Aligning language models (LMs) based on human-annotated preference data is a crucial step in obtaining practical and performant LM-based systems.