No clear way to load models
Created by: stephenroller
🚀 Feature Request
Loading models is a bit of a pain right now. It's done differently in multiple scripts (including our internal eval scripts). Not all ways are compatible with all checkpoint forms.
This typically requires setting a TON of command line args based on what the model checkpoints need (--model-parallel
, --ddp-backend fully_sharded
, --distributed-port
, etc.). Many of these args can be picked up by just looking at the files.
Afterwards we should refactor a few scripts to use this One True Method