Source code: https://github.com/lopuhin/transformer-lm
Vocabulary size: 50000 tokens.
Model parameters:
{
"batch_size": 32,
"epochs": 10,
"g_accum_gradients": 1,
"hparams": {
"gradient_checkpointing": false,
"n_ctx": 64,
"n_embed": 768,
"n_head": 12,
"n_hidden": 768,
"n_layer": 8,
"n_vocab": 50000
},
"lr": 0.00025
}
© Anastasiia Lopukhina, Konstantin Lopukhin