Nvidia is breaking the trend of sesame street themed models with the new MegatronLM.

Hopefully it's good at approximating the global optimus prime.

But also 8.3 billion parameters? Seriously? I can't but help feel like this is designed to sell Tesla GPUs. That's gonna take a lot of VRAM.

