Release of LLM-jp-3 172B alpha1 and alpha2

2024.10.04 Achievements

The Research and Development Center for Large Language Models (LLMC) at the National Institute of Informatics has been working on developing an open GPT-3-class large-scale language model, “LLM-jp-3 172B,” optimized for Japanese.

Today, we are releasing two early-stage models, “LLM-jp-3 172B alpha1 (trained with 0.7 trillion tokens)” and “LLM-jp-3 172B alpha2 (trained with 1.4 trillion tokens),” which did not achieve the expected performance due to issues with the training settings*. We are also simultaneously releasing versions of these models that have been instruction-tuned. While it has been confirmed that the performance of these models is significantly lower than previously released models, we believe they can still be useful for research purposes and are making them available to the public.

For more details on the models, please visit the following links:

* The issue pertains to the ε parameter, which is part of the AdamW optimization algorithm. For more details, please refer to the presentation materials from the Model Construction WG at the 11th LLM-jp Meeting. We are preparing a technical report on this issue.