🔥Released Models Version <0.1.0-beta> (16/05/23)

Version 0.1.0-beta (16 May 2023)

Demo:

https://colab.research.google.com/drive/1nZ6Vc2U6rOezsMxarGJY7oDw8RG5E4r_?usp=sharing#scrollTo=lsOjziA3Dpptarrow-up-right

Change Logs

Version 0.1.0-beta (Facebook LLama Model)

Release date: 16 May 2023

OpenThaiGPT Version 0.1.0-beta is a 7B-parameter LLaMA model finetuned to follow Thai translated instructions below and makes use of the Huggingface LLaMA implementation.

Statistics

Number of parameters: 7B Dimension: 4096 Max Length Token: 2048 n heads: 32 n layers: 32 n tokens: 1T

License

Source Code: License Apache Software License 2.0. Weight: For research use only (due to the Facebook LLama's Weight LICENSE). Note that: A commercial use license for OpenThaiGPT 0.1.0 weight will be released later soon!

Code and Weight

Finetune Code: https://github.com/OpenThaiGPT/openthaigpt-finetune-010betaarrow-up-right Inference Library: https://github.com/OpenThaiGPT/openthaigptarrow-up-right Weight (Lora Adapter): https://huggingface.co/kobkrit/openthaigpt-0.1.0-betaarrow-up-right

Authors

Kobkrit Viriyayudhakorn ([email protected]), Sumeth Yuenyong ([email protected]) and Thaweewat Rugsujarit ([email protected]).

Trained Datasets

Dataset Name
Instruction Pairs
Descriptions

43,000

Alpaca Finance Instruction translated into Thai by Thaweewat Ruksujarit.

600

RD's Tax QA Chatbot Training set by ทรงวุฒิ บุรงค์

4,000

iApp Technology's Extractive QA Dataset in Thai language

15,000

Databrick's Dolly Instruction translated into Thai by Thaweewat Ruksujarit.

52,000

Instruction Wild's translated into Thai by Thaweewat Ruksujarit.

51,000

Standford Alpaca's translated into Thai by Thaweewat Ruksujarit.

20,000

GPT Teacher's Instruction translated into Thai by Thaweewat Ruksujarit.

24,000

Hello Simple AI Summary Dataset translated into Thai by Thaweewat Ruksujarit.

5,000

Thai SelfInstruct Dataset (Automatic Generated) by OpenThaiGPT

---

Version 0.1.0-alpha (ByT5-XL Model)

Release date: 24 April 2023 PoC Testing Website: https://colab.research.google.com/drive/1Uds0ioOZSZrJ9m2FgW3DHlqVRFNHVRtu#scrollTo=qPJIpwuz4ltFarrow-up-right Model and Weight: https://huggingface.co/kobkrit/openthaigpt-0.1.0-alphaarrow-up-right PIP Installation Page: https://pypi.org/project/openthaigpt/arrow-up-right Code Example: https://colab.research.google.com/drive/1Uds0ioOZSZrJ9m2FgW3DHlqVRFNHVRtu#scrollTo=qPJIpwuz4ltFarrow-up-right ----

OpenThaiGPT version 0.1.0-alpha

Thai First 3 billion params models

  • First Thai Byte-Level Text-to-Text Transfer Transformer

  • Support Instruction following

    • Translation to Thai

    • Explanation

    • Paraphase

  • Zero-shot and Few-shot Learning

  • Pretraining Model: ByT5-XL (3.74 billion params)

  • InstructDataset: 50,000 Thai SelfInstruct

  • RLHF: None

  • Developer: Sumeth Yuenyong, Kobkrit Viriyayudhakorn ([email protected])

PoC Version 0.0.4 (The Fourth PoC Version)

Release date: 12 March 2023 PoC Testing Website: https://colab.research.google.com/drive/13yLIifBRDQp82QO4ICs_aEvz0N8tqVPm?usp=sharinarrow-up-right Model and Weight: https://huggingface.co/kobkrit/openthaigpt-gpt2-instructgpt-poc-0.0.4arrow-up-right PIP Installation Page: https://pypi.org/project/openthaigpt/arrow-up-right Code Example: https://github.com/OpenThaiGPT/openthaigpt-examplearrow-up-right ----

OpenThaiGPT version 0.0.4

The Fourth PoC Model

  • ตอบคำถามได้ลงรายละเอียดมากขึ้น และตอบคำถามได้ดีขึ้นกว่า 0.0.3 เป็นส่วนมาก

  • Pretraining Model: GPT-2 Thai-base

  • InstructDataset: 300,000 Pantip + 5,000 Wiki QA => 12,920 Thai InstructGPT

  • RLHF: None

  • Developer: Kobkrit Viriyayudhakorn ([email protected])

PoC Version 0.0.3 (The Third PoC Version)

Release date: 28 February 2023 Model and Weight: https://huggingface.co/kobkrit/openthaigpt-gpt2-instructgpt-poc-0.0.3arrow-up-right PIP Installation Page: https://pypi.org/project/openthaigpt/arrow-up-right Code Example: https://github.com/OpenThaiGPT/openthaigpt-examplearrow-up-right ----

OpenThaiGPT version 0.0.3

The Third PoC Model

  • Pretraining Model: GPT-2 Thai-base

  • InstructDataset: 300,000 Pantip + 5,000 Wiki QA => 7,000 Thai InstructGPT

  • RLHF: None

  • Developer: Kobkrit Viriyayudhakorn ([email protected])

PoC Version 0.0.2 (The Second PoC Version)

Release date: 27 February 2023 Model and Weight: https://huggingface.co/kobkrit/openthaigpt-gpt2-instructgpt-poc-0.0.2arrow-up-right PIP Installation Page: {Coming Soon} Colab Example: {Coming Soon} ----

OpenThaiGPT version 0.0.2

The Second PoC Model

  • Pretraining Model: GPT-2 Thai-base

  • InstructDataset: 7,000 Thai InstructGPT

  • RLHF: None

Developer: Kobkrit Viriyayudhakorn ([email protected])

PoC Version 0.0.1 (Very First PoC Version)

Release date: 20 February 2023 Model and Weight: openthaigpt-gpt2-pantipwiki-pocarrow-up-right PIP Installation Page: {Coming Soon} Colab Example: {Coming Soon} ----

The Very First PoC Model

  • Pretraining Model: GPT-2 Thai-base

  • InstructDataset: 298,678 QA Pairs getting from 70,000 Pantip katoos + Wikipedia QA by iApp

  • RLHF: None

  • Developer: Kobkrit Viriyayudhakorn ([email protected])

Last updated

Was this helpful?