🔥Released Models Version <0.1.0-beta> (16/05/23)

Version 0.1.0-beta (16 May 2023)

Demo:

Change Logs

Version 0.1.0-beta (Facebook LLama Model)

Release date: 16 May 2023

OpenThaiGPT Version 0.1.0-beta is a 7B-parameter LLaMA model finetuned to follow Thai translated instructions below and makes use of the Huggingface LLaMA implementation.

Statistics

Number of parameters: 7B Dimension: 4096 Max Length Token: 2048 n heads: 32 n layers: 32 n tokens: 1T

License

Source Code: License Apache Software License 2.0. Weight: For research use only (due to the Facebook LLama's Weight LICENSE). Note that: A commercial use license for OpenThaiGPT 0.1.0 weight will be released later soon!

Code and Weight

Finetune Code: https://github.com/OpenThaiGPT/openthaigpt-finetune-010beta Inference Library: https://github.com/OpenThaiGPT/openthaigpt Weight (Lora Adapter): https://huggingface.co/kobkrit/openthaigpt-0.1.0-beta

Authors

Kobkrit Viriyayudhakorn ([email protected]), Sumeth Yuenyong ([email protected]) and Thaweewat Rugsujarit ([email protected]).

Trained Datasets

Dataset Name

Instruction Pairs

Descriptions

Thaweewat/alpaca-finance-43k-th

43,000

Alpaca Finance Instruction translated into Thai by Thaweewat Ruksujarit.

kobkrit/rd-taxqa

600

RD's Tax QA Chatbot Training set by ทรงวุฒิ บุรงค์

datasets/iapp_wiki_qa_squad

4,000

iApp Technology's Extractive QA Dataset in Thai language

Thaweewat/databricks-dolly-15k-th

15,000

Databrick's Dolly Instruction translated into Thai by Thaweewat Ruksujarit.

Thaweewat/instruction-wild-52k-th

52,000

Instruction Wild's translated into Thai by Thaweewat Ruksujarit.

Thaweewat/alpaca-cleaned-52k-th

51,000

Standford Alpaca's translated into Thai by Thaweewat Ruksujarit.

Thaweewat/gpteacher-20k-th

20,000

GPT Teacher's Instruction translated into Thai by Thaweewat Ruksujarit.

Thaweewat/onet-m6-social

600

ONET m6 Social Exam

datasets/Thaweewat/hc3-24k-th

24,000

Hello Simple AI Summary Dataset translated into Thai by Thaweewat Ruksujarit.

OpenThaiGPT Self Instruct (https://docs.google.com/spreadsheets/d/1BSHkpRyD5RH90E85tLWe4UzpgfDHZafE2rKxLincyWI/edit?usp=sharing)

5,000

Thai SelfInstruct Dataset (Automatic Generated) by OpenThaiGPT

---

Version 0.1.0-alpha (ByT5-XL Model)

Release date: 24 April 2023 PoC Testing Website: https://colab.research.google.com/drive/1Uds0ioOZSZrJ9m2FgW3DHlqVRFNHVRtu#scrollTo=qPJIpwuz4ltF Model and Weight: https://huggingface.co/kobkrit/openthaigpt-0.1.0-alpha PIP Installation Page: https://pypi.org/project/openthaigpt/ Code Example: https://colab.research.google.com/drive/1Uds0ioOZSZrJ9m2FgW3DHlqVRFNHVRtu#scrollTo=qPJIpwuz4ltF ----

OpenThaiGPT version 0.1.0-alpha

Thai First 3 billion params models

First Thai Byte-Level Text-to-Text Transfer Transformer
Support Instruction following
- Translation to Thai
- Explanation
- Paraphase
Zero-shot and Few-shot Learning
Pretraining Model: ByT5-XL (3.74 billion params)
InstructDataset: 50,000 Thai SelfInstruct
RLHF: None
Developer: Sumeth Yuenyong, Kobkrit Viriyayudhakorn ([email protected])

PoC Version 0.0.4 (The Fourth PoC Version)

Release date: 12 March 2023 PoC Testing Website: https://colab.research.google.com/drive/13yLIifBRDQp82QO4ICs_aEvz0N8tqVPm?usp=sharin Model and Weight: https://huggingface.co/kobkrit/openthaigpt-gpt2-instructgpt-poc-0.0.4 PIP Installation Page: https://pypi.org/project/openthaigpt/ Code Example: https://github.com/OpenThaiGPT/openthaigpt-example ----

OpenThaiGPT version 0.0.4

The Fourth PoC Model

ตอบคำถามได้ลงรายละเอียดมากขึ้น และตอบคำถามได้ดีขึ้นกว่า 0.0.3 เป็นส่วนมาก
Pretraining Model: GPT-2 Thai-base
InstructDataset: 300,000 Pantip + 5,000 Wiki QA => 12,920 Thai InstructGPT
RLHF: None
Developer: Kobkrit Viriyayudhakorn ([email protected])

PoC Version 0.0.3 (The Third PoC Version)

Release date: 28 February 2023 Model and Weight: https://huggingface.co/kobkrit/openthaigpt-gpt2-instructgpt-poc-0.0.3 PIP Installation Page: https://pypi.org/project/openthaigpt/ Code Example: https://github.com/OpenThaiGPT/openthaigpt-example ----

OpenThaiGPT version 0.0.3

The Third PoC Model

Pretraining Model: GPT-2 Thai-base
InstructDataset: 300,000 Pantip + 5,000 Wiki QA => 7,000 Thai InstructGPT
RLHF: None
Developer: Kobkrit Viriyayudhakorn ([email protected])

PoC Version 0.0.2 (The Second PoC Version)

Release date: 27 February 2023 Model and Weight: https://huggingface.co/kobkrit/openthaigpt-gpt2-instructgpt-poc-0.0.2 PIP Installation Page: {Coming Soon} Colab Example: {Coming Soon} ----

OpenThaiGPT version 0.0.2

The Second PoC Model

Pretraining Model: GPT-2 Thai-base
InstructDataset: 7,000 Thai InstructGPT
RLHF: None

Developer: Kobkrit Viriyayudhakorn ([email protected])

PoC Version 0.0.1 (Very First PoC Version)

Release date: 20 February 2023 Model and Weight: openthaigpt-gpt2-pantipwiki-poc PIP Installation Page: {Coming Soon} Colab Example: {Coming Soon} ----

The Very First PoC Model

Pretraining Model: GPT-2 Thai-base
InstructDataset: 298,678 QA Pairs getting from 70,000 Pantip katoos + Wikipedia QA by iApp
RLHF: None
Developer: Kobkrit Viriyayudhakorn ([email protected])

PreviousFirst Meet Up (25 Feb 2023)!NextReleased Models Version <1.0.0-alpha> (03/08/23)

Last updated 1 year ago

Was this helpful?