🔥Released Models Version <0.1.0-beta> (16/05/23)
Last updated
Last updated
Release date: 16 May 2023
OpenThaiGPT Version 0.1.0-beta is a 7B-parameter LLaMA model finetuned to follow Thai translated instructions below and makes use of the Huggingface LLaMA implementation.
Number of parameters: 7B Dimension: 4096 Max Length Token: 2048 n heads: 32 n layers: 32 n tokens: 1T
Source Code: License Apache Software License 2.0. Weight: For research use only (due to the Facebook LLama's Weight LICENSE). Note that: A commercial use license for OpenThaiGPT 0.1.0 weight will be released later soon!
Finetune Code: https://github.com/OpenThaiGPT/openthaigpt-finetune-010beta Inference Library: https://github.com/OpenThaiGPT/openthaigpt Weight (Lora Adapter): https://huggingface.co/kobkrit/openthaigpt-0.1.0-beta
Kobkrit Viriyayudhakorn (kobkrit@aieat.or.th), Sumeth Yuenyong (sumeth.yue@mahidol.edu) and Thaweewat Rugsujarit (thaweewr@scg.com).
---
Release date: 24 April 2023 PoC Testing Website: https://colab.research.google.com/drive/1Uds0ioOZSZrJ9m2FgW3DHlqVRFNHVRtu#scrollTo=qPJIpwuz4ltF Model and Weight: https://huggingface.co/kobkrit/openthaigpt-0.1.0-alpha PIP Installation Page: https://pypi.org/project/openthaigpt/ Code Example: https://colab.research.google.com/drive/1Uds0ioOZSZrJ9m2FgW3DHlqVRFNHVRtu#scrollTo=qPJIpwuz4ltF ----
OpenThaiGPT version 0.1.0-alpha
Thai First 3 billion params models
First Thai Byte-Level Text-to-Text Transfer Transformer
Support Instruction following
Translation to Thai
Explanation
Paraphase
Zero-shot and Few-shot Learning
Pretraining Model: ByT5-XL (3.74 billion params)
InstructDataset: 50,000 Thai SelfInstruct
RLHF: None
Developer: Sumeth Yuenyong, Kobkrit Viriyayudhakorn (kobkrit@iapp.co.th)
Release date: 12 March 2023 PoC Testing Website: https://colab.research.google.com/drive/13yLIifBRDQp82QO4ICs_aEvz0N8tqVPm?usp=sharin Model and Weight: https://huggingface.co/kobkrit/openthaigpt-gpt2-instructgpt-poc-0.0.4 PIP Installation Page: https://pypi.org/project/openthaigpt/ Code Example: https://github.com/OpenThaiGPT/openthaigpt-example ----
OpenThaiGPT version 0.0.4
The Fourth PoC Model
ตอบคำถามได้ลงรายละเอียดมากขึ้น และตอบคำถามได้ดีขึ้นกว่า 0.0.3 เป็นส่วนมาก
Pretraining Model: GPT-2 Thai-base
InstructDataset: 300,000 Pantip + 5,000 Wiki QA => 12,920 Thai InstructGPT
RLHF: None
Developer: Kobkrit Viriyayudhakorn (kobkrit@iapp.co.th)
Release date: 28 February 2023 Model and Weight: https://huggingface.co/kobkrit/openthaigpt-gpt2-instructgpt-poc-0.0.3 PIP Installation Page: https://pypi.org/project/openthaigpt/ Code Example: https://github.com/OpenThaiGPT/openthaigpt-example ----
OpenThaiGPT version 0.0.3
The Third PoC Model
Pretraining Model: GPT-2 Thai-base
InstructDataset: 300,000 Pantip + 5,000 Wiki QA => 7,000 Thai InstructGPT
RLHF: None
Developer: Kobkrit Viriyayudhakorn (kobkrit@iapp.co.th)
Release date: 27 February 2023 Model and Weight: https://huggingface.co/kobkrit/openthaigpt-gpt2-instructgpt-poc-0.0.2 PIP Installation Page: {Coming Soon} Colab Example: {Coming Soon} ----
OpenThaiGPT version 0.0.2
The Second PoC Model
Pretraining Model: GPT-2 Thai-base
InstructDataset: 7,000 Thai InstructGPT
RLHF: None
Developer: Kobkrit Viriyayudhakorn (kobkrit@iapp.co.th)
Release date: 20 February 2023 Model and Weight: openthaigpt-gpt2-pantipwiki-poc PIP Installation Page: {Coming Soon} Colab Example: {Coming Soon} ----
The Very First PoC Model
Pretraining Model: GPT-2 Thai-base
InstructDataset: 298,678 QA Pairs getting from 70,000 Pantip katoos + Wikipedia QA by iApp
RLHF: None
Developer: Kobkrit Viriyayudhakorn (kobkrit@iapp.co.th)
Dataset Name | Instruction Pairs | Descriptions |
---|---|---|
43,000
Alpaca Finance Instruction translated into Thai by Thaweewat Ruksujarit.
600
RD's Tax QA Chatbot Training set by ทรงวุฒิ บุรงค์
4,000
iApp Technology's Extractive QA Dataset in Thai language
15,000
Databrick's Dolly Instruction translated into Thai by Thaweewat Ruksujarit.
52,000
Instruction Wild's translated into Thai by Thaweewat Ruksujarit.
51,000
Standford Alpaca's translated into Thai by Thaweewat Ruksujarit.
20,000
GPT Teacher's Instruction translated into Thai by Thaweewat Ruksujarit.
600
ONET m6 Social Exam
24,000
Hello Simple AI Summary Dataset translated into Thai by Thaweewat Ruksujarit.
OpenThaiGPT Self Instruct (https://docs.google.com/spreadsheets/d/1BSHkpRyD5RH90E85tLWe4UzpgfDHZafE2rKxLincyWI/edit?usp=sharing)
5,000
Thai SelfInstruct Dataset (Automatic Generated) by OpenThaiGPT