OpenThaiGPT
  • 🏠ยินดีต้อนรับสู่ OpenThaiGPT 1.6 และ OpenThaiGPT R1
  • 📚OpenThaiRAG
  • 🎇Web Demo!
  • ▶️Colab Demo!
  • 🔥OpenThaiGPT 1.0.0 <8 Apr 2024>
  • ❤️องค์กรที่ร่วมสนับสนุน (Sponsors)
  • 🤟ทีมอาสาสมัคร (Volunteer)
  • กฎกติกาความร่วมมือ (Rules)
  • ร่วมกับเรา (Join Us)
  • License
  • Previous Versions and Resources
    • 💻Released Code / Colabs
      • Pretraining LLM
      • 🆕InstructGPT Finetuning
      • Reinforcement Learning with Human Feedback (RLHF)
    • 😍การช่วยกันสร้าง Dataset สนทนา Chat ภาษาไทย!
    • 📄Released Datasets (14/04/23)
    • 📦Released OpenThaiGPT Pip Python Library <0.1.1> (26/05/23)
    • 🔥Released OpenThaiGPT 7b <1.0.0-beta> (16/08/23)
    • 🔥Released OpenThaiGPT 13b <1.0.0-beta> (20/12/23)
    • แผนการดำเนินงาน Roadmap
    • Open Resources
      • 🆕Free Working Datasets
      • Related Paper / Knowledge
      • Computing Resources
    • Previous Events
      • 🥳OpenThaiGPT Meet Up #2
      • 🆕อัพเดท! จากทีม Finetune (8 Apr)
      • Core-team Volunteer Meeting 19 March 15:30
      • Finetuning / RLHF Volunteer Event (18 March)
      • Safety Net Volunteer Event (12 March 19:00-19:45)
      • Pre-training Volunteer Event (11 March 19:00-20:15)
      • Volunteer Meetup #1 (Zoom) 5 March 13:00
      • First Meet Up (25 Feb 2023)!
      • 🔥Released Models Version <0.1.0-beta> (16/05/23)
      • 🔥Released Models Version <1.0.0-alpha> (03/08/23)
    • ChatGPT สร้างขึ้นมาได้อย่างไร (How to build ChatGPT?)
    • OpenThaiGPT Version 1.0
    • OpenThaiGPT 1.5
Powered by GitBook
On this page
  • Web Demo:
  • Colab Demo:
  • Change Logs
  • 🇹🇭 Version 1.0.0-beta (Llama v2 + 24,554 Thai word extension)
  • License
  • Code and Weight
  • 🇹🇭 Version 1.0.0-alpha (Facebook LLama V2 Model)
  • Changes
  • License
  • Code and Weight
  • Authors
  • Version 0.1.0-beta (Facebook LLama Model)
  • License
  • Code and Weight
  • Authors
  • Trained Datasets
  • Version 0.1.0-alpha (ByT5-XL Model)
  • PoC Version 0.0.4 (The Fourth PoC Version)
  • PoC Version 0.0.3 (The Third PoC Version)
  • PoC Version 0.0.2 (The Second PoC Version)
  • PoC Version 0.0.1 (Very First PoC Version)

Was this helpful?

Export as PDF
  1. Previous Versions and Resources

Released OpenThaiGPT 7b <1.0.0-beta> (16/08/23)

PreviousReleased OpenThaiGPT Pip Python Library <0.1.1> (26/05/23)NextReleased OpenThaiGPT 13b <1.0.0-beta> (20/12/23)

Last updated 1 year ago

Was this helpful?

🇹🇭 OpenThaiGPT 1.0.0-beta (16 August 2023)

🇹🇭 OpenThaiGPT Version 1.0.0-beta is a Thai language 7B-parameter LLaMA v2 Chat model finetuned to follow Thai translated instructions and extend more than 24,554 most popular Thai words vocabularies into LLM's dictionary for turbo speed.

Web Demo:

Colab Demo:

Change Logs

🇹🇭 Version 1.0.0-beta (Llama v2 + 24,554 Thai word extension)

Release date: 16 August 2023

🇹🇭 OpenThaiGPT Version 1.0.0-beta is a Thai language 7B-parameter LLaMA v2 Chat model finetuned to follow Thai translated instructions and extend 24,554 Thai words vocabularies for turbo speed.

License

Source Code: License Apache Software License 2.0. Weight: Research and Commercial uses.

Code and Weight

Authors

  • Kobkrit Viriyayudhakorn (kobkrit@aieat.or.th)

  • Sumeth Yuenyong (sumeth.yue@mahidol.edu)

  • Prachya Boonkwan (prachya.boonkwan@nectec.or.th, kaamanita@gmail.com)

  • Thaweewat Rugsujarit (thaweewr@scg.com)

  • Jillaphat Jaroenkantasima (autsadang41@gmail.com)

  • Norapat Buppodom (new@norapat.com)

  • Koravich Sangkaew (kwankoravich@gmail.com)

  • Peerawat Rojratchadakorn (peerawat.roj@gmail.com)

  • Surapon Nonesung (nonesungsurapon@gmail.com)

  • Chanon Utupon (chanon.utupon@gmail.com)

  • Sadhis Wongprayoon (sadhis.tae@gmail.com)

  • Nucharee Thongthungwong (nuchhub@hotmail.com)

  • Chawakorn Phiantham (mondcha1507@gmail.com)

  • Patteera Triamamornwooth (patt.patteera@gmail.com)

  • Nattarika Juntarapaoraya (natt.juntara@gmail.com)

  • Kriangkrai Saetan (kraitan.ss21@gmail.com)

  • Pitikorn Khlaisamniang (pitikorn32@gmail.com)

  • Teerapol Saengsukhiran (winroom@gmail.com)

  • Phasin Aumwong (phasin03895@gmail.com)

---

🇹🇭 Version 1.0.0-alpha (Facebook LLama V2 Model)

Release date: 3 August 2023

🇹🇭 OpenThaiGPT Version 1.0.0-alpha is the first Thai implementation of a 7B-parameter LLaMA v2 Chat model finetuned to follow Thai translated instructions and makes use of the Huggingface LLaMA implementation.

Changes

(1) Using Facebook LLama v2 model 7b chat as a base model which is pretrained on over 2 trillion token. (2) Context Length is upgrade from 2048 token to 4096 token (3) Allow research and commerical use.

License

Source Code: License Apache Software License 2.0. Weight: Research and commercial uses.

Code and Weight

Authors

  • Kobkrit Viriyayudhakorn (kobkrit@aieat.or.th)

  • Sumeth Yuenyong (sumeth.yue@mahidol.edu)

  • Thaweewat Rugsujarit (thaweewr@scg.com)

  • Jillaphat Jaroenkantasima (autsadang41@gmail.com)

  • Norapat Buppodom (new@norapat.com)

  • Koravich Sangkaew (kwankoravich@gmail.com)

  • Peerawat Rojratchadakorn (peerawat.roj@gmail.com)

  • Surapon Nonesung (nonesungsurapon@gmail.com)

  • Chanon Utupon (chanon.utupon@gmail.com)

  • Sadhis Wongprayoon (sadhis.tae@gmail.com)

  • Nucharee Thongthungwong (nuchhub@hotmail.com)

  • Chawakorn Phiantham (mondcha1507@gmail.com)

  • Patteera Triamamornwooth (patt.patteera@gmail.com)

  • Nattarika Juntarapaoraya (natt.juntara@gmail.com)

  • Kriangkrai Saetan (kraitan.ss21@gmail.com)

  • Pitikorn Khlaisamniang (pitikorn32@gmail.com)

  • Teerapol Saengsukhiran (winroom@gmail.com)

  • Phasin Aumwong (phasin03895@gmail.com)

---

Version 0.1.0-beta (Facebook LLama Model)

Release date: 16 May 2023

OpenThaiGPT Version 0.1.0-beta is a 7B-parameter LLaMA model finetuned to follow Thai translated instructions below and makes use of the Huggingface LLaMA implementation.

Statistics

Number of parameters: 7B Dimension: 4096 Context Length: 2048 n heads: 32 n layers: 32 n tokens: 1T

License

Source Code: License Apache Software License 2.0. Weight: For research use only (due to the Facebook LLama's Weight LICENSE). Note that: A commercial use license for OpenThaiGPT 0.1.0 weight will be released later soon!

Code and Weight

Authors

Kobkrit Viriyayudhakorn (kobkrit@aieat.or.th), Sumeth Yuenyong (sumeth.yue@mahidol.edu) and Thaweewat Rugsujarit (thaweewr@scg.com).

Trained Datasets

Dataset Name
Instruction Pairs
Descriptions

43,000

Alpaca Finance Instruction translated into Thai by Thaweewat Ruksujarit.

600

RD's Tax QA Chatbot Training set by ทรงวุฒิ บุรงค์

4,000

iApp Technology's Extractive QA Dataset in Thai language

15,000

Databrick's Dolly Instruction translated into Thai by Thaweewat Ruksujarit.

52,000

Instruction Wild's translated into Thai by Thaweewat Ruksujarit.

51,000

Standford Alpaca's translated into Thai by Thaweewat Ruksujarit.

20,000

GPT Teacher's Instruction translated into Thai by Thaweewat Ruksujarit.

600

ONET m6 Social Exam

24,000

Hello Simple AI Summary Dataset translated into Thai by Thaweewat Ruksujarit.

5,000

Thai SelfInstruct Dataset (Automatic Generated) by OpenThaiGPT

---

Version 0.1.0-alpha (ByT5-XL Model)

OpenThaiGPT version 0.1.0-alpha

Thai First 3 billion params models

  • First Thai Byte-Level Text-to-Text Transfer Transformer

  • Support Instruction following

    • Translation to Thai

    • Explanation

    • Paraphase

  • Zero-shot and Few-shot Learning

  • Pretraining Model: ByT5-XL (3.74 billion params)

  • InstructDataset: 50,000 Thai SelfInstruct

  • RLHF: None

  • Developer: Sumeth Yuenyong, Kobkrit Viriyayudhakorn (kobkrit@iapp.co.th)

PoC Version 0.0.4 (The Fourth PoC Version)

OpenThaiGPT version 0.0.4

The Fourth PoC Model

  • ตอบคำถามได้ลงรายละเอียดมากขึ้น และตอบคำถามได้ดีขึ้นกว่า 0.0.3 เป็นส่วนมาก

  • Pretraining Model: GPT-2 Thai-base

  • InstructDataset: 300,000 Pantip + 5,000 Wiki QA => 12,920 Thai InstructGPT

  • RLHF: None

  • Developer: Kobkrit Viriyayudhakorn (kobkrit@iapp.co.th)

PoC Version 0.0.3 (The Third PoC Version)

OpenThaiGPT version 0.0.3

The Third PoC Model

  • Pretraining Model: GPT-2 Thai-base

  • InstructDataset: 300,000 Pantip + 5,000 Wiki QA => 7,000 Thai InstructGPT

  • RLHF: None

  • Developer: Kobkrit Viriyayudhakorn (kobkrit@iapp.co.th)

PoC Version 0.0.2 (The Second PoC Version)

OpenThaiGPT version 0.0.2

The Second PoC Model

  • Pretraining Model: GPT-2 Thai-base

  • InstructDataset: 7,000 Thai InstructGPT

  • RLHF: None

Developer: Kobkrit Viriyayudhakorn (kobkrit@iapp.co.th)

PoC Version 0.0.1 (Very First PoC Version)

The Very First PoC Model

  • Pretraining Model: GPT-2 Thai-base

  • InstructDataset: 298,678 QA Pairs getting from 70,000 Pantip katoos + Wikipedia QA by iApp

  • RLHF: None

  • Developer: Kobkrit Viriyayudhakorn (kobkrit@iapp.co.th)

Finetune Code: Inference Code: Weight:

Colab Demo: Finetune Code: (Same code as OpenThaiGPT 0.1.0-beta) Inference Library: Weight (Lora Adapter): Weight (Huggingface Checkpoint):

Finetune Code: Inference Library: Weight (Lora Adapter):

OpenThaiGPT Self Instruct ()

Release date: 24 April 2023 PoC Testing Website: Model and Weight: PIP Installation Page: Code Example: ----

Release date: 12 March 2023 PoC Testing Website: Model and Weight: PIP Installation Page: Code Example: ----

Release date: 28 February 2023 Model and Weight: PIP Installation Page: Code Example: ----

Release date: 27 February 2023 Model and Weight: PIP Installation Page: {Coming Soon} Colab Example: {Coming Soon} ----

Release date: 20 February 2023 Model and Weight: PIP Installation Page: {Coming Soon} Colab Example: {Coming Soon} ----

🔥
https://github.com/OpenThaiGPT/openthaigpt-finetune-010beta
https://github.com/OpenThaiGPT/openthaigpt
https://huggingface.co/openthaigpt/openthaigpt-1.0.0-beta-7b-chat
https://colab.research.google.com/drive/1kDQidCtY9lDpk49i7P3JjLAcJM04lawu?usp=sharing
https://github.com/OpenThaiGPT/openthaigpt-finetune-010beta
https://github.com/OpenThaiGPT/openthaigpt
https://huggingface.co/openthaigpt/openthaigpt-1.0.0-alpha-7b-chat
https://huggingface.co/openthaigpt/openthaigpt-1.0.0-alpha-7b-chat-ckpt-hf
https://github.com/OpenThaiGPT/openthaigpt-finetune-010beta
https://github.com/OpenThaiGPT/openthaigpt
https://huggingface.co/kobkrit/openthaigpt-0.1.0-beta
https://colab.research.google.com/drive/1Uds0ioOZSZrJ9m2FgW3DHlqVRFNHVRtu#scrollTo=qPJIpwuz4ltF
https://huggingface.co/kobkrit/openthaigpt-0.1.0-alpha
https://pypi.org/project/openthaigpt/
https://colab.research.google.com/drive/1Uds0ioOZSZrJ9m2FgW3DHlqVRFNHVRtu#scrollTo=qPJIpwuz4ltF
https://colab.research.google.com/drive/13yLIifBRDQp82QO4ICs_aEvz0N8tqVPm?usp=sharin
https://huggingface.co/kobkrit/openthaigpt-gpt2-instructgpt-poc-0.0.4
https://pypi.org/project/openthaigpt/
https://github.com/OpenThaiGPT/openthaigpt-example
https://huggingface.co/kobkrit/openthaigpt-gpt2-instructgpt-poc-0.0.3
https://pypi.org/project/openthaigpt/
https://github.com/OpenThaiGPT/openthaigpt-example
https://huggingface.co/kobkrit/openthaigpt-gpt2-instructgpt-poc-0.0.2
openthaigpt-gpt2-pantipwiki-poc
Thaweewat/alpaca-finance-43k-th
kobkrit/rd-taxqa
datasets/iapp_wiki_qa_squad
Thaweewat/databricks-dolly-15k-th
Thaweewat/instruction-wild-52k-th
Thaweewat/alpaca-cleaned-52k-th
Thaweewat/gpteacher-20k-th
Thaweewat/onet-m6-social
datasets/Thaweewat/hc3-24k-th
https://docs.google.com/spreadsheets/d/1BSHkpRyD5RH90E85tLWe4UzpgfDHZafE2rKxLincyWI/edit?usp=sharing
Gradio
https://demo-beta.openthaigpt.aieat.or.th/
Logo
Google Colaboratory
Logo