Let's build a Chess game using small and local Large Language Models

So you can play on your phone, computer or microwave (if that is your thing)

What is this repo about?

In this repository you will learn to fine-tune a small LLM to play chess and deploy it locally in an iOS app.

We will use the LiquidAI family of open-weight LFM2 small-frontier models (less than 1B parameters) to embed super-specialized task-specific intelligence on the edge using the LeapSDK for iOS.

Now, before getting into the technical details (aka the HOWs), let me tell you the WHATs and WHYs.

Why a game?

Mobile games are all about immersing the player in a virtual world, and customizing the experience to the player to increase engagement and retention.

Large Language Models on the other hand, are great at generating text, images or audio dynamically, based on the player's actions and the game state.

This makes LLMs perfect for mobile games... Well, there is one catch.

If the LLMs need to run on remote servers, the whole thing becomes super-expensive and super-slow.

So it is not worth it. Unless you deploy the LLM on the client device.

Why local and small LLMs are the future (and present)?

Let me tell you a story. It will be short, don't worry.

I used to work as a data scientist and ML engineer at Nordeus, creator and producer of one of the most successful mobile games of all time: "Top Eleven". The game (like all games out there) is designed to immerse the player in a virtual world. In this case a world where you (the player) are a soccer manager, and you have to

train your squad
compete against other managers in a live match.
make player transfers
etc.

Every day, several million players around the world were logging into the game, looking for their daily dose of soccer.

Average playtime per day was something around 15 minutes.

So, can you imagine how many request and tokens flying-in-and-out of your LLM server would you need to generate to keep several million players happy for 15 minutes?

I haven't calculated it, but I can tell you that it would be a lot. Too much to make LLMs running on remote servers worth it.

Hence, the only cost-effective way to use LLMs in this context is to specialize them for the task we want them to solve (e.g. real time soccer gama narration) and make them small enough so they can run on the player phone.

This is what makes local and small LLMs the future (and present) for mobile games.

What if you don't care about games?

In this project we build a chess game, but the need for local and private LLMs deployments is relevant for many other problems, especially when you need to handle sensitive data or work reliably offline.

Imagine for example financial apps, insurance apps, health care apps, etc. In these cases, the data is sensitive and you don't want to (or even can't) send it to a remote server.

Or self-driving cars and robots, that need to work reliably offline. You don't want your car to start having problems when you are driving in the middle of nowhere.

In any case, the need for local and private LLMs deployments is becoming more and more relevant.

Because the future of LLMs is not super-expensive GPT-524 running on a data-center the size of North Dakota.

The future of LLMs is high-quality-high-performance-low-latency-task-specific-low-cost-private-data running on your phone, computer, robot, watch, tablet, raspberry pi or microwave, if that is your thing.

Steps

Let me now walk you over the main steps I followed (and you can follow too!) to build this project.

The steps to create this project are:

Download 7k historical chess games played by the great Magnus Carlsen
Process raw data into an instruction dataset for supervised fine-tuning
Fine-tune LFM2-350M to imitate Magnus (aka LFM2-350M-MagnusInstruct)
Evaluate the model
Bundle the model with leap-bundle
Create a simple iOS game and embed our fine-tuned LLM using the LeapSDK for iOS

Steps 1 to 5 are implemented inside the fine-tune directory. The iOS app is inside ChessChat

Let me go over each of these steps, step by step :-)

1. Download chess moves dataset

cd fine-tune && make download-magnus-games

We fetch 7k historical games played by the great Magnus Carlsen from this URL.

This will download the data to fine-tune/data/raw/Carlsen.zip and unzip it to fine-tune/data/raw/Carlsen.pgn. The data is in PGN format, which is a standard format for chess games.

Feel free to expand the dataset to include more games by downloading other players' games from this URL.

2. Generate instruction dataset

cd fine-tune && make instruction-dataset

For each PGN file we downloaded, we extract

the game states,
the previous 5 moves
the set of valid moves for the current game state, and
the next move picked by Magnus.

and store the processed data in a JSON file.

For example, the extracted data for Magnus Carlsen is stored in fine-tune/data/processed/Carlsen.json.

3. Fine-tune LFM2-350M to imitate Magnus (or any other player you got data for)

cd fine-tune && make fine-tune

We used supervised fine-tuning to train the model to "imitate" Magnus.

The chat template we used is the one used by the LiquidAI team to do general fine-tuning of LFM2-350M, so we don't erase abilities the model had before our fine-tuning.

<|startoftext|><|im_start|>system
You are a helpful assistant trained by Liquid AI.<|im_end|>
<|im_start|>user
...user prompt goes here...<|im_end|>
<|im_start|>assistant
...assistant response goes here...<|im_end|>

In the user prompt we include

the game state
the previous 5 moves, and

the set of valid moves for the current game state, to help the LLM output a valid move

# see `fine-tune/src/fine_tune/prompt_template.py`
CHESS_PROMPT_TEMPLATE = """
You are the great Magnus Carlsen.
Your task is to make the best move in the given game state.

Game state: {{ game_state }}
Last 5 moves: {{ last_5_moves_uci }}
Valid moves: {{ valid_moves }}

Your next move should be in UCI format (e.g., 'e2e4', 'f8c8').
Make sure your next move is one of the valid moves.
"""

Now, to run the fine-tuning script you NEED at least ONE GPU. So if you don't have one you need to rent one.

💡 How to get a GPU for LLM training work

In my experience, the fastest and most ergonomic way to use a GPU for LLM training work is Modal serverless platform.

With Modal you work locally, but your code runs on a remote GPU.

You define your:

training job hardware

python dependencies and

training logic

ALL in Python. So you don't need to worry about pushing Docker images or switching on an on off pods.

This is not an ad. This is just my personal preference.

In terms of libraries, we use the standard SFTTrainer from the trl library, with Unsloth for faster training.

Feel free to play with the configuration parameters in the fine-tune/src/fine_tune/config.py file.

In a nutshell 🌰

I ran training for 10k steps and saw no significant improvement in the model's performance when using LiquidAI/LFM2-700M compared to LiquidAI/LFM2-350M. As I deeply care about minimalism, I sticked to the smaller model to ease deployment.

At the end of the day, the model checkpoint I decided to use is LFM2-350M-r16-20250902-232247/checkpoint-5000, which we can friendly call LFM2-350M-MagnusInstruct.

💡 Wanna see all the experiments?

Click here to see all the experiments I ran and logged to Weights & Biases.

4. Evaluate LFM2-350M-MagnusInstruct on real game play

cd fine-tune && make evaluate

Warning ⚠️

It is important to note that the model is not trained to "think" like a chess player.

It is trained to "imitate" Magnus by predicting the next move based on the previous moves and the game state.

And the thing is, chess is a highly-dimenstional state space game. Which means that to play chess you need way more than memorization. You need to reason and plan your moves.

An LLM that predicts the next move well in the test (like our LFM2-350M-MagnusInstruct) is not necessarily a good chess player.

As a matter of fact, playing random against this LLM is actually a great way to win against it. Playing random exposes the LLM to a high-number of out-of-sample situation it has never observed during training and the model is not really trained to reason like good chess players do, combining long-term strategic play and short-term tactical play.

Because of this, training a reasoning LLMs using next moves and expert human strategy and tactis would work better. See this paper for further reference.

5. Bundle the model with the Leap Model Bundling CLI

cd fine-tune && make bundle-model

The model checkpoints we save during training are the lora weights we need to apply to the base model to get the fine-tuned model.

So, as a first step we need to merge the lora weights with the base model to get the fine-tuned model. This is what the merge-model Makefile target does.

Then, we use the leap-bundle CLI to bundle the model into a .bundle file.

The result is a .bundle file that contains the fine-tuned model and tokenizer, that can be deployed on mobile devices (both Android and iOS) using the Leap Edge SDK.

6. Embed the model into an iOS Chess app with the Leap Edge SDK

But, what is the Leap Edge SDK?

The Leap Edge SDK is a collection of libraries and tools to help you deploy and run LLMs on mobile devices.

With Leap we can embed the model bundle into the iOS app, which means we can run the model on the device without having to send the data to a remote server.

Ok, enough talking. Let's open the app in Xcode. Yes, you need to have Xcode installed to follow these steps.

cd ChessChat && make open

If you want to create this project from scratch, you can follow the steps below.

Add the LeapSDK, ChessKit and ChessboardKit packages to the project.
Create a Resources folder and add the model bundle LFM2-350M-MagnusInstruct.bundle.
Create a Models folder and add the Player.swift file.
Create a Views folder and add the ContentView.swift and Board.swift files.

The ChessKit(github here) and ChessboardKit(github here) packages are used to handle the chess logic and the chessboard UI.

The Player model handles the LLM loading and inference logic.

The final product is a nice looking chess app you can run offline on your phone.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
ChessChat		ChessChat
fine-tune		fine-tune
media		media
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Let's build a Chess game using small and local Large Language Models

So you can play on your phone, computer or microwave (if that is your thing)

Table of contents

What is this repo about?

Why a game?

Why local and small LLMs are the future (and present)?

What if you don't care about games?

Steps

1. Download chess moves dataset

2. Generate instruction dataset

3. Fine-tune LFM2-350M to imitate Magnus (or any other player you got data for)

In a nutshell 🌰

4. Evaluate LFM2-350M-MagnusInstruct on real game play

Warning ⚠️

5. Bundle the model with the Leap Model Bundling CLI

6. Embed the model into an iOS Chess app with the Leap Edge SDK

Want to learn more Real World LLM engineering?

About

Uh oh!

Releases

Packages

Languages

Paulescu/chess-game

Folders and files

Latest commit

History

Repository files navigation

Let's build a Chess game using small and local Large Language Models

So you can play on your phone, computer or microwave (if that is your thing)

Table of contents

What is this repo about?

Why a game?

Why local and small LLMs are the future (and present)?

What if you don't care about games?

Steps

1. Download chess moves dataset

2. Generate instruction dataset

3. Fine-tune LFM2-350M to imitate Magnus (or any other player you got data for)

In a nutshell 🌰

4. Evaluate LFM2-350M-MagnusInstruct on real game play

Warning ⚠️

5. Bundle the model with the Leap Model Bundling CLI

6. Embed the model into an iOS Chess app with the Leap Edge SDK

Want to learn more Real World LLM engineering?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages