A Gymnasium (formerly OpenAI Gym) environment for the classic game of Nim, designed for multi-agent reinforcement learning experiments.
Nim is one of the oldest mathematical strategy games, dating back centuries. It's a perfect game for learning reinforcement learning because:
- Simple Rules: Easy to understand but strategically deep
- Perfect Information: Both players see the complete game state
- Mathematical Beauty: Has an elegant solution using binary XOR
- Quick Games: Episodes typically last 10-20 moves
- Multi-Agent Learning: Agents can learn by playing against themselves
Two players take turns removing pieces from piles:
Pile 0: ●●●●●●● (7 pieces)
Pile 1: ●●●●● (5 pieces)
Pile 2: ●●● (3 pieces)
Player 1's turn → Remove 1-3 pieces from any ONE pile
The player who takes the last piece loses (normal play convention).
- State Space: Dictionary with board configuration and current player
{'board': array([7, 5, 3]), 'on_move': 1}
- Action Space:
[pile_index, pieces_to_take]
where:pile_index
: 0, 1, or 2 (which pile to take from)pieces_to_take
: 1, 2, or 3 (how many to remove)
- Rewards:
-1
: Losing the game (taking the last piece)-2
: Making an illegal move0
: All other moves
- Players: Alternates between Player 1 and Player 2
- Python 3.9-3.12 (tested on all versions)
- Gymnasium 1.0+ (modern RL library)
- NumPy (for array operations)
- pytest (for running tests)
# Clone the repository
git clone https://github.com/nczempin/gym-nim.git
cd gym-nim
# Install in development mode
pip install -e .
# Install test dependencies
pip install pytest pytest-cov
import gymnasium as gym
import gym_nim
# Create and reset the environment
env = gym.make('nim-v0')
state, info = env.reset()
print(state) # {'board': array([7, 5, 3]), 'on_move': 1}
# Player 1 takes 2 pieces from pile 0
state, reward, terminated, truncated, info = env.step([0, 2])
env.render()
# Player 2's turn
# Piles: [0]:5 [1]:5 [2]:3
# Get all legal moves for current player
moves = env.unwrapped.move_generator()
print(f"Legal moves: {moves}")
# Play until game ends
while not terminated:
# Simple strategy: take first legal move
if moves:
action = moves[0]
state, reward, terminated, truncated, info = env.step(action)
moves = env.unwrapped.move_generator()
print(f"Game over! Player {state['on_move']} loses with reward {reward}")
The examples/
directory contains two demonstration agents:
Shows two random agents playing against each other:
python examples/random_nim.py
Trains an agent using tabular Q-learning through self-play:
python examples/qtable.py
# Run all tests with coverage
python tests.py
# Or use pytest directly
pytest tests/ -v --cov=gym_nim
Test in an isolated environment:
docker build -t gym-nim .
docker run --rm gym-nim
The main class NimEnv
provides these methods:
reset(seed=None, options=None)
: Start a new game, returns(state, info)
step(action)
: Take an action, returns(state, reward, terminated, truncated, info)
render()
: Display the current game statemove_generator()
: Get all legal moves (access viaenv.unwrapped.move_generator()
)set_board(board)
: Set a custom board position (access viaenv.unwrapped.set_board()
)
- Gymnasium Wrappers: Gymnasium wraps environments in safety wrappers. To access custom methods like
move_generator()
andset_board()
, useenv.unwrapped
- API Changes: This environment follows the Gymnasium API where
reset()
returns a tuple andstep()
returns 5 values includingterminated
andtruncated
See the API documentation for detailed method descriptions and examples.
Version 0.2.0 migrated from OpenAI Gym to Gymnasium. If you're upgrading:
-
Import changes:
# Old import gym # New import gymnasium as gym
-
Reset returns a tuple:
# Old state = env.reset() # New state, info = env.reset()
-
Step returns 5 values:
# Old state, reward, done, info = env.step(action) # New state, reward, terminated, truncated, info = env.step(action)
-
Custom methods need unwrapped:
# Old moves = env.move_generator() # New moves = env.unwrapped.move_generator()
- Variable pile configurations
- Different game variants (misère vs normal play)
- Advanced opponent strategies for training
- Parallel environment support for faster training