Skip to main content
Version: 🚧 Alpha 🚧

8. The GymAgent Bridge

By Killian Trouillet


How GAMA and Python Communicate​

The gama-gymnasium library works as follows:

  1. GAMA runs a simulation and exposes a WebSocket server.
  2. Python connects via the Gymnasium interface (gym.make(...)).
  3. Each simulation step, Python sends an action → GAMA returns an observation + reward.

The glue on the GAMA side is a special species called GymAgent.


The GymAgent Species​

This species is required by gama-gymnasium. It must contain exactly these attributes and the update_data action:

species GymAgent {
map<string, unknown> action_space;
map<string, unknown> observation_space;

list<float> state; // Current observation sent to Python
float reward; // Reward for this step
bool terminated; // True when goal is reached (episode ends)
bool truncated; // True when max steps exceeded (episode ends)
map<string, unknown> info; // Additional info (can be empty)

list<float> next_action; // Action received FROM Python

map<string, unknown> data; // Container read by the bridge

// Action to receive action values from Python (avoids list literal bug in expression API)
action set_action(float dx, float dy) {
next_action <- [dx, dy];
return "ok";
}

action update_data() {
data <- [
"State"::state,
"Reward"::reward,
"Terminated"::terminated,
"Truncated"::truncated,
"Info"::info
];
}
}

In short: each simulation cycle, GAMA waits for Python to send an action, executes one step, then returns the resulting observation and reward.

How it works, step by step​

Python GAMA
│ │
│ env.reset() │
│ ─────────────────────────────► │ Creates GymAgent, initializes state
│ ◄─── obs, info ────────────── │
│ │
│ env.step(action) │
│ ── action ────────────────────► │ Sets GymAgent.next_action = action
│ │ Runs one simulation step (reflex)
│ │ Forager reads next_action, moves
│ │ Forager computes obs, reward, done
│ │ Calls update_data
│ ◄─── obs, reward, done ────── │
│ │
│ (repeat until terminated) │

Attribute Reference​

AttributeTypeDirectionDescription
action_spacemapGAMA → PythonDefines what actions Python can send
observation_spacemapGAMA → PythonDefines the observation format
next_actionunknownPython → GAMAThe action chosen by the Python agent
stateunknownGAMA → PythonCurrent observation vector
rewardfloatGAMA → PythonReward for the last action
terminatedboolGAMA → PythonGoal reached?
truncatedboolGAMA → PythonTimed out?
infomapGAMA → PythonExtra debug data

Defining the Spaces​

In the global init, after creating the GymAgent, we define the action space and observation space:

init {
// ... obstacles and forager creation ...

create GymAgent;

// Action: 2 continuous values in [-1, 1]
GymAgent[0].action_space <- [
"type"::"Box",
"low"::[-1.0, -1.0],
"high"::[1.0, 1.0],
"dtype"::"float"
];

// Observation: 13 continuous values in [0, 1]
GymAgent[0].observation_space <- [
"type"::"Box",
"low"::list_with(13, 0.0),
"high"::list_with(13, 1.0),
"dtype"::"float"
];
}

Space Types​

The gama-gymnasium library supports two main space types:

TypeGAMA FormatPython EquivalentUse Case
Discrete["type"::"Discrete", "n"::4]Discrete(4)Grid actions (up/down/left/right)
Box["type"::"Box", "low"::[...], "high"::[...]]Box(low, high)Continuous values

In Part 1, we would have used Discrete(4). Here we use Box because our forager moves with a continuous velocity vector.

Why 13 Observation Values?​

We'll define the full observation vector in Step 9, but here's a preview:

Values 0-1Values 2Values 3-4Values 5-12
Agent position (x, y)Distance to foodDirection to food (cos, sin)8 obstacle sensors

The Main Loop​

The global reflex connects everything:

reflex main_loop {
// Guard: skip if Python hasn't sent an action yet (e.g. right after reset)
if (empty(GymAgent[0].next_action)) { return; }
ask forager[0] {
do apply_action(); // Read action from Python, move
do compute_observation(); // Build the observation vector
do compute_reward(); // Compute reward + check termination
}
}

The nil guard is critical. When GAMA starts, next_action is nil because Python hasn't sent anything yet. Without this check, the forager would try to read a non-existent action and fail.

Each simulation cycle:

  1. apply_action: Reads gym_agent.next_action, translates it to movement
  2. compute_observation: Builds the 13-value state vector, writes it to gym_agent.state
  3. compute_reward: Computes the reward, sets terminated/truncated, calls update_data

Complete Model​

The model at the end of this step adds GymAgent, the spaces, and the main_loop reflex to the Step 7 skeleton. The forager species actions (apply_action, compute_observation, compute_reward) are stubs — they will be fully implemented in Step 9.

/**
* Name: SmartForagerGym - Step 8: The GymAgent Bridge
* Author: Killian Trouillet
* Description: Adds the GymAgent bridge species, space definitions,
* and main reflex loop. Forager actions are stubs for now.
* Tags: reinforcement-learning, gymnasium, gymAgent, tutorial
*/

model SmartForagerGym

global {
float world_size <- 100.0;
point food_location <- {95.0, 95.0};
float food_radius <- 5.0;
list<geometry> obstacles <- [];
int max_steps <- 300;
int current_step <- 0;
int gama_server_port <- 0;

init {
obstacles << square(10) at_location {25.0, 25.0};
obstacles << square(10) at_location {35.0, 25.0};
obstacles << square(10) at_location {25.0, 35.0};
obstacles << square(10) at_location {65.0, 45.0};
obstacles << square(10) at_location {75.0, 45.0};
obstacles << square(10) at_location {75.0, 55.0};

create forager number: 1 {
location <- {5.0, 5.0};
}

// ── GymAgent bridge ──────────────────────────────────────────────
create GymAgent;

GymAgent[0].action_space <- [
"type"::"Box",
"low"::[-1.0, -1.0],
"high"::[1.0, 1.0],
"dtype"::"float"
];

GymAgent[0].observation_space <- [
"type"::"Box",
"low"::list_with(13, 0.0),
"high"::list_with(13, 1.0),
"dtype"::"float"
];
}

reflex main_loop {
ask forager[0] {
do apply_action();
do compute_observation();
do compute_reward();
}
}
}

// ── GymAgent bridge species (required by gama-gymnasium) ─────────────────────
species GymAgent {
map<string, unknown> action_space;
map<string, unknown> observation_space;

list<float> state;
float reward;
bool terminated;
bool truncated;
map<string, unknown> info;

list<float> next_action;
map<string, unknown> data;

// Action to receive action values from Python (avoids list literal bug in expression API)
action set_action(float dx, float dy) {
next_action <- [dx, dy];
return "ok";
}

action update_data() {
data <- [
"State"::state,
"Reward"::reward,
"Terminated"::terminated,
"Truncated"::truncated,
"Info"::info
];
}
}

// ── Forager species (actions implemented in Step 9) ──────────────────────────
species forager {
GymAgent gym_agent <- GymAgent[0];
float sensor_range <- 30.0;
float previous_distance <- 0.0;

init {
previous_distance <- location distance_to food_location;
}

action apply_action() {
// To be implemented in Step 9
}

action compute_observation() {
// To be implemented in Step 9
}

action compute_reward() {
// To be implemented in Step 9
}

aspect default {
draw circle(3) color: #blue;
draw circle(sensor_range) color: rgb(0, 0, 255, 20) border: rgb(0, 0, 255, 60);
}
}

experiment gym_env {
parameter "communication_port" var: gama_server_port;
output {
display "Continuous World" type: 2d {
graphics "background" {
draw rectangle(world_size, world_size)
at: {world_size / 2, world_size / 2}
color: rgb(240, 240, 240);
}
graphics "obstacles" {
loop obs over: obstacles { draw obs color: rgb(80, 80, 80); }
}
graphics "food" {
draw circle(food_radius) at: food_location color: rgb(50, 180, 50);
}
species forager;
}
}
}