8. The GymAgent Bridge
By Killian Trouillet
How GAMA and Python Communicate​
The gama-gymnasium library works as follows:
- GAMA runs a simulation and exposes a WebSocket server.
- Python connects via the Gymnasium interface (
gym.make(...)). - Each simulation step, Python sends an action → GAMA returns an observation + reward.
The glue on the GAMA side is a special species called GymAgent.
The GymAgent Species​
This species is required by gama-gymnasium. It must contain exactly these attributes and the update_data action:
species GymAgent {
map<string, unknown> action_space;
map<string, unknown> observation_space;
list<float> state; // Current observation sent to Python
float reward; // Reward for this step
bool terminated; // True when goal is reached (episode ends)
bool truncated; // True when max steps exceeded (episode ends)
map<string, unknown> info; // Additional info (can be empty)
list<float> next_action; // Action received FROM Python
map<string, unknown> data; // Container read by the bridge
// Action to receive action values from Python (avoids list literal bug in expression API)
action set_action(float dx, float dy) {
next_action <- [dx, dy];
return "ok";
}
action update_data() {
data <- [
"State"::state,
"Reward"::reward,
"Terminated"::terminated,
"Truncated"::truncated,
"Info"::info
];
}
}
In short: each simulation cycle, GAMA waits for Python to send an action, executes one step, then returns the resulting observation and reward.
How it works, step by step​
Python GAMA
│ │
│ env.reset() │
│ ─────────────────────────────► │ Creates GymAgent, initializes state
│ ◄─── obs, info ────────────── │
│ │
│ env.step(action) │
│ ── action ────────────────────► │ Sets GymAgent.next_action = action
│ │ Runs one simulation step (reflex)
│ │ Forager reads next_action, moves
│ │ Forager computes obs, reward, done
│ │ Calls update_data
│ ◄─── obs, reward, done ────── │
│ │
│ (repeat until terminated) │
Attribute Reference​
| Attribute | Type | Direction | Description |
|---|---|---|---|
action_space | map | GAMA → Python | Defines what actions Python can send |
observation_space | map | GAMA → Python | Defines the observation format |
next_action | unknown | Python → GAMA | The action chosen by the Python agent |
state | unknown | GAMA → Python | Current observation vector |
reward | float | GAMA → Python | Reward for the last action |
terminated | bool | GAMA → Python | Goal reached? |
truncated | bool | GAMA → Python | Timed out? |
info | map | GAMA → Python | Extra debug data |
Defining the Spaces​
In the global init, after creating the GymAgent, we define the action space and observation space:
init {
// ... obstacles and forager creation ...
create GymAgent;
// Action: 2 continuous values in [-1, 1]
GymAgent[0].action_space <- [
"type"::"Box",
"low"::[-1.0, -1.0],
"high"::[1.0, 1.0],
"dtype"::"float"
];
// Observation: 13 continuous values in [0, 1]
GymAgent[0].observation_space <- [
"type"::"Box",
"low"::list_with(13, 0.0),
"high"::list_with(13, 1.0),
"dtype"::"float"
];
}
Space Types​
The gama-gymnasium library supports two main space types:
| Type | GAMA Format | Python Equivalent | Use Case |
|---|---|---|---|
Discrete | ["type"::"Discrete", "n"::4] | Discrete(4) | Grid actions (up/down/left/right) |
Box | ["type"::"Box", "low"::[...], "high"::[...]] | Box(low, high) | Continuous values |
In Part 1, we would have used Discrete(4). Here we use Box because our forager moves with a continuous velocity vector.
Why 13 Observation Values?​
We'll define the full observation vector in Step 9, but here's a preview:
| Values 0-1 | Values 2 | Values 3-4 | Values 5-12 |
|---|---|---|---|
| Agent position (x, y) | Distance to food | Direction to food (cos, sin) | 8 obstacle sensors |
The Main Loop​
The global reflex connects everything:
reflex main_loop {
// Guard: skip if Python hasn't sent an action yet (e.g. right after reset)
if (empty(GymAgent[0].next_action)) { return; }
ask forager[0] {
do apply_action(); // Read action from Python, move
do compute_observation(); // Build the observation vector
do compute_reward(); // Compute reward + check termination
}
}
The
nilguard is critical. When GAMA starts,next_actionisnilbecause Python hasn't sent anything yet. Without this check, the forager would try to read a non-existent action and fail.
Each simulation cycle:
apply_action: Readsgym_agent.next_action, translates it to movementcompute_observation: Builds the 13-value state vector, writes it togym_agent.statecompute_reward: Computes the reward, setsterminated/truncated, callsupdate_data
Complete Model​
The model at the end of this step adds GymAgent, the spaces, and the main_loop reflex to the Step 7 skeleton. The forager species actions (apply_action, compute_observation, compute_reward) are stubs — they will be fully implemented in Step 9.
/**
* Name: SmartForagerGym - Step 8: The GymAgent Bridge
* Author: Killian Trouillet
* Description: Adds the GymAgent bridge species, space definitions,
* and main reflex loop. Forager actions are stubs for now.
* Tags: reinforcement-learning, gymnasium, gymAgent, tutorial
*/
model SmartForagerGym
global {
float world_size <- 100.0;
point food_location <- {95.0, 95.0};
float food_radius <- 5.0;
list<geometry> obstacles <- [];
int max_steps <- 300;
int current_step <- 0;
int gama_server_port <- 0;
init {
obstacles << square(10) at_location {25.0, 25.0};
obstacles << square(10) at_location {35.0, 25.0};
obstacles << square(10) at_location {25.0, 35.0};
obstacles << square(10) at_location {65.0, 45.0};
obstacles << square(10) at_location {75.0, 45.0};
obstacles << square(10) at_location {75.0, 55.0};
create forager number: 1 {
location <- {5.0, 5.0};
}
// ── GymAgent bridge ──────────────────────────────────────────────
create GymAgent;
GymAgent[0].action_space <- [
"type"::"Box",
"low"::[-1.0, -1.0],
"high"::[1.0, 1.0],
"dtype"::"float"
];
GymAgent[0].observation_space <- [
"type"::"Box",
"low"::list_with(13, 0.0),
"high"::list_with(13, 1.0),
"dtype"::"float"
];
}
reflex main_loop {
ask forager[0] {
do apply_action();
do compute_observation();
do compute_reward();
}
}
}
// ── GymAgent bridge species (required by gama-gymnasium) ─────────────────────
species GymAgent {
map<string, unknown> action_space;
map<string, unknown> observation_space;
list<float> state;
float reward;
bool terminated;
bool truncated;
map<string, unknown> info;
list<float> next_action;
map<string, unknown> data;
// Action to receive action values from Python (avoids list literal bug in expression API)
action set_action(float dx, float dy) {
next_action <- [dx, dy];
return "ok";
}
action update_data() {
data <- [
"State"::state,
"Reward"::reward,
"Terminated"::terminated,
"Truncated"::truncated,
"Info"::info
];
}
}
// ── Forager species (actions implemented in Step 9) ──────────────────────────
species forager {
GymAgent gym_agent <- GymAgent[0];
float sensor_range <- 30.0;
float previous_distance <- 0.0;
init {
previous_distance <- location distance_to food_location;
}
action apply_action() {
// To be implemented in Step 9
}
action compute_observation() {
// To be implemented in Step 9
}
action compute_reward() {
// To be implemented in Step 9
}
aspect default {
draw circle(3) color: #blue;
draw circle(sensor_range) color: rgb(0, 0, 255, 20) border: rgb(0, 0, 255, 60);
}
}
experiment gym_env {
parameter "communication_port" var: gama_server_port;
output {
display "Continuous World" type: 2d {
graphics "background" {
draw rectangle(world_size, world_size)
at: {world_size / 2, world_size / 2}
color: rgb(240, 240, 240);
}
graphics "obstacles" {
loop obs over: obstacles { draw obs color: rgb(80, 80, 80); }
}
graphics "food" {
draw circle(food_radius) at: food_location color: rgb(50, 180, 50);
}
species forager;
}
}
}