Version: 🚧 Alpha 🚧

4. The Q-Table

By Killian Trouillet

Content

In Step 3, we added a reward function and episode management. Now it's time to give the agent a memory: the Q-Table. This data structure stores a "quality score" for every possible (state, action) pair. In this step, we define the data structure but the agent still moves randomly — we are setting up the infrastructure.

Formulation

Definition of the Q-Table as a map<string, float>
Definition of a get_state action to convert the agent's position into a string key
Definition of a get_q action for safe Q-value lookup
Population of Q-Table entries during random exploration

Model Definition

Q-Table definition

We add the Q-Table as a map inside the forager species. The key is a string combining the state and action (e.g., "3_5::right") and the value is the Q-value (a float).

species forager {
    world_cell my_cell;

    // The Q-Table: stores quality scores for (state, action) pairs
    map<string, float> q_table;
    list<string> action_list <- ["up", "right", "down", "left"];

map<string, float>: GAML's dictionary type. We use strings as keys and floats as values.
action_list: A list of the four possible actions. This will be reused throughout the RL logic.

State representation

We need a way to describe "where the agent is" as a simple string. We concatenate the grid coordinates:

    string get_state {
        return string(my_cell.grid_x) + "_" + string(my_cell.grid_y);
    }

For example, if the forager is at column 3, row 5, the state string is "3_5". This gives us 100 unique states for a 10×10 grid.

Safe Q-value lookup

When we look up a (state, action) pair that the agent has never visited, the Q-Table has no entry. We must handle this gracefully:

    float get_q (string s, string a) {
        string key <- s + "::" + a;
        if (q_table contains_key key) {
            return float(q_table[key]);
        }
        return 0.0;  // Default: unvisited = neutral = 0
    }

contains_key: A GAML operator that checks whether a key exists in a map.
We use "::" as a separator between the state and action parts of the key.

Populating the table

In the movement reflex, we insert entries into the Q-Table as the agent explores. The values are initialized to 0.0:

    // After moving:
    string key <- current_state + "::" + action_taken;
    if (not (q_table contains_key key)) {
        q_table[key] <- 0.0;
    }

New monitor

We add a monitor to track the Q-Table growth:

monitor "Q-Table Size" value: length(forager[0].q_table);

Complete Model

/**
* Name: SmartForager - Step 4: The Q-Table
* Author: Killian Trouillet
* Description: This fourth step introduces the Q-Table, the "memory" of the agent.
*              We add a map to store Q-values for each (state, action) pair.
*              The agent still moves randomly but now records its state at each step.
* Tags: reinforcement-learning, q-table, map, tutorial
*/

model SmartForager

global {
	int grid_size <- 10;
	int food_x <- 9;
	int food_y <- 9;
	list<point> obstacle_positions <- [{2,2}, {3,2}, {2,3}, {6,4}, {7,4}, {7,5}];
	
	int max_steps_per_episode <- 200;
	int step_count <- 0;
	int episode <- 0;
	float episode_reward <- 0.0;
	float last_episode_reward <- 0.0;
	bool food_found <- false;
	
	init {
		ask world_cell grid_at {food_x, food_y} {
			is_food <- true;
		}
		loop pos over: obstacle_positions {
			ask world_cell grid_at pos {
				is_obstacle <- true;
			}
		}
		create forager number: 1 {
			my_cell <- world_cell grid_at {0, 0};
			location <- my_cell.location;
		}
	}
	
	reflex manage_episode {
		step_count <- step_count + 1;
		if (food_found or step_count >= max_steps_per_episode) {
			do end_episode();
		}
	}
	
	action end_episode() {
		episode <- episode + 1;
		last_episode_reward <- episode_reward;
		write "Episode " + episode + " | Steps: " + step_count 
		      + " | Reward: " + episode_reward
		      + " | Q-Table size: " + length(forager[0].q_table);
		episode_reward <- 0.0;
		step_count <- 0;
		food_found <- false;
		ask world_cell grid_at {food_x, food_y} {
			is_food <- true;
		}
		ask forager[0] {
			my_cell <- world_cell grid_at {0, 0};
			location <- my_cell.location;
		}
	}
}

grid world_cell width: 10 height: 10 neighbors: 4 {
	bool is_food <- false;
	bool is_obstacle <- false;
	rgb color <- #white update: is_obstacle ? rgb(60, 60, 60) : #white;
}

species forager {
	world_cell my_cell;
	
	// === NEW: Q-Table (the agent's memory) ===
	map<string, float> q_table;
	list<string> action_list <- ["up", "right", "down", "left"];
	
	// Convert the current grid position into a string key
	string get_state() {
		return string(my_cell.grid_x) + "_" + string(my_cell.grid_y);
	}
	
	// Look up a Q-value safely. Returns 0.0 if the key does not exist.
	float get_q (string s, string a) {
		string key <- s + "::" + a;
		if (q_table contains_key key) {
			return float(q_table[key]);
		}
		return 0.0;
	}
	
	reflex random_move {
		// Log current state (for debugging / learning)
		string current_state <- get_state();
		
		int direction <- rnd(3);
		int new_x <- my_cell.grid_x;
		int new_y <- my_cell.grid_y;
		
		switch direction {
			match 0 { new_y <- new_y - 1; }
			match 1 { new_x <- new_x + 1; }
			match 2 { new_y <- new_y + 1; }
			match 3 { new_x <- new_x - 1; }
		}
		
		float step_reward <- -1.0;
		
		if (new_x >= 0 and new_x < grid_size and new_y >= 0 and new_y < grid_size) {
			world_cell target <- world_cell grid_at {new_x, new_y};
			if (not target.is_obstacle) {
				my_cell <- target;
				location <- my_cell.location;
				if (my_cell.is_food) {
					my_cell.is_food <- false;
					step_reward <- 100.0;
					food_found <- true;
				}
			} else {
				step_reward <- -5.0;
			}
		} else {
			step_reward <- -5.0;
		}
		
		// Store Q-value (just initializing entries for now, no learning yet)
		string new_state <- get_state();
		string action_taken <- action_list[direction];
		string key <- current_state + "::" + action_taken;
		if (not (q_table contains_key key)) {
			q_table[key] <- 0.0;
		}
		
		episode_reward <- episode_reward + step_reward;
	}
	
	aspect default {
		draw circle(0.8) color: #blue;
	}
}

experiment smart_forager type: gui {
	parameter "Max steps per episode" var: max_steps_per_episode min: 50 max: 1000 category: "Simulation";
	
	output {
		display "Grid World" {
			grid world_cell border: #lightgray;
			species forager;
			graphics "food" {
				ask world_cell where each.is_food {
					draw circle(5) color: rgb(50, 180, 50);
				}
			}
		}
		monitor "Episode" value: episode;
		monitor "Step" value: step_count;
		monitor "Current Reward" value: episode_reward;
		monitor "Last Episode Reward" value: last_episode_reward;
		monitor "Q-Table Size" value: length(forager[0].q_table);
	}
}

Content​

Formulation​

Model Definition​

Q-Table definition​

State representation​

Safe Q-value lookup​

Populating the table​

New monitor​

Complete Model​