Python Lambda Early-Binding for a Game AI Planner

6 min readJul 13, 2021

This story is targeted at two audiences, which may or may not overlap:

Readers interested in a detailed explanation of a score-based planner pattern, specifically in the context of a turn-based RPG combat system.
Readers struggling with Python’s late-binding in lambdas who, like me, found all the generic “i: x * i” examples conceptually helpful, but hard to adapt to their specific code.

The Planner

When players in the game venture into a new room in the mine, they are confronted with a group of enemies they must fight and defeat to proceed further. Each participant in the fight takes one action per turn, in initiative order, until either the enemies or the party are defeated. Simple enough. The tricky part is getting the enemies to make good decisions!

The solution I had devised for a previous game (also turn-based, but a tactical RPG), works roughly like so:

When an enemy is instructed to take_a_turn(), they make a plan to take every action available to them on every valid target (e.g. attack player, heal self, heal ally) in the fight.
Each plan consists of an action (function) and a score (int).
Enemies have goals with numeric values that influence the score of the plans they make. For example, one enemy may only have the goal of hurting players and so will never use a healing ability. Another enemy may have both goals, but favor hurting players more than healing.
When plans are made, scores are weighted by various things. For example, an enemy may favor hurting players more than healing allies based on its goals, but if an ally’s health is critically low, the relative value of healing that ally may be high enough that the plan is favored temporarily above smacking players.
When any viable plan is discovered, the function (e.g. the attack or heal spell) and its arguments (the enemy using it, the target, the state of the fight) are assigned to the plan’s action variable to be invoked later, if the plan is selected.
Once the enemy has made every viable plan, they are sorted by score and the highest score plan is chosen. This is the “best” thing the enemy could do that turn based on their abilities, goals, and the current state of the fight. The chosen plan’s action variable is then executed, causing the enemy to do the thing.

What makes a system like this cool is that you can adjust enemy behavior in a variety of ways:

Give the enemy more abilities.
Give the enemy more goals.
Change the enemy’s goal values.
Change the score weights for different plan types.

None of these adjustments require code changes. Additionally, since all the enemies make context-based decisions, they can be combined with each other and should have interesting and reasonably intelligent interactions with each other and the players, again, with no added code or configuration. Of course, the tuning of the goal and scoring values to arrive at the desired behavior is a whole discipline on its own.

Note: if you are familiar with game programming, and game AI in particular, this is loosely based on Goal-Oriented Action Programming (GOAP). But, knowing me, probably butchered and poorly optimized.

The Code

Enemy, with all their attendant decision-making logic:

class GoalType(Enum):
    damage_player = 1
    debuff_player = 2 #not used in this example
    heal_ally = 3
    buff_ally = 4 # not used in this example
    summon = 5 # not used in this example
class Goal:
    def __init__(self, goal_type: GoalType, value: int):
        self.goal_type = goal_type
        self.value = value    # this method looks like overkill, but several future goals have multiple contributor types
    @staticmethod
    def get_contributor_effects_by_goal_type(goal_type: GoalType):
        if goal_type == GoalType.damage_player:
            contribs = [EffectType.damage_health]
        elif goal_type == GoalType.heal_ally:
            contribs = [EffectType.restore_health]
        else:
            raise Exception(f'GoalType {goal_type} has no configured contributing effects')        return contribs
class Plan:
    def __init__(self):
        self.action = None
        self.debug = ''
        self.score = 0
class Enemy:
        # I omitted all the enemy member variables here not related to the problem, for brevity.
        # AI
        self.actions = actions
        self.goals = goals    def take_a_turn(self, fight):
        plans = self.get_action_plans(fight)        if len(plans) > 0:
            print(f'{self.name}\'s plans:')
            for plan in plans:
                print(': ' + plan.debug)            plans.sort(key=lambda x: x.score, reverse=True)
            the_plan = plans[0]
            print(f'The chosen plan is: --{the_plan.debug}-- w/ score {the_plan.score}')
            return the_plan.action()
        else:
            return f'{self.name} took no action.'    def get_action_plans(self, fight):
        plans = self.get_kill_player_action_plans(fight)        if len(plans) > 0:
            return plans        # damage_player goal
        goal = [x for x in self.goals if x.goal_type == GoalType.damage_player]        if len(goal) > 0:
            plans += self.get_damage_player_plans(goal[0], fight)        # heal_ally goal
        goal = [x for x in self.goals if x.goal_type == GoalType.heal_ally]        if len(goal) > 0:
            plans += self.get_heal_ally_plans(goal[0], fight)        return plans    def get_damage_player_plans(self, goal, fight):
        plans = []        for action in self.actions:
            if action.targets_players and action.is_usable(fight.states):
                effects = list(filter(lambda effect: effect.type == EffectType.damage_health, action.effects))                if len(effects) > 0:
                    for character in fight.characters:
                        dmg = character.estimate_damage_from_enemy_action(self, action)
                        plan = Plan()
                        plan.score = goal.value + int(100.0 * dmg / character.health)
                        plan.action = lambda: action.do(user=self, target=character, fight=fight)
                        plan.debug = f'damage {character.name} w/ {action.name} score {plan.score}'
                        plans.append(plan)        return plans    def get_heal_ally_plans(self, goal, fight):
        plans = []        for action in self.actions:
            if action.targets_allies and action.is_usable(fight.states):
                effects = list(filter(lambda effect: effect.type == EffectType.restore_health, action.effects))                if len(effects) > 0:
                    for enemy in fight.enemies:
                        plan = Plan()
                        plan.score = goal.value + 100 - int(enemy.current_health / enemy.health * 100)
                        plan.action = lambda: action.do(user=self, target=enemy, fight=fight)
                        plan.debug = f'heal {enemy.name} w/ {action.name} score {plan.score}'
                        plans.append(plan)        return plans

The Enemy used for testing — ignore all the numbers, which are just various stats

enemies = {
    'slime': Enemy('Slime', 1, 0.3, 1, 0.3, 1, 0.3, 1, 0.3, 10, 0.1, 5, 0.2, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
                   [SingleTargetAttack('Headbutt', '', 0, 0.05,
                                       [SpellEffect(EffectType.damage_health, Elements.earth, 1, 4)]),
                    SingleTargetHeal('Regenerate', '', 3,
                                     [SpellEffect(EffectType.restore_health, Elements.water, 2, 5)])],
                   [Goal(GoalType.damage_player, 500), Goal(GoalType.heal_ally, 450)]),
}

How an ability and its do() function is defined

class SingleTargetHeal(Action):
    def __init__(self, name: str, description: str, cooldown: int, effects: [SpellEffect]):
        for effect in effects:
            if effect.type != EffectType.restore_health:
                raise Exception(f'SingleTargetHeal {name} has an unsupported effect type {effect.type}')        super().__init__()
        self.name = name
        self.description = description
        self.cooldown = cooldown
        self.effects = effects
        self.targets_players = False
        self.targets_allies = True
        self.area = 0
        self.area_modifiable = False    def do(self, user, target, fight):
        out = f'{user.name} used {self.name} on {target.name}.'
        targets = [target]        if self.area > 0:
            i = self.area            while i > 0:
                if fight.enemies.index(target) + i <= len(fight.enemies) - 1:
                    targets.append(fight.enemies.index(target) + i)                if fight.enemies.index(target) - i > 0:
                    targets.insert(0, fight.enemies.index(target) - i)                i -= 1        for target in targets:
            for effect in self.effects:
                heal = target.restore_health(random.randint(effect.min, effect.max), user)
                out += f'\n{target.name} regained {heal} health.'        return out

The Problem

Due to my inability to figure out the correct syntax to early-bind the action and its parameters for later invocation when the highest scoring plan is selected, the enemies would correctly build the plans, correctly select the best plan, and then execute the wrong (in fact, last planned) action on the correctly planned target. In this case, the Slime would decide the best thing to do is to whack the player and then… heal the player instead.

The debug statements from the planner

Slime's plans:
: damage Player w/ Headbutt score 502
: heal Slime w/ Regenerate score 450
The chosen plan is: --damage justindz#4247 w/ Headbutt score 502-- w/ score 502

What actually happened

Pico Slime goes next.
Pico Slime used Regenerate on you.
You regained 3 health.

The Solution

Thanks to Stack Overflow user Carcigenicate (https://stackoverflow.com/users/3000206/carcigenicate) for leading me to the solution. While I had correctly diagnosed the problem as a late-binding issue with Python lambdas, and had attempted to resolve it by “freezing” the variables at definition time using default parameters, I had not “frozen” the one element most important: the “action” function object itself.

The working code follows. Note that action is passed to the lambda as a default parameter — that’s the key. Also note that because the fight parameter has the same state throughout the entire planning process, that can be passed used without worrying about binding. The same is true for the user (self)… saving entire characters of code (yipee!). The action and the target must early-bind for the pattern to work.

Instead of doing:

plan.action = lambda: action.do(user=self, target=enemy, fight=fight)

We do:

plan.action = lambda action=action, enemy=enemy: action.do(self, enemy, fight)

The Project

In case you are curious about the game itself — not really the subject of this story — I will give a brief explanation. I intend at some point to write a follow-up story about adapting the game concept to Discord as a medium.

The game is a text-based RPG roughly in the loot grinder genre. It features a series of infinitely deep mines full of enemies that you can fight through, room by room, for increasingly higher risks and rewards. Being text-based, and room-based, it has some similarity to MUDs (yes, I am old). However, the character and gear progression is more influenced by games like Diablo and Path of Exile.

The game platform is a Discord bot running in my personal server. The bot is written in Python using discord.py with persistence via MongoDB Atlas. The uniquely interesting part of this project is translating the game patterns into Discord concepts, such as DMs, channels, messages, and reactions. But, as I said, that is another whole story in and of itself.