Game Development Reference
In-Depth Information
software agents as alternatives to human play-
ers. All the actions that these agents can perform
are constrained by the semantic of the business
model, so we assume that the agents only take
“in range” decisions.
learning, or lazy learning where the function is
only approximated locally. We have proposed a
variant of KNN called Adaptive KNN. In this vari-
ant, a data set C is obtained during an interaction
between the agent and the environment. This data
set C is composed by tuples in the form < s 1 , a, r
> where s 1 S (space of all the possible states),
a A (space of all the possible actions) and r
R is the immediate reward (variable that we
want to maximize, like the profit). In each step,
the simulator returns the current state s where
the agent is. The algorithm selects the K near-
est neighbors to the state s in C. Among these K
neighbors, it selects the tuple with best reward.
Then modify slightly the actions of this tuple and
execute it. If the reward obtained is better than
the worst reward in K, it replaces the worst tuple
in K with the new experience generated.
Other ML technique used to create virtual
agents in SIMBA is Reinforcement Learning
(Sutton and Barto, 1998). Among many differ-
ent RL algorithms, Q-learning has been widely
used in the literature. Q-learning is an on-policy
method where the learned function, Q(s,a), for s
S, a A, directly approximates the optimal
action-value function, independently of the policy
being followed. This value function measures the
utility of executing each action from each state.
In SIMBA, function Q can evaluate, for instance,
the profit that a company expects to receive if it
is in a given state and perform a given action.
Q-learning is based on just the one next reward,
using the value of the state one step later as a
proxy for the remaining rewards. This action-value
function gives a utility measure of executing an
action (or decision), a , from a situation or state,
s . The update of such a function is performed
using an experience tuple <s, a, s', r> (where s
is the initial state, a is the action executed, s' is
the state achieved after executing a from s , and r
is the reward or profit received) following equa-
tion 1 (where α is a learning parameter, and γ is
a discount factor that reduces the relevance of
future decisions). Except in very small environ-
Random Agent : It chooses the best deci-
sions to make using uniform random num-
bers. We use this agent to explore the ac-
tion space.
Hand-Coded Agent : Agent sets decision
variables increasing their values using the
Consumer Price Index (CPI). His behavior
is more intelligent than that of the random
agents.
Intelligent Agent : Use the current state
of the environment, action and reward in-
formation to choose the best decisions to
make in every decision period.
A business strategy is a plan that integrates an
organization's major goals, policies, decisions and
sequences of actions into a cohesive whole (Baye,
2006). The managers select a business strategy
to gain a competitive advantage in a particular
market. It can apply at all levels in an organiza-
tion and affect to any of the functional areas of
management. Different business strategies appear
in business literature, and they all could be fol-
lowed to manage the companies in SIMBA, like
incremental decisions, risk or reactive decisions,
low cost strategies, differentiation or specializa-
tion, etcetera.
Which strategy management is chosen in every
moment depends on the organization's strengths
and its competitor's weaknesses. This strategies
could be implemented in the simulator by hand-
coding them, following classical expert system
methodologies. However, creating virtual agents
can also be performed using Machine Learning
(ML) approaches.
A first approach of ML to learn virtual agents
in SIMBA is using lazy learning approaches like
KNN (Aha, 1997). KNN is a type of instance-based
Search Nedrilad ::




Custom Search