Analysis of Agent Expertise in Ms. Pac-Man Using Value-of-Information-Based Policies

@article{Sledge2017AnalysisOA,
  title={Analysis of Agent Expertise in Ms. Pac-Man Using Value-of-Information-Based Policies},
  author={Isaac J. Sledge and Jos{\'e} Carlos Pr{\'i}ncipe},
  journal={IEEE Transactions on Games},
  year={2017},
  volume={11},
  pages={142-158}
}
Conventional reinforcement-learning methods for Markov decision processes rely on weakly guided, stochastic searches to drive the learning process. It can therefore be difficult to predict what agent behaviors might emerge. In this paper, we consider an information-theoretic cost function for performing constrained stochastic searches that promote the formation of risk-averse to risk-favoring behaviors. This cost function is the value of information, which provides the optimal tradeoff between… 

Figures from this paper

Guided Policy Exploration for Markov Decision Processes Using an Uncertainty-Based Value-of-Information Criterion

This paper proposes an uncertainty-based, information-theoretic approach for performing guided stochastic searches that more effectively cover the policy space, based on the value of information, a criterion that provides the optimal tradeoff between expected costs and the granularity of the search process.

An Analysis of the Value of Information When Exploring Stochastic, Discrete Multi-Armed Bandits

An information-theoretic exploration strategy for stochastic, discrete multi-armed bandits that achieves optimal regret is proposed, based on the value of information criterion, which measures the trade-off between policy information and obtainable rewards.

Value of Information Analysis via Active Learning and Knowledge Sharing in Error-Controlled Adaptive Kriging

The proposed VoI analysis framework is applied for an optimal decision-making problem involving load testing of a truss bridge and is shown to offer accurate and robust estimates of VoI with a limited number of model evaluations.

Reduction of Markov Chains Using a Value-of-Information-Based Approach

This paper provides a data-driven means of choosing the ‘optimal’ value of a single free parameter that emerges through the optimization process, which sidesteps needing to a priori know the number of state groups in an arbitrary chain.

An Information-theoretic Approach for Automatically Determining the Number of State Groups When Aggregating Markov Chains

  • I. SledgeJ. Príncipe
  • Mathematics
    ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2019
It is shown that an augmented value-of-information-based approach to aggregating Markov chains facilitates the determination of the number of state groups.

Annotating Motion Primitives for Simplifying Action Search in Reinforcement Learning

This work proposes a theoretically viewpoint-insensitive and speed-inensitive means of automatically annotating the underlying motions and actions of motion primitives through a differential-geometric, spatio-temporal kinematics descriptor, which analyzes how the poses of entities in two motion sequences change over time.

The Opponent ’ s Movement Mechanism in Simple Games Using Heuristic Method

This work presents the strategy of reaching a certain point on the board by the computer through the use of a modified heuristic algorithm, ie a Cuckoo Search Algorithm.

How Decisions Are Made in Brains: Unpack “Black Box” of CNN With Ms. Pac-Man Video Game

The role of the CNN convolutional layer is elucidated and the decision-making process at work during gameplay is high-reward-driven, which shows that CNN makes predictions based on the most important input pattern, which in this case is the high reward entities in the game.

Partitioning Relational Matrices of Similarities or Dissimilarities Using the Value of Information

  • I. SledgeJ. Príncipe
  • Computer Science
    2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2018
In this paper, we provide an approach to clustering relational matrices whose entries correspond to either similarities or dissimilarities between objects. Our approach is based on the value of

Regularized Training of Convolutional Autoencoders using the Rényi-Stratonovich Value of Information

  • I. SledgeJ. Príncipe
  • Computer Science
    2020 International Joint Conference on Neural Networks (IJCNN)
  • 2020
We propose an information-theoretic cost function or the regularized training of convolutional auto encoders that imposes an organization on the bottleneck-layer-projected samples so as to facilitate

References

SHOWING 1-10 OF 53 REFERENCES

Guided Policy Exploration for Markov Decision Processes Using an Uncertainty-Based Value-of-Information Criterion

This paper proposes an uncertainty-based, information-theoretic approach for performing guided stochastic searches that more effectively cover the policy space, based on the value of information, a criterion that provides the optimal tradeoff between expected costs and the granularity of the search process.

Trading Value and Information in MDPs

The tradeoff between value and information, explored using the info-rl algorithm, provides a principled justification for stochastic (soft) policies and is used to show that these optimal policies are also robust to uncertainties in settings with only partial knowledge of the MDP parameters.

Using the Value of Information to Explore Stochastic, Discrete Multi-Armed Bandits.

An information-theoretic exploration strategy for stochastic, discrete multi-armed bandits that achieves optimal regret is proposed based on the value of information criterion, which measures the trade-off between policy information and obtainable rewards.

Balancing exploration and exploitation in reinforcement learning using a value of information criterion

  • I. SledgeJ. Príncipe
  • Computer Science
    2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2017
An information-theoretic approach for addressing the exploration-exploitation dilemma in reinforcement learning using the value of information, a criterion that provides the optimal trade-off between the expected returns and a policy's degrees of freedom.

An Analysis of the Value of Information When Exploring Stochastic, Discrete Multi-Armed Bandits

An information-theoretic exploration strategy for stochastic, discrete multi-armed bandits that achieves optimal regret is proposed, based on the value of information criterion, which measures the trade-off between policy information and obtainable rewards.

Q-learning

This paper presents and proves in detail a convergence theorem forQ-learning based on that outlined in Watkins (1989), showing that Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action- values are represented discretely.

Policy Shaping: Integrating Human Feedback with Reinforcement Learning

This paper introduces Advise, a Bayesian approach that attempts to maximize the information gained from human feedback by utilizing it as direct policy labels and shows that it can outperform state-of-the-art approaches and is robust to infrequent and inconsistent human feedback.

Learning to play Pac-Man: an evolutionary, rule-based approach

This work describes an initial approach to developing an artificial agent that replaces the human to play a simplified version of Pac-Man and adaptively "learns" through the application of population-based incremental learning (PBIL) to adjust the agents' parameters.

RAMP: A rule-based agent for Ms. Pac-Man

The initial implementation of RAMP, a rule-based agent for playing Ms. Pac-Man according to the rules stipulated in the 2008 World Congress on Computational Intelligence, and the progress towards adding an evolutionary computation component to enable the agent learn to play the game are described.

Monte-Carlo tree search in Ms. Pac-Man

A performance comparison between the proposed system and existing programs showed significant improvement in the performance of proposed system over existing programs was observed in terms of its ability to survive, implying the effectiveness of proposed method.
...