payment matrix. Lower and upper price of the game

18.03.2019

Consider a paired finite game. Let the player A has T personal strategies, which we denote

Let the player IN available P personal strategies, let's denote them. They say that the game has a dimension T X P.

As a result of the choice by the players of any pair of strategies, the outcome of the game is uniquely determined, i.e. win A;. player A(positive or negative) and losing (-ay) player IN. Let's assume that the values A.. known for any pair of strategies (A:, B;.). Matrix P =(a..), i == 1, 2, ..., mj = 1, 2, ..., P, whose elements are the payoffs corresponding to the strategies A. And bj, called payment matrix, or game matrix. General form such a matrix is ​​presented in Table. 12.1. The rows of this table correspond to the player's strategies A, and the columns are the player's strategies IN.

Table 12.1

Let's make a payoff matrix for the next game.

12.1. Search game.

Player A can hide in one of two shelters (I and II); player IN looking for a player A, and if he finds, he receives a fine of 1 den. units from A, otherwise pays the player A 1 day units It is necessary to build the payoff matrix of the game.

D e s h e n i s. To compile the payoff matrix, it is necessary to analyze the behavior of each of the players. Player A can hide in shelter I – we denote this strategy by A v either in shelter II - strategy A. g Player IN can look for the first player in the shelter I - strategy IN(or in shelter II - strategy IN.,. If the player A is in hideout I and is discovered there by the player IN, those. a couple of strategies are being implemented ν IN{), then the player A pays a fine, i.e. A n = -1. Similarly, we get A. n = -1 (A 2, IN.,). Obviously, the strategies (A, IN.,) and (R2, /1,) give the player A win 1, so A P = a. n = I. Thus, for the game "search" of size 2x2, we get the payoff matrix:

Consider the game T X P with matrix P = a j) , i = 1,2, ..., τη; j= 1, 2, ..., and and determine the best among the strategies A at A v..., A m. Choosing a strategy A jy player A should expect the player IN will answer it with one of the strategies IN., for which the payoff for the player A minimal (player IN seeks to "harm" the player A).

Denote by a; player's lowest payoff A when he chooses strategy L; for all possible player strategies IN(smallest number in i-th line payoff matrix), i.e.

Among all numbers a (r = 1,2,..., T) choose the largest: . Let's call and the lower price of the game, or maximum payoff (maximin). This Guaranteed payoff of player A for any strategy of player B. Hence,

(12.2)

The strategy corresponding to the maximin is called maximin strategy. Player IN interested in reducing the player's payoff A; choosing a strategy IN., it takes into account the maximum possible payoff for A. Denote

Among all numbers β. choose the smallest

and call β top game price, or minimax payoff (minimax). This guaranteed loss of player B. Hence,

(12.4)

The minimax strategy is called minimax strategy.

The principle that dictates to players the choice of the most "cautious" minimax and maximin strategies is called the principle minimax. This principle follows from the reasonable assumption that each player seeks to achieve the opposite goal of the opponent. Let us determine the lower and upper prices of the game and the corresponding strategies in problem 12.1. Consider the payoff matrix

from problem 12.1. When choosing a strategy A, (the first row of the matrix) minimum win is equal to a, =min(-l; 1) = -1 and corresponds to the strategy β1 of the player IN. When choosing a strategy L 2 (second row of the matrix) the minimum payoff is A 2 = min(l; -1) = -1, it is achieved with the strategy IN.,.

Guaranteeing yourself maximum win for any strategy of the player IN, i.e. the lower price of the game a = max(a, a2) = max(-l; -1) = -1, player A can choose any strategy: Aj or A 2, i.e. any of his strategies is maximin.

Choosing strategy B, (column 1), the player IN understands that the player A will respond with a strategy A 2 to maximize your gain (loss IN). Therefore, the player's maximum loss IN when he chooses strategy B, is equal to β, = max(-1; 1) = 1.

Similarly, the maximum loss of player B (gain A) when he chooses strategy B2 (column 2) is equal to β2 = max(l; -1) = 1.

Thus, for any strategy of the player A the guaranteed minimum loss of player B is equal to β = πιίη(β1, β2) = min(l; 1) = 1- the top price of the game.

Any strategy of player B is minimax. By adding Table. 12.1 line β; and column a;, we get the table. 12.2. At the intersection of additional rows and columns, we will record the upper and lower prices of the games.

Table 12.2

In Problem 12.1 above, the upper and lower costs of the game are different: a f β.

If the upper and lower prices of the game are the same, then general meaning the upper and lower prices of the game α = β = υ is called the net price of the game, or the price of the game. The minimax strategies corresponding to the price of the game are optimal strategies and their totality the optimal solution or decision games. In this case the player A receives the maximum guaranteed (independent of the player's behavior) IN) the payoff is υ, and the player IN achieves the minimum guaranteed (regardless of the behavior of player L) loss υ. The solution to the game is said to have stability, those. if one of the players sticks to his optimal strategy, then it cannot be advantageous for the other to deviate from his optimal strategy.

A couple of pure strategies A. and B. gives an optimal solution to the game if and only if the corresponding element r is both the largest in its column and the smallest in its row. Such a situation, if it exists, is called saddle point(similar to the surface of a saddle, which curves up in one direction and down in the other).

Denote A* And IN* are a pair of pure strategies on which the solution of the game in the problem with a saddle point is achieved. Let us introduce the payoff function of the first player on each pair of strategies: P(A:, IN-) = and at. Then the optimality condition at the saddle point satisfies the double inequality: P(Aj, B*)<Р(А*, В*)<Р(А", В ), which is true for everyone i = 1, 2, ..., m;j = 1, 2, ..., P. Indeed, the choice of strategy A* the first player under the optimal strategy IN" the second player maximizes the minimum possible payoff: P(A*, B")> P(A G IN"), and the choice of strategy B" the second player, with the optimal strategy of the first one, minimizes the maximum loss: P(D , IN*)<Р(А", В).

12.2. Determine the lower and upper price of the game given by the payoff matrix

Does the game have a saddle point?

Table 12 3

Solution. All calculations are conveniently carried out in a table in which, in addition to the matrix R, entered column a; and line )

Similar articles