main.tex

\documentclass{article}

% Packages
\usepackage[margin=1.5cm, includefoot, footskip=30pt]{geometry}
\usepackage{hyperref}
\usepackage{multicol}
\usepackage{booktabs}
\usepackage{xstring}
\usepackage{subcaption}
\usepackage{graphicx}
\usepackage{amsmath}
\usepackage{standalone}
\usepackage{fancyvrb}
\usepackage{algorithm,algorithmic}
\usepackage{authblk}

\usepackage{tikz}
\usetikzlibrary{calc, shapes, patterns}


% Title
\title{Evolution Reinforces Cooperation with the Emergence of Self-Recognition
       Mechanisms: an empirical study of the Moran process for the iterated
       Prisoner's dilemma}
\author[1]{Vincent Knight}
\author[2]{Marc Harper}
\author[1]{Nikoleta E. Glynatsi}
\author[3]{Owen Campbell}


\affil[1]{Cardiff University, School of Mathematics, UK}
\affil[2]{Google Inc., Mountain View, CA, USA}
\affil[3]{Not affiliated}

\date{}


\begin{document}

\maketitle

\begin{abstract}
    We present insights and empirical results from an extensive numerical
    study of the evolutionary dynamics of the iterated prisoner's dilemma.
    Fixation probabilities for Moran processes are obtained
    for all pairs of 164 different strategies including classics such as TitForTat, zero
    determinant strategies, and many more sophisticated strategies.
    Players with long memories and sophisticated behaviours outperform
    many strategies that perform well in a two player setting. Moreover we
    introduce several strategies trained with evolutionary algorithms to
    excel at the Moran process. These strategies are excellent invaders
    and resistors of invasion and in some cases naturally evolve handshaking
    mechanisms to resist invasion. The best invaders were those trained
    to maximize total payoff while the best resistors invoke handshake mechanisms.
    This suggests that while maximizing individual payoff
    can lead to the evolution of cooperation through invasion, the relatively
    weak invasion resistance of payoff maximizing strategies are not as
    evolutionarily stable as strategies employing handshake mechanisms.
\end{abstract}

\section{Introduction}\label{sec:introduction}

The Prisoner's Dilemma (PD) \cite{Flood1958} is a fundamental two player game
used to model a variety of strategic interactions. Each player chooses
simultaneously and independently
between cooperation (C) or defection (D). The payoffs of
the game are defined by the matrix $\begin{pmatrix} R & S \\ T & P
\end{pmatrix}$, where $T > R > P > S$ and $2R > T + S$. The PD is a one
round game, but is commonly studied in a manner where the prior outcomes
matter. This repeated form is called the Iterated Prisoner's
Dilemma (IPD). As described in \cite{Axelrod1980a, Knight2016, Press2012} a
number of strategies have been developed to take advantage of the history of
play. Recently, some strategies referred to as zero determinant (ZD) strategies
\cite{Press2012} can manipulate some players through extortionate mechanisms.

The Moran Process \cite{Moran1957} is a model of
evolutionary population dynamics that has been used to gain insights about the
evolutionary stability in a number of settings (more details given in
Section~\ref{sec:the_moran_process}).
Several earlier
works have studied iterated games in the context of the prisoner's dilemma
\cite{Nowak, stewart2013extortion}, however these often make simplifying assumptions
or are limited to classes of strategies
such as memory-one strategies that only use the previous round of play.

This manuscript provides a detailed numerical analysis of agent-based simulations
of \textbf{\input{./tex/num_strategies.tex}}complex and adaptive strategies for the
IPD\@. This is made possible by the Axelrod library \cite{axelrodproject}, an
effort to provide software for reproducible research for the IPD\@. The library
now contains over \input{./tex/num_strategies_axelrod.tex}parameterized
strategies including classics like TitForTat and WinStayLoseShift, as well as
recent variants such as OmegaTFT, zero determinant and other memory one
strategies, strategies based on finite state machines, lookup tables, neural
networks, and other machine learning based strategies, and a collection of novel
strategies. Not all strategies have been considered for this study: excluded
are those that make use of knowledge of the number of turns in a match
and others that have a high
computational run time. The large number of strategies are available thanks to
the open source nature of the project with over 50 contributors from around the
world, made by programmers and researchers \cite{Knight2016}. Three of the considered
strategies are finite state machines trained specifically for Moran processes
(described further in Section~\ref{sec:strategies}).

In addition to providing a large collection of strategies, the Axelrod library
can conduct matches, tournaments and population
dynamics with variations including noise and spatial structure.
The strategies and simulation frameworks are
automatically tested to an extraordinarily high degree of coverage in accordance
with best research software practices.

Using the Axelrod library and the many strategies it contains, we obtain
fixation probabilities for all pairs of strategies, identifying
those that are effective invaders and those resistant to invasion, for
population sizes $N=2$ to $N=14$. Moreover we present a number of strategies
that were created via reinforcement algorithms (evolutionary and particle
swarm algorithms) that are among the best invaders and resistors of invasion
known to date, and show that handshaking mechanisms naturally arise from these
processes as an invasion-resistance mechanism.

Recent work has argued that agent-based
simulations can provide insights in evolutionary game theory not available
via direct mathematical analysis \cite{adami2016evolutionary}. The results
and insights contained in this paper would be difficult to derive analytically.

In particular the following questions are addressed:
\begin{enumerate}
    \item What strategies are good invaders?
    \item What strategies are good at resisting invasion?
    \item How does the population size affect these findings?
\end{enumerate}

While the results agree with some of the published literature, it is found that:

\begin{enumerate}
 \item Zero determinant strategies are not particularly effective for $N > 2$
 \item Complex strategies can be effective, and in fact can naturally evolve
     through evolutionary processes to outperform designed strategies.
 \item The strongest resistors specifically evolve or have a handshake mechanism.
 \item Strong invaders are generally cooperative strategies that do not defect
 first but retaliate to varying degrees of intensity against strategies that defect.
 \item Strategies evolved to maximize their total payoff can be strong invaders
 and achieve mutual cooperation with many other strategies.
\end{enumerate}

\subsection{The Moran Process}\label{sec:the_moran_process}

Figure~\ref{fig:moran_process} shows a diagrammatic representation of the Moran
process, a stochastic birth death process on a finite population in which the
population size stays constant over time. Individuals are \textbf{selected}
according to a given fitness landscape. Once selected, the individual is
reproduced and similarly another individual is chosen to be removed from the
population. In some settings mutation is also considered but without mutation
(the case considered in this work) this process will arrive at an absorbing
state where the population is entirely made up of players of one strategy. The
probability with which a given strategy takes over a population is called the
\textit{fixation probability}. A more detailed analytic description of this is
given in Section~\ref{sec:validation}. In our simulations offspring do not
inherit any knowledge or history from parent replicants.

\begin{figure}[!hbtp]
    \centering
    \input{./tex/moran_process.tex}
    \caption{A diagrammatic representation of a Moran process.}
    \label{fig:moran_process}
\end{figure}

The Moran process was initially introduced in \cite{Moran1957}. It has since
been used in a variety of settings including the understanding of the spread of
cooperative and non-cooperative behaviour such as cancer \cite{West2016} and the
emergence of cooperative behaviour in spatial topologies \cite{Nowak2017}.
However these works mainly consider relatively simple strategies. Some work has
looked at evolutionary stability of agent-based strategies within the Prisoner's Dilemma
\cite{Li2014} but this is not done in the more widely used setting of the Moran
process, rather in terms of infinite population stability. In \cite{Baek2016}
Moran processes are studied in a theoretical framework for a small subset of
strategies. The subset included memory one strategies: strategies that recall
the events of the previous round only.

Of particular interest are the zero determinant strategies introduced in
\cite{Press2012}. It was argued in \cite{stewart2013extortion} that generous
ZD strategies are robust against invading strategies. However, in \cite{Lee2015},
a strategy using machine learning techniques was capable of resisting invasion
and also able to invade any memory one strategy. Recent work \cite{Hilbe2017}
has investigated the effect of memory length on strategy performance and the
emergence of cooperation but this is not done in a Moran process context and only
considers specific cases of memory 2 strategies. In \cite{Adami2013} it was
recognised that many zero determinant strategies do not fare well against
themselves. This is a disadvantage for the Moran process where the best
strategies cooperate well with other players using the same strategy.

\subsection{Strategies considered}\label{sec:strategies}

To carry out this numerical experiment, \input{./tex/num_strategies}
strategies, listed (with their properties) in Appendix~\ref{app:list_of_players},
are used from the Axelrod library. There are
\input{./tex/num_stochastic.tex}stochastic and
\input{./tex/num_deterministic.tex}deterministic strategies. Their memory depth,
defined by the number of rounds of history used by the strategy each round, is
shown in Table~\ref{tbl:memory_depth_count}. The memory depth is infinite if the
strategy uses the entire history of play (whatever its length). For example, a
strategy that utilizes a handshaking mechanism where the opponent's actions on
the first few rounds of play determines the strategies subsequent behavior would
have infinite memory depth.

A number of these strategies have been trained with reinforcement learning
algorithms prior to this study and not specifically for the Moran process.

\begin{itemize}
    \item Evolved ANN: a neural network based strategy;
    \item Evolved LookerUp: a lookup table based strategy;
    \item PSO Gambler: a stochastic version of the lookup table based strategy;
    \item Evolved HMM: a hidden Markov model based strategy.
\end{itemize}

Apart from the PSO Gambler strategy, which was trained using a particle swarm
optimisation algorithm, these strategies are trained with an evolutionary
algorithm that perturbs strategy parameters and optimizes the mean total score
against all other opponents \cite{affenzeller2009genetic}. They were trained to
win IPD tournaments by maximizing their mean total payoffs against a variety
of opponents. Variation is
introduced via mutation and crossover of parameters, and the best performing
strategies are carried to the next generation along with new variants. Similar
methods appear in the literature \cite{Ashlock2006}.

More information about each player can be obtained in the documentation for
\cite{axelrodproject} and a detailed description of the performance
of these strategies in IPD tournaments is described in~\cite{Harper2017}.

All of the training code is archived at \cite{marc_harper_2017_824264}. This
software is (similarly to the Axelrod library) available on github
\url{https://github.com/Axelrod-Python/axelrod-dojo} with documentation to
train new strategies easily. Training
typically takes less than 100 generations and can be completed within several
hours on commodity hardware.

There are three further strategies trained specifically for this study; Trained
FSM 1, 2, and 3 (TF1 - TF3). These are based on finite state machines of 16, 16,
and 8 states respectively (see Figures~\ref{fig:tf1},~\ref{fig:tf2}
and~\ref{fig:tf3}).


\begin{figure}[!hbtp]
    \centering
    \input{./tex/fsm_one.tex}
    \caption{TF1: a 16 state finite state machine with a handshake leading to
    mutual cooperation at state 4.}
    \label{fig:tf1}
\end{figure}


\begin{figure}[!hbtp]
    \centering
    \input{./tex/fsm_two.tex}
    \caption{TF2: a 16 state finite state machine with a handshake leading to
    mutual cooperation at state 16.}
    \label{fig:tf2}
\end{figure}


\begin{figure}[!hbtp]
    \centering
    \input{./tex/fsm_three.tex}
    \caption{TF3: an 8 state finite state machine.}
    \label{fig:tf3}
\end{figure}

As opposed to the previously described strategies, these strategies were trained
with the objective function of \textbf{mean fixation probabilities for Moran
processes} starting at initial population states consisting of \(N/2\)
individuals of the training candidates and \(N/2\) individuals of an opponent
strategy, taken from a selection of 150 opponents from the Axelrod library:

\begin{itemize}
	\item TF1 \(N=12\), 0\% noise.
	\item TF2 \(N=10\), 0\% noise.
	\item TF3 \(N=8\), 1\% noise.
\end{itemize}

Each matchup of players was run to fixation to estimate the absorption
probabilities. The trained algorithms
were run for fewer than 50 generations. Training data for this is available at
\cite{data}.

TF3 cooperates and defects with
various cycles depending on the opponent's actions. TF3 will mutually
cooperate with any strategy and only tolerates a few defections before
defecting for the rest of match. It is similar to but not exactly the same as
Fool Me Once, a strategy that cooperates until the opponent has defected twice
(not necessarily consecutively), and defects indefinitely thereafter. Though a
product of training with a Moran objective, it differs from TF1 and TF2
in that it lacks a handshake mechanism. Figure~\ref{fig:tf3} shows
all 8 states of the strategy produced by the training process (states 3 and 8
are not reachable).

TF2 always starts with CD and will defect against opponents that start with
DD\@. It plays CDD against itself and then cooperates thereafter; Fortress3 and
Fortress4 also use a similar handshake and cooperate with TF2. Cooperation
can be rescued after a failed handshake by a complex sequence of plays
which sometimes results in mutual cooperation with Firm but Fair, Grofman, and
GTFT, and a few others with low probability. TF2 defects against all other
players in the study, barring
unusual cases arising from particular randomizations. Figure~\ref{fig:tf2} shows
all 16 states of the strategy (states 6 and 7 are not reachable).

TF1 has an initial handshake of CCD and cooperates if the opponent matches.
However if the opponent later defects, TF1 will respond in kind, so the
handshake is not permanent. Only one player (Prober 4 \cite{Prison1998}) manages to
achieve cooperation with TF1 after about 20 rounds of play. TF1 is functionally
very similar to a strategy known as ``Collective Strategy'', which has a
handshake of CD and cooperates with opponents that matched the handshake
until they defect, defecting thereafter if the opponent ever defects \cite{Li2009}.
This strategy was specifically designed for evolutionary processes.

For both TF1 and TF2 a handshake
mechanism naturally emerges from the structure of the underlying finite state
machine. This behavior is an outcome of the evolutionary process and is in no
way hard-coded or included via an additional mechanism.

\begin{table}[!hbtp]
    \centering
        \input{./tex/memory_depth_count.tex}
        \caption{Memory depth}
        \label{tbl:memory_depth_count}
\end{table}

\subsection{Data collection}\label{sec:data_collection}

Each strategy pair is run
with starting population distributions of $(1, N-1)$, $(N/2, N/2)$ and $(N-1 ,
1)$, for \(N\) from 2 through 14. The fixation probability is then empirically
computed for each combination of starting distribution and value of \(N\).  The
Axelrod library can carry out exact simulations of the Moran process. Since some
of the strategies have a high computational cost or are stochastic, samples are
taken from a large number of match outcomes for the pairs of players for use in
computing fitnesses in the Moran process. This approach was verified to agree
with unsampled calculations to a high degree of accuracy in specific cases.
This is described in Algorithms~\ref{alg:data_collection}
and~\ref{alg:moran_process}.

\begin{algorithm}[!hbtp]
        \caption{Data Collection}
        \label{alg:data_collection}
        \input{./tex/data_collection.tex}
\end{algorithm}

\begin{algorithm}[!hbtp]
        \caption{Moran process}
        \label{alg:moran_process}
        \input{./tex/moran_process_simulation.tex}
\end{algorithm}

Section~\ref{sec:validation} will further validate the methodology by comparing
simulated results to analytical results in some cases. The main results of this
manuscript are presented in Section~\ref{sec:empirical_results} which will
present a detailed analysis of all the data generated. Finally,
Section~\ref{sec:conclusion} will conclude and offer future avenues for the work
presented here.


\section{Validation}\label{sec:validation}

As described in \cite{Nowak} consider the payoff matrix:

\begin{equation}\label{equ:payoff_matrix}
    M = \begin{pmatrix}
        a, b\\
        c, d
        \end{pmatrix}
\end{equation}

The expected payoffs of \(i\) players of the first type in a population with \(N
- i\) players of the second type are given by:

\begin{equation}\label{equ:expected_payoff_one}
    f_i = \frac{a(i - 1) + b(N - i)}{N - 1}
\end{equation}

\begin{equation}\label{equ:expected_payoff_two}
    g_i = \frac{ci + d(N - i - 1)}{N - 1}
\end{equation}

The transitions within the birth death process that underpins the Moran process
are then given by:

\begin{align}
    p_{i, i+1}&= \frac{if_i}{if_i+(N-i)g_i}\frac{N-i}{N}\label{equ:p_up}\\
    p_{i, i-1}&= \frac{(N-i)g_i}{if_i+(N-i)g_i}\frac{i}{N}\label{equ:p_down}\\
    p_{ii} &= 1 - p_{i, i+1} - p_{i, i-1}\label{equ:p_stay}
\end{align}

Using this it is a known result \cite{Nowak2017} that the fixation probability
of the first strategy in a population of \(i\) individuals of the first type
and \(N-i\) individuals of the second:

\begin{equation}\label{equ:fixation_probability}
x_i = \frac{1 + \sum_{j=1}^{i-1}\prod_{k=1}^{j}\gamma_j}{1 + \sum_{j=1}^{N-1}
      \prod_{k=1}^{j}\gamma_j}
\end{equation}

where:

\[
\gamma_j = \frac{p_{j, j-1}}{p_{j, j+1}}
\]

A neutral strategy will have fixation probability $x_i = i/N$.

Comparisons of \(x_1, x_{N/2}, x_{N-1}\) are shown in
Figure~\ref{fig:comparison_deterministic}. The points represent the simulated
values and the line shows the theoretical value. Note that these are all
deterministic strategies and show a perfect match between the expected value
of (\ref{equ:fixation_probability}) and the actual Moran process for all
strategy pairs. Figure~\ref{fig:comparison_stochastic} shows the fixation probabilities for
stochastic strategies. These are no longer a good match which highlights the
weakness of assuming a given interaction between two IPD strategies can be
summarised with a set of utilities as shown in
(\ref{equ:payoff_matrix}). For any given pair of strategies it is possible to
obtain \(p_{i,i-1}, p_{i,i+1}, p_{ii}\) exactly (as opposed to the
approximations offered by (\ref{equ:p_up}), (\ref{equ:p_down}) and
(\ref{equ:p_stay})). Obtaining these requires particular analysis for a given
pair and can be quite a complex endeavour for stochastic strategies with long
memory: this is not necessary for the purposes of this work.
All data generated for this validation exercise can be found
at \cite{data}.

\begin{figure}[!hbtp]
    \centering
    \begin{subfigure}[t]{.3\textwidth}
        \centering
        \includegraphics[width=.8\textwidth]{./img/Alternator_v_Cooperator.pdf}
        \caption{Alternator and Cooperator}
    \end{subfigure}%
    ~
    \begin{subfigure}[t]{.3\textwidth}
        \centering
        \includegraphics[width=.8\textwidth]{./img/Defector_v_Alternator.pdf}
        \caption{Defector and Alternator}
    \end{subfigure}%
    ~
    \begin{subfigure}[t]{.3\textwidth}
        \centering
        \includegraphics[width=.8\textwidth]{./img/Alternator_v_TfT.pdf}
        \caption{Alternator and Tit For Tat}
    \end{subfigure}%

    \begin{subfigure}[t]{.3\textwidth}
        \centering
        \includegraphics[width=.8\textwidth]{./img/Cooperator_v_TfT.pdf}
        \caption{Cooperator and Tit For Tat}
    \end{subfigure}%
    ~
    \begin{subfigure}[t]{.3\textwidth}
        \centering
        \includegraphics[width=.8\textwidth]{./img/Defector_v_Cooperator.pdf}
        \caption{Defector and Cooperator}
    \end{subfigure}%
    ~
    \begin{subfigure}[t]{.3\textwidth}
        \centering
        \includegraphics[width=.8\textwidth]{./img/Defector_v_TfT.pdf}
        \caption{Defector and Tit For Tat}
    \end{subfigure}%

    \begin{subfigure}[t]{.3\textwidth}
        \centering
        \includegraphics[width=.8\textwidth]{./img/WSLS_v_TfT.pdf}
        \caption{Win Stay Lose Shift and Tit For Tat}
    \end{subfigure}%
    ~
    \begin{subfigure}[t]{.3\textwidth}
        \centering
        \includegraphics[width=.8\textwidth]{./img/Alternator_v_WSLS.pdf}
        \caption{Alternator and Win Stay Lose Shift}
    \end{subfigure}%
    ~
    \begin{subfigure}[t]{.3\textwidth}
        \centering
        \includegraphics[width=.8\textwidth]{./img/Defector_v_WSLS.pdf}
        \caption{Defector and Win Stay Lose Shift}
    \end{subfigure}%
    \caption{Comparison of theoretic and actual Moran Process fixation
             probabilities for \textbf{deterministic} strategies.}
    \label{fig:comparison_deterministic}
\end{figure}

\begin{figure}[!hbtp]
    \centering
    \begin{subfigure}[t]{.3\textwidth}
        \centering
        \includegraphics[width=.8\textwidth]{./img/Random_v_Cooperator.pdf}
        \caption{Random and Cooperator}
    \end{subfigure}%
    ~
    \begin{subfigure}[t]{.3\textwidth}
        \centering
        \includegraphics[width=.8\textwidth]{./img/Random_v_Defector.pdf}
        \caption{Random and Defector}
    \end{subfigure}%
    ~
    \begin{subfigure}[t]{.3\textwidth}
        \centering
        \includegraphics[width=.8\textwidth]{./img/Random_v_TfT.pdf}
        \caption{Random and Tit For Tat}
    \end{subfigure}%

    \begin{subfigure}[t]{.3\textwidth}
        \centering
        \includegraphics[width=.8\textwidth]{./img/ALLCorALLD_v_Cooperator.pdf}
        \caption{All C or all D and Cooperator}
    \end{subfigure}%
    ~
    \begin{subfigure}[t]{.3\textwidth}
        \centering
        \includegraphics[width=.8\textwidth]{./img/ALLCorALLD_v_Defector.pdf}
        \caption{All C or all D and Defector}
    \end{subfigure}%
    ~
    \begin{subfigure}[t]{.3\textwidth}
        \centering
        \includegraphics[width=.8\textwidth]{./img/Calculator_v_Random.pdf}
        \caption{Calculator and Random}
    \end{subfigure}%

    \begin{subfigure}[t]{.3\textwidth}
        \centering
        \includegraphics[width=.8\textwidth]{./img/Calculator_v_ALLCorALLD.pdf}
        \caption{Calculator and All C or all D}
    \end{subfigure}%
    ~
    \begin{subfigure}[t]{.3\textwidth}
        \centering
        \includegraphics[width=.8\textwidth]{./img/Calculator_v_Arrogant_QLearner.pdf}
        \caption{Calculator and Arrogant Q learner}
    \end{subfigure}%
    ~
    \begin{subfigure}[t]{.3\textwidth}
        \centering
        \includegraphics[width=.8\textwidth]{./img/ALLCorALLD_v_TfT.pdf}
        \caption{All C or all D and Tit For Tat}
    \end{subfigure}%
    \caption{Comparison of theoretic and actual Moran Process
             fixation probabilities for \textbf{stochastic} strategies.}
    \label{fig:comparison_stochastic}
\end{figure}

\section{Empirical results}\label{sec:empirical_results}

This section outlines the data analysis carried out, all data for this study is
available at~\cite{data}:

\begin{itemize}
    \item Section~\ref{sec:two_individuals} considers the specific case of
        \(N=2\).
    \item Section~\ref{sec:strong_invaders} investigates the effect of
        population size on the ability of a strategy to invade another
        population. This will highlight how complex strategies with long
        memories outperform simpler strategies.
    \item Section~\ref{sec:strong_resistors} similarly investigates the
        ability to defend against an invasion.
    \item Section~\ref{sec:population_size} investigates the relationship
        between performance for differing population sizes as well as
        taking a close look at zero determinant strategies \cite{Press2012}.
\end{itemize}

\subsection{The special case of \(N=2\)}\label{sec:two_individuals}

When $N=2$ the Moran process is effectively a measure of the distribution of relative
mean payoffs over all possible matches between two players. The strategy
that scores higher than the other more often will fixate more often. For \(N=2\)
the two cases of \(x_1\) and \(x_{N-1}\) coincide, but will be
considered separately for larger \(N\) in sections~\ref{sec:strong_invaders}
and~\ref{sec:strong_resistors}. Figure~\ref{fig:boxplot_2} shows all fixation
probabilities for the strategies considered. The top 16 (10\%) strategies are
shown in
Table~\ref{tbl:summary_top_2}. The top five ranking strategies are:

\begin{enumerate}
    \item The top strategy is the Collective Strategy (CS) which has a simple
        handshake mechanism described above.
    \item Defector: it always defects. Since it has no interactions with other
        defectors (recall that \(N=2\)), its aggressiveness is rewarded.
    \item Aggravater, which plays like Grudger (responding to any
        defections with unconditional defections throughout) however starts by
        playing 3 defections.
    \item Predator, a finite state machine described in \cite{Ashlock2006}.
    \item Handshake, a slightly less aggressive version of the Collective
        Strategy \cite{robson1989}. As long as the initial sequence is played
        then it cooperates. Thus it will do well in a population consisting of
        many members of itself just as the Collective Strategy does. The
        difference is that CS will defect after the handshake if the opponent
        defects while handshake will not.
\end{enumerate}

It is also noted that TF1, TF2 and TF3 all perform well. This is also the \(N\)
for which a zero determinant strategy does appear in the top 10\% ranking
strategies: ZD-extort-4. The performance of zero determinant strategies will be
examined more closely in Section~\ref{sec:population_size}.

\begin{figure}[!hbtp]
    \begin{subfigure}{.5\textwidth}
        \centering
        \includegraphics[height=.7\textheight]{./img/boxplot_2_invade.pdf}
        \caption{The fixation probabilities for \(N=2\)}
        \label{fig:boxplot_2}
    \end{subfigure}%
    ~
    \begin{subfigure}{.5\textwidth}
        \scriptsize
        \centering
        \input{./tex/summary_top_2_invade.tex}
        \caption{Top strategies for \(N=2\) (neutral fixation is \(p=0.5\))}
        \label{tbl:summary_top_2}
    \end{subfigure}
    \caption{Performance of strategies for \(N=2\).}
\end{figure}

As will be demonstrated in Section~\ref{sec:population_size} the results for
\(N=2\) differ from those of larger $N$. Hence these results do not concur with
the literature which suggests that zero determinant strategies should be
effective for larger population sizes, but these analyses consider stationary
behaviour, while this work runs for a fixed number of rounds. \cite{stewart2013extortion}
The stationarity assumption allows for a deterministic payoff matrix
leading to the conclusions about zero determinant strategies in the space
of memory-one strategies that do not generalize to this context.

% In the next sections close attention to
% strategies who are strong invaders/resistors is given.

\subsection{Strong Invaders}\label{sec:strong_invaders}

In this section the focus is on the ability of a mutant strategy to invade: the
probability of one individual of a given type successfully fixating in a
population of \(N - 1\) other individuals, denoted by \(x_1\).
The ranks of each strategy for all considered values of \(N\) according to mean
\(x_1\) are shown in Figure~\ref{fig:ranks_v_size_invade}.

\begin{figure}[!hbtp]
    \centering
    \includegraphics[height=.9\textheight]{./img/average_rank_vs_population_size_invade.pdf}
    \caption{\textbf{Invasion}: Ranks of all strategies according to \(x_1\) for different
    population sizes.}
    \label{fig:ranks_v_size_invade}
\end{figure}

The fixation
probabilities are shown in
Figures~\ref{fig:boxplot_3_invade},~\ref{fig:boxplot_7_invade}
and~\ref{fig:boxplot_14_invade} for \(N\in\{3, 7, 14\}\) showing the mean
fixation as well as the neutral fixation for each given scenario.

\begin{figure}[!hbtp]
    \centering
    \begin{subfigure}{.3\textwidth}
        \centering
        \includegraphics[height=.9\textheight]{./img/boxplot_3_invade.pdf}
        \caption{\(N=3\)}
        \label{fig:boxplot_3_invade}
    \end{subfigure}%
    ~
    \begin{subfigure}{.3\textwidth}
        \centering
        \includegraphics[height=.9\textheight]{./img/boxplot_7_invade.pdf}
        \caption{\(N=7\)}
        \label{fig:boxplot_7_invade}
    \end{subfigure}%
    ~
    \begin{subfigure}{.3\textwidth}
        \centering
        \includegraphics[height=.9\textheight]{./img/boxplot_14_invade.pdf}
        \caption{\(N=14\)}
        \label{fig:boxplot_14_invade}
    \end{subfigure}
    \caption{The fixation probabilities \(x_1\).}
\end{figure}

The top 16 strategies are given in Tables~\ref{tbl:top_invade}.

\begin{table}[!hbtp]
    \centering
    \scriptsize
    \begin{subfigure}[t]{.5\textwidth}
        \centering
        \input{./tex/summary_top_3_invade.tex}
        \caption{\(N=3\)}
    \end{subfigure}%
    ~
    \begin{subfigure}[t]{.5\textwidth}
        \centering
        \input{./tex/summary_top_7_invade.tex}
        \caption{\(N=7\)}
    \end{subfigure}

    \begin{subfigure}[t]{.3\textwidth}
        \centering
        \input{./tex/summary_top_14_invade.tex}
        \caption{\(N=14\)}
    \end{subfigure}
    \caption{Top invaders for \(N\in\{3, 7, 14\}\)}
    \label{tbl:top_invade}
\end{table}

It can be seen that apart from CS, none of the strategies
of Table~\ref{tbl:summary_top_2} perform well for \(N\in\{3, 7, 14\}\). The new
top performing strategies are:

\begin{itemize}
    \item Grudger (which only performs well for \(N=3\)), starts by cooperating
        but will defect if at any point the opponent has defected.
    \item MEM2, an infinite memory strategy that switches between TFT, TF2T, and
        Defector \cite{Li2014}.
    \item TF3, the finite state machine trained specifically for Moran processes
        described in Section~\ref{sec:introduction}.
    \item Prober 4, a strategy which starts with a specific 20 move sequence of
        cooperations and defections \cite{Prison1998}. This initial sequence serves
        as approximate handshake.
    \item  PSO Gambler and Evolved Lookerup 2 2 2: are strategies that make use
        of a lookup table mapping the first 2 moves of the opponent as well as
        the last 2 moves of both players to an action. The PSO gambler is a
        stochastic version of the Lookerup which maps those states to probabilities of cooperating. The Lookerup was described in \cite{Knight2016}.
    \item The Evolved ANN strategies are neural networks that map a number of
	    attributes (first move, number of cooperations, last move, etc.) to
	    an action. Both of these have been trained using an evolutionary
	    algorithm.
    \item The Evolved FSM 16 is a 16 state finite state machine trained to
        perform well in tournaments.
\end{itemize}

Only one of the above strategies is stochastic although close inspection of the
source code of PSO Gambler shows that it makes stochastic decisions rarely, and
is functionally very similar to its deterministic cousin Evolved Looker Up.
The PSO Gambler Mem1 strategy is a memory one strategy that has been trained to
maximise its utility and does perform well.
Apart from TF3, the finite state machines trained specifically for
Moran processes do not appear in the top 5, while strategies trained for
tournaments do. This is due to the nature of invasion: most of the opponents
will initially be different strategies. The next section will consider the
converse situation.

\subsection{Strong resistors}\label{sec:strong_resistors}

In addition to identifying good invaders, strategies resistant to invasion by
other strategies are identified by examining the distribution of $x_{N-1}$ for
each strategy. The ranks of each strategy for all considered values of \(N\)
according to mean \(x_{N-1}\) are shown in
Figures~\ref{fig:ranks_v_size_resist}.

\begin{figure}[!hbtp]
    \centering
    \includegraphics[height=.9\textheight]{./img/average_rank_vs_population_size_resist.pdf}
    \caption{\textbf{Resistance}: Ranks of all strategies according to \(x_{N-1}\) for different
    population sizes.}
    \label{fig:ranks_v_size_resist}
\end{figure}

The fixation probabilities are shown in
Figures~\ref{fig:boxplot_3_resist},~\ref{fig:boxplot_7_invade}
and~\ref{fig:boxplot_14_resist} for \(N\in\{3, 7, 14\}\) showing the mean
fixation as well as the neutral fixation for each given scenario.


\begin{figure}[!hbtp]
    \centering
    \begin{subfigure}{.3\textwidth}
        \centering
        \includegraphics[height=.9\textheight]{./img/boxplot_3_resist.pdf}
        \caption{\(N=3\)}
        \label{fig:boxplot_3_resist}
    \end{subfigure}%
    ~
    \begin{subfigure}{.3\textwidth}
        \centering
        \includegraphics[height=.9\textheight]{./img/boxplot_7_resist.pdf}
        \caption{\(N=7\)}
        \label{fig:boxplot_7_resist}
    \end{subfigure}%
    ~
    \begin{subfigure}{.3\textwidth}
        \centering
        \includegraphics[height=.9\textheight]{./img/boxplot_14_resist.pdf}
        \caption{\(N=14\)}
        \label{fig:boxplot_14_resist}
    \end{subfigure}
    \caption{The fixation probabilities \(x_{N-1}\).}
\end{figure}

Table~\ref{tbl:top_resist} shows the top strategies when ranked
according to \(x_{N-1}\) for \(N\in\{3, 7, 14\}\).
Once again none of the short memory strategies from
Section~\ref{sec:two_individuals} perform well for high \(N\).

\begin{table}[!hbtp]
    \scriptsize
    \centering
    \begin{subfigure}[t]{.5\textwidth}
        \centering
        \input{./tex/summary_top_3_resist.tex}
        \caption{\(N=3\)}
    \end{subfigure}%
    ~
    \begin{subfigure}[t]{.5\textwidth}
        \centering
        \input{./tex/summary_top_7_resist.tex}
        \caption{\(N=7\)}
    \end{subfigure}

    \begin{subfigure}[t]{.3\textwidth}
        \centering
        \input{./tex/summary_top_14_resist.tex}
        \caption{\(N=14\)}
    \end{subfigure}
    \caption{Top resistors for \(N\in\{3, 7, 14\}\)}
    \label{tbl:top_resist}
\end{table}

Interestingly none of these strategies is stochastic: this is explained by
the need of strategies to have a steady hand when interacting with their own
kind. Acting stochastically increases the chance of friendly fire.
However it is possible to design a strategy with a stochastic or error-correcting
handshake that is an excellent resistor even in noisy environments \cite{Lee2015}.

There are are only two new strategies that appear in the top ranks for
\(x_{N-1}\): TF1 and TF2. These two strategies are with CS the strongest
resistors. They all have handshakes, and whilst the handshakes of CS and
Handshake (which ranks highly for the smaller values of \(N\)) were
programmed, the handshakes of TF1 and TF2 evolved through an evolutionary
process without any priming.

As described in Section~\ref{sec:strong_invaders} the strategies trained with
the payoff maximizing objective are among the best invaders in the library
however they are not as resistant to invasion as the strategies trained using a
Moran objective function. These strategies include trained finite state machine
strategies, but they do not appear to have handshaking mechanisms. Therefore it
is reasonable to conclude that the objective function is the cause of the
emergence of handshaking mechanisms. More specifically, TF1 and TF2 evolved
handshakes for high invasion resistance. TF3 is a better total payoff maximizer
which makes it a better invader along with the strategies
trained to maximize total payoff since successful fitness proportionate selection
is necessary for invasion. Training with an objective with initial population
mix other than $(N/2, N/2)$ may favor invasion or resistance.

The payoff maximizing strategies typically will not defect before the opponent's
first defection, possibly because the training strategy collection contains some
strategies such as Grudger and Fool Me Once that retaliate harshly by defecting
for the remainder of the match if the opponent has more than a small number of
cumulative defections. Paradoxically it is advantageous to defect (as a signal)
in order to achieve mutual cooperation with opponents using the same strategy
but not with other opponents. Nevertheless an evolutionary process is able to
tunnel through the costs and risks associated to early defections to find more
optimal solutions, so it is not surprising in hindsight that handshaking
strategies emerge from the evolutionary training process.

A handshake requires at least one defection and there is
selective pressure to defect as few times as possible to achieve the
self-recognition mechanism. It is also unwise to defect on the first move as
some strategies additionally retaliate first round defections. So the
handshakes used by TF1, TF2, and CS are in some sense optimal.

It is evident through
Sections~\ref{sec:two_individuals},~\ref{sec:strong_invaders}
and~\ref{sec:strong_resistors} that performance of strategies not only depends
on the initial population distribution but also that there seems to be a
difference depending on whether or not \(N>2\). This will be explored further in
the next section, looking not only at \(x_1\) and \(x_{N-1}\) but also consider
\(x_{N/2}\).

\subsection{The effect of population size}\label{sec:population_size}

To complement Figures~\ref{fig:ranks_v_size_invade}
and~\ref{fig:ranks_v_size_resist}, Figure~\ref{fig:ranks_v_size_coexist} shows
the rank of each strategy based on \(x_{N/2}\).
Tables~\ref{tbl:ranks_v_size_invade},~\ref{tbl:ranks_v_size_resist}
and~\ref{tbl:ranks_v_size_coexist} show the same information for a selection of
strategies:

\begin{itemize}
    \item The strategies that ranked highly for \(N=2\);
    \item The strategies that ranked highly for \(N=14\);
    \item The zero determinant strategies.
\end{itemize}

The results for \(x_{N/2}\) show similarities to the results for \(x_{N-1}\) and
in particular TF1, TF2 and TF3 ranked one, three and eight. This is to be
expected since, as described in Section~\ref{sec:strategies} these strategies
were trained in an initial population of \((N/2, N/2)\) individuals.

For all starting populations
\(i\in\{1, N/2, N-1\}\) the ranks of strategies are relatively stable across the
different values of \(N>2\) however for \(N=2\) there is a distinct difference.
This highlights that there is little that can be inferred about the evolutionary
performance of a strategy in a large population from its performance in a small
population. This is confirmed by the performance of the zero determinant strategies: while
some do rank relatively highly for \(N=2\) (ZD-extort-4 has rank 16) this rank
does not translate to larger populations.

\begin{table}[!hbtp]
    \centering
    \scriptsize
    \input{./tex/change_of_rank_invade.tex}
    \caption{Invasion: Fixation ranks of some strategies according to \(x_1\) for different
    population sizes}
    \label{tbl:ranks_v_size_invade}
\end{table}

\begin{table}[!hbtp]
    \centering
    \scriptsize
    \input{./tex/change_of_rank_resist.tex}
    \caption{Resistance: Fixation ranks of some strategies according to \(x_{N-1}\) for different
    population sizes}
    \label{tbl:ranks_v_size_resist}
\end{table}

\begin{figure}[!hbtp]
    \centering
    \includegraphics[height=.9\textheight]{./img/average_rank_vs_population_size_coexist.pdf}
    \caption{Fixation ranks of all strategies according to \(x_{N/2}\) for different
    population sizes.}
    \label{fig:ranks_v_size_coexist}
\end{figure}

\begin{table}[!hbtp]
    \centering
    \scriptsize
    \input{./tex/change_of_rank_coexist.tex}
    \caption{Ranks of some strategies according to \(x_{N/2}\) for different
    population sizes}
    \label{tbl:ranks_v_size_coexist}
\end{table}


Figure~\ref{fig:correlation_coefficients} show the correlation coefficients
of the ranks of strategies in differing population size. How well a strategy
performs in any Moran process for \(N>2\) has
little to do with the performance for \(N=2\). This illustrates why the strong
performance of zero determinant strategies predicted in \cite{Press2012} does
not extend to larger populations. This was discussed theoretically in
\cite{Adami2013} and observed empirically in these simulations.

\begin{figure}[!htbp]
    \centering
    \begin{subfigure}[t]{.3\textwidth}
        \centering
        \includegraphics[width=.9\textwidth]{./img/correlation_heatmap_invade.pdf}
        \caption{Rank based on \(x_1\)}
    \end{subfigure}
    ~
    \begin{subfigure}[t]{.3\textwidth}
        \centering
        \includegraphics[width=.9\textwidth]{./img/correlation_heatmap_resist.pdf}
        \caption{Rank based on \(x_{N - 1}\)}
    \end{subfigure}
    ~
    \begin{subfigure}[t]{.3\textwidth}
        \centering
        \includegraphics[width=.9\textwidth]{./img/correlation_heatmap_coexist.pdf}
        \caption{Rank based on \(x_{N/2}\)}
    \end{subfigure}
    \caption{Heatmap of correlation coefficients of rankings by population size.}
    \label{fig:correlation_coefficients}
\end{figure}

\section{Discussion}

Training strategies to excel
at the Moran process leads to the evolution of cooperation, but only with like
individuals in the case of TF1 and TF2. This may have significant implications
for human social interactions such as the evolution of ingroup/outgroup mechanisms
and other sometimes costly rituals that reinforce group behavior.

While TF1 and TF2 are competent invaders, the best invaders
in the study do not appear to employ strict handshakes, and are generally
cooperative strategies. TF3, which does not use a handshake, is a better invader
than TF1 and TF2 but not as good a resistor. Nevertheless it was the result
of the same kind of training procceses and is a better combined invader-resistor
than the invaders that were trained previously to maximize payout.

The strategies trained to maximize payoff in head-to-head matches are generally
cooperative and are effective invaders.
Combined with the fact that handshaking strategies are stronger resisters,
this suggests that while maximizing individual payoff can lead to the evolution
of cooperation, these strategies are not the most evolutionarily stable
in the long run. A strategy with a handshaking mechanism is still capable of
invading and is more resistant to subsequent invasions. Moreover, the
best resistor of the payoff maximally trained strategies (Evolved Looker Up
1\_1\_1),
which always defects if the opponent defects in the first round, is effectively
employing a one-shot handshake of C. Similarly, Grudger (also known as Grim),
which emerged from training memory one strategies for the Moran process,
also effectively employs a handshake of always cooperating, as it defects
for the remainder of the match if the opponent ever defects.

The insights that payoff maximizers are better invaders and that handshakers
are better resistors suggests that a strategy
aware of the population distribution could choose to become a handshaker at
a critical threshold and use a strategy better for invasion when in the
minority. Information about the population distribution was not available
to our strategies. Previous work has showed that strategies able to retain
memory across matches can infer the population distribution and act in such
a manner, resulting in a strategy effective at invasion and resistance
\cite{Lee2015}.

We did not attempt other objective functions that may serve to select for both
invasion and resistance better than training at a starting population of
$(N/2, N/2)$. Nevertheless our results suggest that there is not much room for
improvement. Any handshake more sophisticated than always cooperate necessarily involves
a defection. (A strategy with a handshake consisting of a long sequence of cooperations is
effectively a grudger.) For TF3 or EvolvedLookerUp1\_1\_1 to become better resistors
they need a longer or more strict handshake. But if this handshake involves
a defection then likely the invasion ability is diminished for $N > 2$: the top
invaders for larger $N$ are nice strategies that do not defect before their
opponents. This is because good invaders need to maximize match payoff to benefit
from fitness proportionate selection,
and so in the absence of a handshake mechanism, knowledge of the population
distribution, or some identifying label on the opponent,
a strategy must be generally cooperative. Aggressive strategies
are only effective invaders for the smallest $N$, dropping dramatically in rank
as the population size increases.

We did, however, attempt to evolve CS using finite state machines and lookup table
based players,
which resulted in some very similar strategies. In particular we evolved a
lookup strategy that had a handshake of DC and played TFT with other players
after a correct handshake while defecting otherwise, which is quite close in
function to CS (full grudging is not possible with a lookup table of limited
depth).

Finally we note that it may be possible to achieve similar results with smaller
capacity finite state machine players.

\section{Conclusion}\label{sec:conclusion}

A detailed empirical analysis of 164 strategies of the IPD within a pairwise
Moran process has been carried out. All \(\binom{164}{2}=13,366\) possible
ordered pairs of strategies have been placed in a Moran process with different
starting values allowing the each strategy to attempt to invade the other.
This is the largest such experiment carried out and has led to many insights.

When studying evolutionary processes it is vital to consider \(N>2\) since
results for \(N=2\) cannot be used to extrapolate performance in larger
populations. This was shown both observationally in
Sections~\ref{sec:strong_invaders} and~\ref{sec:strong_resistors} but also by
considering the correlation of the ranks in different population sizes in
Section~\ref{sec:population_size}.

Memory one strategies do not perform as well as longer memory strategies in general
in this study. Several longer memory strategies were high performers for invasion,
particularly the strategies which have been trained using a number of reinforcement
learning algorithms. Interestingly they have been trained to perform well in
tournaments and not Moran processes specifically. In some cases these strategies
utilize all the history of play (the neural network strategies and the lookup
table strategies, the latter using the first round and some number of trailing rounds).

There are no memory one strategies in the top 5 performing strategies
for \(N>3\). Training memory-one strategies specifically for the Moran process
typically led to Grudger / Grim, a memory-one strategy with
four-vector (1, 0, 0, 0). It appears to be the best resistor of the memory-one strategies.
The highest performing memory-one strategy for invasion is PSO Gambler Mem 1,
training to maximize total payout, which has four-vector $(1, 0.52173487, 0, 0.12050939)$.
For comparison, training for maximum score difference between the player and the
opponent resulted in a strategy nearly the same as Grudger, with four-vector
$(0.9459, 0, 0, 0)$ (not included in the study).


One of the major findings discussed in Section~\ref{sec:strong_resistors}, is
the ability of strategies with a handshake mechanism to resist invasion. This
was not only revealed for CS (a human designed strategy) but also for
two FSM strategies (TF1 and TF2) specifically trained through an evolutionary
process. In these two cases, the handshake mechanism was a product of the
evolutionary process. Figure~\ref{fig:cooperation_rates} shows the cooperation
rate of TF1, TF2, TF3 and CS
for each round of a match against all the opponents in this study. While TF3
does not have a strict handshake mechanism it is clear that all these strategies
start a match by cooperating. It is then evident that TF3 cooperates more than
the other strategies thus explaining the difference in performance. It is also
clear that CS only cooperates with itself and Handshake: it is a very aggressive
strategy.

\begin{figure}[!hbtp]
    \centering
    \begin{subfigure}[t]{.5\textwidth}
        \centering
        \includegraphics[width=\textwidth]{./img/cooperation_0_0_10000_TF1_array.pdf}
        \caption{TF1}
    \end{subfigure}%
    ~
    \begin{subfigure}[t]{.5\textwidth}
        \centering
        \includegraphics[width=\textwidth]{./img/cooperation_0_0_10000_TF2_array.pdf}
        \caption{TF2}
    \end{subfigure}

    \begin{subfigure}[t]{.5\textwidth}
        \centering
        \includegraphics[width=\textwidth]{./img/cooperation_0_0_10000_TF3_array.pdf}
        \caption{TF3}
    \end{subfigure}%
    ~
    \begin{subfigure}[t]{.5\textwidth}
        \centering
        \includegraphics[width=\textwidth]{./img/cooperation_0_0_10000_CS_array.pdf}
        \caption{CS}
    \end{subfigure}
    \caption{Cooperation rate per round (over 10000 repetitions).}
    \label{fig:cooperation_rates}
\end{figure}

These findings are important for the ongoing understanding of
population dynamics and offer evidence for some of the shortcomings of low
memory which has started to be recognised by the community \cite{Hilbe2017}.

All source code for this work has been written in a sustainable manner: it is
open source, under version control and tested which ensures that all results can
be reproduced \cite{Prlic2012, Sandve2013, Wilson2014}. The raw data as well as
% TODO Add reference to source code
the processed data has also been properly archived and can be found
at \cite{data}.

There are many opportunities to build on this work. In particular, an analysis
of the effect of noise should offer insights regarding the stability of the findings,
particularly for the handshaking strategies. They may be less dominant for
larger amounts of noise since the handshaking mechanisms may become brittle.
There are many other variations to explore including populations with more
than one type, spatial structure, and mutation.

\section*{Acknowledgements}

This work was performed using the computational facilities of the Advanced
Research Computing @ Cardiff (ARCCA) Division, Cardiff University.

A variety of software libraries have been used in this work:

\begin{itemize}
    \item The Axelrod library (IPD strategies and Moran processes)
        \cite{axelrodproject}.
    \item The matplotlib library (visualisation) \cite{hunter2007matplotlib}.
    \item The pandas and numpy libraries (data manipulation)
        \cite{mckinney2010data, walt2011numpy}.
\end{itemize}

\newpage
\bibliographystyle{plain}
\bibliography{./tex/bibliography.bib}

\appendix

\section{List of players}\label{app:list_of_players}

\begin{multicols}{2}
	\begin{enumerate}
		\input{./tex/list_of_players.tex}
	\end{enumerate}
\end{multicols}

\end{document}