# Grid'5000 user report for Olivier Teytaud

## User information

Olivier Teytaud (users, user, bordeaux, ml-users user)*More user information in the user management interface.*

## Experiments

**Parallelization of Monte-Carlo planning (Other) []**

*Description:*Monte-Carlo planning works but has no straightforward parallelization. We work on the parallel approximation of the sequential algorithm. Things become complicated beyond 4 nodes; therefore, we are mainly interested in 8-cores machines.*Results:*speed-up measurement for both SMP and MPI**random patterns (Programming) [in progress]**

*Description:*The goal is to find one pattern which is created randomly and improves a player of go. The final goal is that the player can improve itself by testing different patterns and incorporating these "good" patterns without human intervention. The algorithm is called RBGP (Racing Based Genetic Programming)

RBGP algorithm.

S = S0 = some initial set of mutations.

while S = ∅ do

Select s ∈ S

// the selection rule is not specified here

// (the result is independent of it)

Let n be the number of simulations of mutation s.

Simulate s n more times (i.e. now s has been simulated 2n times).

//this ensures nbT ests(s) = O(log(n(s)))

computeBounds(s)

if lb(s) > 0.501 then

Accept s; exit the program.

else if ub(s) < 0.504 then

S ← S \ {s}

end if

end while

(In our experiments, a mutation is a random pattern)

By race, it develops patterns which seems the most promising. The player with one tested pattern plays against the same player without this tested pattern. Thanks to results of all games, we compute an interval of confidence by using Hoeffding. A pattern will be validated if the inferior boundary of Hoeffding will be bigger than 50.1%. The experiment stops when the best chosen pattern is validated or the superior boundary of Hoeffding is inferior than 50.4%.

When one random pattern has been validated or all random patterns have been rejected, a new iteration begins. (complete RBGP Algorithm)

The complete RBGP algorithm

Let P be the current agent.

Let λ the number of agents.

for a given number of iterations do

Create randomly a population pop of λ agents by mutation of P

Apply RBGP BernsteinRace(pop, δ)

if An agent P ′ ∈ pop has been accepted then

P ′ becomes the current baseline: P ← P ′ .

else

The current baseline is not changed.

end if

end for

(In our experiment, an agent is the program P + a mutation).

This experiment is launched in besteffort.

For the moment, the experiment turns on NoGo (This a variant of Go where the first player who captures one or several stones has lost). The goal is to compare for different value of λ the speed of the progression.*Results:*With NoGo

The baseline is NoGo without random patterns.

λ = 1 - Site Orsay

45 random patterns found (STOPPED)

Against the baseline 69.63% +/-0.3

λ = 2 - Site Lyon

28 random patterns found (STOPPED)

Against the baseline 65.71% +/-0.3

λ = 16 - Site Lyon

25 random patterns found (IN PROGRESS)

Against the baseline 65.11% +/-0.3

λ = 128 - Site Orsay

3 random patterns found (IN PROGRESS)

Against the baseline 54.12% +/-0.3

Old results with MoGo :

one "random" pattern has been found on a run of 20 random patterns in 9x9 go.

Individu 20 : SUCCESS

O!X! 33528 - 32064 [ 50.73% ; 51.11% ; 51.49% ]

Hoeffding: 50139 --- 52092

4 randoms patterns (P1,P2,P3 and P4) have been validated in 9x9

9x9 - 10000 simulations per move

MoGoCVS+P1 against MoGoCVS 66707 - 64498 [ 50.57% ; 50.84% ; 51.11% ]

MoGoCVS+P1+P2 against MoGoCVS+P1 33600 - 32027 [ 50.81% ; 51.19% ; 51.58% ]

MoGoCVS+P1+P2 against MoGoCVS 52018 - 48186 [ 51.6% ; 51.91% ; 52.22% ]

MoGoCVS+P1+P2+P3 against MoGoCVS+P1+P2 33432 - 32253 [ 50.51% ; 50.89% ; 51.27% ]

MoGoCVS+P1+P2+P3 against MoGoCVS 52634 - 47400 [ 52.3% ; 52.61% ; 52.92% ]

MoGoCVS+P1+P2+P3+P4 against MoGoCVS+P1+P2+P3 66396 - 64749 [ 50.35% ; 50.62% ; 50.89% ]

MoGoCVS+P1+P2+P3+P4 against MoGoCVS 53569 - 46622 [ 53.15% ; 53.46% ; 53.77% ]

**Parallel coevolution (Programming) [in progress]**

*Description:*Fictitious play (aka coevolution) is a great tool for robust optimization, games, and approximate high scale linear programming. We test it on various problems (with or without noise in the fitness) and in particular check both theoretically and experimentally its parallelization. We also test the impact of B. Kummer's modifications and others in the parallel setting (the parallelization is far less straightforward).*Results:*We're at the beginning. Very positive results in terms of interaction with experts. Not yet clear results for sparse problems in the parallel case. The figure shows an application; the very good opening in the shown game was built by a parallel coevolution. However, it was built by the version before we started big besteffort campaigns. I'll try to take time for posting nicer things soon.

## Publications

**On the Parallelization of Monte-Carlo planning [2008]***(national)**EntryType:*inproceedings *Author:*{G}elly, {S}ylvain and {H}oock, {J}ean-{B}aptiste and {R}immel, {A}rpad and {T}eytaud, {O}livier and {K}alemkarian, {Y}ann *Abstract:*{W}e provide a parallelization with and without shared-memory for {B}andit-{B}ased {M}onte-{C}arlo {P}lanning algorithms, applied to the game of {G}o. {T}he resulting algorithm won the first non-blitz game against a professionnal human player in 9x9 {G}o. *Keywords:*{P}arallelization;{M}onte-{C}arlo {P}lanning;{B}andits *Language:*{A}nglais *Affiliation:*{TAO} - {INRIA} {F}uturs - {INRIA} - {CNRS} : {UMR}8623 - {U}niversit{\'e} {P}aris {S}ud - {P}aris {XI} - {L}aboratoire de {R}echerche en {I}nformatique - {LRI} - {CNRS} : {UMR}8623 - {U}niversit{\'e} {P}aris {S}ud - {P}aris {XI} - {TAO} - {INRIA} {S}aclay - {I}le de {F}rance - {INRIA} - {CNRS} : {UMR}8623 - {U}niversit{\'e} {P}aris {S}ud - {P}aris {XI} - {B}ull - soci{\'e}t{\'e} {B}ull *Booktitle:*{ICINCO} *Address:*{M}adeira {P}ortugal *Audience:*internationale *Url:*http://hal.inria.fr/inria-00287867/en/ **Log(lambda) Modifications for Optimal Parallelism [2010]***(international)**EntryType:*inproceedings *Hal_id:*inria-00495087 *Url:*http://hal.inria.fr/inria-00495087/PDF/autoparacnf.pdf *Author:*{T}eytaud, {F}abien and {T}eytaud, {O}livier *Abstract:*{I}t is usually considered that evolutionary algorithms are highly parallel. {I}n fact, the theoretical speed-ups for parallel optimization are far better than empirical results; this suggests that evolutionary algorithms, for large numbers of processors, are not so efficient. {I}n this paper, we show that in many cases automatic parallelization provably provides better results than the standard parallelization consisting of simply increasing the population size lambda. {A} corollary of these results is that logarithmic bounds on the speed-up (as a function of the number of computing units) are tight within constant factors. {I}mportantly, we propose a simple modification, termed log(lambda)-correction, which strongly improves several important algorithms when lambda is large. *Language:*{E}nglish *Affiliation:*{TAO} - {INRIA} {S}aclay - {I}le de {F}rance - {INRIA} - {CNRS} : {UMR}8623 - {U}niversit{\'e} {P}aris {S}ud - {P}aris {XI} - {L}aboratoire de {R}echerche en {I}nformatique - {LRI} - {CNRS} : {UMR}8623 - {U}niversit{\'e} {P}aris {S}ud - {P}aris {XI} *Booktitle:*{P}arallel {P}roblem {S}olving {F}rom {N}ature *Address:*{K}rakow {P}oland *Audience:*international *Day:*15 *Month:*09 **On the parallel speed-up of Estimation of Multivariate Normal Algorithm and Evolution Strategies [2009]***(international)**EntryType:*inproceedings *Hal_id:*inria-00369781 *Url:*http://hal.inria.fr/inria-00369781/PDF/lambdaLarge.pdf *Author:*{T}eytaud, {F}abien and {T}eytaud, {O}livier *Abstract:*{M}otivated by parallel optimization, we experiment {EDA}-like adaptation-rules in the case of $\lambda$ large. {T}he rule we use, essentially based on estimation of multivariate normal algorithm, is (i) compliant with all families of distributions for which a density estimation algorithm exists (ii) simple (iii) parameter-free (iv) better than current rules in this framework of $\lambda$ large. {T}he speed-up as a function of $\lambda$ is consistent with theoretical bounds. *Language:*{E}nglish *Affiliation:*{I}nstitut {N}ational de la {R}echerche en {I}nformatique et en {A}utomatique - {INRIA} {FUTURS} - {INRIA} - {UFR} {S}ciences - {U}niversit{\'e} {P}aris-{S}ud {XI} - {U}niversit{\'e} {P}aris {S}ud - {P}aris {XI} - {TAO} - {INRIA} {F}uturs - {INRIA} - {CNRS} : {UMR}8623 - {U}niversit{\'e} {P}aris {S}ud - {P}aris {XI} - {L}aboratoire de {R}echerche en {I}nformatique - {LRI} - {CNRS} : {UMR}8623 - {U}niversit{\'e} {P}aris {S}ud - {P}aris {XI} - {TAO} - {INRIA} {S}aclay - {I}le de {F}rance - {INRIA} - {CNRS} : {UMR}8623 - {U}niversit{\'e} {P}aris {S}ud - {P}aris {XI} *Booktitle:*{E}vo{N}um (evostar workshop) {P}roceedings of {E}vo{S}tar workshop 2009 *Publisher:*springer *Address:*{T}uebingen {G}ermany *Volume:*{E}vo{N}um *Note:*{G}.: {M}athematics of {C}omputing/{G}.1: {NUMERICAL} {ANALYSIS}/{G}.1.3: {N}umerical {L}inear {A}lgebra, {G}.: {M}athematics of {C}omputing/{G}.3: {PROBABILITY} {AND} {STATISTICS} *Audience:*international **The Computational Intelligence of MoGo Revealed in Taiwan's Computer Go Tournaments [2009]***(international)**EntryType:*article *Hal_id:*inria-00369786 *Url:*http://hal.inria.fr/inria-00369786/PDF/TCIAIG-2008-0010_Accepted_.pdf *Author:*{L}ee, {C}hang-{S}hing and {W}ang, {M}ei-{H}ui and {C}haslot, {G}uillaume and {H}oock, {J}ean-{B}aptiste and {R}immel, {A}rpad and {T}eytaud, {O}livier and {T}sai, {S}hang-{R}ong and {H}su, {S}hun-{C}hin and {H}ong, {T}zung-{P}ei *Abstract:*{I}n order to promote computer {G}o and stimulate further development and research in the field, the event activities, “{C}omputational {I}ntelligence {F}orum” and “{W}orld 99 {C}omputer {G}o {C}hampionship,” were held in {T}aiwan. {T}his study focuses on the invited games played in the tournament, “{T}aiwanese {G}o players versus the computer program {M}o{G}o,” held at {N}ational {U}niversity of {T}ainan ({NUTN}). {S}everal {T}aiwanese {G}o players, including one 9-{D}an professional {G}o player and eight amateur {G}o players, were invited by {NUTN} to play against {M}o{G}o from {A}ugust 26 to {O}ctober 4, 2008. {T}he {M}o{G}o program combines {A}ll {M}oves {A}s {F}irst ({AMAF})/{R}apid {A}ction {V}alue {E}stimation ({RAVE}) values, online “{UCT}-like” values, offline values extracted from databases, and expert rules. {A}dditionally, four properties of {M}o{G}o are analyzed including: (1) the weakness in corners, (2) the scaling over time, (3) the behavior in handicap games, and (4) the main strength of {M}o{G}o in contact fights. {T}he results reveal that {M}o{G}o can reach the level of 3 {D}an with, (1) good skills for fights, (2) weaknesses in corners, in particular for “semeai” situations, and (3) weaknesses in favorable situations such as handicap games. {I}t is hoped that the advances in artificial intelligence and computational power will enable considerable progress in the field of computer {G}o, with the aim of achieving the same levels as computer chess or {C}hinese chess in the future. *Language:*{A}nglais *Affiliation:*{N}ational {U}niversity of {T}ainan - {NUTN} - {NUTN} - maastricht university - univ. {M}aastricht - {TAO} - {INRIA} {S}aclay - {I}le de {F}rance - {INRIA} - {CNRS} : {UMR}8623 - {U}niversit{\'e} {P}aris {S}ud - {P}aris {XI} - {TAO} - {INRIA} {F}uturs - {INRIA} - {CNRS} : {UMR}8623 - {U}niversit{\'e} {P}aris {S}ud - {P}aris {XI} - {L}aboratoire de {R}echerche en {I}nformatique - {LRI} - {CNRS} : {UMR}8623 - {U}niversit{\'e} {P}aris {S}ud - {P}aris {XI} - {D}ept. of {I}nformation {M}anagement - {CJCU} - {D}ept. of {C}omputer {S}cience and {I}nformation {E}ngineering - {K}aoshiung university *Publisher:*{IEEE} *Journal:*{IEEE} {T}ransactions on {C}omputational {I}ntelligence and {AI} in games *Audience:*internationale **Why one must use reweighting in Estimation Of Distribution Algorithms [2009]***(international)**EntryType:*inproceedings *Hal_id:*inria-00369780 *Url:*http://hal.inria.fr/inria-00369780/PDF/weighted.pdf *Author:*{T}eytaud, {F}abien and {T}eytaud, {O}livier *Abstract:*{W}e study the update of the distribution in {E}stimation of {D}istribution {A}lgorithms, and show that a simple modification leads to unbiased estimates of the optimum. {T}he simple modification (based on a proper reweighting of estimates) leads to a strongly improved behavior in front of premature convergence. *Language:*{E}nglish *Affiliation:*{I}nstitut {N}ational de la {R}echerche en {I}nformatique et en {A}utomatique - {INRIA} {FUTURS} - {INRIA} - {UFR} {S}ciences - {U}niversit{\'e} {P}aris-{S}ud {XI} - {U}niversit{\'e} {P}aris {S}ud - {P}aris {XI} - {TAO} - {INRIA} {F}uturs - {INRIA} - {CNRS} : {UMR}8623 - {U}niversit{\'e} {P}aris {S}ud - {P}aris {XI} - {L}aboratoire de {R}echerche en {I}nformatique - {LRI} - {CNRS} : {UMR}8623 - {U}niversit{\'e} {P}aris {S}ud - {P}aris {XI} - {TAO} - {INRIA} {S}aclay - {I}le de {F}rance - {INRIA} - {CNRS} : {UMR}8623 - {U}niversit{\'e} {P}aris {S}ud - {P}aris {XI} *Booktitle:*{GECCO} *Address:*{M}ontr{\'e}al {C}anada *Note:*{G}.: {M}athematics of {C}omputing/{G}.1: {NUMERICAL} {ANALYSIS}/{G}.1.6: {O}ptimization, {G}.: {M}athematics of {C}omputing/{G}.3: {PROBABILITY} {AND} {STATISTICS} *Audience:*international **Introduction de connaissances expertes en Bandit-Based Monte-Carlo Planning avec application au Computer-Go [2008]***(international)**EntryType:*inproceedings *Author:*{C}hatriot, {L}ouis and {G}elly, {S}ylvain and {H}oock, {J}ean-{B}aptiste and {P}{\'e}rez, {J}ulien and {R}immel, {A}rpad and {T}eytaud, {O}livier *Abstract:*{N}ous ajoutons diff{\'e}rentes astuces d'expertise {G}o dans un programmation de planification {M}onte-{C}arlo {\`a} partir de bandits, via des simulations virtuelles ajout{\'e}es aux statistiques de bandits. *Language:*{F}ran{\c{c}}ais *Affiliation:*{TAO} - {INRIA} {S}aclay - {I}le de {F}rance - {INRIA} - {CNRS} : {UMR}8623 - {U}niversit{\'e} {P}aris {S}ud - {P}aris {XI} - {L}aboratoire de {R}echerche en {I}nformatique - {LRI} - {CNRS} : {UMR}8623 - {U}niversit{\'e} {P}aris {S}ud - {P}aris {XI} *Booktitle:*{JFPDA} *Address:*{M}etz {F}rance *Audience:*nationale *Url:*http://hal.inria.fr/inria-00287883/en/

## Collaborations

## Success stories and benefits from Grid'5000

**Success stories***First ever win against a professional player in 19x19 Go (http://www.cs.unimaas.nl/g.chaslot/muyungwan-mogo/). (2008)
*First ever win as black (disadvantageous side) in 9x9 Go komi 7.5 against a top pro (2009).
*First ever (and still only) win with H7 against a top pro.
*First ever win with H6 against a pro.
*ChessBase award 2009: we got the chessBase award for the biggest contribution to computer games in 2009. (several people in Grid5000, collectively)**Overall benefits**(i) Plenty of tuning.
(ii) High-scale parallelization of our program.
(iii) Coevolution on a grid (not only on a cluster)

*last update: 2010-09-22 15:45:08*