Superior Numbers: Pokemon Type Analysis in R

Lately I've had a fascination with Pokemon. It's a game full of numbers and those numbers call for analysis. As I've been playing White 2 in preparation for the release of X/Y in October, I started to wonder what type pairings of the little monsters would have good resistances. All I really know is that my current Ferrothorn is pretty cool and that Spiritomb supposedly has no weaknesses, but I wanted to know more. Specifically, what type combos has the most resistances and immunities. Let's see how R might help us figure this out.

Luckily, a site I frequent, shares their pocket monster data for anyone. The data I'm looking for is kept at csv in the pokedex/data/csv folder. After getting the data, let's load it into R.

  # load the data  
  types <- read.csv("types.csv")  
  type_efficacy <- read.csv("type_efficacy.csv")

Now let's clean up the data a bit.

 # get rid of some columns we don't care about  
 types <- types[, 1:2]  
 # scale down damage factor  
 type_efficacy$damage_factor <- type_efficacy$damage_factor / 100

Ok, great. If we look at our type_efficacy data frame, we have ids for the damage and target types. Would like to replace those ids with plain text labels that I can understand a little easier. This would be super easy in sql, but for some reason, seems to be a real pain in R. Maybe there is a better answer.

 # hacky way to get this to work. need a better solutions.  
 # join from plyr won't work because column names are not the same?  
 te <- merge(type_efficacy, types, by.x = "damage_type_id", by.y="id", all=T)  
 names(te)[[4]] <- "damage_type"  
 te <- merge(te, types, by.x = "target_type_id", by.y="id", all=T)  
 names(te)[[5]] <- "target_type"  
 # get rid of the NA rows and id value columns  
 te <- te[complete.cases(te), c("damage_type", "target_type", "damage_factor")]

Next we need all pairings of the types. Turns out not all of these type pairings exist as real pokemon, but we can deal with that later.

 # create all pairings of types  
 tm <- expand.grid(t, t)  
 colnames(tm) <- c("t1", "t2")  
 # get rid of rows with same type  
 tm <- tm[which(tm$t1 != tm$t2 | is.na(tm$t2)) , ]  
 tm <- tm[!is.na(tm$t1), ]

Great! But, there is a small problem. We have both Steel / Grass and Grass / Steel. The order of the pairings does not matter in calculating the resistances, so let's get rid of those redundant combinations. This is another place where a better solution is probably available, but this quick function makes it fast and easy to identify these duplicated pairings. I'll source this function, run it against the type pairings data frame and then get rid of all duplicated pairings.

 # quick function to find combinations that are already in the data frame  
 tagDups <- function(df = tm) {  
  df$dup = FALSE  
  for (i in seq(1, nrow(df))) {  
   v1 = df[1:i, ]$t1 == df[i, ]$t2  
   v2 = df[1:i, ]$t2 == df[i, ]$t1  
     
   if(length(which(v1 & v2)) > 0) {  
    df[i, ]$dup = TRUE  
   }  
  }  
  return(df)  
 }  
   
 # and get rid of those rows  
 tm <- tagDups()  
 tm <- tm[!tm$dup, 1:2]

Back to the damage calculations, I would like a data frame where I can easily look up the damage factors of the type can be easily calculated. I'll cast our earlier type_efficacy data frame into a more convenient form.

 # create a damage / target matrix  
 library(reshape2)  
 type_dmg <- dcast(te, damage_type ~ target_type)  
   
 head(type_dmg)  
  damage_type bug dark dragon electric fighting fire flying ghost grass ground ice normal poison psychic rock steel water  
 1     bug 1.0 2.0  1.0   1.0   0.5 0.5  0.5  0.5  2.0   1  1   1  0.5   2.0 1.0  0.5  1.0  
 2    dark 1.0 0.5  1.0   1.0   0.5 1.0  1.0  2.0  1.0   1  1   1  1.0   2.0 1.0  0.5  1.0  
 3   dragon 1.0 1.0  2.0   1.0   1.0 1.0  1.0  1.0  1.0   1  1   1  1.0   1.0 1.0  0.5  1.0  
 4  electric 1.0 1.0  0.5   0.5   1.0 1.0  2.0  1.0  0.5   0  1   1  1.0   1.0 1.0  1.0  2.0  
 5  fighting 0.5 2.0  1.0   1.0   1.0 1.0  0.5  0.0  1.0   1  2   2  0.5   0.5 2.0  2.0  1.0  
 6    fire 2.0 1.0  0.5   1.0   1.0 0.5  1.0  1.0  2.0   1  2   1  1.0   1.0 0.5  2.0  0.5

Ok, ugly formatting, but whatever. You get the idea. Now we can get a type profile simply with the command:

 # now we can get the damage taken profile for a type  
 type_dmg[, "ice"]

At this point we have a data frame listing all type pairings in tm and a lookup table for a type damage profile in type_dmg. This should give us enough to do all the analysis now.

First, we need a quick function to create the pairing summaries for us.

 # returns damage taken vector for type pairing  
 calcRes <- function(row, td = type_dmg) {  
  t1 = as.character(row$t1)  
  t2 = as.character(row$t2)  
  if(is.na(row$t2)) {  
   res <- td[, t1]  
  } else {  
   res <- (td[, t1] * td[, t2])  
  }  
    
  immunities <- length(which(res == 0))  
  resistances <- length(which(res > 0 & res < 1))  
  standards <- length(which(res == 1))  
  weakness <- length(which(res == 2))  
  kryptonite <- length(which(res == 4))  
    
  return(c(t1, t2, immunities, resistances, standards, weakness, kryptonite, res))  
 }

And an object to store our results.

 type_summary <- data.frame(t1 = character(),   
               t2 = character(),  
               immunities = numeric(),  
               resistance = numeric(),  
               standards = numeric(),  
               weaknesses = numeric(),  
               kryptonite = numeric(),  
               bug    = numeric(),  
               dark    = numeric(),  
               dragon   = numeric(),  
               electric  = numeric(),  
               fighting  = numeric(),  
               fire    = numeric(),  
               flying   = numeric(),  
               ghost   = numeric(),  
               grass   = numeric(),  
               ground   = numeric(),  
               ice    = numeric(),  
               normal   = numeric(),  
               poison   = numeric(),  
               psychic  = numeric(),  
               rock    = numeric(),  
               steel   = numeric(),  
               water   = numeric(),  
               stringsAsFactors=FALSE)

Finally, for each type pairing, let's calculate it's resistances. I should have probably done this is sapply or something, but for only a few hundred rows, I guess it's ok to do it the "easiest" way I know.

 for(i in seq(1, nrow(tm))) {  
  type_summary[i, ] <- calcRes(tm[i, ])  
 }

After it's all done, we have a nice data set that looks like this.
CSV
Rda

That was fun. Now I can look at various type pairings as I plan my team for X/Y in October. Surely this will make X/Y more fun for me. Or actually, maybe this was the fun part. No insight to share this time, maybe in the next post. Of course, thanks to veekun and his pokedex for the raw data.

Pokemon Type Analysis in R

1 comment: