07/05: permutation magic

Tags: R

R doesn't have great string or set manipulation functions, but you can accomplish a lot by using factors and apply(). For instance, I had a column in a data frame that consisted of 3 two-character tokens concatenated together (e.g. 'abdqbk','cbdabb') that represented sets of 3 sounds played in order. I needed to recode this as a set of two factors, one indicating which set of symbols had been used in a particular trial, and another indicating the order they had been played. Solving the problem turned out to be an interesting illustration of how your tools constrain your thinking, and in this case the result wound up being fairly elegant (IMHO). Whereas if I had been working in Python I probably would have written a couple of loops to run through the strings, one to find the unique sets of symbols, and the other to assign unique values to each permutation.



permutations 
  # assume all stims have the same length name but otherwise stay flexible

  stimlen 
  split.points 
  toks 


  # sort tokens and use factor to compute unique sets

  sets 


  perms 
                   function(x) {as.numeric(factor(as.character(x)))})



  data.frame(sets, perms=unsplit(perms,sets))

}

strmsplit() is another little bit of apply() magic that will split a string at a set of fixed cut points. The code is from one R tip a day.



strmsplit 
  # split a string into multiple bits based on cut points

  # e.g. strmsplit('st378akbkzk',c(5,2,2,2)) = c('st378','ak','bk','zk')

  # from http://onertipaday.blogspot.com/2007/06/string-manipulation-insert-delim.html

  start 
  sel 
  apply(sel, 1, function(x) substr(s, x[1], x[2]))

}

07/05: permutation magic

Comments