R doesn't have great string or set manipulation functions, but you can accomplish a lot by using factors and apply(). For instance, I had a column in a data frame that consisted of 3 two-character tokens concatenated together (e.g. 'abdqbk','cbdabb') that represented sets of 3 sounds played in order. I needed to recode this as a set of two factors, one indicating which set of symbols had been used in a particular trial, and another indicating the order they had been played. Solving the problem turned out to be an interesting illustration of how your tools constrain your thinking, and in this case the result wound up being fairly elegant (IMHO). Whereas if I had been working in Python I probably would have written a couple of loops to run through the strings, one to find the unique sets of symbols, and the other to assign unique values to each permutation.
permutations
# assume all stims have the same length name but otherwise stay flexible
stimlen
split.points
toks
# sort tokens and use factor to compute unique sets
sets
perms
function(x) {as.numeric(factor(as.character(x)))})
data.frame(sets, perms=unsplit(perms,sets))
}
strmsplit() is another little bit of apply() magic that will split a string at a set of fixed cut points. The code is from
one R tip a day.
strmsplit
# split a string into multiple bits based on cut points
# e.g. strmsplit('st378akbkzk',c(5,2,2,2)) = c('st378','ak','bk','zk')
# from http://onertipaday.blogspot.com/2007/06/string-manipulation-insert-delim.html
start
sel
apply(sel, 1, function(x) substr(s, x[1], x[2]))
}