Function that group values of a list (in R) - r

I am trying to construct a function which shouldn't be hard in terms of programming but I am having some difficulties to conceptualize it. Hope you'll be able to understand my problem better than me!
I'd like a function that takes a single list of vectors as argument. Something like
arg1 = list(c(1,2), c(2,3), c(5,6), c(1,3), c(4,6), c(6,7), c(7,5), c(5,8))
The function should output a matrix with two columns (or a list of two vectors or something like that) where one column contains letters and the other numbers. One can think of the argument as a list of the positions/values that should be placed in the same group. If in the list there is the vector c(5,6), then the output should contain somewhere the same letters next to the values 5 and 6 in the number column. If there are the three following vectors c(1,2), c(2,3) and c(1,3), then the output should contain somewhere the same letters next to the value 1, 2 and 3 in the number column.
Therefore if we enter the object arg1 in the function it should return:
myFun(arg1)
number_column letters_column
1 A
2 A
3 A
5 B
6 B
7 B
4 C
6 C
5 D
8 D
(the order is not important. The letters E should not be present before the letter D has been used)
Therefore the function has constructed 2 groups of 3 (A:[1,2,3] and B:[5,6,7]) and 2 groups of 2 (C:[4,6] and D:[5,8]). Note one position or number can be in several group.
Please let me know if something is unclear in my question! Thanks!

As I wrote in the comments, it appears that you want a data frame that lists the maximal cliques of a graph given a list of vectors that define the edges.
require(igraph)
## create a matrix where each row is an edge
argmatrix <- do.call(rbind, arg1)
## create an igraph object from the matrix of edges
gph <- graph.edgelist(argmatrix, directed = FALSE)
## returns a list of the maximal cliques of the graph
mxc <- maximal.cliques(gph)
## creates a data frame of the output
dat <- data.frame(number_column = unlist(mxc),
group_column = rep.int(seq_along(mxc),times = sapply(mxc,length)))
## converts group numbers to letters
## ONLY USE if max(dat$group_column) <= 26
dat$group_column <- LETTERS[dat$group_column]
# number_column group_column
# 1 5 A
# 2 8 A
# 3 5 B
# 4 6 B
# 5 7 B
# 6 4 C
# 7 6 C
# 8 3 D
# 9 1 D
# 10 2 D

Related

data.table ifelse with multiple columns

I have a dataset with tens of columns that looks something like this:
df <- data.frame(id= c(1,1,1,2,2,2,3,3,3), time=c(1,2,3,1,2,3,1,2,3),y1 = rnorm(9), y2= rnorm(9), x = rnorm(9), xb = rnorm(9))
df
# id time y1 y2 x xb
# 1 1 1 -1.1184009 -1.07430118 0.61398523 -0.68343624
# 2 1 2 0.4347047 -0.53454071 -0.30716538 -1.02328242
# 3 1 3 0.2318315 -0.05854228 0.05169733 -0.22130149
# 4 2 1 1.2640080 2.07899296 -0.95918953 -0.35961156
# 5 2 2 -0.4374764 -0.25284854 -0.46251901 0.08630344
# 6 2 3 0.5042690 0.13322671 1.00881113 0.43807458
# 7 3 1 0.3672216 1.92995242 0.48708183 0.58206127
# 8 3 2 -1.5431709 0.53362731 1.17361087 -1.00932195
# 9 3 3 -1.4577268 0.23413541 -0.32399489 -0.91040641
I would like to modify my data frame using the following logic
df<-setDT(df)[,y1:=ifelse(y1>x,x,y1))]
df<-setDT(df)[,y2:=ifelse(y2>xb,xb,y2))]
However, since I have many variables I would like to do this in a single line expression. In other words, I would like to pass this function for multiple columns at once i.e. y1 with x, y2 with xb and so on...
I have tried the following but it does not seem to work
mod<-c("y1","y2")
max<-c("x","xb")
df2<-setDT(ppta)[,(mod):=ifelse(.(mod)>.(max),.(max),.(mod))]
does anyone knows what I am doing wrong? and how I modify multiple columns with their respective partner column at once?
Consider using pmin instead of your ifelse. You can try:
mod<-c("y1","y2")
max<-c("x","xb")
setDT(df)
df[,c(mod):=Map(pmin,mget(mod),mget(max))]
Explanation:
pmin takes two (or more) vectors and gives the minimum value for each element (equivalent of your ifelse(y1>x,x,y1));
mget returns a list of objects from their names. For instance mget("a","b") returns a list with the a and b objects (if they exist). This is used to retrieve the column from their name in the environment of the data table;
Map applies a function with more arguments element by element. Map(f,a,b) is equivalent to list(f(a[[1]],b[[1]]),f(a[[2]],b[[2]]),...).

How to transpose a long data frame every n rows

I have a data frame like this:
x=data.frame(type = c('a','b','c','a','b','a','b','c'),
value=c(5,2,3,2,10,6,7,8))
every item has attributes a, b, c while some records may be missing records, i.e. only have a and b
The desired output is
y=data.frame(item=c(1,2,3), a=c(5,2,6), b=c(2,10,7), c=c(3,NA,8))
How can I transform x to y? Thanks
We can use dcast
library(data.table)
out <- dcast(setDT(x), rowid(type) ~ type, value.var = 'value')
setnames(out, 'type', 'item')
out
# item a b c
#1: 1 5 2 3
#2: 2 2 10 8
#3: 3 6 7 NA
Create a grouping vector g assuming each occurrence of a starts a new group, use tapply to create a table tab and coerce that to a data frame. No packages are used.
g <- cumsum(x$type == "a")
tab <- with(x, tapply(value, list(g, type), c))
as.data.frame(tab)
giving:
a b c
1 5 2 3
2 2 10 NA
3 6 7 8
An alternate definition of the grouping vector which is slightly more complex but would be needed if some groups have a missing is the following. It assumes that x lists the type values in order of their levels within group so that if a level is less than the prior level it must be the start of a new group.
g <- cumsum(c(-1, diff(as.numeric(x$type))) < 0)
Note that ultimately there must be some restriction on missingness; otherwise, the problem is ambiguous. For example if one group can have b and c missing and then next group can have a missing then whether b and c in the second group actually form a second group or are part of the first group is not determinable.

R: Stack error - How to merge multiple columns into one very long column in R

This should be easy but I'm having a lot of difficulty.
I have a relatively large dataset of medications,
What I want is a table of frequencies, but ranging over ALL the columns - so I want the medication that appears the most commonly from columns 1:8.
My idea was to combine all of these columns into one long column, just one on top of the other. However, I have tried multiple function (stack, melt, matrix), but they all give me bizarre results. The one that seems correct for me to use is stack, but it keeps returning the error message "Error in stack.data.frame(meds) : no vector columns were selected". I've seen this error on the message boards before - I tried converting into as.vector, but this is not working. The object is definitely of class dataframe.
If there is another way to achieve these table results, that would be great, but either way, it's not working right now. Could somebody help?
Consider do.call or Reduce using c() function to combine all columns into a vector and then count unique meds using sapply loop:
set.seed(79)
meds <- data.frame(MED1=sample(LETTERS, 8),
MED2=sample(LETTERS, 8),
MED3=sample(LETTERS, 8),
MED4=sample(LETTERS, 8),
MED5=sample(LETTERS, 8),
MED6=sample(LETTERS, 8),
MED7=sample(LETTERS, 8),
MED8=sample(LETTERS, 8), stringsAsFactors = FALSE)
medslist <- do.call(c, meds) # OR Reduce(c, meds)
medslength <- sapply(unique(medslist), function(i) length(medslist[medslist==i]))
medslength <- sort(medslength, decreasing=TRUE)
medslength[1:8]
# B U W L I E M R
# 5 5 3 3 3 3 3 3
Try this to get what you want. No stacking necessary:
df = data.frame(Col1 = sample(LETTERS,50,replace=T),
Col2 = sample(LETTERS,50,replace=T))
> table(as.matrix(df))
# A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
# 2 3 3 4 3 5 4 3 5 3 4 8 4 5 3 6 5 2 5 4 4 2 4 2 3 4

formula for getting combination of lists

I have
List 1 = AB
List 2 = CD
List 3 = EF
List 4 = GH
A program will print a final list composed by only one letter from each list.
So one of the combination can be
A
C
E
G
How many combination are possibile? What is the formula to count the number of the combinations?
The formula is just the product of the lengths of all your lists, so in your case: 2 x 2 x 2 x 2 = 16 combinations.

'Random' Sorting with a condition in R for Psychology Research

I have Valence Category for word stimuli in my psychology experiment.
1 = Negative, 2 = Neutral, 3 = Positive
I need to sort the thousands of stimuli with a pseudo-randomised condition.
Val_Category cannot have more than 2 of the same valence stimuli in a row i.e. no more than 2x negative stimuli in a row.
for example - 2, 2, 2 = not acceptable
2, 2, 1 = ok
I can't sequence the data i.e. decide the whole experiment will be 1,3,2,3,1,3,2,3,2,2,1 because I'm not allowed to have a pattern.
I tried various packages like dylpr, sample, order, sort and nothing so far solves the problem.
I think there's a thousand ways to do this, none of which are probably very pretty. I wrote a small function that takes care of the ordering. It's a bit hacky, but it appeared to work for what I tried.
To explain what I did, the function works as follows:
Take the vector of valences and samples from it.
If sequences are found that are larger than the desired length, then, (for each such sequence), take the last value of that sequence at places it "somewhere else".
Check if the problem is solved. If so, return the reordered vector. If not, then go back to 2.
# some vector of valences
val <- rep(1:3,each=50)
pseudoRandomize <- function(x, n){
# take an initial sample
out <- sample(val)
# check if the sample is "bad" (containing sequences longer than n)
bad.seq <- any(rle(out)$lengths > n)
# length of the whole sample
l0 <- length(out)
while(bad.seq){
# get lengths of all subsequences
l1 <- rle(out)$lengths
# find the bad ones
ind <- l1 > n
# take the last value of each bad sequence, and...
for(i in cumsum(l1)[ind]){
# take it out of the original sample
tmp <- out[-i]
# pick new position at random
pos <- sample(2:(l0-2),1)
# put the value back into the sample at the new position
out <- c(tmp[1:(pos-1)],out[i],tmp[pos:(l0-1)])
}
# check if bad sequences (still) exist
# if TRUE, then 'while' continues; if FALSE, then it doesn't
bad.seq <- any(rle(out)$lengths > n)
}
# return the reordered sequence
out
}
Example:
The function may be used on a vector with or without names. If the vector was named, then these names will still be present on the pseudo-randomized vector.
# simple unnamed vector
val <- rep(1:3,each=5)
pseudoRandomize(val, 2)
# gives:
# [1] 1 3 2 1 2 3 3 2 1 2 1 3 3 1 2
# when names assigned to the vector
names(val) <- 1:length(val)
pseudoRandomize(val, 2)
# gives (first row shows the names):
# 1 13 9 7 3 11 15 8 10 5 12 14 6 4 2
# 1 3 2 2 1 3 3 2 2 1 3 3 2 1 1
This property can be used for randomizing a whole data frame. To achieve that, the "valence" vector is taken out of the data frame, and names are assigned to it either by row index (1:nrow(dat)) or by row names (rownames(dat)).
# reorder a data.frame using a named vector
dat <- data.frame(val=rep(1:3,each=5), stim=rep(letters[1:5],3))
val <- dat$val
names(val) <- 1:nrow(dat)
new.val <- pseudoRandomize(val, 2)
new.dat <- dat[as.integer(names(new.val)),]
# gives:
# val stim
# 5 1 e
# 2 1 b
# 9 2 d
# 6 2 a
# 3 1 c
# 15 3 e
# ...
I believe this loop will set the Valence Category's appropriately. I've called the valence categories treat.
#Generate example data
s1 = data.frame(id=c(1:10),treat=NA)
#Setting the first two rows
s1[1,"treat"] <- sample(1:3,1)
s1[2,"treat"] <- sample(1:3,1)
#Looping through the remainder of the rows
for (i in 3:length(s1$id))
{
s1[i,"treat"] <- sample(1:3,1)
#Check if the treat value is equal to the previous two values.
if (s1[i,"treat"]==s1[i-1,"treat"] & s1[i-1,"treat"]==s1[i-2,"treat"])
#If so draw one of the values not equal to that value
{
a = 1:3
remove <- s1[i,"treat"]
a=a[!a==remove]
s1[i,"treat"] <- sample(a,1)
}
}
This solution is not particularly elegant. There may be a much faster way to accomplish this by sorting several columns or something.

Resources