I am currently working through an intro class and I and was having some difficulty with this particular problem:
Create a function that takes in a vector of numbers V.Size and a single number N as inputs and outputs a list object of size N where each list member is a vector that contains elements of V.Size such that the largest value in V.Size is in the vector of the first list item, the second largest value in V.Sizeis in the vector of the second list item, etc. The (N+1) ordered value of V.Size should be in the first vector of the list, the (N+2) ordered value ofV.Size should be in the second vector of the list and so on.
Now, this is what I have done thus far, I am trying to make an example code:
V.Size <- c(5,4,2,3,1)
n <- 5
Function <- c(V.Size, n)
Function
[1] 5 4 2 3 1 5
sort(Function, decreasing=TRUE)
[1] 5 5 4 3 2 1
The issue I am having is with (N+1), (N+2) and its ordering.
The first step to addressing this would be to create a vector of the list position for each element in sorted V.size. This is basically the vector (1, 2, ..., N, 1, 2, ..., N, ...), of total length V.size. You can get that with:
V.Size <- c(5,4,2,3,1)
n <- 2
rep(1:n, length.out=length(V.Size))
# [1] 1 2 1 2 1
Now you can use the split function to create a list based on these assignments:
split(sort(V.Size, decreasing=TRUE), rep(1:n, length.out=length(V.Size)))
# $`1`
# [1] 5 3 1
#
# $`2`
# [1] 4 2
Related
For example, I have a vector below:
a = c(1,1,1,1,1,2,2,2,3,3,3,3)
Now I want to randomly pick 4 of the elements from all the elements then change them into different value, for instance,
if the elements I pick is 1, 1, 2, 3, then I need to change them randomly , like, 2, 3, 1, 2
The resulting vector is the following
a' = c(1,2,3,1,1,2,1,2,3,3,3,2)
No idea how to make this.
May be this function helps
# #param vec - input vector
# #param n - number of values to replace
# #param n1 - number of unique value threshold
# #return replaced sampled vector
sample_fn <- function(vec, n, n1) {
flag <- TRUE
while(flag) {
# // sample on the positions
pos <- sample(seq_along(vec), n, replace = FALSE)
print(pos)
# // extract the values based on the position index
as <- vec[pos]
# // get the unique values
un1 <- unique(as)
print(un1)
if(length(un1) > n1)
flag <- FALSE
}
# // sample the unique and set it as names of unique
# // use named vector to match and replace
# // assign the output back to the same positions in the vector
vec[pos] <- setNames(sample(un1), un1)[as.character(as)]
vec
}
sample_fn(a, 4, 2)
#[1] 10 1 12 2
#[1] 3 1
#[1] 1 8 4 3
#[1] 1 2
#[1] 7 11 4 12
#[1] 2 3 1
# [1] 1 1 1 2 1 2 3 2 3 3 1 1
I am not sure if the values for random replacement are also from a. If so, the code below might be an option
replace(a,sample(seq_along(a),4),sample(unique(a),4,replace = TRUE))
I have a dataframe with multiple rows. I want to call a function is using any two rows. For example, Let's say I have this data and this myFunc which accepts two args:
df <- data.frame(q1=c(1,2,5), q2=c(5,5,5), q3=c(5,2,5), q4=c(5,5,5), q5=c(2,3,1))
df
q1 q2 q3 q4 q5
1 1 5 5 5 2
2 2 5 2 5 3
3 5 5 5 5 1
myFunc<-function(a,b) sum((df[a,]==df[b,] & df[a,]==5)*1)
A want to apply myFunc for row 1 and 2, myFunc(1,2) and I expect 2, myFunc compute how many "5" are have in common under the same column, between row 1 and 2.
Since I have thousands of rows, and I want to match all pairs, I want do this without writing a for loop, maybe with the do call or apply function family.
I tried this:
a=c(1,2) # match the row 1 and 2
b=c(2,3) # match the row 2 and 3
my_list=list(a,b)
do.call("myFunc", my_list)
But I got 4, instead of 2 and 2, any ideas?
The question recently changed. My understanding of it is that the input should be a list of pairs of row numbers and the output should be the same length as that list such that each component of the output is the number of columns with both entries equal to 5 in both rows defined by the corresponding pair. Thus for df shown in the question the list L shown below would correspond to c(myFunc(1, 2), myFunc(2, 3)) where myFunc is as defined in the question.
L <- list(1:2, 2:3)
myFunc2 <- function(x) myFunc(x[1], x[2])
sapply(L, myFunc2)
## [1] 2 2
Note that *1 in myFunc is unnecessary since sum will coerce a logical argument to numeric.
An alternative might be to specify the first row numbers as a vector and the second row numbers as another vector. In terms of L that would be a <- sapply(L, "[", 1); b <- sapply(L, "[", 2). Then use mapply.
a <- c(1, 2) # L[[1]][1], L[[2]][1]
b <- c(2, 3) # L[[1]][2], L[[2]][2]
mapply(myFunc, a, b)
## [1] 2 2
Try passing the rows instead of the row index
df <- data.frame(q1=c(1,2,5), q2=c(5,5,5), q3=c(5,2,5), q4=c(5,5,5), q5=c(2,3,1))
myFunc<-function(a,b) sum((a==b & a==5)*1)
myFunc(df[1,],df[2,])
This worked for me (returned 2)
I have Valence Category for word stimuli in my psychology experiment.
1 = Negative, 2 = Neutral, 3 = Positive
I need to sort the thousands of stimuli with a pseudo-randomised condition.
Val_Category cannot have more than 2 of the same valence stimuli in a row i.e. no more than 2x negative stimuli in a row.
for example - 2, 2, 2 = not acceptable
2, 2, 1 = ok
I can't sequence the data i.e. decide the whole experiment will be 1,3,2,3,1,3,2,3,2,2,1 because I'm not allowed to have a pattern.
I tried various packages like dylpr, sample, order, sort and nothing so far solves the problem.
I think there's a thousand ways to do this, none of which are probably very pretty. I wrote a small function that takes care of the ordering. It's a bit hacky, but it appeared to work for what I tried.
To explain what I did, the function works as follows:
Take the vector of valences and samples from it.
If sequences are found that are larger than the desired length, then, (for each such sequence), take the last value of that sequence at places it "somewhere else".
Check if the problem is solved. If so, return the reordered vector. If not, then go back to 2.
# some vector of valences
val <- rep(1:3,each=50)
pseudoRandomize <- function(x, n){
# take an initial sample
out <- sample(val)
# check if the sample is "bad" (containing sequences longer than n)
bad.seq <- any(rle(out)$lengths > n)
# length of the whole sample
l0 <- length(out)
while(bad.seq){
# get lengths of all subsequences
l1 <- rle(out)$lengths
# find the bad ones
ind <- l1 > n
# take the last value of each bad sequence, and...
for(i in cumsum(l1)[ind]){
# take it out of the original sample
tmp <- out[-i]
# pick new position at random
pos <- sample(2:(l0-2),1)
# put the value back into the sample at the new position
out <- c(tmp[1:(pos-1)],out[i],tmp[pos:(l0-1)])
}
# check if bad sequences (still) exist
# if TRUE, then 'while' continues; if FALSE, then it doesn't
bad.seq <- any(rle(out)$lengths > n)
}
# return the reordered sequence
out
}
Example:
The function may be used on a vector with or without names. If the vector was named, then these names will still be present on the pseudo-randomized vector.
# simple unnamed vector
val <- rep(1:3,each=5)
pseudoRandomize(val, 2)
# gives:
# [1] 1 3 2 1 2 3 3 2 1 2 1 3 3 1 2
# when names assigned to the vector
names(val) <- 1:length(val)
pseudoRandomize(val, 2)
# gives (first row shows the names):
# 1 13 9 7 3 11 15 8 10 5 12 14 6 4 2
# 1 3 2 2 1 3 3 2 2 1 3 3 2 1 1
This property can be used for randomizing a whole data frame. To achieve that, the "valence" vector is taken out of the data frame, and names are assigned to it either by row index (1:nrow(dat)) or by row names (rownames(dat)).
# reorder a data.frame using a named vector
dat <- data.frame(val=rep(1:3,each=5), stim=rep(letters[1:5],3))
val <- dat$val
names(val) <- 1:nrow(dat)
new.val <- pseudoRandomize(val, 2)
new.dat <- dat[as.integer(names(new.val)),]
# gives:
# val stim
# 5 1 e
# 2 1 b
# 9 2 d
# 6 2 a
# 3 1 c
# 15 3 e
# ...
I believe this loop will set the Valence Category's appropriately. I've called the valence categories treat.
#Generate example data
s1 = data.frame(id=c(1:10),treat=NA)
#Setting the first two rows
s1[1,"treat"] <- sample(1:3,1)
s1[2,"treat"] <- sample(1:3,1)
#Looping through the remainder of the rows
for (i in 3:length(s1$id))
{
s1[i,"treat"] <- sample(1:3,1)
#Check if the treat value is equal to the previous two values.
if (s1[i,"treat"]==s1[i-1,"treat"] & s1[i-1,"treat"]==s1[i-2,"treat"])
#If so draw one of the values not equal to that value
{
a = 1:3
remove <- s1[i,"treat"]
a=a[!a==remove]
s1[i,"treat"] <- sample(a,1)
}
}
This solution is not particularly elegant. There may be a much faster way to accomplish this by sorting several columns or something.
I am trying to construct a function which shouldn't be hard in terms of programming but I am having some difficulties to conceptualize it. Hope you'll be able to understand my problem better than me!
I'd like a function that takes a single list of vectors as argument. Something like
arg1 = list(c(1,2), c(2,3), c(5,6), c(1,3), c(4,6), c(6,7), c(7,5), c(5,8))
The function should output a matrix with two columns (or a list of two vectors or something like that) where one column contains letters and the other numbers. One can think of the argument as a list of the positions/values that should be placed in the same group. If in the list there is the vector c(5,6), then the output should contain somewhere the same letters next to the values 5 and 6 in the number column. If there are the three following vectors c(1,2), c(2,3) and c(1,3), then the output should contain somewhere the same letters next to the value 1, 2 and 3 in the number column.
Therefore if we enter the object arg1 in the function it should return:
myFun(arg1)
number_column letters_column
1 A
2 A
3 A
5 B
6 B
7 B
4 C
6 C
5 D
8 D
(the order is not important. The letters E should not be present before the letter D has been used)
Therefore the function has constructed 2 groups of 3 (A:[1,2,3] and B:[5,6,7]) and 2 groups of 2 (C:[4,6] and D:[5,8]). Note one position or number can be in several group.
Please let me know if something is unclear in my question! Thanks!
As I wrote in the comments, it appears that you want a data frame that lists the maximal cliques of a graph given a list of vectors that define the edges.
require(igraph)
## create a matrix where each row is an edge
argmatrix <- do.call(rbind, arg1)
## create an igraph object from the matrix of edges
gph <- graph.edgelist(argmatrix, directed = FALSE)
## returns a list of the maximal cliques of the graph
mxc <- maximal.cliques(gph)
## creates a data frame of the output
dat <- data.frame(number_column = unlist(mxc),
group_column = rep.int(seq_along(mxc),times = sapply(mxc,length)))
## converts group numbers to letters
## ONLY USE if max(dat$group_column) <= 26
dat$group_column <- LETTERS[dat$group_column]
# number_column group_column
# 1 5 A
# 2 8 A
# 3 5 B
# 4 6 B
# 5 7 B
# 6 4 C
# 7 6 C
# 8 3 D
# 9 1 D
# 10 2 D
I have A vector:
x<-c(1,2,3,3,2,2)
Now I want to order this vector on number of occurences, I know I can count the number of occurences with table:
x.order <- table(x)[rev(order(table(x)))]
Gives me:
2 3 1
3 2 1
Now I know, I first have to select the values of x, which are 2, then the values of x which are 3 and then the values where x is 1. How can I perform this last step?
The final output has to look like:
2,2,2,3,3,1
Or is there a better way to order the vector by number of occurences?
x<-c(1,2,3,3,2,2)
x.order <- sort(table(x), TRUE)
rep(as.numeric(names(x.order)), times=x.order)
#[1] 2 2 2 3 3 1