What code in R will do the following:
Given a list 1, 2, ..., M create a list of N random entries from that list. Furthermore, obtain the complement list.
example:
N = 5
M = 10
list = [1,4,3,9,2]
complement = [5,6,7,8,10]
?sample
samp_range <- 1:M
out <- sample(samp_range, N)
compliment <- samp_range[!samp_range %in% out]
or as per Joran's comment:
compliment <- setdiff(samp_range, out)
Also, as a rule, avoid using things like list as variable names since they are internal R functions.
Related
I have a dataframe with binary values like so:
df<-data.frame(a=rep(c(1,0),9),b=rep(c(0,1,0),6),c=rep(c(0,1),9))
Purpose is to first obtain all pairwise combinations :
combos <- function(df, n) {
unlist(lapply(n, function(x) combn(df, x, simplify=F)), recursive=F)
}
combos(df,2)->j
Next I want to get the proportion of pairs for which both columns in each dataframe in list j has either (0,0) or (1,1). I can get the proportions like so:
lapply(j, function(x) data.frame(new = rowSums(x[,1:2])))->k
lapply(k, function(x) data.frame(prop1 = length(which(x==1))/18,prop2=length(which(x==0|x==2))/18))
However this seems slow and complicated for larger lists. Couple of questions:
1) Is there a faster/better method than this? My actual list is 20 dataframes each with dim : 250 x 400. I tried dist(df,method=binary)but it looks like the binary method doesnot take into account (0,0) instances.
2) Also why when I try to divide using length(x[1]) or lengths(x[1]) it does not give me 18? In the example I divided it by specifying the length of vector new.
Any help is very much appreciated!
#Get the combinations
j = combn(x = df, m = 2, simplify = FALSE)
#Get the Proportions
sapply(j, function(x) length(which(x[1] == x[2]))/NROW(x))
As #thelatemail commented, if you are not concerned with storing the intermediate combinations, you can just do at once using
combn(x = df, m = 2, FUN=function(x) length(which(x[1] == x[2]))/NROW(x))
I have a large list with many sublists, each of the sublists is formed by a vector of values. To this list I aim to apply a form of fast expand grid cJ, however when confronted with one of the lists yielding integer zero the function fails. My question is how could I convert Z as per all sublists which yield integer zero are transformed into class which can be submitted to the below function. I know I could use length(Z[[4]) but I aim to have a method that can be used for lists wich may include thousands of lines and a few of them may be integer 0, so I aim to convert in Z any possible integers which may be listed.
Z <- list (c(1,2,3,4,3,2,1,2),c(1,2,3,4),c(5,6,4),c(integer(0)))
do.call ( CJ , args = Z ) # get all combinations
My question is if there is any way to change the class as a whole of Z as to succeed in sumitting the data as for the function to work and not yield an error.
# Desired Output will be equal to having the last list with a numeric 0 so it will be represented in the fast expand.grid.
Z <- list (c(1,2,3,4,3,2,1,2),c(1,2,3,4),c(5,6,4),c((0)))
do.call(CJ,Z)
CJ function comes from data.table so it is worth to add that tag to question.
There is an open FR to create CJ generic method, so it could handle different types separately.
Below the function which address your question.
library(data.table)
f = function(x){
stopifnot(is.list(x))
ll = sapply(x, length)
if(any(ll == 0L)) x[ll == 0L] = 0L
do.call(CJ, args = x)
}
x = list(c(1,2,3,4,3,2,1,2),c(1,2,3,4),c(5,6,4),c(integer(0)))
f(x)
I have a question for R gurus out there. I'll illustrate it on the following example:
I have a vector, say 1,2,3,4,5,6,7,8
I'd like to get a vector of sums of 2 elements: 3,5,7,9,11,13,15
This is just an example, I'm not looking for a trick, I want to do it with just vectorization and indexing. Is there any way to get access to the implicit loop parameter as it goes through it?
Thanks a lot.
You can use rollapply from zoo package
> x <- 1:8
> rollapply(x, width=2, FUN=sum)
[1] 3 5 7 9 11 13 15
You can use sapply or a variation of it, and write a function that sums up appropriate elements given the indexes, and your matrix. For example,
m <- matrix(1:9, nrow=3)
m
Create a data frame with all possible index pairs
m_ind <- expand.grid(1:nrow(m),1:ncol(m), stringsAsFactors = FALSE)
names(m_ind) <- c("i","j")
m_ind
m[as.matrix(m_ind[,1:2])]
Diagonals, or the parallel lines can be described by constant diffs, or constant sums of the indexes
m_ind$dif_ij <- m_ind$i - m_ind$j
m_ind$sum_ij <- m_ind$i + m_ind$j
Then sum up the elements you want
m_ind$sum1 <- sapply(1:nrow(m_ind), function(k, mydf, colname, mymatr)
sum(mymatr[as.matrix(mydf[mydf[, colname]==mydf[k, colname], c("i","j")])]),mydf=m_ind, colname="dif_ij", mymatr=m)
m_ind$sum2 <- sapply(1:nrow(m_ind), function(k, mydf, colname, mymatr)
sum(mymatr[as.matrix(mydf[mydf[, colname]==mydf[k, colname], c("i","j")])]), mydf=m_ind, colname="sum_ij", mymatr=m)
and, finally combine them
m_ind$sum <- m_ind$sum1 + m_ind$sum2
m_ind
I've got a list of lists of bootstrap statistics from a function that I wrote in R. The main list has the 1000 bootstrap iterations. Each element within the list is itself a list of three things, including fitted values for each of the four variables ("fvboot" -- a 501x4 matrix).
I want to make a vector of the values for each position on the grid of x values, from 1:501, and for each variable, from 1:4.
For example, for the ith point on the xgrid of the jth variable, I want to make a vector like the following:
vec = bootfits$fvboot[[1:1000]][i,j]
but when I do this, I get:
recursive indexing failed at level 2
googling around, i think I understand why R is doing this. but I'm not getting an answer for how I can get the ijth element of each fvboot matrix into a 1000x1 vector.
help would be much appreciated.
Use unlist() function in R. From example(unlist),
unlist(options())
unlist(options(), use.names = FALSE)
l.ex <- list(a = list(1:5, LETTERS[1:5]), b = "Z", c = NA)
unlist(l.ex, recursive = FALSE)
unlist(l.ex, recursive = TRUE)
l1 <- list(a = "a", b = 2, c = pi+2i)
unlist(l1) # a character vector
l2 <- list(a = "a", b = as.name("b"), c = pi+2i)
unlist(l2) # remains a list
ll <- list(as.name("sinc"), quote( a + b ), 1:10, letters, expression(1+x))
utils::str(ll)
for(x in ll)
stopifnot(identical(x, unlist(x)))
This would be easier if you give a minimal example object. In general, you can not index lists with vectors like [[1:1000]]. I would use the plyr functions. This should do it (although I haven't tested it):
require("plyr")
laply(bootfits$fvboot,function(l) l[i,j])
If you are not familiar with plyr: I always found Hadley Wickham's article 'The split-apply-combine strategy for data analysis' very useful.
You can extract one vector at a time using sapply, e.g. for i=1 and j=1:
i <- 1
j <- 1
vec <- sapply(bootfits, function(x){x$fvboot[i,j]})
sapply carries out the function (in this case an inline function we have written) to each element of the list bootfits, and simplifies the result if possible (i.e. converts it from a list to a vector).
To extract a whole set of values as a matrix (e.g. over all the i's) you can wrap this in another sapply, but this time over the i's for a specified j:
j <- 1
mymatrix <- sapply(1:501, function(i){
sapply(bootfits, function(x){x$fvboot[i,j]})
})
Warning: I haven't tested this code but I think it should work.
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Randomly selecting values from an existing matrix after adding a vector (in R)
This is a follow up to my question from last week and can be found here. I wasn't sure if it was appropriate to post this question in the same place, or to post it as a new question.
Okay, last time I asked about randomly removing values from a matrix after binding a new vector to it. The answers were very useful, but I have found a bug when I am using a non square matrix. I have been running the code in a loop and taking the sum of the matrix each time to ensure that it is working properly, but I have found that the sum varies, which would imply that the code is sometimes selecting the wrong value in the matrix (I want it to only select and replace ones).
Here is the code:
mat1<-matrix(c(1,0,1,0, 0,1,1,1, 1,0,0,0, 1,0,0,1, 1,1,1,1, 0,0,0,1),byrow=F, nrow=4)
I.vec<-c(0,1,1,1,0,0)
foo <- function(mat, vec) {
nr <- nrow(mat)
nc <- ncol(mat)
cols <- which(vec == 1L)
rows <- sapply(seq_along(cols),
function(x, mat, cols) {
ones <- which(mat[,cols[x]] == 1L)
sample(ones, 1)
}, mat = mat, cols = cols)
ind <- (nr*(cols-1)) + rows
mat[ind] <- 0
mat <- rbind(mat, vec)
rownames(mat) <- NULL
mat
}
set.seed(2)
for (j in 1:1000){ #run this vector through the simulations
I.vec2=sample(I.vec,replace=FALSE) #randomize interactions
temp=foo(mat1,I.vec2) #run foo function
prop=sum(temp)
print.table(prop)
}
In this case, sometimes the sum of the matrix is 13 and sometimes it is 14, when it should always be = sum(mat1) = 13.
I've tried to pick apart the code, and I think everything is working correctly except for the rows function, which, admittedly, I do not fully understand.
The problem is a feature of sample(). I will update the original Q, but the problem is due to a single 1 being observed in a column in the candidate matrix. The code that generates rows is trying to sample from a set of 1. Unfortunately, I forgot that sample() has a feature that when the first argument is a vector of length 1, sample() treats it as if you wanted to sample from the set 1, ..., n where n was the value of the single element in the set youe really wanted to sample from.
A simple example illustrates the behaviour when the argument x to sample() is a length 1 vector:
> set.seed(1)
> replicate(10, sample(4, 1))
[1] 2 2 3 4 1 4 4 3 3 1
Instinctively, these should all be 4, but they aren't because of this documented and well known feature.