How can I generate all two way tables from a data frame in R?
some_data <- data.frame(replicate(100, base::sample(1:4, size = 50, replace = TRUE)))
combos <- combn(names(some_data), 2)
The following does not work, was planning to wrap a for loop around it and store results from each iteration somewhere
i=1
table(some_data[combos[, i][1]], some_data[combos[, i][2]])
Why does this not work? individual arguments evaluate as expected:
some_data[combos[, i][1]]
some_data[combos[, i][2]]
Calling it with the variable names directly yields the desired result, but how to loop through all combos in this structure?
table(some_data$X1, some_data$X2)
With combn, there is the FUN argument, so we can use that to extract the 'some_data' and then get the table output in an array
out <- combn(names(some_data), 2, FUN = function(i) table(some_data[i]))
Regarding the issue in the OP's post
table(some_data[combos[, i][1]], some_data[combos[, i][2]])
Both of them are data.frames, we can extract as a vector and it should work
table(some_data[, combos[, i][1]], some_data[, combos[, i][2]])
^^ ^^
or more compactly
table(some_data[combos[, i]])
Update
combn by default have simplify = TRUE, that is it would convert the output to an array. Suppose, if we have combinations that are not symmetric, then this will result in different dimensions of the table output unless we convert it to factor with levels specified. An array can hold only a fixed dimensions. If some of the elements changes in dimension, it result in error as it is an array. One way is to use simplify = FALSE to return a list and list doesn't have that restriction.
Here is an example where the previous code fails
set.seed(24)
some_data2 <- data.frame(replicate(5, base::sample(1:10, size = 50,
replace = TRUE)))
some_data <- data.frame(some_data, some_data2)
out1 <- combn(names(some_data), 2, FUN = function(i)
table(some_data[i]), simplify = FALSE)
is.list(out1)
#[1] TRUE
length(out1)
#[1] 5460
Related
I have more than theree sets, but here I wrote the following example.
S1<-c("Frizzy","Jack","Amy")
S2<-c("Alice","Samy","Anna","Jack")
S3<-c("Frizzy","Anna","Fred","Jack")
I would like to obtain the following result
length(intersect(S1,S2))+length(intersect(S1,S3))+length(intersect(S2,S3))
without write manually all the possible combinations.
We can use combn to get the pairwise intersect between the elements, get the lengths of the list elements and find the sum
sum(lengths(combn(list(S1, S2, S3), 2,
FUN = function(x) Reduce(intersect, x), simplify = FALSE)))
#[1] 5
If there are many objects of the same pattern 'S' followed by some digits, use mget to get those all into a list instead of writing them manually
lst1 <- mget(ls(pattern = '^S\\d+$'))
sum(lengths(combn(lst1, 2,
FUN = function(x) Reduce(intersect, x), simplify = FALSE)))
#[1] 5
I would like to clean up my code a bit and start to use more functions for my everyday computations (where I would normally use for loops). I have an example of a for loop that I would like to make into a function. The problem I am having is in how to step through the constraint vectors without a loop. Here's what I mean;
## represents spectral data
set.seed(11)
df <- data.frame(Sample = 1:100, replicate(1000, sample(0:1000, 100, rep = TRUE)))
## feature ranges by column number
frm <- c(438,563,953,963)
to <- c(548,803,1000,993)
nm <- c("WL890", "WL1080", "WL1400", "WL1375")
WL.ps <- list()
for (i in 1:length(frm)){
## finds the minimum value within the range constraints and returns the corresponding column name
WL <- colnames(df[frm[i]:to[i]])[apply(df[frm[i]:to[i]],1,which.min)]
WL.ps[[i]] <- WL
}
new.df <- data.frame(WL.ps)
colnames(new.df) <- nm
The part where I iterate through the 'frm' and 'to' vector values is what I'm having trouble with. How does one go from frm[1] to frm[2].. so-on in a function (apply or otherwise)?
Any advice would be greatly appreciated.
Thank you.
You could write a function which returns column name of minimum value in each row for a particular range of columns. I have used max.col instead of apply(df, 1, which.min) to get minimum value in a row since max.col would be efficient compared to apply.
apply_fun <- function(data, x, y) {
cols <- x:y
names(data[cols])[max.col(-data[cols])]
}
Apply this function using Map :
WL.ps <- Map(apply_fun, frm, to, MoreArgs = list(data = df))
I need to apply a list of indices to a list of dataframes with a one on one mapping. First element of the list of indices goes to the first dataframe only and so on. List of indices applies to the rows in the dataframes.
And a list of complementary dataframes needs to created by selecting rows not mentioned in the indices list.
Here is some sample data:
set.seed(1)
A <- data.frame(matrix(rnorm(40,0,1), nrow = 10))
B <- data.frame(matrix(rnorm(40,2,3), nrow = 10))
C <- data.frame(matrix(rnorm(40,3,4), nrow = 10))
dflis <- list(A,B,C)
# Create a sample row index
ix <- lapply(lapply(dflis,nrow), sample, size = 6)
So far I have managed this working but ugly looking code:
dflis.train <- lapply(seq_along(dflis), function(x) dflis[[x]][ix[[x]],])
dflis.test <- lapply(seq_along(dflis), function(x) dflis[[x]][-ix[[x]],])
Can someone suggest something better, more elegant?
Use Map/mapply instead of the univariate lapply, so that you can iterate over both objects and apply a function, like:
Map(function(d,r) d[r,], dflis, ix)
Or if you want to be fancy:
Map(`[`, dflis, ix, TRUE)
Matches your requested answer.
identical(
Map(function(d,r) d[r,], dflis, ix),
lapply(seq_along(dflis), function(x) dflis[[x]][ix[[x]],])
)
#[1] TRUE
Is there any function can solve this kind of different-sized random resampling problem? For example, given a vector, data = c('a','a','b','c','d','e'). I want to randomly resample this vector into 3 groups with different sizes 1 ,3 ,2 respectively. Like
input: samplefunc(data,size = c(1,3,2))
output: c('a') c('a','d','e') c('b','c')
I only found this "sample" function, but it is only for one size sample:
sample(x, size, replace = FALSE, prob = NULL)
size: a non-negative integer giving the number of items to choose.
Since I have to divide the data into many groups(not just 3), if there is an existed function can do that, it will be much easier without the for-loop.
You can easily write your own function using, say, lapply, which would return a list of your samples:
samplefunc <- function(vec, size, ...) lapply(size, function(x) sample(vec, x, ...))
Usage would be as you imagined:
samplefunc(data, c(1, 3, 2))
As #thelatemail suggests, if you wanted to do sampling without replacement, you can try defining samplefunc as:
samplefunc <- function(vec, size) {
temp <- split(vec, sample(rep(size, size)))
temp[match(names(temp), as.character(size))]
}
I would like to convert a vector into a transitions matrix first (which I managed). As a second step I would like apply the resulting function to a dataset where different respondents did different tasks.
As a result I would like to get a List which is nested on Respondent and Task.
Here is an example data frame:
Data <- data.frame(
respondent = c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2),
task = c(1,1,1,1,1,2,2,2,2,2,1,1,1,1,1,2,2,2,2,2),
acquisition = sample(1:5, replace = TRUE)
)
and here my result vector and function that takes the acquisition vector and generates a transition matrix:
result <- matrix(data = 0, nrow = 5, ncol = 5)
gettrans <- function(invec){
for (i in 1:length(invec)-1){
result[invec[i],invec[i+1]] <- result[invec[i], invec[i+1]] + 1
}
return(result)
}
Now, I get a flattened result with
with(Data,aggregate(acquisition,by=list(respondent=respondent,task=task),gettrans))
However what I would like would look something like:
$respondent
[1]$task[1]
result
$respondent
[1]$task[2]
result
...
I played around with dlply but could not get that to work ...
Any suggestions appreciated!
dlply naturally gives you a list (rather than a list of lists). The standard way of calling it would be
(ans_as_list <- dlply(
Data,
.(respondent, task),
summarise,
res = gettrans(acquisition)
))
This should be suitable for most purposes, but if you really must have a list of lists, use llply (or equivalently, lapply) to restructure.
(ans_as_list_of_lists <- llply(levels(factor(Data$respondent)), function(lvl)
{
ans_as_list[grepl(paste("^", lvl, sep = ""), names(ans_as_list))]
}))