R transition matrix into List of Lists - r

I would like to convert a vector into a transitions matrix first (which I managed). As a second step I would like apply the resulting function to a dataset where different respondents did different tasks.
As a result I would like to get a List which is nested on Respondent and Task.
Here is an example data frame:
Data <- data.frame(
respondent = c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2),
task = c(1,1,1,1,1,2,2,2,2,2,1,1,1,1,1,2,2,2,2,2),
acquisition = sample(1:5, replace = TRUE)
)
and here my result vector and function that takes the acquisition vector and generates a transition matrix:
result <- matrix(data = 0, nrow = 5, ncol = 5)
gettrans <- function(invec){
for (i in 1:length(invec)-1){
result[invec[i],invec[i+1]] <- result[invec[i], invec[i+1]] + 1
}
return(result)
}
Now, I get a flattened result with
with(Data,aggregate(acquisition,by=list(respondent=respondent,task=task),gettrans))
However what I would like would look something like:
$respondent
[1]$task[1]
result
$respondent
[1]$task[2]
result
...
I played around with dlply but could not get that to work ...
Any suggestions appreciated!

dlply naturally gives you a list (rather than a list of lists). The standard way of calling it would be
(ans_as_list <- dlply(
Data,
.(respondent, task),
summarise,
res = gettrans(acquisition)
))
This should be suitable for most purposes, but if you really must have a list of lists, use llply (or equivalently, lapply) to restructure.
(ans_as_list_of_lists <- llply(levels(factor(Data$respondent)), function(lvl)
{
ans_as_list[grepl(paste("^", lvl, sep = ""), names(ans_as_list))]
}))

Related

Using lapply instead of a for loop to add information to an empty dataframe

Like the title says, I wish to use lapply instead of a for loop to parse data from a data frame and put it into an empty data frame. My motivation is that the data frame I'm parsing contains thousands of genes and I've read that the apply functions are faster at iterating through large tables.
### My data table ###
rawCounts <- data.frame(ensembl_gene_id_version = c('ENSG00000000003.15', 'ENSG00000000005.6', 'ENSG00000000419.14'),
HS1 = c(1133, 0, 1392),
HS2 = c(900, 0, 1155),
HS3 = c(1251, 0, 2011),
HS4 = c(785, 0, 1022),
stringsAsFactors = FALSE)
## Function
extract_counts <- function(df, esdbid){
counts <- data.frame()
plyr::ldply(esdbid, function(i) {counts <- df[grep(pattern = i, x = df),] %>% rbind()})
return(counts)
}
## Call the first one
extract_counts(df = rawCounts, esdbid = c('ENSG00000000003.15'))
I want this to return a data frame, so I used the plyr::ldply function from this post - Extracting outputs from lapply to a dataframe
However, it isn't returning anything. Eventually I want to scale up my esdbid vector to include multiple values; such as any combination of gene IDs to quickly retrieve the gene counts.
Strangely, when I run this in the console it appears to work as intended for a vector of length 1, i.e.;
esdbid <- 'ENSG00000000003.15'
plyr::ldply(esdbid, function(i) {counts <- rawCounts[grep(pattern = i, x = rawCounts),] %>% rbind()})
Returns a data frame with the correct values. However, when I increase the length of the vector it returns only the first value for each row. For example if esdbid <- c('ENSG00000000003.15', 'ENSG00000000005.6', 'ENSG00000000419.14') then the console code will return the values for ENSG00000000003.15 three times.
Maybe subset can handle this more effectively?
extract_counts <- function(.data, esdbid) {
subset(.data, grepl(esdbid, .data))
}
esdbid <- "ENSG00000000003.15"
df |> extract_counts(esdbid)
Then you can use lapply if you want a list with all dataframe subsets:
lapply(
unique(df$ensembl_gene_id_version),
function(id) { df |> extract_counts(id) }
)

T test for two lists of matrices

I have two lists of matrices (list A and list B, each matrix is of dimension 14x14, and list A contains 10 matrices and list B 11)
and I would like to do a t test for each coordinate to compare the means of each coordinate of group A and group B.
As a result I would like to have a matrix of dimension 14x14 which contains the p value associated with each t test.
Thank you in advance for your answers.
Here's a method using a for loop and then applying the lm() function.
First we'll generate some fake data as described in the question.
#generating fake matrices described by OP
listA <- vector(mode = "list", length = 10)
listB <- vector(mode = "list", length = 10)
for (i in 1:10) {
listA[[i]] <- matrix(rnorm(196),14,14,byrow = TRUE)
}
for (i in 1:11) {
listB[[i]] <- matrix(rnorm(196),14,14,byrow = TRUE)
}
Then we'll unwrap each matrix as described by dcarlson in a for loop.
Unwrapped.Mats <- NULL
for (ID in 1:10) {
unwrapped <- as.vector(as.matrix(listA[[ID]])) #Unwrapping each matrix into a vector
withID <- c(ID, "GroupA", unwrapped) #labeling it with ID# and which group it belongs to
UnwrappedCorMats <- rbind(Unwrapped.Mats, withID)
}
for (ID in 1:11) {
unwrapped <- as.vector(as.matrix(listB[[ID]]))
withID <- c(PID, "GroupB", unwrapped)
UnwrappedCorMats <- rbind(UnwrappedCorMats, withID)
}
Then write and apply a function to run lm(). lm() is statistically equivalent to an unpaired t-test in this context but I'm using it to be more easily adapted into a mixed effect model if anyone wants to add mixed effects.
UnwrappedDF <- as.data.frame(UnwrappedCorMats)
lmPixel2Pixel <- function(i) { #defining function to run lm
lmoutput <- summary(lm(i ~ V2, data= UnwrappedDF))
lmoutputJustP <- lmoutput$coefficients[2,4] #Comment out this line to return full lm output rather than just p value
}
Vector_pvals <- sapply(UnwrappedDF[3:length(UnwrappedDF)], lmPixel2Pixel)
Finally we will reform the vector into the same shape as the original matrix for more easily interpreting results
LM_mat.again <- as.data.frame(matrix(Vector_pvals, nrow = nrow(listA[[1]]), ncol = ncol(listA[[1]]), byrow = T))
colnames(LM_mat.again) <- colnames(listA[[1]]) #if your matrix has row or column names then restoring them is helpful for interpretation
rownames(LM_mat.again) <- colnames(listB[[1]])
head(LM_mat.again)
I'm sure there are faster methods but this one is pretty straight forward and I was surprised there weren't answers for this type of operation yet

how to use apply (or sapply) with columns of matrix or dataframe as function args

I know this is a bonehead newbie question, but I've been trying to figure it out for quite awhile and need some input. Basically, I'm trying to learn how to use the apply family to omit for loops, specifically how to set up the call so that columns of a matrix serve as arguments to the function. I'll use a simple call to the rbinom function as an example.
Example: this for loop works fine. The data are a set of integers and a set of probabilities
success <- rep(-1, times=10) # initialize result var
num <- sample.int(20, 10) # get 10 random integers
p <- runif(10) # get 10 random probabilities
for (i in 1:10) {
success[i]= rbinom(n=1, size=num[i],prob=p[i]) # number successes in 1 trial
}
But how to do the same thing with the apply family? I first put the data into 2 columns of a matrix, thinking that was the right start. However, the following does NOT work, obviously due to my
poor understanding of how to set up a call to apply.
myData <- matrix(nrow=10, ncol=2)
myData[,1] <- num
myData[,2] <- p
success <- apply(myData, rbinom, n=1, size=myData[,1], prob=myData[,2])
Any tips are greatly appreciated! I'm coming to R from Fortran, and trying to port over a lot of code that is loaded with DO loops, so I really need to get my head around this.
lapply, sapply, apply only deal with one vector/list at a time. That is, apply will only call its function for one column at a time. What you need is mapply or Map.
myData <- matrix(nrow=10, ncol=2)
myData[,1] <- num
myData[,2] <- p
mapply(rbinom, n = 1, myData[,1], myData[,2])
# [1] 5 4 11 8 3 3 17 8 0 11
Just like lapply returns a list, so does Map; similarly, just like sapply, mapply will return a vector or array if all return values are compatible, otherwise it returns a list as well.
These calls are equivalent:
sapply(1:3, function(z) z + 1)
mapply(function(z) z + 1, 1:3)
but mapply and Map allow arbitrary number of lists/vectors, so for instance
func <- function(X,Y,Z) X^2+2*Y-Z
Map(func, 1:9, 11:19, 21:29)
## effectively the same as
list(
func(1, 11, 21),
func(2, 12, 22),
func(3, 13, 33),
...,
func(9, 19, 29)
)
The equivalent call of that with sapply for your data would be
sapply(seq_len(nrow(myData)), function(ind) {
rbinom(n = 1, size = myData[ind,1], prob = myData[ind,2])
})
though I personally feel that mapply is easier to read.

R Convert loop into function

I would like to clean up my code a bit and start to use more functions for my everyday computations (where I would normally use for loops). I have an example of a for loop that I would like to make into a function. The problem I am having is in how to step through the constraint vectors without a loop. Here's what I mean;
## represents spectral data
set.seed(11)
df <- data.frame(Sample = 1:100, replicate(1000, sample(0:1000, 100, rep = TRUE)))
## feature ranges by column number
frm <- c(438,563,953,963)
to <- c(548,803,1000,993)
nm <- c("WL890", "WL1080", "WL1400", "WL1375")
WL.ps <- list()
for (i in 1:length(frm)){
## finds the minimum value within the range constraints and returns the corresponding column name
WL <- colnames(df[frm[i]:to[i]])[apply(df[frm[i]:to[i]],1,which.min)]
WL.ps[[i]] <- WL
}
new.df <- data.frame(WL.ps)
colnames(new.df) <- nm
The part where I iterate through the 'frm' and 'to' vector values is what I'm having trouble with. How does one go from frm[1] to frm[2].. so-on in a function (apply or otherwise)?
Any advice would be greatly appreciated.
Thank you.
You could write a function which returns column name of minimum value in each row for a particular range of columns. I have used max.col instead of apply(df, 1, which.min) to get minimum value in a row since max.col would be efficient compared to apply.
apply_fun <- function(data, x, y) {
cols <- x:y
names(data[cols])[max.col(-data[cols])]
}
Apply this function using Map :
WL.ps <- Map(apply_fun, frm, to, MoreArgs = list(data = df))

All combinations of two-way tables

How can I generate all two way tables from a data frame in R?
some_data <- data.frame(replicate(100, base::sample(1:4, size = 50, replace = TRUE)))
combos <- combn(names(some_data), 2)
The following does not work, was planning to wrap a for loop around it and store results from each iteration somewhere
i=1
table(some_data[combos[, i][1]], some_data[combos[, i][2]])
Why does this not work? individual arguments evaluate as expected:
some_data[combos[, i][1]]
some_data[combos[, i][2]]
Calling it with the variable names directly yields the desired result, but how to loop through all combos in this structure?
table(some_data$X1, some_data$X2)
With combn, there is the FUN argument, so we can use that to extract the 'some_data' and then get the table output in an array
out <- combn(names(some_data), 2, FUN = function(i) table(some_data[i]))
Regarding the issue in the OP's post
table(some_data[combos[, i][1]], some_data[combos[, i][2]])
Both of them are data.frames, we can extract as a vector and it should work
table(some_data[, combos[, i][1]], some_data[, combos[, i][2]])
^^ ^^
or more compactly
table(some_data[combos[, i]])
Update
combn by default have simplify = TRUE, that is it would convert the output to an array. Suppose, if we have combinations that are not symmetric, then this will result in different dimensions of the table output unless we convert it to factor with levels specified. An array can hold only a fixed dimensions. If some of the elements changes in dimension, it result in error as it is an array. One way is to use simplify = FALSE to return a list and list doesn't have that restriction.
Here is an example where the previous code fails
set.seed(24)
some_data2 <- data.frame(replicate(5, base::sample(1:10, size = 50,
replace = TRUE)))
some_data <- data.frame(some_data, some_data2)
out1 <- combn(names(some_data), 2, FUN = function(i)
table(some_data[i]), simplify = FALSE)
is.list(out1)
#[1] TRUE
length(out1)
#[1] 5460

Resources