Way to reference list names in R S3 object? - r

I'm just learning how to create my own S3 object in R. The class I'm creating is named DAT. It contains a number of matrices that will be populated as the pre-processing of my data ensues. Here's how I define it:
createDAT <- function(M){ # M is a data.matrix
# it's assumed that samples are rows and genes are columns already
z <- list(M_orig <- M, # assumed log2 scale
M_nat <- matrix(), # M_orig on natural scale
M_filt <- matrix(), # after gene filtering
M_scaled <- matrix(), # as fraction of all counts
M_norm <- matrix(), # after gene normalization
ZEROGENES <- list(),
outcome <- list(),
RefSampleName <- character(),
RefSample <- matrix(),
RefSampleUnscaled <- matrix(),
seed <- numeric())
#names(z[[2]]) <- "Nat"
class(z) <- "DAT"
return(z)
}
I'll instantiate this here with the following code:
x <- rnorm(100)
y <- rnorm(100)
df <- data.frame(x, y)
df_wide <- t(df)
data <- createDAT(df_wide)
I have a list of "zero-expressed genes", called ZERO. All I want to do is to add that list to the data instance of DAT. I can successfully do that with the line:
data[[6]] <- ZERO
However, to make things more intuitive, instead of referencing data[[6]], I'd somehow like to use data$ZERO or something.
Is there some way of doing that? I haven't been able to find anything online.
Thank you!!

The problem with your code is that you are using arrows <- (assing operator) within the list function. Use equal signs instead and the elements in your DAT object will be named. That way you will be able to access to its elements using the $ operator as you are expecting.

Related

Find reciprocal row duplicates

(Please feel free to change the title to something more appropriate)
I would like extract all reciprocal pairs from a asymmetric square matrix.
Some dummy data to clarify:
m <- matrix(c(NA,0,1,0,0,-1,NA,1,-1,0,1,1,NA,-1,-1,-1,1,0,NA,0,-1,1,0,0,NA), ncol=5, nrow=5)
colnames(m) <- letters[seq(ncol(m))]
rownames(m) <- letters[seq(nrow(m))]
require(reshape2)
m.m <- melt(m) # get all pairs
m.m <- m.m[complete.cases(m.m),] # remove NAs
How would I now extract all "reciprocal duplicates" from m.m (or directly from m)?
This is what I mean with reciprocal duplicate:
Var1 Var2 value
b a 0
a b -1
And I would like to store each value combination, i.e. {1,1},{-1,-1},{1,0},{-1,0},{0,0} in a list with its Var combination {a,b},{a,c},{a,d},{a,e},{b,c},{b,d},{b,e},{c,d},{c,e},{d,e} pointing to it, something like
$`a,b`
[1] 0,-1
I haven't manage to solve this. Feel like it could be possible with merge() or inner_join. Also, I apologize for not providing the best example.
Any pointers would be highly appreciated.
Here's an approach based on the object m.m:
# extract the unique combinations
levs <- apply(m.m[-3], 1, function(x) paste(sort(x), collapse = ","))
# create a list of values for these combinations
split(m.m$value, levs)
Using the matrix representation, you can get vectors of each triangle of the matrix (which align as you wish) using:
m[upper.tri(m)]
t(m)[upper.tri(m)]
To name them:
nm <- matrix(paste("(",rep(rownames(m),times=nrow(m)), ",",rep(rownames(m),each=nrow(m)),")",sep=""), nrow=nrow(m))
nm[as.vector(upper.tri(m))]
Finally to convert to a list as you wish. First I put them in a new 2 x 10 matrix. Then I used lapply to create the list structure.
pairs<- cbind(m[upper.tri(m)], t(m)[upper.tri(m)] )
rownames(pairs) <- nm[as.vector(upper.tri(m))]
pairs
m.list <- lapply(seq_len(nrow(pairs)),function(i) pairs[i,])
names(m.list) <- rownames(pairs)
m.list

Generate a function based upon name data in a csv file opened in R

I have a dataframe of variable names and weightings.
Example:
Names <- c("a","b","c")
Weightings <- c(1,2,3)
df <- cbind(Names,Weightings)
I need to generate a function from this data as such.
myfun <- function(x,data){
data[x,"a"]*1+data[x,"b"]*2+data[x,"c"]*3}
I have another dataframe named data where the column names match a, b, and c and I will apply myfun to this data over all rows.
The issue I have is that the size of the Names and Weightings vector can vary. I could be working with 5 names and Weightings but I want it to generate the new function "myfun" as such.
Newnames <- c("a","b","c","d","e")
NewWeightings <- c(1,2,3,4,5)
myfun <- function(data){
data[x,"a"]*1+data[x,"b"]*2+data[x,"c"]*3+data[x,"d"]*4+data[x,"e"]*5}
Is there an easy way to automate the creation of this function so I could give someone the code, and a .csv file of column names and weightings and they could generate their new function.
What about a strategy like this. We use a function to make a function
getMyFunction <- function(columns, weights) {
stopifnot(length(columns)==length(weights))
function(x, data) {
rowSums(data[x, columns] * weights)
}
}
Basically the rowSums takes care of the addition, we specify a vector of columns all at once, and the default * is element-wise so that takes care of the weights.
Then we build a function like
Names <- c("a","b","c")
Weightings <- c(1,2,3)
myFun <- getMyFunction(Names, Weightings)
and we can use it with
dd<-data.frame(a=c(1,1), b=c(1,2), c=c(1,3))
myFun(1,dd)
# [1] 6
myFun(2,dd)
# [1] 13
myFun(1:2,dd)
# [1] 6 13

How to convert a List to Data Frame, but removing the internal List structure?

It sounds simple, but I have many problems trying to convert a List to Data Frame.
I did it with the as.data.frame function and it works, but when I use the str function the internal structure still remains with the List structure. And I would like to select a specific column to work in it.
There is some easy way to convert a List to Data Frame, but with a new data frame structure?
I have tryied also unlisting my List into a matrix but I lose the colnames and rownames, and I have to put it again manually.
For example that is my List, and I would like to use and plot the mystats$p.value column:
library(gtools)
x <- rnorm(100, sd=1)
y <- rnorm(100, sd =2)
mystats <- t(running(x, y, fun = cor.test, width=5, by=5))
Thanks
If and only if it's a list of data.frames you can use do.call
al <- split(airquality, airquality$Month)
sapply(al, class)
same.airquality <- do.call(rbind, al)
Here the list elements have the same structure of columns (for list that "splits" different variables across list elements, each the same nrow), you can use
do.call(cbind, another.list)
Finally (but not tested) with this approach you could try package abind
EDIT
After the example provided i understand a little more your setting: you shoud sanitize a bit the call to cor.test because with running it messes data a bit (currently you are trying to put a list, a complex data structure, in a matrix like object)
foobar <- function(x,y) {
my.test <- cor.test(x,y)
## look at values returned by names(cor.test) or ?cor.test for
## which object you can export
c(my.test$statistic, my.test$p.value, my.test$conf.int)
}
## mystats is a matrix
mystats <- as.data.frame(t(running(x, y, fun = foobar, width=5, by=5)))
names(mystats) <- c("statistic", "p.value", "low.ci", "up.ci")
mystats$p.value
If you have multiple objects like this one, eg
mystats$row <- row.names(mystats)
mystats$rep <- 1
row.names(mystats) <- NULL
mystats2 <- mystats
mystats2$rep <- 2
asd <- list(mystats, mystats2)
foo <- do.call("rbind", asd )
foo
foo$p.value
HTH

how to retrieve, extract value from data frame, then assign to object to be used for indexing

I am trying to write a function to allow the user to input various parameters ("means", "temp", etc.), which are then compiled into a list so that I can work with them. Values from the list are extracted and assigned to objects.
param <- function(){
parameters <- c("means","variance","temp","gene","add")
x <- list()
for(i in 1:length(parameters)){
z <- readline(paste("Input ", parameters[i], ": ", sep=""))
#store inputs in vector x
x[[z]] <- z
#x[[i]] <- z
#x[i] <- z
} #End of for loop to bring in inputs
#Bind results into a vector
y=c(x[[1]],x[[2]],x[[3]],x[[4]],x[[5]])
#Assign answers to proper variable names so they can be used to index.
means <- y[1]; variance <- y[2]; temp <- y[3]; gene <- y[4]; add <- y[5]
#means <- x[[1]]; variance <- x[[2]]; temp <- x[[3]]; gene <- x[[4]]; add <- x[[5]]
} #End of function to bring in inputs
I want to use the objects to index a larger data frame (strn.x).
yav=strn.x[NT==temp, means]
I've referenced https://stat.ethz.ch/pipermail/r-help/2008-April/158757.html, to learn how to assign user inputs to objects, and I recognize (based on Retrieving specific values from subsetting a data.table) that part of my problem is the class the objects take on. I've attempted to use "[[" to extract the values without the name, but they still come out as class 'list' or 'data.frame' depending on what I include in the function above.
means
V1
1 FT_mn
class(means)
[1] "data.frame"
I'd like them to take on the form:
[1] FT_mn
What do I need to do to get this to work?
Try unlist() to convert a data.frame or list into a vector.

Calculate statistics (e.g. average) across cells of identical data-frames

I am having a list of identically sorted dataframes. More specific these are the imputed dataframes which I get after doing Multiple imputations with the AmeliaII package. Now I want to create a new dataframe that is identical in structure, but contains the mean values of the cells calculated across the dataframes.
The way I achieve this at the moment is the following:
## do the Amelia run ------------------------------------------------------------
a.out <- amelia(merged, m=5, ts="Year", cs ="GEO",polytime=1)
## Calculate the output statistics ----------------------------------------------
left.side <- a.out$imputations[[1]][,1:2]
a.out.ncol <- ncol(a.out$imputations[[1]])
a <- a.out$imputations[[1]][,3:a.out.ncol]
b <- a.out$imputations[[2]][,3:a.out.ncol]
c <- a.out$imputations[[3]][,3:a.out.ncol]
d <- a.out$imputations[[4]][,3:a.out.ncol]
e <- a.out$imputations[[5]][,3:a.out.ncol]
# Calculate the Mean of the matrices
mean.right <- apply(abind(a,b,c,d,e,f,g,h,i,j,along=3),c(1,2),mean)
# recombine factors with values
mean <- cbind(left.side,mean.right)
I suppose that there is a much better way of doing this by using apply, plyr or the like, but as a R Newbie I am really a bit lost here. Do you have any suggestions how to go about this?
Here's an alternate approach using Reduce and plyr::llply
dfr1 <- data.frame(a = c(1,2.5,3), b = c(9.0,9,9), c = letters[1:3])
dfr2 <- data.frame(a = c(5,2,5), b = c(6,5,4), c = letters[1:3])
tst = list(dfr1, dfr2)
require(plyr)
tst2 = llply(tst, function(df) df[,sapply(df, is.numeric)]) # strip out non-numeric cols
ans = Reduce("+", tst2)/length(tst2)
EDIT. You can simplify your code considerably and accomplish what you want in 5 lines of R code. Here is an example using the Amelia package.
library(Amelia)
data(africa)
# carry out imputations
a.out = amelia(x = africa, cs = "country", ts = "year", logs = "gdp_pc")
# extract numeric columns from each element of a.out$impuations
tst2 = llply(a.out$imputations, function(df) df[,sapply(df, is.numeric)])
# sum them up and divide by length to get mean
mean.right = Reduce("+", tst2)/length(tst2)
# compute fixed columns and cbind with mean.right
left.side = a.out$imputations[[1]][1:2]
mean0 = cbind(left.side,mean.right)
If I understand your question correctly, then this should get you a long way:
#set up some data:
dfr1<-data.frame(a=c(1,2.5,3), b=c(9.0,9,9))
dfr2<-data.frame(a=c(5,2,5), b=c(6,5,4))
tst<-list(dfr1, dfr2)
#since all variables are numerical, use a threedimensional array
tst2<-array(do.call(c, lapply(tst, unlist)), dim=c(nrow(tst[[1]]), ncol(tst[[1]]), length(tst)))
#To see where you're at:
tst2
#rowMeans for a threedimensional array and dims=2 does the mean over the last dimension
result<-data.frame(rowMeans(tst2, dims=2))
rownames(result)<-rownames(tst[[1]])
colnames(result)<-colnames(tst[[1]])
#display the full result
result
HTH.
After many attempts, I've found a reasonably fast way to calculate cells' means across multiple data frames.
# First create an empty data frame for storing the average imputed values. This
# data frame will have the same dimensions of the original one
imp.df <- df
# Then create an array with the first two dimensions of the original data frame and
# the third dimension given by the number of imputations
a <- array(NA, dim=c(nrow(imp.df), ncol(imp.df), length(a.out$imputations)))
# Then copy each imputation in each "slice" of the array
for (z in 1:length(a.out$imputations)) {
a[,,z] <- as.matrix(a.out$imputations[[z]])
}
# Finally, for each cell, replace the actual value with the mean across all
# "slices" in the array
for (i in 1:dim(a)[1]) {
for (j in 1:dim(a)[2]) {
imp.df[i, j] <- mean(as.numeric(a[i, j,]))
}}

Resources