I'm working on single cell rna-seq on Seurat and I'm trying to make a for() loop over Seurat objects to draw several heatmaps of average gene expression.
for(i in c(seuratobject1, seuratobject2, seuratobject3)){
cluster.averages <- data.frame(AverageExpression(i, features = genelist))
cluster.averages$rowmeans <- rowMeans(cluster.averages)
genelist.new <- as.list(rownames(cluster.averages))
cluster.averages <- cluster.averages[order(cluster.averages$rowmeans),]
HMP.ordered <- DoHeatmap(i, features = genelist.new, size = 3, draw.lines = T)
ggsave(HMP.ordered, file=paste0(i, ".HMP.ordered.png"), width=7, height=30)
the ggsave line does not work as it takes i as a seurat object. Hence my question: How to get ggsave() to use the name of my seurat object stored in "i"?
I tried substitute(i) and deparse(substitute(i)) w/o success.
Short answer: you can’t.
Long answer: using substitute or similar to try to get i’s name will give you … i. (This is different for function arguments, where substitute(arg) gives you the call’s argument expression.)
You need to use a named vector instead. Ideally you’d have your Seurat objects inside a list to begin with. But to create such a list on the fly, you can use get:
names = c('seuratobject1', 'seuratobject2', 'seuratobject3')
for(i in names) {
cluster.averages <- data.frame(AverageExpression(get(i), features = genelist))
# … rest is identical …
}
That said, I generally advocate strongly against the use of get and for treating the local environment as a data structure. Lists and vectors are designed to be used in this situation instead.
Related
I have 7 large seurat objects, saved as sn1, sn2, sn3 ... sn7
I am trying to do scaledata on all 7 samples. I could write the same line 7 times as:
all.genes <- rownames(sn1)
snN1<-ScaleData(sn1, features = all.genes)
all.genes <- rownames(sn2)
snN2<-ScaleData(sn2, features = all.genes)
all.genes <- rownames(sn2)
snN2<-ScaleData(sn2, features = all.genes)
.
.
.
This would work perfectly. Since I have to use all 7 samples for quite a while still, I thought I'd save time and make a for loop to do the job with one line of code, but I am unable to save the varables, getting an error "Error in { : target of assignment expands to non-language object".
This is what I tried:
samples<-c("sn1", "sn2", "sn3", "sn4", "sn5", "sn6", "sn7")
list<-c("snN1", "snN2", "snN3", "snN4", "snN5", "snN6", "snN7")
for (i in samples) {
all.genes <- rownames(get(i))
list[1:7]<-ScaleData(get(i), features = all.genes)
}
How do I have to format the code so it could create varables snN1, snN2, snN3 and save scaled data from sn1, sn2, sn3... to each respective new variable?
I think the error is in this line: list[1:7]<-ScaleData(get(i), features = all.genes). You are saying to the for-loop to reference the output of the function ScaleData, to the 7 string variables in the list, which makes no sense. I think you are looking for the function assign(), but it is recommended to use it in very specific contexts.
Also, there're better methods that for-loops in R, for example apply() and related functions. I recommend to you to create as a custom function the steps you want to apply, and then call lapply() to iteratively - as a for-loop would do - change every variable and store it in a list. To call every 'snX' variable as the input you can reference them in a list that direct to them.
# Custom function
custom_scale <- function(x){
all.genes <- rownames(x)
y = ScaleData(x, features = all.genes)
}
# Apply custom function and return saved in a list
# Create a list that directo to every variable
samples = list(sn1, sn2, sn3, sn4, sn5, sn6, sn7) # Note I'm not using characters, I'm referencing the actual variable.
# use lapply to iterate over the list and apply your custom function, saving the result as your list
scaled_Data_list = lapply(samples, function(x) custom_scale(x))
This should work, however without an example data I can't test it.
Here is how to do it using a loop and assign. I removed some redundant code/variables as this can always be a source of error. However, I agree with RobertoT that storing such data in a list and using lapply is a good idea.
samples <- paste0('sn', 1:7)
for (sn in samples) {
sn.data <- get(sn)
assign(sub('n', 'nN', sn),
ScaleData(sn.data, features=rownames(sn.data)))
}
This question already has answers here:
How to assign from a function which returns more than one value?
(16 answers)
Closed 6 years ago.
How can I return multiple objects in an R function? In Java, I would make a Class, maybe Person which has some private variables and encapsulates, maybe, height, age, etc.
But in R, I need to pass around groups of data. For example, how can I make an R function return both an list of characters and an integer?
Unlike many other languages, R functions don't return multiple objects in the strict sense. The most general way to handle this is to return a list object. So if you have an integer foo and a vector of strings bar in your function, you could create a list that combines these items:
foo <- 12
bar <- c("a", "b", "e")
newList <- list("integer" = foo, "names" = bar)
Then return this list.
After calling your function, you can then access each of these with newList$integer or newList$names.
Other object types might work better for various purposes, but the list object is a good way to get started.
Similarly in Java, you can create a S4 class in R that encapsulates your information:
setClass(Class="Person",
representation(
height="numeric",
age="numeric"
)
)
Then your function can return an instance of this class:
myFunction = function(age=28, height=176){
return(new("Person",
age=age,
height=height))
}
and you can access your information:
aPerson = myFunction()
aPerson#age
aPerson#height
Is something along these lines what you are looking for?
x1 = function(x){
mu = mean(x)
l1 = list(s1=table(x),std=sd(x))
return(list(l1,mu))
}
library(Ecdat)
data(Fair)
x1(Fair$age)
You can also use super-assignment.
Rather than "<-" type "<<-". The function will recursively and repeatedly search one functional level higher for an object of that name. If it can't find one, it will create one on the global level.
You could use for() with assign() to create many objects.
See the example from assign():
for(i in 1:6) { #-- Create objects 'r.1', 'r.2', ... 'r.6' --
nam <- paste("r", i, sep = ".")
assign(nam, 1:i)
Looking the new objects
ls(pattern = "^r..$")
One way to handle this is to put the information as an attribute on the primary one. I must stress, I really think this is the appropriate thing to do only when the two pieces of information are related such that one has information about the other.
For example, I sometimes stash the name of "crucial variables" or variables that have been significantly modified by storing a list of variable names as an attribute on the data frame:
attr(my.DF, 'Modified.Variables') <- DVs.For.Analysis$Names.of.Modified.Vars
return(my.DF)
This allows me to store a list of variable names with the data frame itself.
I want to use function for repetitively making up set with different names.
for example, if I have 5 random vectors.
number1<-sample(1:10, 3)
number2<-sample(1:10, 3)
number3<-sample(1:10, 3)
number4<-sample(1:10, 3)
number5<-sample(1:10, 3)
Then, I will use these vectors for selecting rows in raw data set(i.e. dataframe)
testset1<-raw[number1,]
testset2<-raw[number2,]
testset3<-raw[number3,]
tsetset4<-raw[number4,]
testset5<-raw[number5,]
It takes lot of spaces in manuscript for writing up each commands. I'm trying to shorten these commands with using 'function'
However, I found that it is hard to use variables in a function statement for writing 'text argument'. For example, it is easy to use variables like this.
mean_function<-function(x){
mean(x)
}
But, I want to use function like this.
testset "number with 1-5" <-raw[number"number 1-5",]
I would really appreciate your help.
You don't need to create a function for this task, simply use lapply to loop over the list of elements produced by mget(), then set some names and finally put all results in the global environment:
rowSelected <-lapply(mget(paste0("number", 1:5)), function(x) raw[x, ])
names(rowSelected) <- paste0("testset", 1:5)
list2env(rowSelected, envir = .GlobalEnv)
I have a list of data frames. I want to use lapply on a specific column for each of those data frames, but I keep throwing errors when I tried methods from similar answers:
The setup is something like this:
a <- list(*a series of data frames that each have a column named DIM*)
dim_loc <- lapply(1:length(a), function(x){paste0("a[[", x, "]]$DIM")}
Eventually, I'll want to write something like results <- lapply(dim_loc, *some function on the DIMs*)
However, when I try get(dim_loc[[1]]), say, I get an error: Error in get(dim_loc[[1]]) : object 'a[[1]]$DIM' not found
But I can return values from function(a[[1]]$DIM) all day long. It's there.
I've tried working around this by using as.name() in the dim_loc assignment, but that doesn't seem to do the trick either.
I'm curious 1. what's up with get(), and 2. if there's a better solution. I'm constraining myself to the apply family of functions because I want to try to get out of the for-loop habit, and this name-as-list method seems to be preferred based on something like R- how to dynamically name data frames?, but I'd be interested in other, more elegant solutions, too.
I'd say that if you want to modify an object in place you are better off using a for loop since lapply would require the <<- assignment symbol (<- doesn't work on lapply`). Like so:
set.seed(1)
aList <- list(cars = mtcars, iris = iris)
for(i in seq_along(aList)){
aList[[i]][["newcol"]] <- runif(nrow(aList[[i]]))
}
As opposed to...
invisible(
lapply(seq_along(aList), function(x){
aList[[x]][["newcol"]] <<- runif(nrow(aList[[x]]))
})
)
You have to use invisible() otherwise lapply would print the output on the console. The <<- assigns the vector runif(...) to the new created column.
If you want to produce another set of data.frames using lapply then you do:
lapply(seq_along(aList), function(x){
aList[[x]][["newcol"]] <- runif(nrow(aList[[x]]))
return(aList[[x]])
})
Also, may I suggest the use of seq_along(list) in lapply and for loops as opposed to 1:length(list) since it avoids unexpected behavior such as:
# no length list
seq_along(list()) # prints integer(0)
1:length(list()) # prints 1 0.
I am trying to create (vector) objects in R. Thereby, I want to achieve that I don't specify a priori the name of the object. For example if I have a list of length 3, I want to create the objects p1 to p3 and if I have a list of length 10, the objects p1to p10 have to be created. The length should be arbitrary and not a priori determined.
Thanks for your help!
I guess the proper way of doing that is to consider a list p = list() and then you can use p[[i]] with i as big as you wish without having specified any length.
Then once your list is filled up, you can rename it: names(p) = paste0("p",c(1:length(p)))
Finally, if you want to get all the pi variables directly accessible, you add attach(p)
This is kind of a hack but you can do the following
short_list <- list(rnorm(10),rnorm(20),1:3)
long_list <- c(short_list,short_list )
paste0("p",seq_along(short_list))
mapply(assign, paste0("p",seq_along(short_list)), short_list, MoreArgs = list(envir = .GlobalEnv))
result:
> p3
[1] 1 2 3
you can do the same with long_list
I dont see a statistical model you will need this. Better start working with lists like short_list or data.frame's directly.
PS If you just want to use it for glm you probably want to learn formula's in R.
glm(y~., data=your_data) takes all columns in your data-frame that are not named y as regressor. Maybe this helps.
assign (and maybe also attach) are often a sign that you have not yet arrived at an "Rish" version of the code.
Considering that you need this for modeling: if your $p_1 \cdot p_n$ are of the same type, you can put them into a matrix (inside a column of a data.frame; for modeling they anyways need to be of same length):
df$matrix <- p.matrix
If you directly create the data.frame, you need to make sure the matrix is not expanded to data.frame columns:
df <- data.frame (matrix = I (matrix), ...)
Then glm (y ~ matrix, ...) will work.
For examples of this technique see e.g. packages pls or hyperSpec or the pls paper in the Journal of Statistical Software.