This question already has answers here:
How to assign values to dynamic names variables
(2 answers)
Closed 7 years ago.
I keep running into situations where I want to dynamically create variables using a for loop (or similar / more efficient construct using dplyr perhaps). However, it's unclear to me how to do it right now.
For example, the below shows a construct that I would intuitively expect to generate 10 variables assigned numbers 1:10, but it doesn't work.
for (i in 1:10) {paste("variable",i,sep = "") = i}
The error
Error in paste("variable", i, sep = "") = i :
target of assignment expands to non-language object
Any thoughts on what method I should use to do this? I assume there are multiple approaches (including a more efficient dplyr method). Full disclosure: I'm relatively new to R and really appreciate the help. Thanks!
I've run into this problem myself many times. The solution is the assign command.
for(i in 1:10){
assign(paste("variable", i, sep = ""), i)
}
If you wanted to get everything into one vector, you could use sapply. The following code would give you a vector from 1 to 10, and the names of each item would be "variable i," where i is the value of each item. This may not be the prettiest or most elegant way to use the apply family for this, but I think it ought to work well enough.
var.names <- function(x){
a <- x
names(a) <- paste0("variable", x)
return(a)
}
variables <- sapply(X = 1:10, FUN = var.names)
This sort of approach seems to be favored because it keeps all of those variables tucked away in one object, rather than scattered all over the global environment. This could make calling them easier in the future, preventing the need to use get to scrounge up variables you'd saved.
No need to use a loop, you can create character expression with paste0 and then transform it as uneveluated expression with parse, and finally evaluate it with eval.
eval(parse(text = paste0("variable", 1:10, "=",1:10, collapse = ";") ))
The code you have is really no more useful than a vector of elements:
x<-1
for(i in 2:10){
x<-c(x,i)
}
(Obviously, this example is trivial, could just use x<-1:10 and be done. I assume there's a reason you need to do non-vectored calculations on each variable).
Related
I want to use function for repetitively making up set with different names.
for example, if I have 5 random vectors.
number1<-sample(1:10, 3)
number2<-sample(1:10, 3)
number3<-sample(1:10, 3)
number4<-sample(1:10, 3)
number5<-sample(1:10, 3)
Then, I will use these vectors for selecting rows in raw data set(i.e. dataframe)
testset1<-raw[number1,]
testset2<-raw[number2,]
testset3<-raw[number3,]
tsetset4<-raw[number4,]
testset5<-raw[number5,]
It takes lot of spaces in manuscript for writing up each commands. I'm trying to shorten these commands with using 'function'
However, I found that it is hard to use variables in a function statement for writing 'text argument'. For example, it is easy to use variables like this.
mean_function<-function(x){
mean(x)
}
But, I want to use function like this.
testset "number with 1-5" <-raw[number"number 1-5",]
I would really appreciate your help.
You don't need to create a function for this task, simply use lapply to loop over the list of elements produced by mget(), then set some names and finally put all results in the global environment:
rowSelected <-lapply(mget(paste0("number", 1:5)), function(x) raw[x, ])
names(rowSelected) <- paste0("testset", 1:5)
list2env(rowSelected, envir = .GlobalEnv)
This question already has an answer here:
BioMart: Is there a way to easily change the species for all of my code?
(1 answer)
Closed 4 years ago.
Is there any way to use a loop to write this code? Each line of code is identical except from the species name
ensembl_hsapiens <- useMart("ensembl",
dataset = "hsapiens_gene_ensembl")
ensembl_mouse <- useMart("ensembl",
dataset = "mmusculus_gene_ensembl")
ensembl_chicken <- useMart("ensembl",
dataset = "ggallus_gene_ensembl")
Here's an approach. Note that using a loop (or a loop-equivalent construct) to populate the global environment isn't often a good idea. But it's what you asked for.
There's nothing special about useMart, so I'll make up a nonsense function that takes two character arguments:
foo <- function(x, y) {
nchar(paste(x, y))
}
Here are the species names. I'll use them for the object names as well.
species <- c("hsapiens", "mmusculus", "ggallus")
Now, you want to create three named objects in the global environment. You can use the assign function for this, noting that you use pos=2 because each loop of lapply is done in its own environment.
lapply(species, function(s) assign(paste0("ensembl_", s),
foo("ensemble", paste0(s, "_gene_ensembl")),
pos = 1))
This gives you what you want. You can replace foo use useMart.
Now, is this a good idea? Perhaps not. I would be more inclined to keep the objects themselves in a list.
objs <- lapply(species, function(s) foo("ensemble", paste0(s, "_gene_ensembl")))
names(objs) <- paste0("ensemble_", species)
You can access them using statements like objs$ensemble_hsapiens or objs[["ensemble_hsapiens"]]
here is how I created number of data sets with names data_1,data_2,data_3 .....and so on
for initial
dim(data)<- 500(rows) 17(column) matrix
for ( i in 1:length(unique( data$cluster ))) {
assign(paste("data", i, sep = "_"),subset(data[data$cluster == i,]))
}
upto this point everything is fine
now I am trying to use these inside the other loop one by one like
for (i in 1:5) {
data<- paste(data, i, sep = "_")
}
however this is not giving me the data with required format
any help will be really appreciated.
Thank you in advance
Let me give you a tip here: Don't just assign everything in the global environment but use lists for this. That way you avoid all the things that can go wrong when meddling with the global environment. The code you have in your question, will overwrite the original dataset data, so you'll be in trouble if you want to rerun that code when something went wrong. You'll have to reconstruct the original dataframe.
Second: If you need to split a data frame based on a factor and carry out some code on each part, you should take a look at split, by and tapply, or at the plyr and dplyr packages.
Using Base R
With base R, it depends on what you want to do. In the most general case you can use a combination of split() and lapply or even a for loop:
mylist <- split( data, f = data$cluster)
for(mydata in mylist){
head(mydata)
...
}
Or
mylist <- split( data, f = data$cluster)
result <- lapply(mylist, function(mydata){
doSomething(mydata)
})
Which one you use, depends largely on what the result should be. If you need some kind of a summary for every subset, using lapply will give you a list with the results per subset. If you need this for a simulation or plotting or so, you better use the for loop.
If you want to add some variables based on other variables, then the plyr or dplyr packages come in handy
Using plyr and dplyr
These packages come especially handy if the result of your code is going to be an array or data frame of some kind. This would be similar to using split and lapply but then in a way Hadley approves of :-)
For example:
library(plyr)
result <- ddply(data, .(cluster),
function(mydata){
doSomething(mydata)
})
Use dlply if the result should be a list.
First question post. Please excuse any formatting issues that may be present.
What I'm trying to do is conditionally replace a factor level in a dataframe column. Reason being due to unicode differences between a right single quotation mark (U+2019) and an apostrophe (U+0027).
All of the columns that need this replacement begin with with "INN8", so I'm using
grep("INN8", colnames(demoDf)) -> apostropheFixIndices
for(i in apostropheFixIndices) {
levels(demoDfFinal[i]) <- c(levels(demoDf[i]), "I definitely wouldn't")
(insert code here)
}
to get the indices in order to perform the conditional replacement.
I've taken a look at a myriad of questions that involve naming variables on the fly: naming variables on the fly
as well as how to assign values to dynamic variables
and have explored the R-FAQ on turning a string into a variable and looked into Ari Friedman's suggestion that named elements in a list are preferred. However I'm unsure as to the execution as well as the significance of the best practice suggestion.
I know I need to do something along the lines of
demoDf$INN8xx[demoDf$INN8xx=="I definitely wouldn’t"] <- "I definitely wouldn't"]
but the iterations I've tried so far haven't worked.
Thank you for your time!
If I understand you correctly, then you don't want to rename the columns. Then this might work:
demoDf <- data.frame(A=rep("I definitely wouldn’t",10) , B=rep("I definitely wouldn’t",10))
newDf <- apply(demoDf, 2, function(col) {
gsub(pattern="’", replacement = "'", x = col)
})
It just checks all columns for the wrong symbol.
Or if you have a vector containing the column indices you want to check then you could go with
# Let's say you identified columns 2, 5 and 8
cols <- c(2,5,8)
sapply(cols, function(col) {
demoDf[,col] <<- gsub(pattern="’", replacement = "'", x = demoDf[,col])
})
This is what I've got at the moment:
weights0 <- array(dim=c(nrow(ind),nrow(all.msim)))
weights1 <- array(dim=c(nrow(ind),nrow(all.msim)))
weights2 <- array(dim=c(nrow(ind),nrow(all.msim)))
weights3 <- array(dim=c(nrow(ind),nrow(all.msim)))
weights4 <- array(dim=c(nrow(ind),nrow(all.msim)))
weights5 <- array(dim=c(nrow(ind),nrow(all.msim)))
weights0 <- 1 # sets initial weights to 1
Nice and clear, but not nice and short!
Would experienced R programmers write this in a different way?
EDIT:
Also, is there an established way of creating a number of weights that depends on a pre-existing variable to make this generalisable? For example, the parameter num.cons would equal 5: the number of constraints (and hence weights) that we need. Imagine this is a common programming problem, so sure there is a solution.
Option 1
If you want to create the different elements in your environment, you can do it with a for loop and assign. Other options are sapply and the envir argument of assign
for (i in 0:5)
assign(paste0("weights", i), array(dim=c(nrow(ind),nrow(all.msim))))
Option 2
However, as #Axolotl9250 points out, depending on your application, more often than not it makes sense to have these all in a single list
weights <- lapply(rep(NA, 6), array, dim=c(nrow(ind),nrow(all.msim)))
Then to assign to weights0 as you have above, you would use
weights[[1]][ ] <- 1
note the empty [ ] which is important to assign to ALL elements of weights[[1]]
Option 3
As per #flodel's suggestion, if all of your arrays are of the same dim,
you can create one big array with an extra dim of length equal to the number
of objects you have. (ie, 6)
weights <- array(dim=c(nrow(ind),nrow(all.msim), 6))
Note that for any of the options:
If you want to assign to all elements of an array, you have to use empty brackets. For example, in option 3, to assign to the 1st array, you would use:
weights[,,1][] <- 1
I've just tried to have a go at achieving this but with no joy, maybe someone else is better than I (most likely!!). However I can't help but feel maybe it's easier to have all the arrays in a single object, a list; that way a single lapply line would do, and instead of referring to weights1 weights2 weights3 weights4 it would be weights[[1]] weights[[2]] weights[[3]] weights[[4]]. Future operations on those arrays would then also be achieved by the apply family of functions. Sorry I can't get it exactly as you describe.
given what you're duing, just using a for loop is quick and intuitive
# create a character vector containing all the variable names you want..
variable.names <- paste0( 'weights' , 0:5 )
# look at it.
variable.names
# create the value to provide _each_ of those variable names
variable.value <- array( dim=c( nrow(ind) , nrow(all.msim) ) )
# assign them all
for ( i in variable.names ) assign( i , variable.value )
# look at what's now in memory
ls()
# look at any of them
weights4