in R I have a list of 100 phlyo objects called called Newick1, Newick2, Newick3, etc. I want to do pairwise comparisons between the trees (e.g. all.equal.phylo(Newick1, Newick2)) but am having difficulty figuring out how to do this efficiently since each file has a different name.
I think something like the for loop below will work, but how do I designate a different file for each iteration of the loop? For obvious reasons the [i] and [j] I put in the code below don't work, but I don't know what to replace them with.
Thank you very much!
for (i in 1:99) {
for (j in i+1:100) {
all.equal.phylo(Newick[i], Newick[j]) -> output[i,j]
} }
try mget() to reference multiple objects by name
> x1 <- x2 <- x3 <-1
> mget(paste0("x",1:3))
$x1
[1] 1
$x2
[1] 1
$x3
[1] 1
You can try a variation on the following:
# make a two column dataframe
# and filter the identical values
df <- expand.grid(1:100,1:100)
names(df) <- c('i','j')
df <- df[!df$i == df$j,]
# example function that takes two parameters
addtwo <- function(i,j){i + j}
# apply that function across rows of the dataframe
results <- mapply(addtwo, df$i, df$j)
# using the same logic,
# your function would look something like this
getdistance <- function(i,j, newicks=NEWICKS) {
all.equal.phylo(newicks[i], newicks[j])
}
# and apply it like this
results <- mapply(getdistance, df$i, df$j)
Key concepts:
expand.grid()
mapply()
Related
I am new to R and have written a function that needs to be run multiple times to generate the final dataset.
So the multiple times is determined by the vector of unique years and again based on these years every single time the function gives an output.
Still I am not getting the right output.
Desired output: for eg it takes 10 samples from each year, after 10th run I should have 100 rows of correct output.
create_strsample <- function(n1,n2){
yr <- c(2010,2011,2012,2013)
for(i in 1:length(yr)){
k1<-subset(data,format(as.Date(data$account_opening_date),"%Y")==yr[i])
r1 <-sample(which(!is.na(k1$account_closing_date)),n1,replace=FALSE)
r2<-sample(which(is.na(k1$account_closing_date)),n2,replace=FALSE)
#final.data <-k1[c(r1,r2),]
sample.data <- lapply(yr, function(x) {f.data<-create_strsample(200,800)})
k1 <- do.call(rbind,k1)
return(k1)
}
final <- do.call(rbind,sample.data)
return(final)
}
stratified.sample.data <- create_strsample(200,800)
A MWE would have been nice, but I'll give you a template for these kind of questions. Note, that this is not optimized for speed (or anything else), but only for the ease of understanding.
As noted in the comments, that call to create_strsample in the loop looks weird and probably isn't what you really want.
data <- data.frame() # we need an empty, but existing variable for the first loop iteration
for (i in 1:10) {
temp <- runif(1,max=i) # do something...
data <- rbind(data,temp) # ... and add this to 'data'
} # repeat 10 times
rm(temp) # don't need this anymore
That return(k1) in the loop also looks wrong.
I tried this later after your suggestion #herbaman for the desired output minus the lapply.
create_strsample <- function(n1,n2){
final.data <- NULL
yr <- c(2010,2011,2012,2013)
for(i in 1:length(yr)){
k1<-subset(data,format(as.Date(data$account_opening_date),"%Y")==yr[i])
r1 <- k1[sample(which(!is.na(k1$account_closing_date)),n1,replace=FALSE), ]
r2 <- k1[sample(which(is.na(k1$account_closing_date)),n2,replace=FALSE), ]
sample.data <- rbind(r1,r2)
final.data <- rbind(final.data, sample.data)
}
return(final.data)
}
stratified.sample.data <- create_strsample(200,800)
I'm trying to create empty numeric object like this
corr <- cor()
to use it later on in a loop.
but, it keep returning this error
Error in is.data.frame(x) : argument "x" is missing, with no default.
Here is my full script:
EVI <- "D:\\Modis_EVI\\Original\\EVI_Stack_single5000.tif"
y.EVI <- brick(EVI)
m.EVI.cropped <- as.matrix(y.EVI)
time <- 1:nlayers(y.EVI)
corr <- cor()
inf2NA <- function(x) { x[is.infinite(x)] <- NA; x }
for (i in 1:nrow(m.EVI.cropped)){
EVI.m <- m.EVI.cropped[i,]
time <- 1:nlayers(y.EVI)
Corr[i] <- cor(EVI.m, time, method="pearson", use="pairwise.complete.obs")
}
Any advice please?
Since you are asking for advice:
It is very likely that you don't need to do this since you can probably use (i) a vectorized function or (ii) a lapply loop that pre-allocates the return object for you. If you insist on using a for loop, set it up properly. This means you should pre-allocate which you can, e.g., do by using corr <- numeric(n), where n is the number of iterations. Appending to a vector is extremely slooooooow.
We can create empty objects with numeric(0), logical(0), character(0) etc.
For example
num_vec <- numeric(0)
creates an empty numeric vector that can be filled up later on:
num_vec[1] <- 2
num_vec
# [1] 2
num_vec[2] <- 1
num_vec
# [1] 2 1
I want to pass variables within the .Globalenv when inside a function. Basically concatenate x number of data frames into a matrix.
Here is some dummy code;
Alpha <- data.frame(lon=124.9167,lat=1.53333)
Alpha_2 <- data.frame(lon=3.13333, lat=42.48333)
Alpha_3 <- data.frame(lon=-91.50667, lat=27.78333)
myfunc <- function(x){
vars <- ls(.GlobalEnv, pattern=x)
mat <- as.matrix(rbind(vars[1], vars[2], vars[3]))
return(mat)
}
When calling myfunc('Alpha') I would like the same thing to be returned as when you run;
as.matrix(rbind(Alpha, Alpha_2, Alpha_3)
lon lat
1 124.91670 1.53333
2 3.13333 42.48333
3 -91.50667 27.78333
Any pointers would be appreciated, thanks!
You can use get to retrieve variables by name. We do this here in a loop with lapply, and then use rbind to bind them together.
myfunc <- function(x){
vars <- ls(.GlobalEnv, pattern=x)
df <- do.call(rbind, mget(vars, .GlobalEnv)) # courtesy #Roland
return(df)
}
myfunc("Alpha")
# lon lat
# 1 124.91670 1.53333
# 2 3.13333 42.48333
# 3 -91.50667 27.78333
Note, in practice, you probably want to check that the variables that match the pattern actually are what you think they are, but this gives you the rough tools you want.
Old version (2nd line of func):
df <- do.call(rbind, lapply(vars, get, envir=.GlobalEnv))
I have a for loop in R in which I want to store the result of each calculation (for all the values looped through). In the for loop a function is called and the output is stored in a variable r in the moment. However, this is overwritten in each successive loop. How could I store the result of each loop through the function and access it afterwards?
Thanks,
example
for (par1 in 1:n) {
var<-function(par1,par2)
c(var,par1)->var2
print(var2)
So print returns every instance of var2 but in var2 only the value for the last n is saved..is there any way to get an array of the data or something?
initialise an empty object and then assign the value by indexing
a <- 0
for (i in 1:10) {
a[i] <- mean(rnorm(50))
}
print(a)
EDIT:
To include an example with two output variables, in the most basic case, create an empty matrix with the number of columns corresponding to your output parameters and the number of rows matching the number of iterations. Then save the output in the matrix, by indexing the row position in your for loop:
n <- 10
mat <- matrix(ncol=2, nrow=n)
for (i in 1:n) {
var1 <- function_one(i,par1)
var2 <- function_two(i,par2)
mat[i,] <- c(var1,var2)
}
print(mat)
The iteration number i corresponds to the row number in the mat object. So there is no need to explicitly keep track of it.
However, this is just to illustrate the basics. Once you understand the above, it is more efficient to use the elegant solution given by #eddi, especially if you are handling many output variables.
To get a list of results:
n = 3
lapply(1:n, function(par1) {
# your function and whatnot, e.g.
par1*par1
})
Or sapply if you want a vector instead.
A bit more complicated example:
n = 3
some_fn = function(x, y) { x + y }
par2 = 4
lapply(1:n, function(par1) {
var = some_fn(par1, par2)
return(c(var, par1)) # don't have to type return, but I chose to make it explicit here
})
#[[1]]
#[1] 5 1
#
#[[2]]
#[1] 6 2
#
#[[3]]
#[1] 7 3
If you have a list of files, and you want to compare 1 against a set of the others, how do you do it?
my.test <- list[1]
my.reference.set <- list[-1]
This works of course, but I want to have this in a loop, with my.test varying each time (so that each file in the list is my.test for one iteration i.e. I have a list of 250 files, and I want to do this for every subset of 12 files within it.
> num <- (1:2)
> sdasd<- c("asds", "ksad", "nasd", "ksasd", "nadsd", "kasdih")
> splitlist<- split(sdasd, num)
> splitlist
$`1`
[1] "asds" "nasd" "nadsd"
$`2`
[1] "ksad" "ksasd" "kasdih"
> for (i in splitlist) {my.test <- splitlist[i] # "asds"
+ my.reference.set <- splitlist[-i] # "nasd" and "nadsd"
+ combined <- data.frame (my.test, my.reference.set)
+ combined}
Error in -i : invalid argument to unary operator
>
then i want next iteration to be,
my.test <- splitlist[i] #my.test to be "nasd"
my.reference.set <- splitlist[-i] # "asds" and "nadsd"
}
and finally for splitlist[1],
my.test <- splitlist[i] # "nadsd"
my.reference.set <- splitlist[-i] # "asds" and "ksad"
}
Then the same for splitlist[2]
Does this do what you want? The key point here is to loop over the indices of the list, rather than the names, because x[-n] indexing only works when n is a natural number (with some obscure exceptions). Also, I wasn't sure if you wanted the results as a data frame or a list -- the latter allows the components to be different lengths.
num <- 1:2
sdasd <- c("asds", "ksad", "nasd", "ksasd", "nadsd", "kasdih")
splitlist<- split(sdasd, num)
L <- vector("list",length(splitlist))
for (i in seq_along(splitlist)) {
my.test <- splitlist[[i]] # "asds"
my.reference.set <- splitlist[-i] # "nasd" and "nadsd"
L[[i]] <- list(test=my.test, ref.set=my.reference.set)
}
edit: I'm still a little confused by your example above, but I think this is what you want:
refs <- lapply(splitlist,
function(S) {
lapply(seq_along(S),
function(i) {
list(test=S[i], ref.set=S[-i])
})
})
refs is a nested list; the top level has length 2 (the length of splitlist), each of the next levels has length 3 (the lengths of the elements of splitslist), and each of the bottom levels has length 2 (test and reference set).