I am writing a function which takes a directory of data, and reads them in, and (if it reaches the threshold of complete cases), calculates the correlation between two variables in the data ("sulfate" and "nitrate"). I want this to run in a for loop to create a numeric vector of the correlation values (one value for each file in the directory).
However, when I run the code, it only returns the last value.
I am quite new to R (so may be making simple mistakes, and have the newest version of R installed). Below is the code:
corr <- function(directory, threshold = 0) {
filenames3 <- list.files(directory, pattern = ".csv", full.names = TRUE)
loop_length <- length(filenames3)
correlation_values <- numeric()
for(i in loop_length) {
read_in_data3 <- read.csv(filenames3[i])
complete_boolean <- complete.cases(read_in_data3)
nobs2 <- sum(complete_boolean)
data_rmNA <- read_in_data3[complete_boolean, ]
if(nobs2 > threshold) {
correlation_values <- c(correlation_values,
cor(data_rmNA[["sulfate"]],
data_rmNA[["nitrate"]]))
}
}
correlation_values
}
corr("C:/Users/Danie/OneDrive/Documents/R/specdata")
I have tried specifying the length of the vector e.g. correlation_values <- numeric(length = loop_length). This returns a vector of the right length, but all the values are 0 excluding the last which runs properly. I have looked at similar questions, but still can't find a solution to my problem.
I assume I'm losing information in the loop somewhere (rewriting over a variable or something).
Thanks in advance for any help.
I think you need to say for(i in 1:loop_length) instead of for(i in loop_length).
R will loop over each element in the provided vector, but right now your vector is length 1 which is why only the last value is returned.
Suppose I have two vectors. Suppose further that I would like my function takes only one values of each vector and return me the output. Then, I would like another function to check the values of each run. If the output of the previous run is smaller than the new one. Then, I would like my function to stop and return me all the previous values. My original function is very complicated (estimation models). Hence, I try to provide an example to explain my idea.
Suppose that I have these two vectors:
set.seed(123)
x <- rnorm(1:20)
y <- rnorm(1:20)
Then, I would like to write a function which only takes one values of each vector and multiplied them. Then, return me the output. Then, I would like the function to check if the previous multiplication is smaller than the new one or not. If yes, then stop and return me all the previous multiplication.
I tried this:However, this functions takes all the values at once and return me a list of the multiplication. I was thinking about using lapply, to fit one element at a time but I do not know how to work with the conditions.
myfun <- function(x, y, n){
multi <- list()
for ( i in 1:n){
multi[[i]] <- x[[i]]*y[[i]]
}
return(multi)
}
myfun(x,y,10)
Here is another try
x <- rnorm(1:20)
y <- rnorm(1:20)
myfun <- function(x, y){
multi <- x*y
return(multi)
}
This is the first function. I would like to run it element by element. Each time, I would like it to returns me only one multiplication result. Then, another function (wrapper function) check the result. It the second output of the first function (multiplication function) is larger than the first one, then stop, otherwise keep going.
I would like to write a function which only takes one values of each vector and multiplied them. Then, return me the output. Then, I would like the function to check if the previous multiplication is smaller than the new one or not.
I would like the multiplication in a separate function. Then, I would like to check its output. So, I should have a warper function.
You can apply a for loop with a stopping condition, similar to what you have already:
# example input
set.seed(123)
x <- rnorm(1:20)
y <- rnorm(1:20)
# example function
f = function(xi, yi) xi*yi
# wrapper
stopifnot(length(x) == length(y))
res = vector(length(x), mode="list")
for (i in seq_along(x)){
res[[i]] = f(x[[i]], y[[i]])
if (i > 1L && res[[i]] > res[[i-1L]]) break
}
res[seq_len(i)]
Comments:
It is better to predefine the max length res might need (here, length(x)), rather than expanding it in the loop.
For this function (multiplication), there is no good reason to proceed elementwise. R's multiplication function is vectorized and fast.
You don't need to use a list-class output for this function, since it is returning doubles; res = double(length(x)) should also work.
You don't need to use list-style accessors for x, y and res unless lists are involved; res[i] = f(x[i], y[i]) should work, etc.
I am trying to write the following loop over an empirical data set where
each ID replicate has a different number of observations for each sample period.
Any suggestions would be greatly appreciated!
a <- unique(bma$ID)
t <- unique(bma$Sample.period)
# empty list to hold the data
dens.data <- vector(mode='list', length = length(a) * length(t))
tank1 <- double(length(a))
index = 0
for (i in 1:length(a)){
for (j in 1:length(t)){
index = index + 1
tank1[index] = a[index] ### building an ID column
temp.tank <- subset(bma, bma$ID == a[i])
time.tank <- subset(temp.tank, temp.tank$Sample.period == t[j])
temp1 <- unique(temp.tank$Sample.period)
temp.tank <- data.frame(temp.tank, temp1)
dens.1 <- density(time.tank$Biomass_.adults_mgC.mm.3, na.rm = T)
# extract the y-values from the pdf function - these need to be separated by each Replicate and Sample Period
dens.data[[index]] <- dens.1$y
}
}
#### extract the data and place into a dataframe
dens.new<- data.frame(dens.data)
dens.new
colnames(dens.new) <- c("Treatment","Sample Period","pdf/density for biomass")
all<- list(dens.new)
all
### create new spreadsheet with all the data from the loop
dens.new.data<- write.csv(dens.new, "New.density.csv") ## export file to excel spreadsheet
Calling dens.new<- data.frame(dens.data) Yield the following error message:
Error in data.frame(c(...) :
arguments imply differing number of rows: 512, 0
The loop seems to work for dens.data[[1]] but returns NULL for
dens.data[[>1]]
As there isn't a minimal example, it is difficult for me to guess what the original data.frame looks like. However, as for the error message, it is clear that your for-loop fails to assign values to the list dens.data for indices greater than 1.
My guess is that the index didn't update by index = index + 1. Maybe you could try changing the equal sign = to the standard R assignment operator <- and see whether the whole list is updated.
I heard that using equal sign for assignment may cause some problems in an older version of R, but I'm not sure whether you are facing the same problem. Anyway, using <- to assign a value is always safer and recommended.
We all know that appending a vector to a vector within a for loop in R is a bad thing because it costs time. A solution would be to do it in a vectorized style. Here is a nice example by Joshua Ulrich. It is important to first create a vector with known length and then fill it up, instead of appending each new piece to an existing piece within the loop.
Still, in his example he demonstrates 'only' how to append one data piece at a time. I am now fighting with the idea to fill a vector with vectors - not scalars.
Imagine I have a vector with a length of 100
vector <- numeric(length=100)
and a smaller vector that would fit 10 times into the first vector
vec <- seq(1,10,1)
How would I have to construct a loop that adds the smaller vector to the large vector without using c() or append ?
EDIT: This example is simplified - vec does not always consist of the same sequence but is generated within a for loop and should be added to vector.
You could just use normal vector indexing within the loop to accomplish this:
vector <- numeric(length=100)
for (i in 1:10) {
vector[(10*i-9):(10*i)] <- 1:10
}
all.equal(vector, rep(1:10, 10))
# [1] TRUE
Of course if you were just trying to repeat a vector a certain number of times rep(vec, 10) would be the preferred solution.
A similar approach, perhaps a little more clear if your new vectors are of variable length:
# Let's over-allocate so that we now the big vector is big enough
big_vec = numeric(1e4)
this.index = 1
for (i in 1:10) {
# Generate a new vector of random length
new_vec = runif(sample(1:20, size = 1))
# Stick in in big_vec by index
big_vec[this.index:(this.index + length(new_vec) - 1)] = new_vec
# update the starting index
this.index = this.index + length(new_vec)
}
# truncate to only include the added values
big_vec = big_vec[1:(this.index - 1)]
As #josilber suggested in comments, lists would be more R-ish. This is a much cleaner approach, unless the new vector generation depends on the previous vectors, in which case the for loop might be necessary.
vec_list = list()
for (i in 1:10) {
# Generate a new vector of random length
vec_list[[i]] = runif(sample(1:20, size = 1))
}
# Or, use lapply
vec_list = lapply(1:10, FUN = function(x) {runif(sample(1:20, size = 1))})
# Then combine with do.call
do.call(c, vec_list)
# or more simply, just unlist
unlist(vec_list)
I have another question for the brilliant minds out there (this site is so addictive).
I am running some simulations on a matrix and I have nested for loops for this purpose. The first creates a vector that increases by one each time a loop cycles. The nested loop is running simulations by randomizing the vector, attaching it to the matrix, and calculating some simple properties on the new matrix. (For an example, I used properties that will not vary in the simulations, but in practice I require the simulations to get a good idea of the impact of the randomized vector.) The nested loop runs 100 simulations, and ultimately I want only the column means of those simulations.
Here's some example code:
property<-function(mat){ #where mat is a matrix
a=sum(mat)
b=sum(colMeans(mat))
c=mean(mat)
d=sum(rowMeans(mat))
e=nrow(mat)*ncol(mat)
answer=list(a,b,c,d,e)
return(answer)
}
x=matrix(c(1,0,1,0, 0,1,1,0, 0,0,0,1, 1,0,0,0, 1,0,0,1), byrow=T, nrow=5, ncol=4)
obj=matrix(nrow=100,ncol=5,byrow=T) #create an empty matrix to dump results into
for(i in 1:ncol(x)){ #nested for loops
a=rep(1,times=i) #repeat 1 for 1:# columns in x
b=rep(0,times=(ncol(x)-length(a))) #have the rest of the vector be 0
I.vec=append(a,b) #append these two for the I vector
for (j in 1:100){
I.vec2=sample(I.vec,replace=FALSE) #randomize I vector
temp=rbind(x,I.vec2)
prop<-property(temp)
obj[[j]]<-prop
}
write.table(colMeans(obj), 'myfile.csv', quote = FALSE, sep = ',', row.names = FALSE)
}
The problem I am encountering is how to fill in the empty object matrix with the results of the nested loop. obj ends up as a vector of mostly NAs, so it is clear that I am not assigning the results properly. I want each cycle to add a row of prop to obj, but if I try
obj[j,]<-prop
R tells me that there is an incorrect number of subscripts on the matrix.
Thank you so much for your help!
EDITS:
Okay, so here is the improved code re the answers below:
property<-function(mat){ #where mat is a matrix
a=sum(mat)
b=sum(colMeans(mat))
f=mean(mat)
d=sum(rowMeans(mat))
e=nrow(mat)*ncol(mat)
answer=c(a,b,f,d,e)
return(answer)
}
x=matrix(c(1,0,1,0, 0,1,1,0, 0,0,0,1, 1,0,0,0, 1,0,0,1), byrow=T, nrow=5, ncol=4)
obj<-data.frame(a=0,b=0,f=0,d=0,e=0) #create an empty dataframe to dump results into
obj2<-data.frame(a=0,b=0,f=0,d=0,e=0)
for(i in 1:ncol(x)){ #nested for loops
a=rep(1,times=i) #repeat 1 for 1:# columns in x
b=rep(0,times=(ncol(x)-length(a))) #have the rest of the vector be 0
I.vec=append(a,b) #append these two for the I vector
for (j in 1:100){
I.vec2=sample(I.vec,replace=FALSE) #randomize I vector
temp=rbind(x,I.vec2)
obj[j,]<-property(temp)
}
obj2[i,]<-colMeans(obj)
write.table(obj2, 'myfile.csv', quote = FALSE,
sep = ',', row.names = FALSE, col.names=F, append=T)
}
However, this is still glitchy, as the myfile should only have four rows (one for each column of x), but actually has 10 rows, with some repeated. Any ideas?
Your property function is returning a list. If you want to store the numbers in a matrix, you should have it return a numeric vector:
property <- function(mat)
{
....
c(a, b, c, d, e) # probably a good idea to rename your "c" variable
}
Alternatively, instead of defining obj to be a matrix, make it a data.frame (which conceptually makes more sense, as each column represents a different quantity).
obj <- data.frame(a=0, b=0, c=0, ...)
for(i in 1:ncol(x))
....
obj[j, ] <- property(temp)
Finally, note that your call to write.table will overwrite the contents of myfile.csv, so the only output it will contain is the result for the last iteration of i.
Use rbind:
obj <- rbind(obj, prop)