I have another question for the brilliant minds out there (this site is so addictive).
I am running some simulations on a matrix and I have nested for loops for this purpose. The first creates a vector that increases by one each time a loop cycles. The nested loop is running simulations by randomizing the vector, attaching it to the matrix, and calculating some simple properties on the new matrix. (For an example, I used properties that will not vary in the simulations, but in practice I require the simulations to get a good idea of the impact of the randomized vector.) The nested loop runs 100 simulations, and ultimately I want only the column means of those simulations.
Here's some example code:
property<-function(mat){ #where mat is a matrix
a=sum(mat)
b=sum(colMeans(mat))
c=mean(mat)
d=sum(rowMeans(mat))
e=nrow(mat)*ncol(mat)
answer=list(a,b,c,d,e)
return(answer)
}
x=matrix(c(1,0,1,0, 0,1,1,0, 0,0,0,1, 1,0,0,0, 1,0,0,1), byrow=T, nrow=5, ncol=4)
obj=matrix(nrow=100,ncol=5,byrow=T) #create an empty matrix to dump results into
for(i in 1:ncol(x)){ #nested for loops
a=rep(1,times=i) #repeat 1 for 1:# columns in x
b=rep(0,times=(ncol(x)-length(a))) #have the rest of the vector be 0
I.vec=append(a,b) #append these two for the I vector
for (j in 1:100){
I.vec2=sample(I.vec,replace=FALSE) #randomize I vector
temp=rbind(x,I.vec2)
prop<-property(temp)
obj[[j]]<-prop
}
write.table(colMeans(obj), 'myfile.csv', quote = FALSE, sep = ',', row.names = FALSE)
}
The problem I am encountering is how to fill in the empty object matrix with the results of the nested loop. obj ends up as a vector of mostly NAs, so it is clear that I am not assigning the results properly. I want each cycle to add a row of prop to obj, but if I try
obj[j,]<-prop
R tells me that there is an incorrect number of subscripts on the matrix.
Thank you so much for your help!
EDITS:
Okay, so here is the improved code re the answers below:
property<-function(mat){ #where mat is a matrix
a=sum(mat)
b=sum(colMeans(mat))
f=mean(mat)
d=sum(rowMeans(mat))
e=nrow(mat)*ncol(mat)
answer=c(a,b,f,d,e)
return(answer)
}
x=matrix(c(1,0,1,0, 0,1,1,0, 0,0,0,1, 1,0,0,0, 1,0,0,1), byrow=T, nrow=5, ncol=4)
obj<-data.frame(a=0,b=0,f=0,d=0,e=0) #create an empty dataframe to dump results into
obj2<-data.frame(a=0,b=0,f=0,d=0,e=0)
for(i in 1:ncol(x)){ #nested for loops
a=rep(1,times=i) #repeat 1 for 1:# columns in x
b=rep(0,times=(ncol(x)-length(a))) #have the rest of the vector be 0
I.vec=append(a,b) #append these two for the I vector
for (j in 1:100){
I.vec2=sample(I.vec,replace=FALSE) #randomize I vector
temp=rbind(x,I.vec2)
obj[j,]<-property(temp)
}
obj2[i,]<-colMeans(obj)
write.table(obj2, 'myfile.csv', quote = FALSE,
sep = ',', row.names = FALSE, col.names=F, append=T)
}
However, this is still glitchy, as the myfile should only have four rows (one for each column of x), but actually has 10 rows, with some repeated. Any ideas?
Your property function is returning a list. If you want to store the numbers in a matrix, you should have it return a numeric vector:
property <- function(mat)
{
....
c(a, b, c, d, e) # probably a good idea to rename your "c" variable
}
Alternatively, instead of defining obj to be a matrix, make it a data.frame (which conceptually makes more sense, as each column represents a different quantity).
obj <- data.frame(a=0, b=0, c=0, ...)
for(i in 1:ncol(x))
....
obj[j, ] <- property(temp)
Finally, note that your call to write.table will overwrite the contents of myfile.csv, so the only output it will contain is the result for the last iteration of i.
Use rbind:
obj <- rbind(obj, prop)
Related
I am writing a function which takes a directory of data, and reads them in, and (if it reaches the threshold of complete cases), calculates the correlation between two variables in the data ("sulfate" and "nitrate"). I want this to run in a for loop to create a numeric vector of the correlation values (one value for each file in the directory).
However, when I run the code, it only returns the last value.
I am quite new to R (so may be making simple mistakes, and have the newest version of R installed). Below is the code:
corr <- function(directory, threshold = 0) {
filenames3 <- list.files(directory, pattern = ".csv", full.names = TRUE)
loop_length <- length(filenames3)
correlation_values <- numeric()
for(i in loop_length) {
read_in_data3 <- read.csv(filenames3[i])
complete_boolean <- complete.cases(read_in_data3)
nobs2 <- sum(complete_boolean)
data_rmNA <- read_in_data3[complete_boolean, ]
if(nobs2 > threshold) {
correlation_values <- c(correlation_values,
cor(data_rmNA[["sulfate"]],
data_rmNA[["nitrate"]]))
}
}
correlation_values
}
corr("C:/Users/Danie/OneDrive/Documents/R/specdata")
I have tried specifying the length of the vector e.g. correlation_values <- numeric(length = loop_length). This returns a vector of the right length, but all the values are 0 excluding the last which runs properly. I have looked at similar questions, but still can't find a solution to my problem.
I assume I'm losing information in the loop somewhere (rewriting over a variable or something).
Thanks in advance for any help.
I think you need to say for(i in 1:loop_length) instead of for(i in loop_length).
R will loop over each element in the provided vector, but right now your vector is length 1 which is why only the last value is returned.
We all know that appending a vector to a vector within a for loop in R is a bad thing because it costs time. A solution would be to do it in a vectorized style. Here is a nice example by Joshua Ulrich. It is important to first create a vector with known length and then fill it up, instead of appending each new piece to an existing piece within the loop.
Still, in his example he demonstrates 'only' how to append one data piece at a time. I am now fighting with the idea to fill a vector with vectors - not scalars.
Imagine I have a vector with a length of 100
vector <- numeric(length=100)
and a smaller vector that would fit 10 times into the first vector
vec <- seq(1,10,1)
How would I have to construct a loop that adds the smaller vector to the large vector without using c() or append ?
EDIT: This example is simplified - vec does not always consist of the same sequence but is generated within a for loop and should be added to vector.
You could just use normal vector indexing within the loop to accomplish this:
vector <- numeric(length=100)
for (i in 1:10) {
vector[(10*i-9):(10*i)] <- 1:10
}
all.equal(vector, rep(1:10, 10))
# [1] TRUE
Of course if you were just trying to repeat a vector a certain number of times rep(vec, 10) would be the preferred solution.
A similar approach, perhaps a little more clear if your new vectors are of variable length:
# Let's over-allocate so that we now the big vector is big enough
big_vec = numeric(1e4)
this.index = 1
for (i in 1:10) {
# Generate a new vector of random length
new_vec = runif(sample(1:20, size = 1))
# Stick in in big_vec by index
big_vec[this.index:(this.index + length(new_vec) - 1)] = new_vec
# update the starting index
this.index = this.index + length(new_vec)
}
# truncate to only include the added values
big_vec = big_vec[1:(this.index - 1)]
As #josilber suggested in comments, lists would be more R-ish. This is a much cleaner approach, unless the new vector generation depends on the previous vectors, in which case the for loop might be necessary.
vec_list = list()
for (i in 1:10) {
# Generate a new vector of random length
vec_list[[i]] = runif(sample(1:20, size = 1))
}
# Or, use lapply
vec_list = lapply(1:10, FUN = function(x) {runif(sample(1:20, size = 1))})
# Then combine with do.call
do.call(c, vec_list)
# or more simply, just unlist
unlist(vec_list)
I have 24 .csv files, each containing hundreds of thousands of data points.
My intention is for this code to:
1. loop though each of the files in the directory
2. take a sample of 1000 random points from a single column
3. check to see if each sample data point is below a particular level, here's where I'm stuck, if TRUE change the result[i] to 1, if FALSE then 0. The result vector doesn't change at all though. Any thoughts?
rm(list=ls())
years<-c(1990:2013)
#####################################
S=1000
level<-.075
result<-(1:S)
inBounds<-function(data){
for(i in 1:S){
result[i]<-(data[i] < level)
}
return(mean(result))
}
#####################################
#Get sample arithmetic mean readings from 1990-2013
n=1000
temp<-data.frame()
arithMean<-data.frame()
Samp<-data.frame()
CI<-data.frame()
#Get data file names
files <- list.files(path="~/Proj",pattern="*.csv", full.names=T, recursive=FALSE)
for(i in 1:23){
temp<-read.csv(files[i],header=TRUE,sep=",")
arithMean<-temp$Arithmetic.Mean
Samp<-sample(arithMean,n,replace=TRUE,prob=NULL)
CI[1,i]<-inBounds(Samp)
}
The problem is with scope. The result vector declared after level is in a higher scope than the result vector in the function. They are not equal.
If you want the result vector from the function, return it. If you want both the vector and the average both, return a list:
return(list(result = result, mean = mean(result)))
The result of your entire operation is a single vector of length 23, so you can do this with sapply:
CI <- sapply(1:23, function(i) {
temp <- read.csv(files[i], header=T, sep=",")
return(mean(sample(temp$Arithmetic.Mean, n, replace=T, prob=NULL) < level))
})
The reason result was not changing in your function is that it was declared outside the function but you were editing it inside the function. You could move the result<-(1:S) inside the function to get the expected behavior.
I am trying to create a set of new vectors based on a rule, for a list of vectors. My input consists of 3 normal vectors (index, rfree, ret) and a list of several vectors (roll), all vectors being the same length. I want the new vectors to follow the rule: if index>roll -> ret, else rfree, so that the index is evaluated against the "k" number of roll vectors giving "k" new vectors that only consist of ret and rfree inputs. My current program doesn't work and I can't figure out why. The error message I get is
"Error in `*tmp*`[[j]] : subscript out of bounds"
But I can't really figure out why. Any help is greatly appreciated.
#Input:
roll <- list(runif(85),runif(85))
index <- runif(85)
rfree <- rnorm(85)
ret <- rnorm(85)
#Programe:
aret <- function(index, roll, ret, rfree, k=2){
aret <- list()
for (j in seq(k))
for (i in 1:length(ret)){
if (roll[[j]][i]>index[i])(aret[[j]][i] <- ret[i])
else(aret[[j]][i] <- rfree[i])
}
}
This should do it, but I agree with #Carl, a matrix will be easier to manipulate here if all vectors are the same length
roll <- matrix(runif(170),ncol=2) #lets use a matrix instead
index <- runif(85) #index is your indicator variable when compared to roll
rfree <- rnorm(85) #assign if roll>index
ret <- rnorm(85) #assign if index>roll
#use vector operations when possible, for speed and readability. Look into
#sapply, lapply etc. Apply is useful for column/row operations in matrices
result<-apply(roll,2, function(x){
# we will use an anonymous function here for this,
#but you could define elsewhere if needed
w<-index>x # where is index larger than our roll entries
x[w]<-ret[w] #assign the corresponding ret there
x[!w]<-rfree[!w] #assign the corresponding rfree where appropriate
x
})
I have code for nested for loops here. The output I would like to receive is a matrix of the means of the columns of the matrix produced by the nested loop. So, the interior loop should run 1000 simulations of a randomized vector, and run a function each time. This works fine on its own, and spits the output into R. But I want to save the output from the nested loop to an object (a matrix of 1000 rows and 11 columns), and then print only the colMeans of that matrix, to be performed by the outer loop.
I think the problem lies in the step where I assign the results of the inner loop to the obj matrix. I have tried every variation on obj[i,],obj[i],obj[[i]], etc. with no success. R tells me that it is an object of only one dimension.
x=ACexp
obj=matrix(nrow=1000,ncol=11,byrow=T) #create an empty matrix to dump results into
for(i in 1:ncol(x)){ #nested for loops
a=rep(1,times=i) #repeat 1 for 1:# columns in x
b=rep(0,times=(ncol(x)-length(a))) #have the rest of the vector be 0
Inv=append(a,b) #append these two for the Inv vector
for (i in 1:1000){ #run this vector through the simulations
Inv2=sample(Inv,replace=FALSE) #randomize interactions
temp2=rbind(x,Inv2)
obj[i]<-property(temp2) #print results to obj matrix
}
print.table(colMeans(obj)) #get colMeans and print to excel file
}
Any ideas how this can be fixed?
You're repeatedly printing the whole matrix to the screen as it gets modified but your comment says "print to excel file". I'm guessing you actually want to save your data out to a file. Remove print.table command all together and after your loops are completed use write.table()
write.table(colMeans(obj), 'myNewMatrixFile.csv', quote = FALSE, sep = ',', row.names = FALSE)
(my preferred options... see ?write.table to select the ones you like)
Since your code isn't reproducible, we can't quite tell what you want. However, I guess that property is returning a single number that you want to place in the right row/column place of the obj matrix, which you would refer to as obj[row,col]. But you'll have trouble with that as is, because both your loops are using the same index i. Maybe something like this will work for you.
obj <- matrix(nrow=1000,ncol=11,byrow=T) #create an empty matrix to dump results into
for(i in 1:ncol(x)){ #nested for loops
Inv <- rep(c(1,0), times=c(i, ncol(x)-i)) #repeat 1 for 1:# columns in x, then 0's
for (j in 1:nrow(obj)){ #run this vector through the simulations
Inv2 <- sample(Inv,replace=FALSE) #randomize interactions
temp2 <- rbind(x,Inv2)
obj[j,i] <- property(temp2) #save results in obj matrix
}
}
write.csv(colMeans(obj), 'myFile.csv') #get colMeans and print to csv file