What's wrong with my nested for loop in R? - r

fname = file.choose()
two = read.csv(fname.header=T)
rec = two$Receipt
del = two$Delivery
date = two$Date
net = rec-del
yrec = matrix(rec,nrow=365,ncol=4,byrow=F)
ydel = matrix(del,nrow=365,ncol=4,byrow=F)
ynet = matrix(net,nrow=365,ncol=4,byrow=F)
yrecsum = 0
yrecavg = 0
for(i in 1:4)
{
for(j in 1:365)
{
yrecsum[i] = yrecsum[i]+yrec[j,i]
}
yrecavg[i] = yrecsum[i]/365
}
So what I have are three matrices of the same size with days in integers (from 1 to 365) on the rows and years integers (from 1 to 4) on the columns. Each matrix is filled in with the data that I'm working with.
I'm trying to find the average of each column for all three matrices and I would like to put those averages in a vector for each matrix.
I've looked around and found some information about the zoo library and chron library and such but I can't get those to work.

lapply(list(yrec, ydel, ynet), colMeans)
[[1]]
[1] 732.9370 731.9836 705.3808 751.6986
[[2]]
[1] 704.7178 714.2877 735.4822 767.5123
[[3]]
[1] 749.1041 715.4164 711.1425 746.3370
#Data
yrec <- matrix(sample(365*4), ncol=4)
ydel <- matrix(sample(365*4), ncol=4)
ynet <- matrix(sample(365*4), ncol=4)

this should get you started (even though I would convert the matrices to data.frames):
#some sample data
m <- matrix(sample(10000, 365*4),365,4)
# get the mean of all the columns of your matrix
colMeans(m)
if you have 3 matrices and you want to combine the results I would do:
# some sample data:
m1 <- matrix(sample(10000, 365*4),365,4)
m2 <- matrix(sample(10000, 365*4),365,4)
m3 <- matrix(sample(10000, 365*4),365,4)
do.call("cbind", lapply(list(m1,m2,m3), colMeans))

Related

Dataframe output from a for-loop

I am trying to populate the output of a for loop into a data frame. The loop is repeating across the columns of a dataset called "data". The output is to be put into a new dataset called "data2". I specified an empty data frame with 4 columns (i.e. ncol=4). However, the output generates only the first two columns. I also get a warning message: "In matrix(value, n, p) : data length [2403] is not a sub-multiple or multiple of the number of columns [2]"
Why does the dataframe called "data2" have 2 columns, when I have specified 4 columns? This is my code:
a <- 0
b <- 0
GM <- 0
GSD <- 0
data2 <- data.frame(ncol=4, nrow=33)
for (i in 1:ncol(data))
{
if (i==34) {break}
a[i] <- colnames(data[i])
b <- data$cycle
GM[i] <- geoMean(data[,i], na.rm=TRUE)
GSD[i] <- geoSD(data[,i], na.rm=TRUE)
data2[i,] <- c(a[i], b, GM[i], GSD[i])
}
data2
If you look at the ?data.frame() help page, you'll see that it does not take arguments nrow and ncol--those are arguments for the matrix() function.
This is how you initialize data2, and you can see it starts with 2 columns, one column is named ncol, the second column is named nrow.
data2 <- data.frame(ncol=4, nrow=33)
data2
# ncol nrow
# 1 4 33
Instead you could try data2 <- as.data.frame(matrix(NA, ncol = 4, nrow = 33)), though if you share a small sample of data and your expected result there may be more efficient ways than explicit loops to get this job done.
Generally, if you do loop, you want to do as much outside of the loop as possible. This is just guesswork without having sample data, these changes seem like a start at improving your code.
a <- colnames(data)
b <- data$cycle ## this never changes, no need to redefine every iteration
GM <- numeric(ncol(data)) ## better to initialize vectors to the correct length
GSD <- numeric(ncol(data))
data2 <- as.data.frame(matrix(NA, ncol = 4, nrow = 33))
for (i in 1:ncol(data))
{
if (i==34) {break}
GM[i] <- geoMean(data[,i], na.rm=TRUE)
GSD[i] <- geoSD(data[,i], na.rm=TRUE)
## it's weird to assign a row of data.frame at once...
## maybe you should keep it as a matrix?
data2[i,] <- c(a[i], b, GM[i], GSD[i])
}
data2

How to order multiple dataframes in Global Environment R

I'm trying to Run a simulation but I'm having trouble storing multiple dataframes called "data_i" in a list ordering by i. I start with a df called "data_", which has data from 1901 to 2032 (132 rows). I apply a loop to create one dataframe per row called data_1, data_2,data_3,...,data_132 (row of 2032 is stored in data_132). Finally, I store all this dataframes in a list and use lapply to create a column in each dataframe. Here is a reproducible example:
#Main dataframe
time <- 1901:2032
b <- 1:132
data_ <- data.frame(time,b)
#Loop for creating data_i where i goes from 1 to 132
simulations <- 10000
for (i in 1:132) {
assign(paste("data_",i, sep = ""), as.data.frame( sapply(data_[i,], function(n) rep(n,simulations)), stringsAsFactors = FALSE ))
}
#Store all dataframes in list (**I THINK THE PROBLEM IS HERE**)
data_names<-str_extract(ls(), '^data_[[:digit:]]{1,3}$')[!is.na(str_extract(ls(), '^data_[[:digit:]]{1,3}$'))]
dataframes<-lapply(data_names, function(x)get(x))
#Create a new column in each dataframe
new_list <- lapply(dataframes, function(x) cbind(x, production = as.numeric(runif(simulations, min = 50, max = 100))))
#Create data_newi in environnment
list2env(setNames(new_list,paste0("data_new", seq_along(dataframes))),
envir = parent.frame())
The code runs but the problem is that the order of the dataframes is not data_1, data_2,data_3,...,data_132 but data_1,data_10,data_100,data_101...This generates that data_names stores this values in that order. This will lead to, for example, 2032 not being in data_new132 as I would want it to be.
Does anybody knows how to solve this? Thanks in advance!
Andres, See if this helps. I added a pad of '0' for the max number of characters (e.g. 132 = 3 characters wide):
#Main dataframe
time <- 1901:2032
b <- 1:132
data_ <- data.frame(time,b)
#Loop for creating data_i where i goes from 1 to 132
simulations <- 10000
for (i in 1:132) {
assign(paste("data_",str_pad(i,nchar(max(b)),pad="0"), sep = ""), as.data.frame( sapply(data_[i,], function(n) rep(n,simulations)), stringsAsFactors = FALSE ))
}
#Store all dataframes in list (**I THINK THE PROBLEM IS HERE**)
data_names<-str_extract(ls(), '^data_[[:digit:]]{1,3}$')[!is.na(str_extract(ls(), '^data_[[:digit:]]{1,3}$'))]
dataframes<-lapply(data_names, function(x)get(x))
#Create a new column in each dataframe
new_list <- lapply(dataframes, function(x) cbind(x, production = as.numeric(runif(simulations, min = 50, max = 100))))
#Create data_newi in environnment
list2env(setNames(new_list,paste0("data_new", paste(str_pad(seq_along(dataframes),nchar(max(seq_along(dataframes))),pad="0"),sep=""))),
envir = parent.frame())
1) Use mixedsort in gtools:
library(gtools)
for(i in c(2, 10)) assign(paste0("data", i), i)
ls(pattern = "^data")
## [1] "data10" "data2"
mixedsort(ls(pattern = "^data"))
## [1] "data2" "data10"
2) or ensure that the names are the same length using leading 0's in which case ls() will sort them appropriately:
for(i in c(2, 10)) assign(sprintf("data%03d", i), i)
ls(pattern = "^data")
## [1] "data002" "data010"
3) Normally one does not assign such objects directly into the global environment but puts them into a list. One can refer to elements using L[[1]], etc.
L <- list()
# for(i in 1:3) L[[i]] <- i
L
## [[1]]
## [1] 1
##
## [[2]]
## [1] 2
##
## [[3]]
## [1] 3
3a) or in one line:
L <- lapply(1:3, function(i) i)

Writing a for loop with the output as a data frame in R

I am currently working my way through the book 'R for Data Science'.
I am trying to solve this exercise question (21.2.1 Q1.4) but have not been able to determine the correct output before starting the for loop.
Write a for loop to:
Generate 10 random normals for each of μ= −10, 0, 10 and 100.
Like the previous questions in the book I have been trying to insert into a vector output but for this example, it appears I need the output to be a data frame?
This is my code so far:
values <- c(-10,0,10,100)
output <- vector("double", 10)
for (i in seq_along(values)) {
output[[i]] <- rnorm(10, mean = values[[i]])
}
I know the output is wrong but am unsure how to create the format I need here. Any help much appreciated. Thanks!
There are many ways of doing this. Here is one. See inline comments.
set.seed(357) # to make things reproducible, set random seed
N <- 10 # number of loops
xy <- vector("list", N) # create an empty list into which values are to be filled
# run the loop N times and on each loop...
for (i in 1:N) {
# generate a data.frame with 4 columns, and add a random number into each one
# random number depends on the mean specified
xy[[i]] <- data.frame(um10 = rnorm(1, mean = -10),
u0 = rnorm(1, mean = 0),
u10 = rnorm(1, mean = 10),
u100 = rnorm(1, mean = 100))
}
# result is a list of data.frames with 1 row and 4 columns
# you can bind them together into one data.frame using do.call
# rbind means they will be merged row-wise
xy <- do.call(rbind, xy)
um10 u0 u10 u100
1 -11.241117 -0.5832050 10.394747 101.50421
2 -9.233200 0.3174604 9.900024 100.22703
3 -10.469015 0.4765213 9.088352 99.65822
4 -9.453259 -0.3272080 10.041090 99.72397
5 -10.593497 0.1764618 10.505760 101.00852
6 -10.935463 0.3845648 9.981747 100.05564
7 -11.447720 0.8477938 9.726617 99.12918
8 -11.373889 -0.3550321 9.806823 99.52711
9 -7.950092 0.5711058 10.162878 101.38218
10 -9.408727 0.5885065 9.471274 100.69328
Another way would be to pre-allocate a matrix, add in values and coerce it to a data.frame.
xy <- matrix(NA, nrow = N, ncol = 4)
for (i in 1:N) {
xy[i, ] <- rnorm(4, mean = c(-10, 0, 10, 100))
}
# notice that i name the column names post festum
colnames(xy) <- c("um10", "u0", "u10", "u100")
xy <- as.data.frame(xy)
As this is a learning question I will not provide the solution directly.
> values <- c(-10,0,10,100)
> for (i in seq_along(values)) {print(i)} # Checking we iterate by position
[1] 1
[1] 2
[1] 3
[1] 4
> output <- vector("double", 10)
> output # Checking the place where the output will be
[1] 0 0 0 0 0 0 0 0 0 0
> for (i in seq_along(values)) { # Testing the full code
+ output[[i]] <- rnorm(10, mean = values[[i]])
+ }
Error in output[[i]] <- rnorm(10, mean = values[[i]]) :
more elements supplied than there are to replace
As you can see the error say there are more elements to put than space (each iteration generates 10 random numbers, (in total 40) and you only have 10 spaces. Consider using a data format that allows to store several values for each iteration.
So that:
> output <- ??
> for (i in seq_along(values)) { # Testing the full code
+ output[[i]] <- rnorm(10, mean = values[[i]])
+ }
> output # Should have length 4 and each element all the 10 values you created in the loop
# set the number of rows
rows <- 10
# vector with the values
means <- c(-10,0,10,100)
# generating output matrix
output <- matrix(nrow = rows,
ncol = 4)
# setting seed and looping through the number of rows
set.seed(222)
for (i in 1:rows){
output[i,] <- rnorm(length(means),
mean=means)
}
#printing the output
output

R: How to write a for loop that reads every two lines in a matrix?

I want to calculate correlation statistics using cor.test(). I have a data matrix where the two pairs to be tested are on consecutive lines (I have more than thousand pairs so I need to correct for that also later). I was thinking that I could loop through every two and two lines in the matrix and perform the test (i.e. first test correlation between row1 and row2, then row3 and row4, row5 and row6 etc.), but I don't know how to make this kind of loop.
This is how I do the test on a single pair:
d = read.table(file="cor-test-sample-data.txt", header=T, sep="\t", row.names = 1)
d = as.matrix(d)
cor.test(d[1,], d[2,], method = "spearman")
You could try
res <- lapply(split(seq_len(nrow(mat1)),(seq_len(nrow(mat1))-1)%/%2 +1),
function(i){m1 <- mat1[i,]
if(NROW(m1)==2){
cor.test(m1[1,], m1[2,], method="spearman")
}
else NA
})
To get the p-values
resP <- sapply(res, function(x) x$p.value)
indx <- t(`dim<-`(seq_len(nrow(mat1)), c(2, nrow(mat1)/2)))
names(resP) <- paste(indx[,1], indx[,2], sep="_")
resP
# 1_2 3_4 5_6 7_8 9_10 11_12 13_14
#0.89726818 0.45191660 0.14106085 0.82532260 0.54262680 0.25384239 0.89726815
# 15_16 17_18 19_20 21_22 23_24 25_26 27_28
#0.02270217 0.16840791 0.45563229 0.28533447 0.53088721 0.23453161 0.79235990
# 29_30 31_32
#0.01345768 0.01611903
Or using mapply (assuming that the rows are even)
ind <- seq(1, nrow(mat1), by=2) #similar to the one used by #CathG in for loop
mapply(function(i,j) cor.test(mat1[i,], mat1[j,],
method='spearman')$p.value , ind, ind+1)
data
set.seed(25)
mat1 <- matrix(sample(0:100, 20*32, replace=TRUE), ncol=20)
Try
d = matrix(rep(1:9, 3), ncol=3, byrow = T)
sapply(2*(1:(nrow(d)/2)), function(pair) unname(cor.test(d[pair-1,], d[pair,], method="spearman")$estimate))
pvalues<-c()
for (i in seq(1,nrow(d),by=2)) {
pvalues<-c(pvalues,cor.test(d[i,],d[i+1,],method="spearman")$p.value)
}
names(pvalues)<-paste(row.names(d)[seq(1,nrow(d),by=2)],row.names(d)[seq(2,nrow(d),by=2)],sep="_")

How to create a data frame with dimension M x N in R

Is there a way to do it?
I'm stuck with this:
m <- 10 # nof row
n <- 5 # nof column
# We will fill each cell with '0'
all <-c()
for (i in 1:m) {
row_i <- c(rep(0,n))
all <- c(all,row_i)
}
Which only create 1 row as output.
Why not use a matrix? data.frames are for storing columns of varying types.
So,
m = 10
n = 5
mat = matrix(0, nrow = m, ncol = n)
If you really want a data.frame, coerce to one - the column names will simply be a default:
dat = as.data.frame(mat)
names(dat)
[1] "V1" "V2" "V3" "V4" "V5"
The problem with your approach is that you simply append the values one after the other, ignoring the dimensions you want. You can do it like this, but it's not a good idea to grow data, better to allocate it all upfront as above. Also, this results in a matrix anyway, which I think is what you should use.
WARNING: bad code ahead!
m <- 10 # nof row
n <- 5 # nof column
all <- NULL
for (i in 1:m) {
row_i <- c(rep(0,n))
all <- rbind(all,row_i)
}
This produces data.frame filled with zeros.
as.data.frame(lapply(structure(.Data=1:N,.Names=1:N),function(x) numeric(M)))

Resources