vectorize replacement over 3d for loop - r

I would like to vectorize (or optimize in any way possible), the following 3d for loop:
dat: array with dim = c(n,n,m)
ref: matrix with dim = c(n,m)
for(i in 1:length(dat[,1,1])){
for(k in 1:length(dat[1,1,])){
dat[i,,k][dat[i,,k] > ref[i,k]] <- NA
}
}
The array I am working with is 7e3 x 7e3 x 2e2 so the for loop above is impractically expensive. To boot, I will need to perform two or three very similar operations (on different arrays), so any saved time will be multiplied.
Example dat and ref arrays:
dat <- array(seq(1,75), dim=c(5,5,3))
ref <- cbind(seq(6,10), seq(36,40), seq(61,65))

You can use this instead. It creates a new_ref array which is conformable to dat, so you can compare them directly:
new_ref <- aperm(array(ref, dim(dat)[c(1,3,2)]), c(1,3,2))
dat3 <- dat
dat3[dat3 > new_ref] <- NA
Comparison with your loop:
dat2 <- dat
for(i in 1:length(dat[,1,1])){
for(k in 1:length(dat[1,1,])){
dat2[i,,k][dat2[i,,k] > ref[i,k]] <- NA
}
}
identical(dat2, dat3)
#[1] TRUE

Related

How do I save a single column of data produced from a while loop in R to a dataframe?

I have written the following very simple while loop in R.
i=1
while (i <= 5) {
print(10*i)
i = i+1
}
I would like to save the results to a dataframe that will be a single column of data. How can this be done?
You may try(if you want while)
df1 <- c()
i=1
while (i <= 5) {
print(10*i)
df1 <- c(df1, 10*i)
i = i+1
}
as.data.frame(df1)
df1
1 10
2 20
3 30
4 40
5 50
Or
df1 <- data.frame()
i=1
while (i <= 5) {
df1[i,1] <- 10 * i
i = i+1
}
df1
If you already have a data frame (let's call it dat), you can create a new, empty column in the data frame, and then assign each value to that column by its row number:
# Make a data frame with column `x`
n <- 5
dat <- data.frame(x = 1:n)
# Fill the column `y` with the "missing value" `NA`
dat$y <- NA
# Run your loop, assigning values back to `y`
i <- 1
while (i <= 5) {
result <- 10*i
print(result)
dat$y[i] <- result
i <- i+1
}
Of course, in R we rarely need to write loops like his. Normally, we use vectorized operations to carry out tasks like this faster and more succinctly:
n <- 5
dat <- data.frame(x = 1:n)
# Same result as your loop
dat$y <- 10 * (1:n)
Also note that, if you really did need a loop instead of a vectorized operation, that particular while loop could also be expressed as a for loop.
I recommend consulting an introductory book or other guide to data manipulation in R. Data frames are very powerful and their use is a necessary and essential part of programming in R.

Loop over a list in R

I want to do an operation if each data frame of a list. I want to perform the Kolmogorov–Smirnov (KS) test for one column in each data frame. I am using the code below but it is not working:
PDF_mean <- matrix(nrow = length(siteNumber), ncol = 4)
PDF_mean <- data.frame(PDF_mean)
names(PDF_mean) <- c("station","normal","gamma","gev")
listDF <- mget(ls(pattern="DSF_moments_"))
length(listDF)
i <- 1
for (i in length(listDF)) {
PDF_mean$station[i] <- siteNumber[i]
PDF_mean$normal[i] <- ks.test(list[i]$mean,"pnorm")$p.value
PDF_mean$gev[i] <- ks.test(list[i]$mean,"pgev")$p.value
PDF_mean$gamma[i] <- ks.test(list[i]$mean,"gamma")$p.value
}
Any help?
It is not length(listDF) instead, it would be seq_along(listDF) or 1:length(listDF) (however, it is more appropriate with seq_along) because length is a single value and it is not doing any loop
for(i in seq_along(listDF)) {
PDF_mean$station[i] <- listDF[[i]]$siteNumber
PDF_mean$normal[i] <- ks.test(listDF[[i]]$mean,"pnorm")$p.value
PDF_mean$gev[i] <- ks.test(listDF[[i]]$mean,"pgev")$p.value
PDF_mean$gamma[i] <- ks.test(listDF[[i]]$mean,"gamma")$p.value
}

R loop to create data frames with 2 counters

What I want is to create 60 data frames with 500 rows in each. I tried the below code and, while I get no errors, I am not getting the data frames. However, when I do a View on the as.data.frame, I get the view, but no data frame in my environment. I've been trying for three days with various versions of this code:
getDS <- function(x){
for(i in 1:3){
for(j in 1:30000){
ID_i <- data.table(x$ID[j: (j+500)])
}
}
as.data.frame(ID_i)
}
getDS(DATASETNAME)
We can use outer (on a small example)
out1 <- c(outer(1:3, 1:3, Vectorize(function(i, j) list(x$ID[j:(j + 5)]))))
lapply(out1, as.data.table)
--
The issue in the OP's function is that inside the loop, the ID_i gets updated each time i.e. it is not stored. Inorder to do that we can initialize a list and then store it
getDS <- function(x) {
ID_i <- vector('list', 3)
for(i in 1:3) {
for(j in 1:3) {
ID_i[[i]][[j]] <- data.table(x$ID[j:(j + 5)])
}
}
ID_i
}
do.call(c, getDS(x))
data
x <- data.table(ID = 1:50)
I'm not sure the description matches the code, so I'm a little unsure what the desired result is. That said, it is usually not helpful to split a data.table because the built-in by-processing makes it unnecessary. If for some reason you do want to split into a list of data.tables you might consider something along the lines of
getDS <- function(x, n=5, size = nrow(x)/n, column = "ID", reps = 3) {
x <- x[1:(n*size), ..column]
index <- rep(1:n, each = size)
replicate(reps, split(x, index),
simplify = FALSE)
}
getDS(data.table(ID = 1:20), n = 5)

lapply() changing global variable in R

Using R, I wanted to save each variable's value when running lapply().
Below is what I tested now:
list_C <- list()
list_D <- list()
n <- 1
data_partition <- split(data, with(data, paste(A, B, sep=":")))
final_result <- lapply(data_partition,
function(dat) {
if(... condition ...) {
<Some R codes to run>
list_C[[n]] <- dat$C
list_D[[n]] <- dat$D
n <- n + 1
}
})
However, after running the code, 'n' remains just '1' and there's no change. How can I change the variable of 'n' to get the right saving lists of 'list_C' and 'list_D'?

subsetting a list of data frames using a for loop

My question is why does the last statement "a <- ..." work to give me a subset of that data frame within the list, but when I try to automate the process with a for loop through all data frames in the list I am met with all kinds of warnings and not the answer I am looking for??
time <- c(1:20)
temp <- c(2,3,4,5,6,2,3,4,5,6,2,3,4,5,6,2,3,4,5,6)
data <- data.frame(time,temp)
tmp <- c(1,diff(data[[2]]))
tmp2 <- tmp < 0
tmp3 <- cumsum(tmp2)
data1 <- split(data, tmp3)
#this does not work. I want to automate the successful process below through all data frames in the list "data1"
for(i in 1:length(data1)){
finale[i] <- subset(data1[[i]], data1[[i]][,2] > 3)
}
#this works to give me a part of what I want
a <- subset(data1[[1]], data1[[1]][,2] >3)
Maybe you may want to try with lapply
lapply(data1, function(x) subset(x, x[,2]>3))
Same result using a for loop
finale <- vector("list", length(data1))
for(i in 1:length(data1)){
finale[[i]] <- subset(data1[[i]], data1[[i]][,2] > 3)
}
It works because I preallocate a type and a length for finale, it didn't work for you, because you did not declare what finale should be.
You're trying to save a data.frame (2D object) in a vector (1D objetc). Just define finale as list and the code will work:
time <- c(1:20)
temp <- c(2,3,4,5,6,2,3,4,5,6,2,3,4,5,6,2,3,4,5,6)
data <- data.frame(time,temp)
tmp <- c(1,diff(data[[2]]))
tmp2 <- tmp < 0
tmp3 <- cumsum(tmp2)
data1 <- split(data, tmp3)
#this does not work. I want to automate the successful process below through all data frames in the list "data1"
finale <- vector(mode='list')
for(i in 1:length(data1)){
finale[[i]] <- subset(data1[[i]], data1[[i]][,2] > 3) # Use [[i]] instead of [i]
}
To save all in 1 data.frame:
finale <- do.call(rbind, finale)

Resources