R cbind is very slow

R cbind is very slow - r

There is a question that has a very similar title (cbind runs very slow) but it does not help me with my problem. I am retrieving 10+ JSON files with 100 variables each and I try to create one big data.frame/table with 1000 columns. In practice, I do not use the very same JSON-file as per the example but different ones. Ideally only the problematic line cx <- cbind(cx, bx) would speed up as the other lines (unlist, as.data.table) work well for me and I would not know what else to use. I know, "cbind is slow" but do I have any alternatives? Ideally with Base R.
library(jsonlite)
library(data.table)
starttime <- Sys.time()
for (i in 1:10) { # loop through all 10 json files
zz <- Sys.time() # measuring the time for each loop
urlx <- "http://mysafeinfo.com/api/data?list=englishmonarchs&format=json"
jsnx <- fromJSON(urlx)
if(i==1) {
ax <- unlist(jsnx)
bx <- as.data.table(ax)
cx <- bx
}
for (j in 1:100) { # loop through all 100 variables in each file
ax <- unlist(jsnx)
bx <- as.data.table(ax)
cx <- cbind(cx, bx) # <---- VERY SLOW ----
}
zz <- round(Sys.time()-zz,1)
print(sprintf("%1.1f", zz))
flush.console()
}
endtime <- Sys.time()
endtime-starttime
This gets slower and slower with more files, here my timings.
[1] "0.7"
[1] "1.3"
[1] "1.3"
[1] "1.6"
[1] "2.1"
[1] "2.2"
[1] "2.5"
[1] "3.2"
[1] "3.4"
[1] "3.5"

Related

Saving the Output of a Loop Every "X" Seconds

I want to make a loop that saves the results of the loop to your computer every "x" seconds. For example, consider the following loop:
my_list <- list()
for (i in 1:10000000) {
a_i <- rnorm(1, 100, 100)
my_list[[i]] <- a_i
saveRDS(my_list, "my_list.RDS")
}
I want to make it so that the "saveRDS" command is executed every 33 seconds, thus overwriting the previous version of the file.
I know that the "Sys.time()" function can be used to record time in R and "Sys.sleep()" can be used to pause time - but I am not sure how it can be used to use these functions together to perform this task.
Can someone please show me how to do this?
Thank you!

# Jay.sf : I made this small adjustment based on your answer and everything seems to be working now! Thank you so much!
my_list <- list()
for (i in 1:100000000000) {
a_i = rnorm(1,100,100)
tm <- Sys.time()
my_list[[i]] <- a_i
saveRDS(my_list, "my_list.RDS")
Sys.sleep(5) ## actually 33 secs
}

If you want the loop to do stuff every x seconds, set Sys.sleep at the end, i.e. before the next iteration starts. The time might get longer, if the "stuff" to be done needs long time. Here my proposal:
my_list <- list()
for (i in 1:5) {
tm <- Sys.time()
## do stuff
my_list[[i]] <- tm
saveRDS(my_list, "my_list.RDS")
Sys.sleep(3) ## actually 33 secs
}
readRDS("my_list.RDS")
# [[1]]
# [1] "2022-09-25 19:37:26 CEST"
#
# [[2]]
# [1] "2022-09-25 19:37:29 CEST"
#
# [[3]]
# [1] "2022-09-25 19:37:32 CEST"
#
# [[4]]
# [1] "2022-09-25 19:37:35 CEST"
#
# [[5]]
# [1] "2022-09-25 19:37:38 CEST"

How to sum list of list of vector in R

Suppose I have a list of list. I would like to divide each vector of Tau by the sum of all vectors of Tau. That is,
Tau[[1]][[1]] / sum(Tau[[1]],Tau[[1]][[2]],Tau[[2]][[1]],Tau[[2]][[1]])
I would like to do this for each element of Tau. I tried the Reduce but it returns me an error.
tau1 <- rnorm(10,0,1)
tau2 <- rnorm(10,0,1)
tau <- list(tau1, tau2)
tau
tau3 <- rnorm(10, 0,1)
tau4 <- rnorm(10,0,1)
tau5 <- list(tau3, tau4)
tau5
Tau <- list(tau, tau5)
Tau
[[1]]
[[1]][[1]]
[1] 0.41435211 -0.28983281 0.96462705 -1.32050463 -0.15736981 0.07512305
[7] -0.73394053 -0.12630874 0.21886818 1.57760128
[[1]][[2]]
[1] -1.31643065 1.24744501 0.09073152 -1.02300779 0.63927688 -2.09642019
[7] 1.25458113 -0.21542568 -0.07314255 1.02092833
[[2]]
[[2]][[1]]
[1] 0.2582012 0.9561437 -0.8351850 0.3028827 -0.7016825 -0.6400293
[7] 0.1925083 -1.0869632 0.3688728 -0.1837725
[[2]][[2]]
[1] -2.560212660 1.953122685 0.087180131 2.252459267 -0.003317207
[6] -1.767479446 -0.298496963 0.015214568 0.300665882 -1.017860244
Reduce("+", Tau)
Error in f(init, x[[i]]) : non-numeric argument to binary operator
Any help, please?

try:
Tau[[1]][[1]] / sum(unlist(Tau))
and since this is 2 level list of list:
lapply(Tau, FUN = function(x) ### to dive into the first level
lapply(x,FUN = function(x) (x/sum(unlist(Tau)))))

You can try a tidyverse solution
library(tidyverse)
Tau %>%
flatten() %>% # This function removes a level hierarchy from the list.
map(function(x) x/sum(unlist(.))) # This function applies a function to each element of the list
[[1]]
[1] -0.3101120 -0.1273576 0.8624357 0.0390124 0.0715351 0.9489481 0.2550256 -0.6999603
[9] -0.3800367 -0.2465854
[[2]]
[1] 0.67728632 0.19908554 0.22174745 0.06124092 -0.30754775 0.98870176 0.27546143 -1.08813227
[9] 0.38806129 -0.26159621
[[3]]
[1] -0.59082848 -0.12060585 -0.56768982 -0.40329663 -0.34583518 -0.93324998 0.46354885 0.08486158
[9] -0.62973290 0.69373770
[[4]]
[1] 0.23596330 -0.16326350 0.49527439 0.48587260 0.45458206 0.38102570 0.30648348 -0.03425584
[9] -0.16928961 -0.21051518
your data. I added a seed for reproducibility
set.seed(123)
tau1 <- rnorm(10,0,1)
tau2 <- rnorm(10,0,1)
tau3 <- rnorm(10, 0,1)
tau4 <- rnorm(10,0,1)
Tau <- list(list(tau1, tau2), list(tau3, tau4))

Tau <- list(tau, tau5) is making a list with two elements, both of which are themselves lists. So the Reduce call tries to add two lists together, which isn't defined.
You need to use append to combine the elements of two lists into a single list:
Tau <- append(tau, tau5)
Reduce("+", Tau)
# [1] -0.7481876 3.2098496 1.9950819 2.8188345 1.4200328 0.2202510
# [7] 0.1448013 0.8132506 -0.7788742 0.5466227

I think you can just use unlist. From its help file:
Given a list structure x, unlist simplifies it to produce a vector which contains all the atomic components which occur in x.
sum(unlist(Tau))

Spatial data and memory

I am trying to add up geotiffs but am running into memory issues. R is using all 32GB according the the following R error...
In writeValues(y, x, start = 1) :
Reached total allocation of 32710Mb: see help(memory.size)
I also checked the properties of R and it is 64 bit and the target is...
"C:\Program Files\R\R-3.3.0\bin\x64\Rgui.exe"
The version is
R.Version()
$platform
[1] "x86_64-w64-mingw32"
$arch
[1] "x86_64"
$os
[1] "mingw32"
$system
[1] "x86_64, mingw32"
$status
[1] ""
$major
[1] "3"
$minor
[1] "3.0"
$year
[1] "2016"
$month
[1] "05"
$day
[1] "03"
$`svn rev`
[1] "70573"
$language
[1] "R"
$version.string
[1] "R version 3.3.0 (2016-05-03)"
$nickname
[1] "Supposedly Educational"
So it looks like my max memory is being used by R. I tried the to use bigmemory package in R. So in the code below I tried changing the matrix to big.matrix but that failed and the error occurs when trying to write the output file. Any suggestions for trying to alter the code so less memory is used or try to work in the package ff or bigmemory?
############ LOOP THROUGH AGE MAPS TO COMPILE THE NUMBER OF TIMES A CELL BURNS DURING A GIVEN SPAN OF TIME ####################
## Empirical Fires
print("1 of 3: 2010-2015")
burn.mat<- matrix(0,nrow,ncol) #create matrix of all zero's, the dimension of your landscape (row, col)
# Read in Historical Fire maps
for (j in 2010:2015){ #Year Loop
age.tmp<- as.matrix(raster(paste('fr',j,'.tif',sep=''))) #read in Age Map
burn.mat<- burn.mat+(age.tmp==1) #when something has burned in ALFRESCO empirical fire history files, AGE=1. (age.tmp==0) is a 'logic' cmd, returning a 0,1 map for True/False
#Write the data to a geotiff
out <- raster(burn.mat,xmn=-1692148,xmx= 1321752, ymn = 490809.9, ymx = 2245610, crs = '+proj=aea +lat_1=55 +lat_2=65 +lat_0=50 +lon_0=-154 +x_0=0 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m +no_defs')
writeRaster(out,filename=paste(outdir,'/burn.mat.hist.1950-2007.tif',sep=''),format = 'GTiff',options='COMPRESS=LZW',datatype='FLT4S',overwrite=T)
}

The problem will probably go away if you use Raster* objects rather than matrices. Something like
library(raster)
r <- raster('fr2010.tif')
burn.mat <- setValues(r, 0)
for (j in 2010:2015) {
age.tmp <- raster(paste0('fr', j, '.tif'))
burn.mat <- burn.mat + (age.tmp==1)
# if age.tmp only has values of 0 and 1 use this instead:
# burn.mat <- burn.mat + age.tmp
}
# write the results outside of the loop
writeRaster(burn.mat, filename=file.path(outdir, 'burn.mat.hist.1950-2007.tif'), options='COMPRESS=LZW',datatype='FLT4S',overwrite=TRUE)
A more direct approach without a loop
files <- paste0('fr', 2010:2015, '.tif'))
s <- stack(files)
burn <- sum(s)
Or
burn <- sum(s == 1)
Or to write to a file in one step
b <- calc(s, sum, filename=file.path(outdir, 'burn.mat.hist.1950-2007.tif'), options='COMPRESS=LZW', datatype='FLT4S', overwrite=TRUE)

Create a list with named values by applying a function to each row of a data frame

I'm trying to get a list where each element has a name, by applying a function to each row of a data frame, but can't get the right output.
Assuming this is the function that I want to apply to each row:
format_setup_name <- function(m, v, s) {
a <- list()
a[[paste(m, "machines and", v, s, "GB volumes")]] <- paste(num_machines,num_volumes,vol_size,sep="-")
a
}
If this is the input data frame:
df <- data.frame(m=c(1,2,3), v=c(3,3,3), s=c(15,20,30))
I can't get a list that looks like:
$`1-3-15`
[1] "1 machines and 3 15 GB volumes"
$`2-3-20`
[1] "2 machines and 3 20 GB volumes"
$`3-3-30`
[1] "3 machines and 3 30 GB volumes"
Can someone give me hints how to do it?
Why do I need this? Well, I want to populate selectizeInput in shiny using values coming from the database. Since I'm combining several columns, I need a way to match the selected input with the values.

This is a good use case for setNames which can add the names() attribute to an object, in place. Also, if you use as.list, you can do this in just one line without any looping:
setNames(as.list(paste(df$m, ifelse(df$m == 1, "machine", "machines"), "and", df$v, df$s, "GB volumes")), paste(df$m,df$v,df$s,sep="-"))
# $`1-3-15`
# [1] "1 machine and 3 15 GB volumes"
#
# $`2-3-20`
# [1] "2 machines and 3 20 GB volumes"
#
# $`3-3-30`
# [1] "3 machines and 3 30 GB volumes"

Thomas has already found a pretty neat solution to your problem (and in one line, too!). But I'll just show you how you could have succeeded with the approach you first tried:
# We'll use the same data, this time called "dat" (I avoid calling
# objects `df` because `df` is also a function's name)
dat <- data.frame(m = c(1,2,3), v = c(3,3,3), s = c(15,20,30))
format_setup_name <- function(m, v, s) {
a <- list() # initialize the list, all is well up to here
# But here we'll need a loop to assign in turn each element to the list
for(i in seq_along(m)) {
a[[paste(m[i], v[i], s[i], sep="-")]] <-
paste(m[i], "machines and", v[i], s[i], "GB volumes")
}
return(a)
}
Note that what goes inside the brackets is the name of the element, while what's at the right side of the <- is the content to be assigned, not the inverse as your code was suggesting.
So let's try it:
my.setup <- format_setup_name(dat$m, dat$v, dat$s)
my.setup
# $`1-3-15`
# [1] "1 machines and 3 15 GB volumes"
#
# $`2-3-20`
# [1] "2 machines and 3 20 GB volumes"
#
# $`3-3-30`
# [1] "3 machines and 3 30 GB volumes"
Everything seems nice. Just one thing to note: with the $ operator, you'll need to use single or double quotes to access individual items by their names:
my.setup$"1-3-15" # my.setup$1-3-15 won't work
# [1] "1 machines and 3 15 GB volumes"
my.setup[['1-3-15']] # equivalent
# [1] "1 machines and 3 15 GB volumes"
Edit: lapply version
Since loops have really fallen out of favor, here's a version with lapply:
format_setup_name <- function(m, v, s) {
a <- lapply(seq_along(m), function(i) paste(m[i], "machines and", v[i], s[i], "GB volumes"))
names(a) <- paste(m, v, s, sep="-")
return(a)
}

Find similar elements between two lists and Replace with a corresponding elements

I have a list of probe ids as below :
> dput(best)
list(c("204639_at", "203440_at", "242136_x_at", "231954_at",
"208388_at", "205942_s_at", "203510_at", "204639_at"), c("204639_at",
"203510_at", "231954_at"))
Then I have used this file:
> head(sym)
x
204639_at ADA
203440_at CDH2
242876_at AKT3
207078_at MED6
208388_at NR2E3
222161_at NAALAD2
> class(sym)
[1] "data.frame"
Then, I want to find alternative names :
("ADA" "CDH2" "AKT3" "MED6" "NR2E3" "NAALAD2")
In sym and replace existing with elements in "best" file. Does anyone have a hack? Thanks

There is no "hack" needed.
#your data:
best <- list(list(c("204639_at", "203440_at", "242136_x_at", "231954_at", "208388_at", "205942_s_at", "203510_at", "204639_at" )),
list(c("204639_at", "203510_at", "231954_at")))
sym <- read.table(text=" x
204639_at ADA
203440_at CDH2
242876_at AKT3
207078_at MED6
208388_at NR2E3
222161_at NAALAD2", header=TRUE)
#iterate through list and match against sym
rapply(best, function(x) {
res <- as.character(sym[x,1])
#omit the following line if you prefer NAs for nomatches
res[is.na(res)] <- x[is.na(res)]
res
}, how="list")
#[[1]]
#[[1]][[1]]
#[1] "ADA" "CDH2" "242136_x_at" "231954_at" "NR2E3" "205942_s_at" "203510_at" "ADA"
#
#
#[[2]]
#[[2]][[1]]
#[1] "ADA" "203510_at" "231954_at"

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R cbind is very slow - r

Related

Saving the Output of a Loop Every "X" Seconds

How to sum list of list of vector in R

Spatial data and memory

Create a list with named values by applying a function to each row of a data frame

Find similar elements between two lists and Replace with a corresponding elements

Categories

Resources