This is for a tool I am building, so my solution can't be adhoc. I would like to save a ggplot object to a log file, which contains a lot of other information. Then I would use many of these log files to make a combined plot of all the ggplots. I've tried just saving the images as .pngs and then combining them in R but the quality decreases significantly when combining them from image files. Any ideas?
I don't want to save them to indiviual .Rdata files because I'd like all the information to be contained within the log file. Is my only option to save the dataframe used to construct the ggplot and then reconstruct it later?
I am not sure the reason for embedding in a log file, but you have two options here with saving r objects.
Option1. Save several memory objects at once.
save(ggplot1object, dataframe2, dataframe3, file = "location.filename.Rdata")
Then you can
load("location.filename.RData")
and all 3 objects will be loaded into memory.
Option 2. Create a List and save the list.
save(list(ggplot1object, dataframe2, dataframe3), "location.filename.Rdata")
Then you can
load("location.filename.Rdata")
and the single list item with the 3 different pieces will be loaded into the environment. These can be ggplot output items like p1, p2, etc... that represent different plot objects.
Related
I am running several calculations and ML algorithms in R and store their results in four distinctive tables.
For each calculation, I obtain four tables, which I store in a single list.
According to R, all of my lists are labelled as "Large List (4 elements, 971.2 kB)" in the upper right quadrant in RStudio where all my objects, functions, etc. are displayed.
I have five of these lists and save them for later use with the save() function.
I use the function:
save(list1, list2, list3, list4, list5, file="mypath/mylists.RData")
For some reason, which I do not understand, R takes more than 24 hours to save these four lists with only 971.2 kB each.
Maybe, I should add that apparently more than 10GB of my RAM are used by R at the time. However, the lists are as small as I indicated above.
Does anyone have an idea why it takes so long to save the lists to my harddrive and what I could do about it?
Thank you
This is just a guess, because we don't have your data.
Some objects in r contain references to environments. The most common examples are functions and formulas. If you save one of those, r may need to save the whole environment. This can drastically increase the size of what is being saved. If you are short of memory that could take a very long time due to swapping.
Example:
F <- function () {
X <- rnorm(1000000)
Y ~ z
}
This function returns a small formula which references the environment holding X, so saving it will take a lot of space.
Thanks for your answers.
I solved my problem by writing a function which extracts the tables from the objects and saves them as .csv files in a folder. I cleaned the environment and shut down the computer. Afterwards, I restarted the computer, started R and loaded all the .csv files again. I then saved the thus created objects with the familiar save() command.
It is probably not the most elegant way, but it worked and was quite quick.
I have three files I want to load in. They are all CSV's. I want to load all three files all at once into one combined dataframe. I want to do this so I don't have to go through three lines of loading in the files separately and then an additional line binding them all. I also want to save room in my Global Environment and having three dataframes loaded when I will combine them all into one will just take up room and make things confusing. I can click and view each dataframe separately but they aren't merged like I wanted them to be. How can I load in the files at once and merge them without reading each file one after another?
Use purrr::map_dfr():
files <- purrr::map_dfr(filelist, read.csv)
(All data frames have the same columns as my understanding.)
I'm currently trying to code an averaged matrix for all matrix values from a specific air quality variable (ColumnAmountNO2TropCloudScreened) positioned in different .ncdf4 files. The only way I was able to do it was listing all the files, opening them using lapply, creating a single NO2 variable for every ncdf. file and then applying abind to all of the variables. Even though I was able to do it, it took me a lot of time to type in different names for the NO2 variables (NO2_1, NO2_2,NO2_3,etc) and which row to access the original listed file ([[1]],[[2]],[[3]],etc).
I am trying to type in a code that's smarter and easier than just typing in a bunch of numbers. I have all the original .ncdf4 files listed, and am trying to loop over the files to open them and get the 'ColumnAmountNO2TropCloudScreened' matrix value from each, so then I can average them. However, I am having no luck. Would someone know what is wrong with this code/my thought over it? Thanks.
I'm trying the code as it follows:
# Load libraries
library(ncdf4)
library(abind)
library(plot.matrix)
# Set working directory
setwd("~/researchdatasets/2020")
# Declare data frame
df=NULL
# List all files in one file
files1= list.files(pattern='\\.nc4$',full.names=FALSE)
# Loop to open files, get NO2 variables
for(i in seq(along=files1)) {
nc_data = nc_open(files1[i])
NO2_var<-ncvar_get(nc_data,'ColumnAmountNO2TropCloudScreened')
nc_close(nc_data)
}
# Average variables
list_NO2= apply(abind::abind(NO2_var,along=3),1:2,mean,na.rm=TRUE)
NCO's ncra averages variables across all input files with, e.g.,
ncra in*.nc out.nc
I have an app where I want to pull out values from a lookup table based on user inputs. The reference table is a statistical test, based on a calculation that'd be too slow to do for all the different combinations of user inputs. Hence, a lookup table for all the possibilities.
But... right now the table is about 60 MB (as .Rdata) or 214 MB (as .csv), and it'll get much larger if I expand the possible user inputs. I've already reduced the number of significant figures in the data (to 3) and removed the row/column names.
Obviously, I can preload the lookup table outside the reactive server function, but it'll still take a decent chunk of time to load in that data. Does anyone have any tips on dealing with large amounts of data in Shiny? Thanks!
flaneuse, we are still working with a smaller set that you but we have been experimenting with:
Use rds for our data
As #jazzurro mentioned rds above, and you seem to know how to do this, but the syntax for others is below.
Format .rds allows you to bring in a single R object so you can rename it if needs be.
In your prep data code, for example:
mystorefile <- file.path("/my/path","data.rds")
# ... do data stuff
# Save down (assuming mydata holds your data frame or table)
saveRDS(mydata, file = mystorefile)
In your shiny code:
# Load in my data
x <- readRDS(mystorefile)
Remember to copy your data .rds file into your app directory when you deploy. We use a data directory /myapp/data and then file.path for store file is changed to "./data" in our shiny code.
global.R
We have placed our readRDS calls to load in our data in this global file (instead of in server.R before shinyServer() call), so that is run once, and is available for all sessions, with the added bonus it can be seen by ui.R.
See this scoping explanation for R Shiny.
Slice and dice upfront
The standard daily reports use the most recent data. So I make a small latest.dt in my global.R of a smaller subset of my data. So the landing page with the latest charts work with this smaller data set to get faster charts.
The custom data tab which uses the full.dt then is on a separate tab. It is slower but at that stage the user is more patient, and is thinking of what dates and other parameters to choose.
This subset idea may help you.
Would be interested in what others (with more demanding data sets have tried)!
I have 10+ files that I want to add to ArcMap then do some spatial analysis in an automated fashion. The files are in csv format which are located in one folder and named in order as "TTS11_path_points_1" to "TTS11_path_points_13". The steps are as follows:
Make XY event layer
Export the XY table to a point shapefile using the feature class to feature class tool
Project the shapefiles
Snap the points to another line shapfile
Make a Route layer - network analyst
Add locations to stops using the output of step 4
Solve to get routes between points based on a RouteName field
I tried to attach a snapshot of the model builder to show the steps visually but I don't have enough points to do so.
I have two problems:
How do I iterate this procedure over the number of files that I have?
How to make sure that every time the output has a different name so it doesn't overwrite the one form the previous iteration?
Your help is much appreciated.
Once you're satisfied with the way the model works on a single input CSV, you can batch the operation 10+ times, manually adjusting the input/output files. This easily addresses your second problem, since you're controlling the output name.
You can use an iterator in your ModelBuilder model -- specifically, Iterate Files. The iterator would be the first input to the model, and has two outputs: File (which you link to other tools), and Name. The latter is a variable which you can use in other tools to control their output -- for example, you can set the final output to C:\temp\out%Name% instead of just C:\temp\output. This can be a little trickier, but once it's in place it tends to work well.
For future reference, gis.stackexchange.com is likely to get you a faster response.