How to export a list of dataframes in R? - r

I have a list that consists of a large number of data frames, and every time R crashes I lose the variable and have to recreate it. The problem is that my list of data frames is pretty large and takes several hours to recreate.
Is there anyway to save/export this list so I can just load it into R at my convenience (without having to recreate it)?

Related

What are my options when dealing with very large tibbles?

I am doing some pre-processing on on data from multiple sources (multiple large CSV's, above 500mb), applying some transformations and ending up with a final tibble dataset whcih has all the data that I need in a tidy "format." At the end of that pre-processing, I save that final tibble as an .RData file that I import later for my subsequent statistical analysis.
The problem is that the tibble dataset is very big (takes 5gb memory in the R workspace) and it is very slow to save and to load. I haven't measured it in time but it takes over 15 minutes to save that object, even with compress = FALSE.
Question: Do I have any (ideally easy) options to speed all this up? I already checked and the data types in the tibble are all as they should be (character is charecter, numeric is dbl etc.)
Thanks
read_csv and the other tidyr functions aren't the fastest, but they make things really easy. Per the comments on your question, data.table::fread is a great option for speeding up the import of data in to data frames. It is ~7x faster than read_csv. Those data frames can then be easily be changed to tibbles using dplyr::as_tibble. You also may not even need to change the data frames to a tibble prior to processing as most tidyverse functions will accept a data frame input and give you a tibble output.

Set up and running function for multiple observations and variables

I have a question about the setup and execution of a function to some multivariate data.
My data file is set up in excel with each variable as individual sheets, and each trajectory as a row of data (100 trajectories in total). The values within each row across 365 columns show the measurements associated with the respective variable across time (daily measurements over 1 year).
I’ve done some analysis of 1 trajectory by setting up my data manually in a separate excel file, where I’ve got 16 columns containing separate variables, and 365 rows containing the associated data from each daily measurement. I’ve imported this into R as ‘Traj1’ and set up the function as follows;
> T1 <- Traj1[,1:16]
> multi.fun <- function(T1) {c(summary(T1),sd(T1), skewness(T1), kurtosis(T1), shapiro.test(T1))}
However, I need to do this with 100 trajectories, and this is extremely inefficient (both in R and Excel time).
I’m not sure how best to set this up in R with my initial excel file set up, and how this function should be set up so that I can batch execute and export the output into a new excel file.
Sorry I am new to programming in general and haven’t had much experience in dealing with large data sets. Any help is really appreciated.

extracting data from matlab file in R

It's the first time I deal with Matlab files in R.
The rationale for saving the information in a .mat file type was the length. (the dataset contains 226518 rows). We were worried to excel (and then a csv) would not take them.
I can upload the original file if necessary
So I have my Matlab file and when I open it in Matlab all good.
There are various arrays and the one I want is called "allPoints"
I can open it and then see that it contains values around 0.something.
Screenshot:
What I want to do is to extract the same data in R.
library(R.matlab)
df <- readMat("170314_Col_HD_R20_339-381um_DNNhalf_PPP1-EN_CellWallThickness.mat")
str(df)
And here I get stuck. How do I pull out "allPoints" from it. $ does not seem to work.
I will have multiple files that need to be put together in one single dataframe in R so the plan is to mutate each extracted df generating a new column for sample and then I will rbind together.
Could anybody help?

Combining longitudinal data in separate CSVs into one with year field?

I am very familiar with Excel but new to R. I have several years worth of data across multiple spreadsheets:
data1996.csv
data1997.csv
...
data2013.csv
Each csv is about 500,000 rows by 1700 columns.
I want to manipulate this data in D3 and plan to remove columns that are not essential to calculation. My goal is to create a slider with years that will create a corresponding visualization. I want to know what the easiest way is to aggregate these massive datasets. I suppose it could be done manually, but this would prove cumbersome and inefficient.
I am very new to R, but if there is another means to aggregate the data into a single CSV, that would work fine as well.
Any help and suggestions are appreciated.

Visualizing Data time series using the zoo package

I am loading time series data using the read.zoo function. I noticed that when loading time series using zoo package it doesn't display as a data frame and when clicked on in displayed as shown in the picture.
One cannot discern what the data looks like from this. While data pulled using the read.csv/read.table are labeled as a data.frame and displayed in neat manner when clicked on. I know I can simply use the View(data) command but this is very cumbersome, I am sorry to be picky but it would be nice to simply click on the data and have it displayed with the appropriate columns and rows.
I also noticed that when I generate variables using the data-set that the new variables are never attached to the data-set in which they were created and therefore must use the data=merge(data,newvariable) command to combine it to the initial data.
Are there any techniques that can be employed to fix these two issues?

Resources