Combining longitudinal data in separate CSVs into one with year field? - r

I am very familiar with Excel but new to R. I have several years worth of data across multiple spreadsheets:
data1996.csv
data1997.csv
...
data2013.csv
Each csv is about 500,000 rows by 1700 columns.
I want to manipulate this data in D3 and plan to remove columns that are not essential to calculation. My goal is to create a slider with years that will create a corresponding visualization. I want to know what the easiest way is to aggregate these massive datasets. I suppose it could be done manually, but this would prove cumbersome and inefficient.
I am very new to R, but if there is another means to aggregate the data into a single CSV, that would work fine as well.
Any help and suggestions are appreciated.

Related

Comparing two lists in R

Hi so I have two nearly identical data sets, however one has some values the other doesn't and I'm trying to compare them in R. I'm trying to create a list of the observations in the two data sets that aren't shared between the two, but I'm struggling with how to do this. I'm relatively new to R.
You should try the arsenal package.
try
install.packages("arsenal")
library(arsenal)
captureVariable <- summary(arsenal::comparedf(list1,list2))
captureVariable[["diffs.byvar.table"]]
There are some other helpful outputs that will be captured by captureVariable if that particular table doesn't suit your needs.

How to handle a large collection of time series in R?

I have data that represents about 50,000 different 2-year monthly time series. What would be the most convenient and tidyverse-ish way to store that in R? I'll be using R to review each series, trying to extract characteristic features of their shapes.
Somehow a data frame with 50,000 rows and 24 columns (plus a few more for meta data) seems awkward, because the time axis is in the columns. But what else should I be using? A list of xts objects? A data frame with 50,000x24 rows? A three-dimensional matrix? I'm not really seeing anything obviously convenient, and my friend google hasn't found any great examples for me either. I imagine this means I'm overlooking the obvious solution, so maybe someone can suggest it. Any help?

Merging specific columns from multiple excel files in R?

apologies for what is probably an already-answered question, I couldn't seem to find what I was looking for in the archives.
I'm currently in the process of trying to merge multiple excel files into one df for data analysis.
It's experimental data across different versions, and the variables in each column in excel are inconsistent (ie, in Version 1, ReactionTime is in Column AB, in Version 2 it's in AG). I need to merge the values from specified variables from the (~24) data files with different column structures into one long format df.
I've only ever used an excel macro to merge files before, and am unsure of how to go about specifying the variable names for merging. Any help you could provide would be appreciated!

Apply same subset to multiple datasets in R

I am pretty new to R, but I am working with several datasets containing the same data only from different days.
For my analysis I only need some specific columns from this dataset, therefore I created a new dataset with only the new colums (I do not want to overwrite or delete the old dataset). I am using the following code to do this:
subset01012018 <- (dataset01012018[,c(1,2,3,4,10,11,14,15,16)])
Now I want to apply the same to all the datasets. How could I do something like this? Could I do this with a for loop? Or do I need an apply function?
Hope someone can help me!

How to export a list of dataframes in R?

I have a list that consists of a large number of data frames, and every time R crashes I lose the variable and have to recreate it. The problem is that my list of data frames is pretty large and takes several hours to recreate.
Is there anyway to save/export this list so I can just load it into R at my convenience (without having to recreate it)?

Resources