How to load multiple files in R? - r

I have three files I want to load in. They are all CSV's. I want to load all three files all at once into one combined dataframe. I want to do this so I don't have to go through three lines of loading in the files separately and then an additional line binding them all. I also want to save room in my Global Environment and having three dataframes loaded when I will combine them all into one will just take up room and make things confusing. I can click and view each dataframe separately but they aren't merged like I wanted them to be. How can I load in the files at once and merge them without reading each file one after another?

Use purrr::map_dfr():
files <- purrr::map_dfr(filelist, read.csv)
(All data frames have the same columns as my understanding.)

Related

Import only specific cells from Excel to R by hard coding

I have around 100 equal .xls files containing 10 sheets each, with very messy data, here is a thought example of one sheet:
I want to add everything together in one R dataframe/tibble.
I don't know the right approach here, but I believe that I can hard code this within readxl::read.xls. It should look like this
I would like if somebody could show a short code of how to pick a cell to be the column name by its position and the data belonging to that column, also by its position/range.
Afterwards, I will find a way to loop this to all sheets within all files, or better: If I can specify the needed code for a certain sheet name within the read.xls function. Then i only have to loop on all the files.
Thanks and let me know if you need some more information on this.

How do I use a for loop to open .ncdf files and average a matrix variable that has different values over all the files? (Using R Programming)

I'm currently trying to code an averaged matrix for all matrix values from a specific air quality variable (ColumnAmountNO2TropCloudScreened) positioned in different .ncdf4 files. The only way I was able to do it was listing all the files, opening them using lapply, creating a single NO2 variable for every ncdf. file and then applying abind to all of the variables. Even though I was able to do it, it took me a lot of time to type in different names for the NO2 variables (NO2_1, NO2_2,NO2_3,etc) and which row to access the original listed file ([[1]],[[2]],[[3]],etc).
I am trying to type in a code that's smarter and easier than just typing in a bunch of numbers. I have all the original .ncdf4 files listed, and am trying to loop over the files to open them and get the 'ColumnAmountNO2TropCloudScreened' matrix value from each, so then I can average them. However, I am having no luck. Would someone know what is wrong with this code/my thought over it? Thanks.
I'm trying the code as it follows:
# Load libraries
library(ncdf4)
library(abind)
library(plot.matrix)
# Set working directory
setwd("~/researchdatasets/2020")
# Declare data frame
df=NULL
# List all files in one file
files1= list.files(pattern='\\.nc4$',full.names=FALSE)
# Loop to open files, get NO2 variables
for(i in seq(along=files1)) {
nc_data = nc_open(files1[i])
NO2_var<-ncvar_get(nc_data,'ColumnAmountNO2TropCloudScreened')
nc_close(nc_data)
}
# Average variables
list_NO2= apply(abind::abind(NO2_var,along=3),1:2,mean,na.rm=TRUE)
NCO's ncra averages variables across all input files with, e.g.,
ncra in*.nc out.nc

How to save multiple R objects to a file with other data?

This is for a tool I am building, so my solution can't be adhoc. I would like to save a ggplot object to a log file, which contains a lot of other information. Then I would use many of these log files to make a combined plot of all the ggplots. I've tried just saving the images as .pngs and then combining them in R but the quality decreases significantly when combining them from image files. Any ideas?
I don't want to save them to indiviual .Rdata files because I'd like all the information to be contained within the log file. Is my only option to save the dataframe used to construct the ggplot and then reconstruct it later?
I am not sure the reason for embedding in a log file, but you have two options here with saving r objects.
Option1. Save several memory objects at once.
save(ggplot1object, dataframe2, dataframe3, file = "location.filename.Rdata")
Then you can
load("location.filename.RData")
and all 3 objects will be loaded into memory.
Option 2. Create a List and save the list.
save(list(ggplot1object, dataframe2, dataframe3), "location.filename.Rdata")
Then you can
load("location.filename.Rdata")
and the single list item with the 3 different pieces will be loaded into the environment. These can be ggplot output items like p1, p2, etc... that represent different plot objects.

Load multiple tables from one word sheet and split them by detecting one fully emptied row

So generally what I would like to do is to load a sheet with 4 different tables, and split this one big data into smaller tables using str_detect() to detect one fully blank row that's deviding those tables. After that I want to plug that information into the startRow, startCol, endRow, endCol.
I have tried using this function as followed :
str_detect(my_data, ‘’) but the my_data format is wrong. I’m not sure what step shall I make do prevent this and make it work.
I’m using read_xlsx() to read my dataset

How do I stack data in R?

I have 20 different .csv files and I need to some how stack the data in R so that I can get an overall picture of the data.
Presently I am copying and pasting the columns in excel to make one big data set.
However, I am sure there is a quicker and more efficient way of doing this in R as this would ultimately take a while.
Also, to make things worse some of the variable names are not the same in each data set.
eg VARIABLE1 is written as variable1 in some datasets. How would i rectify this in R as I understand that R is case sensitive?
Any help would be greatly appreciated. Thanks!
The easiest and the fastest way to do this, if you're (or wish you to be) familiar with data.table package is this way (not tested):
require(data.table)
in_pth <- "path_to_csv_files" # directory where CSV files are located, not the files.
files <- list.files(in_pth, full.names=TRUE, recursive=FALSE, pattern="\\.csv$")
out <- rbindlist(lapply(files, fread))
list.files parameters:
full.names = TRUE will return the full path to your file. Suppose your in_pth <- "c:\\my_csv_folder" and inside this you've two files: 01.csv and 02.csv. Then, full.names=TRUE will return c:\\my_csv_folder\\01.csv and c:\\my_csv_folder\\02.csv (full path).
recursive = FALSE will not search inside directories within your in_pth folder. Assume you've two more csv files in c:\\my_csv_folder\\another_folder. Now, if you want to load these files inside this one, then you can set recursive=TRUE, which'll scan for files until you reach all directories searching down.
pattern=\\.csv$: This is a regular expression to tell which sort of files to load. If your folder, in addition to csv files also has text files (.txt), then by specifying this pattern, you'll load only the csv files. If your folder has only CSV files, then this is not necessary.
data.table functions:
rbindlist avoids conflict in column names by retaining the name of the previous data.table. That is, if you've two data.tables dt1, dt2 with column names x,y and a,b respectively, then doing rbindlist(dt1,dt2) will take care of changing a,b to x,y and rbindlist(dt2, dt1) will take care of changing x,y to a,b.
fread takes care of columns, headers separators etc most often automatically.. and is extremely fast (although still experimental, so you may want to check your output to be sure it's all fine (even if stable)).
# Denis:It is also worth looking into the plyr package for the same. rbind.fill(...) allows you to combine data.frames by row.
install.packages("plyr")
library(plyr)
help (rbind.fill) for details gives you following:
rbinds a list of data frames filling missing columns with NA.
Usage
rbind.fill(...)
Arguments
...
input data frames to row bind together. The first argument can be a list of data frames, in which case all other arguments are ignored.
Details
This is an enhancement to rbind that adds in columns that are not present in all inputs, accepts a list of data frames, and operates substantially faster.
Column names and types in the output will appear in the order in which they were encountered. No checking is performed to ensure that each column is of consistent type in the inputs.
To my knowledge,there is no cbind.fill; however, there is the user function cbind.fill that allows you to combine data.frames by column. Details here.
There are two solutions: one depending on rbind.fill in the plyr package and another is independent of rbind.fill.
Another way, without using external packages, is to use the cbind() command: it makes the binding per column.. So if you have to different tables you can just pass them as arguments to cbind() and they will be appended

Resources