How to handle parsing hundreds files in R? - r

I want to parse last year weather data which are recorded in CSV files. Each CSV file includes one day of data. So I have 365 CSV files need to parse. What is the best way to handle these files? As far as I know, I need to load all of them into R and bind them into one big data frame. But I don't know whether this is the best solution. What if I have more than one years of data files? Do I need to load all of them into memory? Or is there any other way to handle them?
Each file is about 1M to 1.5M.

The easiest way to do this is to get all your files to read using list.files, read them into a list of data frames, then rbind all the frames together:
#setwd('dirwithallmycsvs')
x <- list.files(pattern = '.+\\.csv$')
out = lapply(x, read.csv)
out2 = do.call(rbind, out)
Your output should now be one dataframe. You will need to take care all the columns are the same across your files.

Related

import multiple csv from web into one data frame

i want to read out several csv files from the web and save the data into a data frame.
If the files were on my computer this would be very easy as I have seen but I don't always want to download the files.
The example:
"https://www.football-data.co.uk/mmz4281/1819/F1.csv",
"https://www.football-data.co.uk/mmz4281/1718/F1.csv",
"https://www.football-data.co.uk/mmz4281/1617/F1.csv",
"https://www.football-data.co.uk/mmz4281/1516/F1.csv",
"https://www.football-data.co.uk/mmz4281/1415/F1.csv",
"https://www.football-data.co.uk/mmz4281/1314/F1.csv",
"https://www.football-data.co.uk/mmz4281/1213/F1.csv",
"https://www.football-data.co.uk/mmz4281/1112/F1.csv",
"https://www.football-data.co.uk/mmz4281/1011/F1.csv"
These are the CSV files. maybe its possible with a function or a loop but i dont know how.
Maybe you can help me.
Greetings
Reading files from the web is just as easy as reading them from your file system; you can just pass a URL instead of a file-path to readr::read_csv() (you tagged your question with readr so I assume you want to use that).
Assuming your files are in a vector:
files <- c("https://www.football-data.co.uk/mmz4281/1819/F1.csv",
"https://www.football-data.co.uk/mmz4281/1718/F1.csv",
"https://www.football-data.co.uk/mmz4281/1617/F1.csv",
"https://www.football-data.co.uk/mmz4281/1516/F1.csv",
"https://www.football-data.co.uk/mmz4281/1415/F1.csv",
"https://www.football-data.co.uk/mmz4281/1314/F1.csv",
"https://www.football-data.co.uk/mmz4281/1213/F1.csv",
"https://www.football-data.co.uk/mmz4281/1112/F1.csv",
"https://www.football-data.co.uk/mmz4281/1011/F1.csv")
You can use readr::read_csv to read a specific file, and combine them into one data-frame with purrr::map_dfr:
df <- purrr::map_dfr(files, readr::read_csv)
This iterates over the contents of files, applies readr::read_csv to each of those elements, and combines them into one data frame, rowwise (hence dfr).

How do I directly import a CSV file to a data frame in R?

I need to import large CSV files directly as data frames. As the read.csv() function creates a list, I have to then change the list as a data frame using as.data.frame(). This is a problem for me as the CSV files are big and storing both the list and data frame will take up a lot of memory.
Since I need to merge multiple CSV files, I think using data frames will be easier as I can join the files on common attributes(columns).
So, this is what I'm doing right now:
sample <- read.csv(sample.csv)
sample_df <- as.data.frame(sample)
I need a way to import the CSV files directly as data frames.

Renaming elements of a nested list with dynamic number of items in R

I just want to preface this question by saying I am a novice in both R (and any coding in general) and stack overflow. I apologize if I am unclear in relaying my question or haven't provided enough details.
I've set up my R script currently to import all the csv files contained in a directory and read them all into a single list of lists with the following:
temp <- list.files(pattern="*.csv")
myfiles <- lapply(temp, read.csv)
So if I had 4 csv files in the working directory, it would create a list with 4 nested lists within it. These nested lists are named as a number in the order they are read (1 to 4 for 4 csv files imported), but I want them to retain the original names of the csv files. Is there a way to change the way I am reading the csv files into R so that they retain the original file names of the csv?
I am able to change the names of the list elements manually with the following, since temp holds all the csv file names:
names(myfiles) <- c(temp[1], temp[2], temp[3], temp[4])
However, this would only work given that I only have 4 csv files within the directory. I was not able to figure out how to write this code so that it works for a dynamic number of csv files.
I have tried to make a for loop along the lines of:
for (i in 1:length(temp)) {
names(myfiles) <- c(temp[i])
}
but I wasn't able to figure out how to make the vector c() encapsulate a dynamic number of list elements. I would appreciate any help!

Convert columns into rows when importing .csv

I'm looking for an efficient way to use read.csv (or an alternative) when reading a .csv file that has 100,000s columns and virtually a single row.
The file with this structure is extracted from MATLAB which seems to prefer to add millions of rows rather than columns. When opening the file in Excel it does not load completely so I cannot simply transpose.
The following works in r, still slowly, but I'm wondering if there is a better way?
library(data.table)
dfr <- as.data.frame(t(fread('filename.csv')))
If there is only a single row, we can read it with scan and convert to data.frame
data.frame(Col=scan('filename.csv', skip=1, what=numeric(), sep=','))

How do I merge the headers from one csv file with another csv file in R?

What I'm trying to ask is, how would I use the headers from one csv as the headers for another csv file? It would kind of be like a merge, except the first csv file is JUST headers, and the second csv file has JUST data
Something as simple as this will work
dn <- read.csv("d-names.txt")
dd <- read.csv("d-data.txt",header=FALSE)
names(dd)<-names(dn)
Just assign the names from one data.frame to the other. Just make sure the files have exactly the same number of columns.

Resources