import multiple csv from web into one data frame - r

i want to read out several csv files from the web and save the data into a data frame.
If the files were on my computer this would be very easy as I have seen but I don't always want to download the files.
The example:
"https://www.football-data.co.uk/mmz4281/1819/F1.csv",
"https://www.football-data.co.uk/mmz4281/1718/F1.csv",
"https://www.football-data.co.uk/mmz4281/1617/F1.csv",
"https://www.football-data.co.uk/mmz4281/1516/F1.csv",
"https://www.football-data.co.uk/mmz4281/1415/F1.csv",
"https://www.football-data.co.uk/mmz4281/1314/F1.csv",
"https://www.football-data.co.uk/mmz4281/1213/F1.csv",
"https://www.football-data.co.uk/mmz4281/1112/F1.csv",
"https://www.football-data.co.uk/mmz4281/1011/F1.csv"
These are the CSV files. maybe its possible with a function or a loop but i dont know how.
Maybe you can help me.
Greetings

Reading files from the web is just as easy as reading them from your file system; you can just pass a URL instead of a file-path to readr::read_csv() (you tagged your question with readr so I assume you want to use that).
Assuming your files are in a vector:
files <- c("https://www.football-data.co.uk/mmz4281/1819/F1.csv",
"https://www.football-data.co.uk/mmz4281/1718/F1.csv",
"https://www.football-data.co.uk/mmz4281/1617/F1.csv",
"https://www.football-data.co.uk/mmz4281/1516/F1.csv",
"https://www.football-data.co.uk/mmz4281/1415/F1.csv",
"https://www.football-data.co.uk/mmz4281/1314/F1.csv",
"https://www.football-data.co.uk/mmz4281/1213/F1.csv",
"https://www.football-data.co.uk/mmz4281/1112/F1.csv",
"https://www.football-data.co.uk/mmz4281/1011/F1.csv")
You can use readr::read_csv to read a specific file, and combine them into one data-frame with purrr::map_dfr:
df <- purrr::map_dfr(files, readr::read_csv)
This iterates over the contents of files, applies readr::read_csv to each of those elements, and combines them into one data frame, rowwise (hence dfr).

Related

When I upload my excel file to R, the column titles are in the rows and the data seems all jumbled. How do I fix this?

hi literally day one new coder
On the excel sheet, my data looks organized, but when I upload my file to R, it's not able to read the excel properly and the column headers are in the rows and the data seems randomized.
So far I have tried:
library(readxl)
dataset <-read_excel("pathname")
View(dataset)
Also tried:
dataset <-read_excel("pathname", sheet=1, colNames=TRUE)
Also tried to use the package openxlsx
but nothing is giving me the correct, organized data set.
I tried formatting my Excel to a CSV file, and the CSV file looks exactly like the data that shows up on R (both are messed up).
How should I approach this problem?
I deal with importing .xlsx into R frequently. It can be challenging due to the flexibility of the excel platform. I generally use readxl::read_xlsx() to fetch data from .xlsx files. My suggestions:
First, specify exactly the data you want to import with the range argument.
A cell range to read from, as described in cell-specification. Includes typical Excel
ranges like "B3:D87", possibly including the sheet name like "Budget!B2:G14"
Second, if there are there merged cells or other formatting challenges in column headers, I resort to setting col_names = FALSE. And supplying clean names after import with names(df) <- c("first_col", "second_col")
Third, if there are merged cells elsewhere in the spreadsheet I generally I resort to "fixing" them in excel (not ideal but easier for my use case), however, others may have suggestions on a programmatic fix.
It may be helpful to provide a screenshot of your spreadsheet.

How to keep style format unchanged after writing data using openxlsx in R

I am using openxlsx in order to write the outputs of my data.
I have used the following code to read my data using readxl.
df1=read_excel("C:/my_data.xlsx",skip=2);
Now I want to write the output and keep the original Excel file using any possible package. I have used the following codes, but it does not keep the original Excel file. Can we do it it in R packages?
write.xlsx(df1, 'C:/mydata.xlsx',skip=2)
Given your code, you should nhave two different data files in your working directory:
"my_data.xlsx" (the one that you loaded), and "mydata.xlsx" (the one that you created through R). R shouldn't overwrite your files if you give them different names.
If there's only one file, are you sure you didn't use the same name for both files? If so, then everything should work fine if you give the files different names (e.g. "my_file1.xlsx" and "my_file2.xlsx")!
Also, in general, it's a good idea to give data files an informative name so that you don't accidentally delete/overwrite files that you need. For example, if the original excel data is you raw data, consider naming it "data_raw.xlsx", and make sure that you only read it, and whenever you make some changes to it, save it under a different name (e.g. "data_processed1.xlsx").
You can also save data files in the native R format .rds using the save_rds() function, this is especially helpful if you want to keep special attributes of variables such as factors, etc...
Hope this helps!

Is there a way to read multiple excel files into R, but only up to a certain creation date? (Note: Date does not exist within the actual excel files.)

I have multiple excel files in multiple directories that I am reading into R. However, I don't want to read in EVERY excel file; I only want to read in the most recent ones (for example, only the ones created in the last month). Is there a way to do this?
Currently I am using this to read in all of the excel files, which is working just fine:
filenames <- Sys.glob(file.path('(name of dir)', "19*", "Electrode*02.xlsx")) <br>
elecsheet <- do.call("cbind", lapply(filenames, read_excel))
Somewhere in this second line of code (I think), I need to tell R to look at the metadata and only read in the excel files that have been created since a certain date.
Thank you!

How to put data frame in R including count of complete cases in separate files

I'm a new student at R. I have a directory containing EXCEL files and I need to make a summary in a data frame with complete cases in each file. How can I do this. I tried the following code buwt doesn't work. Appreciate your support
Always begin with the steps required. You will need to do the following:
Read in your data
Clean up your data
Since you do not have any code shown, I will provide you with pseudo code.
library(readxl)
df <- read_xls(path, other options)
df <- complete.cases(df)
You'll want to do that for all of your files. You can use lapply once you are more advanced, and loop over your list.files() list of excel files.

How to handle parsing hundreds files in R?

I want to parse last year weather data which are recorded in CSV files. Each CSV file includes one day of data. So I have 365 CSV files need to parse. What is the best way to handle these files? As far as I know, I need to load all of them into R and bind them into one big data frame. But I don't know whether this is the best solution. What if I have more than one years of data files? Do I need to load all of them into memory? Or is there any other way to handle them?
Each file is about 1M to 1.5M.
The easiest way to do this is to get all your files to read using list.files, read them into a list of data frames, then rbind all the frames together:
#setwd('dirwithallmycsvs')
x <- list.files(pattern = '.+\\.csv$')
out = lapply(x, read.csv)
out2 = do.call(rbind, out)
Your output should now be one dataframe. You will need to take care all the columns are the same across your files.

Resources