Merging Two csv files with different column lengths with R - r

I am trying to merge two files in R in an attempt to compute the correlation. One file is located at http://richardtwatson.com/data/SolarRadiationAthens.csv and the other at http://richardtwatson.com/data/electricityprices.csv
Currently my code looks as follows:
library(dplyr)
data1<-read.csv("C:/Users/nldru/Downloads/SolarRadiationAthens.csv")
data2 <- read.csv("C:/Users/nldru/Downloads/electricityprices.csv")
n <- merge(data1,data2)
I have the code stored locally on my computer just for ease of access. The files are being read in properly, but for some reason when I merge, variable n receives no data, just the headers of the columns from the csv files. I have experimented with using inner_join to no avail as well as pulling the files directly from the http address linked above and using read_delim() commands but can't seem to get it to work. Any help or tips are much appreciated.

Related

Issue with writing Parquet Files via Arrow Package in R

Just wondering if there's a difference in the read/write parquet function from the arrow package in R when running in Windows vs Linux OS?
Example code(insert anything in dataframe):
mydata = data.frame(...)
write_parquet(mydata, 'mydata.parquet')
read_parquet('mydata.parquet')
I'm noticing when this code is ran in Windows the parquet files can be read with no problems in either Windows or Linux, and returns a dataframe in R. But when the write parquet code is ran in Linux, and afterwards if I try to read these parquet files in R in Windows it does not return a dataframe but rather a grouped list (each vector in the grouped list contains the data for that respective column). Initially I tried doing a workaround with do.call(rbind...) to convert the grouped list back into a dataframe, but it does not contain any of the column names.
Please let me know if there are any ways to resolve this. Ideally I'd like to be able to write parquet files and be able to read them back into R as dataframes from either OS. For reference I'm on R4.0 on both OS.
Thanks in advance.

import multiple csv from web into one data frame

i want to read out several csv files from the web and save the data into a data frame.
If the files were on my computer this would be very easy as I have seen but I don't always want to download the files.
The example:
"https://www.football-data.co.uk/mmz4281/1819/F1.csv",
"https://www.football-data.co.uk/mmz4281/1718/F1.csv",
"https://www.football-data.co.uk/mmz4281/1617/F1.csv",
"https://www.football-data.co.uk/mmz4281/1516/F1.csv",
"https://www.football-data.co.uk/mmz4281/1415/F1.csv",
"https://www.football-data.co.uk/mmz4281/1314/F1.csv",
"https://www.football-data.co.uk/mmz4281/1213/F1.csv",
"https://www.football-data.co.uk/mmz4281/1112/F1.csv",
"https://www.football-data.co.uk/mmz4281/1011/F1.csv"
These are the CSV files. maybe its possible with a function or a loop but i dont know how.
Maybe you can help me.
Greetings
Reading files from the web is just as easy as reading them from your file system; you can just pass a URL instead of a file-path to readr::read_csv() (you tagged your question with readr so I assume you want to use that).
Assuming your files are in a vector:
files <- c("https://www.football-data.co.uk/mmz4281/1819/F1.csv",
"https://www.football-data.co.uk/mmz4281/1718/F1.csv",
"https://www.football-data.co.uk/mmz4281/1617/F1.csv",
"https://www.football-data.co.uk/mmz4281/1516/F1.csv",
"https://www.football-data.co.uk/mmz4281/1415/F1.csv",
"https://www.football-data.co.uk/mmz4281/1314/F1.csv",
"https://www.football-data.co.uk/mmz4281/1213/F1.csv",
"https://www.football-data.co.uk/mmz4281/1112/F1.csv",
"https://www.football-data.co.uk/mmz4281/1011/F1.csv")
You can use readr::read_csv to read a specific file, and combine them into one data-frame with purrr::map_dfr:
df <- purrr::map_dfr(files, readr::read_csv)
This iterates over the contents of files, applies readr::read_csv to each of those elements, and combines them into one data frame, rowwise (hence dfr).

Trying to merge columns from multiple csv's, but the merged dataframe is coming up NULL

My problem seems to be two-fold. I am using code that has worked before. I re-ran my scripts and got similar outputs, but saved to a new location. I have changed all of my setwd lines accordingly. But, there may be an error with either setwd or the do.call function.
In R, I want to merge 25 csv's that are located in a folder- only certain columns
My path is
/Documents/CODE/merge_file/2sp
So, I do:
setwd("/Documents/CODE")
but then I get an error saying cannot change working directory (usually works fine). So then I manually set working directory in the Session in RStudio.
The next script seems to run fine:
myMergedData2 <-
do.call(rbind,
lapply(list.files(path = "/Documents/CODE/merge_file/2sp"),
read.csv))
myMergedData2 ends up in the global environment, but it says it is NULL (empty), though the console makes it look like everything is ok.
I would then like to save just these columns of information but I can't even get to this point.
myMergedData2<-myMergedData2[c(2:5),c(10:12)]
And then add this
myMergedData2<-myMergedData2 %>% mutate(richness = 2)%>% select(richness,
everything())
And then I would like to save
setwd("/Documents/CODE/merge_file/allsp")
write.csv(myMergedData2, "/Documents/CODE/merge_file/allsp/2sp.csv")
I am trying to merge these data so I can use ggplot 2 and show how my response variables (columns 2-5) according to my independent variables (columns 10-12). I have 25 different parameter sets with 50 observations in each csv.
Ok, so the issue was that my dropbox didn't have enough space and I weirdly don't have permissions to do what I was trying on my university's H drive. Bizarre, but easy fix with the increase in space on Dropbox to allow for complete syncing of csv's.
Sometimes the issue is minor!

How to put data frame in R including count of complete cases in separate files

I'm a new student at R. I have a directory containing EXCEL files and I need to make a summary in a data frame with complete cases in each file. How can I do this. I tried the following code buwt doesn't work. Appreciate your support
Always begin with the steps required. You will need to do the following:
Read in your data
Clean up your data
Since you do not have any code shown, I will provide you with pseudo code.
library(readxl)
df <- read_xls(path, other options)
df <- complete.cases(df)
You'll want to do that for all of your files. You can use lapply once you are more advanced, and loop over your list.files() list of excel files.

How to get R to read in files from multiple subdirectories under one large directory?

I am trying to get started with writing my first R code. I have searched for this answer but I am not quite sure what I've found is what I'm looking for exactly. I know how to get R to read in multiple files in the same subdirectory, but I'm not quite sure how to get it to read in one specific file from multiple subdirectories.
For instance, I have a main directory containing a series of trajectory replicates, each replicate is in it's own subdirectory. The break down is as follows;
"Main Dir" -> "SubDir1" -> "ReplicateDirs 1-6"
From each "ReplicateDir" I want R to pull the "RMSD.dat" table (file) to read from. All of the RMSD.dat files have identical names, they are just in different directories and contain different data of course.
I could move all the files to one folder but this doesn't seem like the most efficient way to attack this problem.
If anyone could enlighten me, I'd appreciate it.
Thanks
This should work, of course change My Dir to your directory
dat.files <- list.files(path="Main Dir",
recursive=T,
pattern="RMSD.dat"
,full.names=T)
If you want to read the files into the data set, you could use the function below:
readDatFile <- function(f) {
dat.fl <- read.csv(f) # You may have to change read.csv to match your data type
}
And apply to the list of files:
data.files <- sapply(dat.files, readDatFile)

Resources