Downloading Zipped File from URL in R - r

I'm trying to download all the zipped csv file from the following pg: http://mis.ercot.com/misapp/GetReports.do?reportTypeId=12301&reportTitle=Settlement%20Point%20Prices%20at%20Resource%20Nodes,%20Hubs%20and%20Load%20Zones&showHTMLView=&mimicKey
I've started by trying to download one file as an example before I move on to downloading multiple. This site contains prices from specific locations in Texas - interesting, given recent power outages due to cold weather in Texas.
url <- "http://mis.ercot.com/misapp/GetReports.do?reportTypeId=12301&reportTitle=Settlement%20Point%20Prices%20at%20Resource%20Nodes,%20Hubs%20and%20Load%20Zones&showHTMLView=&mimicKey/cdr.00012301.0000000000000000.20210220.141704636.SPPHLZNP6905_20210220_1415_csv.zip"
temp <- tempfile()
download.file(url,temp, mode = "wb")
data <- read.csv(unzip(temp, "cdr.00012301.0000000000000000.20210220.141704.SPPHLZNP6905_20210220_1415.csv"))
unlink(temp)
Keep receiving the following error message: "error 1 in extracting from zip file."
I'm relatively new to R, so any advice would be helpful.
Edit: If the link above doesn't work, another way to get to the link is following this: http://www.ercot.com/mktinfo/rtm and going to "Real-Time Price Reports" and selecting the last option "Settlement Point Prices at Resource Nodes, Hubs, and Load Zones." Might look a little overwhelming, but my goal for right now is just to download and open the first zipped csv file on there (and ignore all the other files on there)

Related

Download CSV file from website that changes name every month

pretty new to R and just trying to get my head round web scrapping.
So I want to download into R a CSV file (easy enough I can do that). The issue that i am having is that every month the CSV file name changes on the website, so the URL then also changes.
So is there a way in R to tell it to download the the CSV file without the exact file url?
here is the website i am practicing it on:
https://www.police.uk/pu/your-area/police-scotland/performance/999-data-performance/
Thanks :)
So tried the basic stuff like below, issue is the url will change when the file is updated.
dat <- read.csv(
"https://www.police.uk/contentassets/069a2c11fcb444bbbeb519f69875577e/2022/sept/999-data-nov-21---sep-22.csv"
, header = T)
view(dat)

xarray.open_dataset will not open all netcdf files in one folder

I have a folder containing many netcdf files for individual station data (about 8,000 files total). When I attempt to read in all of these files using xarray.open_mfdataset, I get an error that reads "OSError: no files to open." I am using the code below:
station_data = xr.open_mfdataset('~/Data/station_data/all_stations/*.nc')
In contrast, I do not have any issue opening individual station data (one file) using xarray.open_dataset, as below:
station1 = xr.open_dataset('~/Data/station_data/all_stations/hadisd.3.1.0.2019f_19310101-20200101_010010-99999.nc')
I have been playing with other ways to express the path, with no luck. Any suggestions on how to properly read in all station data at once are appreciated!

Importing to R an Excel file saved as web-page

I would like to open an Excel file saved as webpage using R and I keep getting error messages.
The desired steps are:
1) Upload the file into RStudio
2) Change the format into a data frame / tibble
3) Save the file as an xls
The message I get when I open the file in Excel is that the file format (excel webpage format) and extension format (xls) differ. I have tried the steps in this answer, but to no avail. I would be grateful for any help!
I don't expect anybody will be able to give you a definitive answer without a link to the actual file. The complication is that many services will write files as .xls or .xlsx without them being valid Excel format. This is done because Excel is so common and some non-technical people feel more confident working with Excel files than a csv file. Now, the files will have been stored in a format that Excel can deal with (hence your warning message), but R's libraries are more strict and don't see the actual file type they were expecting, so they fail.
That said, the below steps worked for me when I last encountered this problem. A service was outputting .xls files which were actually just HTML tables saved with an .xls file extension.
1) Download the file to work with it locally. You can script this of course, e.g. with download.file(), but this step helps eliminate other errors involved in working directly with a webpage or connection.
2) Load the full file with readHTMLTable() from the XML package
library(XML)
dTemp = readHTMLTable([filename], stringsAsFactors = FALSE)
This will return a list of dataframes. Your result set will quite likely be the second element or later (see ?readHTMLTable for an example with explanation). You will probably need to experiment here and explore the list structure as it may have nested lists.
3) Extract the relevant list element, e.g.
df = dTemp[2]
You also mention writing out the final data frame as an xls file which suggests you want the old-style format. I would suggest the package WriteXLS for this purpose.
I seriously doubt Excel is 'saved as a web page'. I'm pretty sure the file just sits on a server and all you have to do is go fetch it. Some kind of files (In particular Excel and h5) are binary rather than text files. This needs an added setting to warn R that it is a binary file and should be handled appropriately.
myurl <- "http://127.0.0.1/imaginary/file.xlsx"
download.file(url=myurl, destfile="localcopy.xlsx", mode="wb")
or, for use downloader, and ty something like this.
myurl <- "http://127.0.0.1/imaginary/file.xlsx"
download(myurl, destfile="localcopy.csv", mode="wb")

Downloading NetCDF files with R

I am a beginner in R.
I am trying to download NetCDF4 files from the NASA subset wizzard (https://disc.gsfc.nasa.gov/SSW/#keywords=TRMM_3B42RT_Daily) - in my case, I am looking for TRMM 3B42 precipitation data for South Africa. I will need to download thousands of datasets and work with them in raster format in R and therefore want to use the URL's provided by the subset wizzard. e.g.:
http://disc2.gesdisc.eosdis.nasa.gov/opendap/TRMM_RT/TRMM_3B42RT_Daily.
7/2016/10/3B42RT_Daily.20161001.7.nc4.nc4?precipitation[777:867][99:173],
precipitation_cnt[777:867][99:173],uncal_precipitation_cnt[777:867][99:173],
lat[99:173],lon[777:867]
I have tried
url1 <- "http://.."
dest <- "C:\\Users\\User\\Documents\\3_UP\\2016_Masters\\
Dissertation\\Satellite Data\\TRMM Precipitation\\TRMM 3B42 Daily RT\\Try.nc4.nc4"
download.file(url=url1, destfile=name, mode="wb")
And here I receive an error message "cannot open URL" and additional Warning Messages ending with "HTTP status was '401 Unauthorized'.
Which led me to suspect that the browser needed some login details. In Chrome- the URL works as is. in Internet Explorer (R's default) I typed in my username and password once and after that - the URL also works as is.
However, no success with R functions. The errors remain. I have tried to use other R packages and functions - however, this is my first time trying something like this and I am not seeing the light yet. e.g. most RCurl functions see illegal characters found in the URL.
Does anyone have more experience in working with these specific files ?
best regards
Marion

Read a zipped .csv file in R

I have been trying hard to solve this, but I cannot get my head around how to read zipped .csv files in R. I could first unzip the files and then read them, but since the amount of unzipped data is around 22GB, I guess it is more practical to handle zipped files.
I basically have many .csv files, which I ZIPPED ONE BY ONE into single .7z files. Every file is named like: file1.csv, file2.csv, etc., which zipped became respectively: file1.csv.7z, file2.csv.7z, etc.
If I use the following command:
data <- read.table(unz("substn-20100101.csv.7z", "substn-20100101.csv"), nrows=10, header=T, quote="\"", sep=",")
I get the message:
Error in open.connection(file, "rt") : cannot open the connection
In addition: Warning message:
In open.connection(file, "rt") : cannot open zip file 'substn-20100101.7z'
Any help would be much appreciated, thank you in advance.
First of all if your problem is RAM, as you said each file has 22G, using compressed files won't resolve your problems. After read.table, for example, all file will be loaded in memory. If you are using these files to some kind of modeling i advise you to look at ff and bigmemory packages.
Another solution is use Revolutions R that has an academic licence and you can use for free. Revolutions R provides Big Data capabilities and you can manage this files easily with packages like revoscaleR.
Even another solution is using Postgres + MADLib + PivotalR. After ingesting data at Postgres, use PivotalR package to access that data and do models with MADLib library, directly from R console.
BUT, if you are planing something that be done with chunks of data, summary for example, you can use the package iterators. I will provide an use case to show how this can be done. Get Airlines data, 1988, and follow this code:
> install.packages('iterators')
> library(iterators)
> con <- bzfile('1988.csv.bz2', 'r')
OK, now you have a connection to your file. Let's create an iterator:
> it <- ireadLines(con, n=1) ## read just one line from the connection (n=1)
Just to test:
> nextElem(it)
and you will see something like:
1 "1988,1,9,6,1348,1331,1458,1435,PI,942,NA,70,64,NA,23,17,SYR,BWI,273,NA,NA,0,NA,0,NA,NA,NA,NA,NA"
> nextElem(it)
and you will see the next line, and so on. Be aware that you are reading a line at a time, so you are not loading all the file to RAM.
If you want to read line by line till the end of the file you can use
> tryCatch(expr=nextElem(it), error=function(e) return(FALSE))
for example. When the file ends it return a logical FALSE.
If I understand the question correctly, at least on Windows OS, you could use 7-Zip Command-Line.
For the sake of simplicity put 7za.exe in your R working directory (and your 7zip files), create .bat file with the following text in it:
"7za e *.7z -y"
...than in R you run the following code:
my_batch <- "your_bat_file_name.bat"
shell.exec(shQuote(paste(my_batch), type = "cmd"))
Than you just read.table()...
It works for me.
According to the readr package documentation, readr::read_csv and fellows will automatically unzip files ending in .gz, .bz2, .xz, or .zip. Although .7z is not mentioned, perhaps a solution is to change to one of these compression formats and then use readr (which also offers a number of other benefits). If your data is compressed with zip, your code would be:
library(readr)
data <- read_csv("substn-20100101.csv.zip", n_max=10)

Resources