Automating downloading of csv using R

Automating downloading of csv using R - r

The UK Charity Commission has a site from which you can download charity data for a specified category e.g. dementia
http://beta.charitycommission.gov.uk/charity-search/?q=dementia
When hovering over the Export Button, the link URL is shown at the bottom with the search item and number of charities included
The actual file downloaded is named
"charitydetails_2017_06_14_23_57_17.csv" so no mention of category/number but with a date-stamp instead
I have tried
library(readr)
df <- read_csv("http://beta.charitycommission.gov.uk/charity-search/?q=dementia&exportCSV=317.csv")
but just get 404 error
Is there anyways in R that I can automate this so that entering a different term e.g. blind in, say, a shiny app would download the correct dataset into R for processing?
TIA

You used an incorrect URL. The correct one is:
http://beta.charitycommission.gov.uk/charity-search/?q=dementia&exportCSV=1&p=317

Related

Downloading Zipped File from URL in R

I'm trying to download all the zipped csv file from the following pg: http://mis.ercot.com/misapp/GetReports.do?reportTypeId=12301&reportTitle=Settlement%20Point%20Prices%20at%20Resource%20Nodes,%20Hubs%20and%20Load%20Zones&showHTMLView=&mimicKey
I've started by trying to download one file as an example before I move on to downloading multiple. This site contains prices from specific locations in Texas - interesting, given recent power outages due to cold weather in Texas.
url <- "http://mis.ercot.com/misapp/GetReports.do?reportTypeId=12301&reportTitle=Settlement%20Point%20Prices%20at%20Resource%20Nodes,%20Hubs%20and%20Load%20Zones&showHTMLView=&mimicKey/cdr.00012301.0000000000000000.20210220.141704636.SPPHLZNP6905_20210220_1415_csv.zip"
temp <- tempfile()
download.file(url,temp, mode = "wb")
data <- read.csv(unzip(temp, "cdr.00012301.0000000000000000.20210220.141704.SPPHLZNP6905_20210220_1415.csv"))
unlink(temp)
Keep receiving the following error message: "error 1 in extracting from zip file."
I'm relatively new to R, so any advice would be helpful.
Edit: If the link above doesn't work, another way to get to the link is following this: http://www.ercot.com/mktinfo/rtm and going to "Real-Time Price Reports" and selecting the last option "Settlement Point Prices at Resource Nodes, Hubs, and Load Zones." Might look a little overwhelming, but my goal for right now is just to download and open the first zipped csv file on there (and ignore all the other files on there)

Downloading NetCDF files with R

I am a beginner in R.
I am trying to download NetCDF4 files from the NASA subset wizzard (https://disc.gsfc.nasa.gov/SSW/#keywords=TRMM_3B42RT_Daily) - in my case, I am looking for TRMM 3B42 precipitation data for South Africa. I will need to download thousands of datasets and work with them in raster format in R and therefore want to use the URL's provided by the subset wizzard. e.g.:
http://disc2.gesdisc.eosdis.nasa.gov/opendap/TRMM_RT/TRMM_3B42RT_Daily.
7/2016/10/3B42RT_Daily.20161001.7.nc4.nc4?precipitation[777:867][99:173],
precipitation_cnt[777:867][99:173],uncal_precipitation_cnt[777:867][99:173],
lat[99:173],lon[777:867]
I have tried
url1 <- "http://.."
dest <- "C:\\Users\\User\\Documents\\3_UP\\2016_Masters\\
Dissertation\\Satellite Data\\TRMM Precipitation\\TRMM 3B42 Daily RT\\Try.nc4.nc4"
download.file(url=url1, destfile=name, mode="wb")
And here I receive an error message "cannot open URL" and additional Warning Messages ending with "HTTP status was '401 Unauthorized'.
Which led me to suspect that the browser needed some login details. In Chrome- the URL works as is. in Internet Explorer (R's default) I typed in my username and password once and after that - the URL also works as is.
However, no success with R functions. The errors remain. I have tried to use other R packages and functions - however, this is my first time trying something like this and I am not seeing the light yet. e.g. most RCurl functions see illegal characters found in the URL.
Does anyone have more experience in working with these specific files ?
best regards
Marion

Download .txt from URL string in R

I am working with the EDGAR package in R to download Apple's 2005 Annual Report. This is my code to get that far:
library(edgar)
getMasterIndex(2005)
aapl<-getFilings(2005,320193, '10-K')
This was my output when I did that:
> aapl<-getFilings(2005,320193, '10-K')
Total number of filings to be downloaded=1. Do you want to download (yes/no)? yes
> aapl
Link Status
1 https://www.sec.gov/Archives/edgar/data/320193/0001104659-05-058421.txt
Download success
To me this looks like I just retrieved the URL to this particular document, I did not actually download the text file.
My next step I imagine would be to download the file based on URL. I thought doing a download.file using AAPL as my URL argument would work but I must be missing something.
Thoughts on how to download the full doc based on the URL? Thank you

How to investigate 5MB+ datasets in RStudio's source editor?

My question:
Can I change the parameters in R to use the source editor to also view >5MB data sets in R?
If not, what is your advice?
Background:
I recently stopped looking at data in Excel and switched to R entirely. As I did in Excel and still prefer to do in R, I like to look at the entire frame and then decide on filters.
Problem: Working with the World Development Indicators (WDI) data set which is over 100MB, opening it in the source editor does not work. View(df) opens an empty tab in RStudio as also shown below:
R threw another error when I selected the data set from the Files Tab in column on the right of RStudio which read:
The selected file 'wdi.csv' is too large to open in the source editor (the file is 104.5 MB and the maximum file size is 5MB).
Solutions?
My alter ego would tell me to increase the threshold of datasets' file size for the source editor, so I could investigate it there. In brief: change 5 to 200 MB. My alter ego would also tell me that I would probably encounter performance issues (since I am using a MacAir).
How I resolved the issue:
I used head() and dplyr's glimpse() to get a better idea, but ended up looking at the wdi matrix in excel and then filtered it out in R. Newly created dataframes could be opened in the source editor without any problems.
Thanks in advance!

Retrieving GWAS information with R

I am trying to get specific disease-related information from the GWAS catalog. This can be done directly from the website via a spreadsheet download. But I was wondering if I could possibly do it programmatically in R. Any suggestions will be greatly appreciated.
Thanks.
Avoks

Checkout the function download.file() and the package rcurl (http://cran.r-project.org/web/packages/RCurl/index.html) - this should do what you are looking for

You will have to download .tsv file(s) first and manually edit them.
This is because GWAS Catalog files contain HTML symbols, like &#x000A7 in "Behçet's disease" (defining that special fourth letter). The # in these symbols will be interpreted by R as an end of line, thus you will get an error message, e.g.:
line 2028 did not have 34 elements
So you downlad it first, open in plain text editor, automatically replace every # with empty character, and only then load it into R with:
read.table("gwas_catalog_v1.0-associations_e91_r2018-02-21.tsv",sep="\t",h=T,stringsAsFactors = F,quote="")

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Automating downloading of csv using R - r

You used an incorrect URL. The correct one is: http://beta.charitycommission.gov.uk/charity-search/?q=dementia&exportCSV=1&p=317

Related

Downloading Zipped File from URL in R

Downloading NetCDF files with R

Download .txt from URL string in R

How to investigate 5MB+ datasets in RStudio's source editor?

Retrieving GWAS information with R

Categories

Resources