Download and import file from web directly into R environment - r

I was trying to import a hotel reviews dataset into R from website. How do we do this one line of code, without manually downloading it and then importing it using read.csv type functions?
https://data.world/datafiniti/hotel-reviews/workspace/file?filename=Datafiniti_Hotel_Reviews.csv
Clicking on the above link doesn't directly prompt you to download. I tried using the URL function within read.csv().
Thanks for your help.
Rahman

you need to go to the "share url" in Data World to get the proper link
url2<-"https://query.data.world/s/hvrhbuqej6z2wdlmga4vtpsxx32ig4"
download.file(url2, destfile = "./Data.csv",cacheOK=TRUE)
Data<-read.csv("./Data.csv",header=T,stringsAsFactors = FALSE)

Similar to the answer above, but you can also f you click on the Download button in data.world and click on Share URL, you can copy a one-liner for R to load that table directly from the signed URL:

Related

Download CSV file from website that changes name every month

pretty new to R and just trying to get my head round web scrapping.
So I want to download into R a CSV file (easy enough I can do that). The issue that i am having is that every month the CSV file name changes on the website, so the URL then also changes.
So is there a way in R to tell it to download the the CSV file without the exact file url?
here is the website i am practicing it on:
https://www.police.uk/pu/your-area/police-scotland/performance/999-data-performance/
Thanks :)
So tried the basic stuff like below, issue is the url will change when the file is updated.
dat <- read.csv(
"https://www.police.uk/contentassets/069a2c11fcb444bbbeb519f69875577e/2022/sept/999-data-nov-21---sep-22.csv"
, header = T)
view(dat)

Is there a way I can specify and get data from a web site URL on to a xls file for analysis using R?

Right now I have to inport a file on Rstudio using a URL, I tried to inport it like I do for a cvs file (read.csv...), but it doesn't work. Can someone please tell me how can I do it?

Using R to download a file without a download link

I try to download a csv file programmatically on a webpage.
Here is the URL.
I tried to use download.file() in R, but there is no link for the csv file.
Can I use R to click the 'CSV' button on top of the webpage to trigger the download process?
You could also directly read the table on the page instead of trying to download the csv:
library(rvest)
url<-"https://e-service.cwb.gov.tw/HistoryDataQuery/MonthDataController.do?command=viewMain&station=466920&stname=%25E8%2587%25BA%25E5%258C%2597&datepicker=2017-02"
session <- html_session(url)
data <- html_table(session)
head(data[[2]])

Loading CSV File in Google Colaboratory

HI I am using Jyputer Notebook Colaboratory
I am writing in R
How can I load the CSV File as r code
Regards
I am not familiar with R, but the instructions should be relatively similar to Python.
This article is very useful: https://towardsdatascience.com/3-ways-to-load-csv-files-into-colab-7c14fcbdcb92
If none of the methods in the article suit your needs, here is another method that involves mounting your Google Drive to Collaboratory:
Go to Google Colab and type:
from google.colab import drive
drive.mount('/content/gdrive')
Run the code block (Ctrl + Enter), click on the link, sign in to your Google account, and copy the authorization code and paste it into the output of the code block in Google Colab.
The files in your Google Drive should be under the 'Files' tab when you open the left toolbar (by clicking the small arrow on the left side of the screen).
When you want to load the CSV in your code, enter this line of code (where loadCSV is the variable name, and the part after the = sign is the directory of the file):
loadCSV = "gdrive/My Drive/dataset.csv"
I upload the spreadsheet to google drive, make it published on the web, and use the link:
url<-"https://docs.google.com/spreadsheets/AAAAAAAAAAAAA"
library(curl)
download.file(url, destfile = "./Data.csv",cacheOK=TRUE)
Data1<-read.csv("./Data.csv",header=T,stringsAsFactors = FALSE)
Where the "https://docs.google.com/spreadsheets/AAAAAAAAAAAAA" is the weblink generated by google drive for the spreadsheet but selecting just one Sheet as csv.
this should work:
x = read.csv(filepath)

Downloading MCX data using R

I'm trying to download data on margin requirements from the MCX website using R
However, I am unable to recognise the appropriate url to use in order to download this data.
The link is here
files for different dates have seemingly different urls
for instance:
DailyMargin_20170919223427.csv
DailyMargin_20170919223104.csv
DailyMargin_20170919223039.csv
They seem to be of the form
DailyMargin_2017091922****.csv
(20170919 is the date on which I'm trying to download the data)
My code has the line:
myURL = paste("https://www.mcxindia.com/market-operations/clearing-settlement/daily-margin", "DailyMargin_2017091922","****", ".csv", sep = "")
the ****** part seems to be random.
From what I can tell, the remaining **** appears to be a timestamp when the data are created by the webpage using javascript. You will probably not be able to directly download data as it will not exist until it is created. That said, you might be able to utilize a package like rvest to do the scrapping for you.
https://stat4701.github.io/edav/2015/04/02/rvest_tutorial/

Resources