How import excel file from the browser - r

I want to use GET() function from httr package, because this is just an example file and in the original file I need to write in user name and password i.e.
library(httr)
filename<-"filename_in_url.xls"
URL <- "originalurl"
GET(URL, authenticate("usr", "pwd"), write_disk(paste0("C:/Temp/temp/",filename), overwrite = TRUE))
As a test, I tried to import one of the files from I want to import one of the files from https://www.nordpoolgroup.com/historical-market-data/ and do not save it to the disk, but save it to the environment in order to see the data. However, it also does not work.
library(XML)
library(RCurl)
excel <- readHTMLTable(htmlTreeParse(getURL(paste("https://www.nordpoolgroup.com/4a4c6b/globalassets/marketdata-excel-files/elspot-prices_2021_hourly_eur.xls")), useInternalNodes=TRUE))[[1]]
Or if there are other ways how to import data (functions where login information can be as an input)m it will be great to see them

Related

Web scrape excel file in different date

I'm a newbie to beautiful soup. Can anyone suggest how to scrape the excel file for the past 14 days? My understanding is to loop over the date and save the file. Thanks
https://www.hkexnews.hk/reports/sharerepur/sbn.asp
import requests
from bs4 import BeautifulSoup
res=requests.get("https://www.hkexnews.hk/reports/sharerepur/sbn.asp")
soup=BeautifulSoup(res.text,"lxml")
Now we will find data inside table using find method and use find_all to get all td tags and append data to list lst.
main_data=soup.find("table").find_all("td")
lst=[]
for data in main_data:
try:
url=data.find("a").get('href')[1:]
main_url="https://www.hkexnews.hk/reports/sharerepur"+url
lst.append(main_url)
except AttributeError:
pass
Now iterate through lst and call individual URL to download data to excel file.
for url in range(len(lst)):
resp=requests.get(lst[url])
output = open(f'test_{url}.xls', 'wb')
output.write(resp.content)
output.close()
print(url)
Image: (File being created in Local)

Reading csv into Rstudio from google cloud storage

I want to read csv file from google cloud storage with a function similar to
read.csv.
I used library googleCloudStorageR and I can't find a function for that. I don't want to download it, I just want to read it in environment like a data frame.
If you download a .csv file, googleCloudStorageR will by default put it into a data.frame for you via write.csv - you can turn off the behaviour by specifying saveToDisk
# will make a data.frame
gcs_get_object("mtcars.csv")
# save to disk as a CSV
gcs_get_object("mtcars.csv", saveToDisk = "mtcars.csv")
You can specify your own parse function by supplying it via parseFunction
## default gives a warning about missing column name.
## custom parse function to suppress warning
f <- function(object){
suppressWarnings(httr::content(object, encoding = "UTF-8"))
}
## get mtcars csv with custom parse function.
gcs_get_object("mtcars.csv", parseFunction = f)
I’ve tried running a sample csv file with the as.data.frame() function.
In order to run this code snippet make sure you install (install.packages("data.table")) and included the library library(“data.table”)
Also be sure that you include the fread() within the as.data.frame() function in order to read the file from it’s location.
Here is the code snippet I ran and managed to display the data frame for my data set:
library(“data.table”)
MyData <- as.data.frame(fread(file="$FILE_PATH",header=TRUE, sep = ','))
print(MyData)
Reading Data with TensorFlow:
There is one other way you can read a csv from your cloud storage with the TensorFlow API. I would assume you are accessing this data from a bucket? Firstly, you would need to install the “readr” and “cloudml” packages for these functionalities to work. Then you would need to use gs_data_dir(“gs://your-bucket-name”) along with specifying the file path file.path(data_dir, “something.csv”). You would then want to read data from the file path with read_csv(file.path(data_dir, “something.csv”)). If you want it formatted as a data frame it should look something like this.
library(“data.table”)
library(cloudml)
library(readr)
data_dir <- gs_data_dir(“gs://your-bucket-name”)
MyData <- as.data.frame(read_csv(file.path(data_dir, “something.csv”)))
print(MyData)
Make sure you have properly authenticated access to your storage
More information in this link

Import excel from Azure blob using R

I have the basic setup done following the link below:
http://htmlpreview.github.io/?https://github.com/Microsoft/AzureSMR/blob/master/inst/doc/tutorial.html
There is a method 'azureGetBlob' which allows you to retrieve objects from the containers. however, it seems to only allow "raw" and "text" format which is not very useful for excel. I've tested the connections and etc, I can retrieve .txt / .csv files but not .xlsx files.
Does anyone know any workaround for this?
Thanks
Does anyone know any workaround for this?
There is no file type on the azure blob storage, it is just a blob name. The extension type is known for OS. If we want to open the excel file in the r, we could use the 3rd library to do that such as readXl.
Work around:
You could use the get blob api to download the blob file to local path then use readXl to read the file. We also get could more demo code from this link.
# install
install.packages("readxl")
# Loading
library("readxl")
# xls files
my_data <- read_excel("my_file.xls")
# xlsx files
my_data <- read_excel("my_file.xlsx")
Solved with the following code. Basically, read the file in byte then wrote the file to disk then read it into R
excel_bytes <- azureGetBlob(sc, storageAccount = "accountname", container = "containername", blob=blob_name, type="raw")
q <- tempfile()
f <- file(q, 'wb')
writeBin(excel_bytes, f)
close(f)
result <- read.xlsx(q, sheetIndex = sheetIndex)
unlink(q)

Import RDS file from github into R Windows

I am trying to import a RDS file into RStudio in Windows, I tried following this example, which is for Rdata, and I tried both methods:
Method 1:
githubURL <- ("https://github.com/derek-corcoran-barrios/LastBat/blob/master/best2.My.Lu2.rds")
BestMyyu <- readRDS(url(githubURL))
Method 2:
githubURL <- ("https://github.com/derek-corcoran-barrios/LastBat/blob/master/best2.My.Lu2.rds")
download.file(githubURL,"best2.My.Lu2.rds")
BestMyyu <- readRDS("best2.My.Lu2.rds")
I've looked for other threads and I have not found any other example
In 2nd method you just need to add method="curl" and also change the url to point to raw (Download link on the page)
githubURL <- ("https://raw.githubusercontent.com/derek-corcoran-barrios/LastBat/master/best2.My.Lu2.rds")
download.file(githubURL,"best2.My.Lu2.rds", method="curl")
BestMyyu <- readRDS("best2.My.Lu2.rds")
If you don't have curl installed, you can get it from here

How to import accented public TSV Google Spreadsheet data into R

If I try to import a public spreadsheet like this example into R:
using:
library(httr)
url <- "https://docs.google.com/spreadsheets/d/1qIOv7MlpQAuBBgzV9SeP3gu0jCyKkKZapPrZHD7DUyQ/pub?gid=0&single=true&output=tsv"
GET(url)
I get the wrong accented words, as you can see in this picture:
How can I get the right encode?
I know I can use googlesheets package, but for public data I prefer to work with direct download, so I don't have to handle user login authentication and token refresh.
I don't know why httr::GET do not work, but this works:
data <- utils::read.csv(url, header=TRUE, sep="\t", stringsAsFactors=FALSE)
If you have a *nix operating system you could use
curl -o data.tsv 'https://docs.google.com/spreadsheets/d/1qIOv7MlpQAuBBgzV9SeP3gu0jCyKkKZapPrZHD7DUyQ//pub?gid=0&single=true&output=tsv'

Resources