download xlsx from link and import into r - r

I know there are a number of posts on this topic and I usually am able to accomplish what I want just fine but I'm having trouble with this one particular link. It's likely related to the non-orthodox layout of the excel file. Here's my workflow:
library(rest)
url<-"http://irandataportal.syr.edu/wp-content/uploads/3.-economic-participation-and-unemployment-rates-for-populationa-aged-10-and-overa-by-ostan-province-1380-1384-2001-2005.xlsx"
unemp <- url %>%
read.xls()
That produces an error Error in getinfo.shape(fn) : Error opening SHP file
The problem is not related to the scraping of the data. The problem arises in regards to importing the data into a usable format. For example, read.xls("file.path/file.csv") produces the same error.

For example :
library(RCurl)
download.file(url, destfile = "./file.xlsx")
use your favorite reader then,

Adding the option fileEncoding="latin1" solved my problem.
url<-"http://irandataportal.syr.edu/wp-content/uploads/3.-economic-participation-and-unemployment-rates-for-populationa-aged-10-and-overa-by-ostan-province-1380-1384-2001-2005.xlsx"
unemp <- url %>%
read.xls(fileEncoding="latin1")

Related

Down load data from HTTPS in R

I have a problem with downloading data from HTTPS in R, I try using curl, but it doesn't work.
URL <- "https://github.com/Bitakhparsa/Capstone/blob/0850c8f65f74c58e45f6cdb2fc6d966e4c160a78/Plant_1_Generation_Data.csv"
options('download.file.method'='curl')
download.file(URL, destfile = "./data.csv", method="auto")
I downloaded the CSV file with that code, but the format was changed when I checked the data. So it didn't download correctly.
Would you please someone help me?
I think you might actually have the URL wrong. I think you want:
https://raw.githubusercontent.com/Bitakhparsa/Capstone/0850c8f65f74c58e45f6cdb2fc6d966e4c160a78/Plant_1_Generation_Data.csv
Then you can download the file directly using library(RCurl) rather than creating a variable with the URL
library(RCurl)
download.file("https://raw.githubusercontent.com/Bitakhparsa/Capstone/0850c8f65f74c58e45f6cdb2fc6d966e4c160a78/Plant_1_Generation_Data.csv",destfile="./data.csv",method="libcurl")
You can also just load the file directly into R from the site using the following
URL <- "https://github.com/Bitakhparsa/Capstone/blob/0850c8f65f74c58e45f6cdb2fc6d966e4c160a78/Plant_1_Generation_Data.csv"
out <- read.csv(textConnection(URL))
You can use the 'raw.githubusercontent.com' link, i.e. in the browser, when you go to "https://github.com/Bitakhparsa/Capstone/blob/0850c8f65f74c58e45f6cdb2fc6d966e4c160a78/Plant_1_Generation_Data.csv" you can click on the link "View raw" (it's above "Sorry about that, but we can’t show files that are this big right now.") and this takes you to the actual data. You also have some minor typos.
This worked as expected for me:
url <- "https://raw.githubusercontent.com/Bitakhparsa/Capstone/0850c8f65f74c58e45f6cdb2fc6d966e4c160a78/Plant_1_Generation_Data.csv"
download.file(url, destfile = "./data.csv", method="auto")
df <- read.csv("~/Desktop/data.csv")

read_xlsx function fails to import online dataset with valid url

(NB- I am very much a beginner in R.)
This is the code I tried:
read_xlsx("valid/url")
For some reason I get the error message:
'path' does not exist:'valid/url'
I know the URL works, I have tested it many times. I am mystified, so any help would be much appreciated.
If I understand your issue correctly, I think you are inputting the URL into the read_xlsx command. Far as I am aware, this will not work if your excel file is online, you will need to download it locally first.
I suggest the following adjustment:
url <- "valid/url"
temp <- tempfile()
download.file(url, temp, mode="wb")
df1 <- read_excel(path = temp)
This will download the excel file into a temporary file, which you can then read into a dataframe, since it will be saved locally.

R read_excel from online web page link produces an empty data frame

Hi this is my first time posting,
I am trying to obtain data from an online web page link excel sheet. However,it works for the other links on the page but not a specific one which returns a blank data frame.
library(readxl)
download.file("https://www.parismou.org/sites/default/files/2016-04-DetentionLists_0.XLS","test.xls",mode="wb")
tbls=read_excel("test.xls")
Downloading it as a .xls file works fine but reading it doesnt work.
I have also tried using:
tbls=read.table("https://www.parismou.org/sites/default/files/2016-04-DetentionLists_0.XLS", header=TRUE, skipNul= TRUE)
which returns:
Error in read.table("https://www.parismou.org/sites/default/files/2016-04-DetentionLists_0.XLS", :
no lines available in input
I have also tried the XLConnect packages but those returned the following error:
require(XLConnect)
download.file("https://www.parismou.org/sites/default/files/2016-04-DetentionLists_0.XLS","test.xls",mode="wb")
tblspx=loadWorkbook("test.xls")
Error: OldExcelFormatException (Java): The supplied spreadsheet seems to be Excel 5.0/7.0 (BIFF5) format. POI only supports BIFF8 format (from Excel versions 97/2000/XP/2003)
Any help would be greatly appreciated.
You're dealing with a very old excel format. The gdata package can deal with that (see this SO post):
install.packages("gdata")
require(readxl)
download.file("https://www.parismou.org/sites/default/files/2016-04-DetentionLists_0.XLS","test.xls",mode="wb")
tbls = gdata::read.xls("test.xls", fileEncoding="latin1")

Import OData dataset

How do I properly download and load in R an OData dataset?
I tried the OData package, and even if the documentation is really simple, I am sure, I am missing something trivial.
I am trying to download and parse in R this dataset, but I cannot get how it is structured. Is it a XML format? Hence, what is the reason for a separator argument?
library(OData)
#What is the correct argument for the separator?
downloadResourceCsv("https://data.nasa.gov/OData.svc/gh4g-9sfh", sep = "")
As hrbrmstr suggests, use the RSocrata package
e.g., go to 1, click on ... in the top right,
click on "Access this Dataset via OData", click
on "Copy" to copy the OData endpoint, save it:
url <- "https://data.cdc.gov/api/odata/v4/9bhg-hcku"
library(RSocrata)
dat <- read.socrata(url)
It's XML format.So download first.
Try using httr package.
library(httr)
r <- GET("http://httpbin.org/get")
Visit this site for quick-start.
After download use XML package for xmlParse.
Thank you

How to download an .xlsx file from a dropbox (https:) location

I'm trying to adopt the Reproducible Research paradigm but meet people who like looking at Excel rather than text data files half way, by using Dropbox to host Excel files which I can then access using the .xlsx package.
Rather like downloading and unpacking a zipped file I assumed something like the following would work:
# Prerequisites
require("xlsx")
require("ggplot2")
require("repmis")
require("devtools")
require("RCurl")
# Downloading data from Dropbox location
link <- paste0(
"https://www.dropbox.com/s/",
"{THE SHA-1 KEY}",
"{THE FILE NAME}"
)
url <- getURL(link)
temp <- tempfile()
download.file(url, temp)
However, I get Error in download.file(url, temp) : unsupported URL scheme
Is there an alternative to download.file that will accept this URL scheme?
Thanks,
Jon
You have the wrong URL - the one you are using just goes to the landing page. I think the actual download URL is different, I managed to get it sort of working using the below.
I actually don't think you need to use RCurl or the getURL() function, and I think you were leaving out some relatively important /'s in your previous formulation.
Try the following:
link <- paste("https://dl.dropboxusercontent.com/s",
"{THE SHA-1 KEY}",
"{THE FILE NAME}",
sep="/")
download.file(url=link,destfile="your.destination.xlsx")
closeAllConnections()
UPDATE:
I just realised there is a source_XlsxData function in the repmis package, which in theory should do the job perfectly.
Also the function below works some of the time but not others, and appears to get stuck at the GET line. So, a better solution would be very welcome.
I decided to try taking a step back and figure out how to download a raw file from a secure (https) url. I adapted (butchered?) the source_url function in devtools to produce the following:
download_file_url <- function (
url,
outfile,
..., sha1 = NULL)
{
require(RCurl)
require(devtools)
require(repmis)
require(httr)
require(digest)
stopifnot(is.character(url), length(url) == 1)
filetag <- file(outfile, "wb")
request <- GET(url)
stop_for_status(request)
writeBin(content(request, type = "raw"), filetag)
close(filetag)
}
This seems to work for producing local versions of binary files - Excel included. Nicer, neater, smarter improvements in this gratefully received.

Resources