getURL and downfile.file won't download content of webpage (R) - r

Currently I am looking into the "getURL" and "download.file" command in R. So far, both worked like a charm.
However, I have problems with one specific link and I don't know why this one doesn't work.
Running
getURL
("http://www.r-bloggers.com/improving-script_002-%e2%80%9cmonitor%e2%80%9d/")
produce the error:
Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) :
embedded nul in string: '\037\b'
The "download.file" command creates also a weirdly encoded file:
download.file
("http://www.r-bloggers.com/improving-script_002-%e2%80%9cmonitor%e2%80%9d/",
"test.html")
Does this work with you?

The problem are the quotes in the URL. You need to encode it first. This will work properly.
getURL(URLencode("http://www.r-bloggers.com/improving-script_002-%E2%80%9Cmonitor%E2%80%9D/"))

Related

Post image to slack using httr package in R

Slack offers a method to upload files through their api. The documentation is found here:
Slack files.upload method
On this page it gives an example of how to post a file:
curl -F file=#dramacat.gif -F "initial_comment=Shakes the cat" -F channels=C024BE91L,D032AC32T -H "Authorization: Bearer xoxa-xxxxxxxxx-xxxx" https://slack.com/api/files.upload
I am trying to translate how to execute this line of code using the httr package in R, with a file in my R working directory. I'm having trouble translating the different parts of the command. Here is what I have so far.
api_token='******'
f_path='c:/mark/consulting/dreamcloud' #this is also my working directory
f_name='alert_picture.png'
res<-httr::POST(url='https://slack.com/api/files.upload', httr::add_headers(`Content-Type` = "multipart/form-data"),
body = list(token=api_token, channels='CCJL7TMC7', title='test', file = httr::upload_file(f_path), filename=f_name))
When I run this I get the following error:
Error in curl::curl_fetch_memory(url, handle = handle) :
read function returned funny value
I tried to find better examples to use but so far no luck. Any suggestions are appreciated!
There's an example in slackr's own gg_slackr method, which creates an image of a GGPlot and uploads it to Slack:
res <- POST(url="https://slack.com/api/files.upload",
add_headers(`Content-Type`="multipart/form-data"),
body=list(file=upload_file(ftmp),
token=api_token, channels=modchan))
Your code seems to be passing a path to a directory rather than a file as the file parameter - consider changing that parameter to file=upload_file(paste(f_path, f_name, sep="/") and see if that fixes your error.

pdftables R package throwing HTTP 400 error

I am trying to use pdftables package to extract data into csv.
install.packages("pdftables")
library(pdftables)
write.csv(head(iris), file = "test.csv", row.names = FALSE)
Open test.csv and print as PDF to "test.pdf"
convert_pdf("test.pdf", "test2.csv")
However, I am getting the following error:
Error in get_content(input_file, format, api_key) : Bad Request
(HTTP 400).
What's the fix here?
Did you get an API token?
To use the package the user first needs to sign up to the PDFTables API to get an API token (they offer a free package that allows up to 50 pages).
See: https://cran.r-project.org/web/packages/pdftables/README.html
To use the PDFTables R package, you need to the run the following command:
convert_pdf('test/index.pdf', output_file = NULL, format = "xlsx-single", message = TRUE, api_key = "insert_API_key")
Make sure you replace insert_API_key with your API key, and change the file path and/or format.
More info here: https://pdftables.com/blog/convert-pdf-to-excel-r

R Importing excel file directly from web

I need to import excel file directly from NYSE website. The spreadsheet url is https://quotespeed.morningstar.com/exportChartDataToExcel.jsp?tickers=AAPL&symbols=126.1.AAPL&st=1980-12-1&ed=2015-6-8&f=m&dty=1&types=1&ver=1.6.0&qs_wsid=E43474CC03753FE0E777D89877788ECB . Tried using gdata package and changing https to http but still doesnt work. Does anybody know solution to such issue?
EDIT: Has to be imported to R directly from website (project requirement)
Without information about why using the gdata package does not work for you I have to assume. Make sure you have Perl installed - you can download it at http://www.activestate.com/activeperl
This works for me:
library('gdata')
## URL broken into multiple lines for readability
url <- paste("https://quotespeed.morningstar.com/exportChartDataToExcel.",
"jsp?tickers=AAPL&symbols=126.1.AAPL&st=1980-12-1&ed=2015-",
"6-8&f=m&dty=1&types=1&ver=1.6.0&qs_wsid=E43474CC03753FE0E",
"777D89877788ECB", sep = "")
url <- gsub("https", "http",url)
data <- read.xls(url, perl = "C:/Perl64/bin/perl.exe")
Without perl = "path_to_perl.exe" I got the error
Error in findPerl(verbose = verbose) :
perl executable not found. Use perl= argument to specify the correct path.
Error in file.exists(tfn) : invalid 'file' argument
Use the RCurl package to download the file and the readxl package by Hadley to read the excel file

R produces "unsupported URL scheme" error when getting data from https sites

R version 3.0.1 (2013-05-16) for Windows 8 knitr version 1.5 Rstudio 0.97.551
I am using knitr to do the markdown of my R code.
As part of my analysis I downloaded various data sets from the web, knitr is totally fine with getting data from http sites but from https ones where it generates an unsupported URL scheme message.
I know when using the download.file function on a mac the method parameter has to be set to curl to get data from an https however this doesn't help when using knitr.
What do I need to do so that knitr will gather data from Https websites?
Edit:
Here is the code chunk that returns an error in Knitr but when run through R works without error.
```{r}
fileurl <- "https://dl.dropbox.com/u/7710864/data/csv_hid/ss06hid.csv"
download.file(fileurl, destfile = "C:/Users/xxx/yyy")
```
You could use https with download.file() function by passing "curl" to method as :
download.file(url,destination,method="curl")
Edit (May 2016): As of R 3.3.0, download.file() should handle SSL websites automatically on all platforms, making the rest of this answer moot.
You want something like this:
library(RCurl)
data <- getURL("https://dl.dropbox.com/u/7710864/data/csv_hid/ss06hid.csv",
ssl.verifypeer=0L, followlocation=1L)
That reads the data into memory as a single string. You'll still have to parse it into a dataset in some way. One strategy is:
writeLines(data,'temp.csv')
read.csv('temp.csv')
You can also separate out the data directly without writing to file:
read.csv(text=data)
Edit: A much easier option is actually to use the rio package:
library("rio")
import("https://dl.dropbox.com/u/7710864/data/csv_hid/ss06hid.csv")
This will read directly from the HTTPS URL and return a data.frame.
Use setInternet2(use = TRUE) before using the download.file() function. It works on Windows 7.
setInternet2(use = TRUE)
download.file(url, destfile = "test.csv")
I am sure you have already found solution to your problem by now.
I was working on an assignment right now and ended up getting the same error. I tried some of the tricks, but that did not work for me. Maybe because I am working on Windows machine.
Anyhow, I changed the link to http: rather than https: and that did the trick.
Following is chunk of my code:
if (!file.exists("./PeerAssesment2")) {dir.create("./PeerAssessment2")}
fileURL <- "http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileURL, dest = "./PeerAssessment2/Data.zip")
install.packages("R.utils")
library(R.utils)
if (!file.exists("./PeerAssessment2/Data")) {
bunzip2 ("./PeerAssessment2/Data.zip", destname = "./PeerAssessment2/Data")
}
list.files("./PeerAssessment2")
noaaData <- read.csv ('./PeerAssessment2/Data')
Hope this helps.
I had the same issue with knitr and download.file() with a https url, on Windows 8.
You could try setInternet2(TRUE) before using the download.file() function. However I'm not sure that this fix works on Unix-like systems.
setInternet2(TRUE) # set the R_WIN_INTERNET2 to TRUE
fileurl <- "https://dl.dropbox.com/u/7710864/data/csv_hid/ss06hid.csv"
download.file(fileurl, destfile = "C:/Users/xxx/yyy") # now it should work
Source : R documentation (?download.file()) :
Note that https:// URLs are only supported if --internet2 or environment variable R_WIN_INTERNET2 was set or setInternet2(TRUE) was used (to make use of Internet Explorer internals), and then only if the certificate is considered to be valid.
I had the same problem with a https with the following code running perfectly in R and getting unsupported URL scheme when knitting to html:
temp = tempfile()
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2Factivity.zip", temp)
data = read.csv(unz(temp, "activity.csv"), colClasses = c("numeric", "Date", "numeric"))
I tried all the solutions posted here and nothing worked, in my absolute desperation I just eliminated the "s" in the "https" in the url and everything got fine...
Using the R download package takes care of the quirky details typically associated with file downloads. For you example, all you needed to do would have been:
```{r}
library(download)
fileurl <- "https://dl.dropbox.com/u/7710864/data/csv_hid/ss06hid.csv"
download(fileurl, destfile = "C:/Users/xxx/yyy")
```

R input file arabic?

I am writing a program that reads arabic text from a text file in R, whenever i read the file i get the following errors:
Warning messages:
1: In read.table("C:\\Users\\Mustafa\\Desktop\\arabic.txt", sep = "\n", :
invalid input found on input connection 'C:\Users\Mustafa\Desktop\arabic.txt'
2: In read.table("C:\\Users\\Mustafa\\Desktop\\arabic.txt", sep = "\n", :
incomplete final line found by readTableHeader on 'C:\Users\Mustafa\Desktop\arabic.txt'
File<-read.table("C:\\Users\\Mustafa\\Desktop\\arabic.txt",sep=" \n",col.names="ar",fileEncoding="UTF-8")
I have no idea where error is , the environment i am using is windows, on mac os it works file, however i must run it on windows! any help is appreciated.
Thank you!
This error message means that your file does not end with EOL (end-of-line character) e.g. \n or \r\n.
This is sort of a warning that your file might not be completed. It seems that on MAC it is ignored, but in windows it is considered an error.
The solution is easy, just add a new line at the end of your file, save it and try again.
The below code worked for me.
Sys.setlocale("LC_ALL","Arabic")

Resources