R - Read data from HTML table - r

I'm trying to execute an example from book "Practical Data Science Cookbook"
the code as following :
year <- 2013
#Acquire offense data
url <- paste("http://sports.yahoo.com/nfl/stats/byteam? group=Offense&cat=Total&conference=NFL&year=season_",
year,"&sort=530&old_category=Total&old_group=Offense")
offense <- readHTMLTable(url, encoding = "UTF-8", colClasses="character")[[7]]
and getting error :
Error in UseMethod("xmlNamespaceDefinitions") :
no applicable method for 'xmlNamespaceDefinitions' applied to an object of class "NULL"
Please help

To solve the problem need to configure http proxy .
On Windows desktop edit R-Studio shortcut , add after R-Studio name
proxy definitions
http_proxy=http://user_id:passwod#your_proxy:your_port/
source: Proxy settings for R

Related

jsonlite::fromJSON failed to connect to Port 443 in quantmod getFX()

I needed a function which automatically gets the Exchange Rate from a website (oanda.com), therefore I used the quantmod package and made a function based on that.
library(quantmod)
ForeignCurrency<-"MYR" # has to be a character
getExchangeRate<-function(ForeignCurrency){
Conv<-paste("EUR/",(ForeignCurrency),sep="")
getFX(Conv,from=Sys.Date()-179,to=Sys.Date())
Conv2<-paste0("EUR",ForeignCurrency,sep="")
Table<-as.data.frame(get(Conv2))
ExchangeRate<-1/mean(Table[,1])
ExchangeRate
}
ExchangeRate<-getExchangeRate(ForeignCurrency)
ExchangeRate
On my personal PC, it works perfectly and do what I want. If i run this on the working PC, I get following Error:
Warning: Unable to import “EUR/MYR”.
Failed to connect to www.oanda.com port 443: Timed out
I googled already a lot, it seems to be a Firewall Problem, but none of the suggestions I found there doesnt work. After checking the getFX() function, the Problem seems to be in the jsonlite::fromJSON function which getFX() is using.
Did someone of you faced a similar Problem? I am quite familar with R, but with Firewalls/Ports I have no expertise. Do I have to change something in the R settings or is it a Problem independent of R and something in the Proxy settings needs to be changed?
Can you please help :-) ?
The code below shows how you can do a workaround for the getfx() in an enterprise context where you often have to go through a proxy to the internet.
library(httr)
# url that you can find inside https://github.com/joshuaulrich/quantmod/blob/master/R/getSymbols.R
url <- "https://www.oanda.com/fx-for-business//historical-rates/api/data/update/?&source=OANDA&adjustment=0&base_currency=EUR&start_date=2022-02-17&end_date=2022-02-17&period=daily&price=mid&view=table&quote_currency_0=VND"
# original call inside the quantmod library: # Fetch data (jsonlite::fromJSON will handle connection) tbl <- jsonlite::fromJSON(oanda.URL, simplifyVector = FALSE)
# add the use_proxy with your proxy address and proxy port to get through the proxy
response <- httr::GET(url, use_proxy("XX.XX.XX.XX",XXXX))
status <- status_code(response)
if(status == 200){
content <- httr::content(response)
# use jsonlite to get the single attributes like quote currency, exchange rate = average and base currency
exportJson <- jsonlite::toJSON(content, auto_unbox = T)
getJsonObject <- jsonlite::fromJSON(exportJson, flatten = FALSE)
print(getJsonObject$widget$quoteCurrency)
print(getJsonObject$widget$average)
print(getJsonObject$widget$baseCurrency)
}

Download a custom dataset in Azure ML Jupyter/iPython Notebook using R

I need to download a custom dataset in an Azure Jupyter/iPython Notebook.
My ultimate goal is to install an R package. To be able to do this the package (the dataset) needs to be downloaded in code. I followed the steps outlined by Andrie de Vries in the comments section of this post: Jupyter Notebooks with R in Azure ML Studio.
Uploading the package as a ZIP file was without problems, but when I run the code in my notebook I get an error:
Error in curl(x$DownloadLocation, handle = h, open = conn): Failure
when receiving data from the peer Traceback:
download.datasets(ws, "plotly_3.6.0.tar.gz.zip")
lapply(1:nrow(datasets), function(j) get_dataset(datasets[j, . ], ...))
FUN(1L[[1L]], ...)
get_dataset(datasets[j, ], ...)
curl(x$DownloadLocation, handle = h, open = conn)
So I simplified my code into:
library("AzureML")
ws <- workspace()
ds <- datasets(ws)
ds$Name
data <- download.datasets(ws, "plotly_3.6.0.tar.gz.zip")
head(data)
Where "plotly_3.6.0.tar.gz.zip" is the name of my dataset of data type "Zip".
Unfortunately this results in the same error.
To rule out data type issues I also tried to download another dataset of mine which is of data type "Dataset". Also the same error.
Now I change the dataset I want to download to one of the sample datasets of AzureML Studio.
"text.preprocessing.zip" is of datatype Zip
data <- download.datasets(ws, "text.preprocessing.zip")
"Flight Delays Data" is of datatype GenericCSV
data <- download.datasets(ws, "Flight Delays Data")
Both of the sample datasets can be downloaded without problems.
So why can't I download my own saved dataset?
I could not find anything helpful in the documentation of the download.datasets function. Not on rdocumentation.org, nor on cran.r-project.org (page 17-18).
Try this:
library(AzureML)
ws <- workspace(
id = "your AzureML ID",
auth = "your AzureML Key"
)
name <- "Name of your saved data"
ws <- workspace()
It seems the error I got was due to a bug in the (then early) Azure ML Studio.
I tried again after the reply of Daniel Prager only to find out my code works as expected without any changes. Adding the id and auth parameters was not needed.

Spotfire TERR text mining error: "name must be a single string"

I am trying to create a script that does text mining (tm) combining property and action controls with TERR.
I have run my script successfully in open-source R but keep getting an error in TERR. I have narrowed down the function causing the error to VCorpus, part of the tm package. Here is the portion of the script causing errors:
myinput <- do.call(paste, c(as.list(col1), sep=" "))
Col1 is a document property (string) based on selection from property
control drop down list.
b <- VCorpus(VectorSource(myinput), readerControl = list(language = 'eng'))
... and the error message I get in TERR is:
TIBCO Enterprise Runtime for R returned an error: 'Error in
getS3method("pGetElem", class(x), TRUE) : 'name' must be a single
string'.
I am at this point too.
I can do well using open R engine but in TERR I am trying to solve this error.
I am suspecting about the data format expected by TERR.
Got a solution from Tibco developers community
Answer:
You will not face this error if you use TERR 4.1.
There was a bug which got fixed in version 4.1
Reference :
https://docs.tibco.com/pub/enterprise-runtime-for-R/4.1.0/TIB_terr_4.1.0_relnotes.pdf
See below fix on page 16
TERR-6049 The getS3method function now works when the class argument is of
length greater than 1.

R connecting R to twitter for sentiment analysis

I refered to the link given below for doing sentiment analysis
http://heuristically.wordpress.com/2011/04/08/text-data-mining-twitter-r/
And when I ran the code that is given below :`for (page in c(1:15)){
# search parameter
twitter_q <- URLencode('#prolife OR #prochoice')
twitter_url =
# fetch remote URL and parse
mydata.xml <- xmlParseDoc(twitter_url, asText=F)
# extract the titles
mydata.vector <- xpathSApply(mydata.xml, '//s:entry/s:title', xmlValue, namespaces =c('s'='http://www.w3.org/2005/Atom'))
# aggregate new tweets with previous tweets
mydata.vectors <- c(mydata.vector, mydata.vectors)
}
After running the code it is prompting me for an error
Error:Error in UseMethod("xpathApply") :
no applicable method for 'xpathApply' applied to an object of class "NULL"
I/O warning : failed to load HTTP resource
I installed the packages Roath,stringr,XML,plyr which was required.And I am using R Ver 3.0.3
Kindly help me out pleas how to go about it . I am struggling for this . It would be a great help if anyone guides me properly in right direction.

Using R package BerkeleyEarth

I'm working for the first time with the R package BerkeleyEarth, and attempting to use its convenience functions to access the BEST data. I think maybe it's just a problem with their servers (a matter I've separately addressed to the package's maintainer) but I wanted to know if it's instead something silly I'm doing.
To reproduce my fault
library(BerkeleyEarth)
downloadBerkeley()
which provides the following error message
trying URL 'http://download.berkeleyearth.org/downloads/TAVG/LATEST%20-%20Non-seasonal%20_%20Quality%20Controlled.zip'
Error in download.file(urls$Url[thisUrl], destfile = file.path(destDir, :
cannot open URL 'http://download.berkeleyearth.org/downloads/TAVG/LATEST%20-%20Non-seasonal%20_%20Quality%20Controlled.zip'
In addition: Warning message:
In download.file(urls$Url[thisUrl], destfile = file.path(destDir, :
InternetOpenUrl failed: 'A connection with the server could not be established'
Has anyone had a better experience using this package?
The error message is pointing to a different URL than one should get judging what URLs are listed at http://berkeleyearth.org/data/ that point to the zip formatted files. There are another set of .nc files that appear to be more recent. I would replace the entries in the BerkeleyUrls dataframe with the ones that match your analysis strategy:
This is the current URL that should be in position 1,1:
http://berkeleyearth.lbl.gov/downloads/TAVG/LATEST%20-%20Non-seasonal%20_%20Quality%20Controlled.zip
And this is the one that is in the package dataframe:
> BerkeleyUrls[1,1]
[1] "http://download.berkeleyearth.org/downloads/TAVG/LATEST%20-%20Non-seasonal%20_%20Quality%20Controlled.zip"
I suppose you could try:
BerkeleyUrls[, 1] <- sub( "download\\.berkeleyearth\\.org", "berkeleyearth.lbl.gov", BerkeleyUrls[, 1])

Resources