R quandl : couldn't connect to host - r

I am begining to use Quandl facilities to import datasets to R with Quandl R API. It appears to be the easiest thing. However I have a problem. The below pasted snipet of code does not work (for me). It returns an error.
library(Quandl)
my_quandl_dtst <- Quandl("DOE/RBRTE")
Error in function (type, msg, asError = TRUE) : couldn't connect to host
What could be the cause of the problem?
I searched this site and found some solutions, also the one below, but it does not work for me.
set_config(use_proxy(url='your.proxy.url',port,username,password))
On the other hand, read.csv with url pasted from quandl website export dataset facility works:
my_quandl_dtst <- read.csv('http://www.quandl.com/api/v1/datasets/DOE/RBRTE.csv?', colClasses = c('Date' = 'Date'))
I would realy like to use the Quandl library, since using it would make my code cleaner. Therefore I would appreciate any help. Thanks in advance.

Ok, I found the solution, I had to set RCurlOptions, because the Quandl function uses getURL() to download data from url. But I had to use options() function as well. So:
options(RCurlOptions = list(proxy = "my.proxy", proxyport = my.proxyport.number))
head(quandldata <- Quandl("NSE/OIL"))
Date Open High Low Last Close Total Trade Quantity Turnover (Lacs)
1 2014-03-03 453.5 460.05 450.10 450.30 451.30 90347 410.08
2 2014-02-28 440.0 460.00 440.00 457.60 455.55 565074 2544.66
3 2014-02-26 446.2 450.95 440.00 440.65 440.60 179055 794.24
4 2014-02-25 445.1 451.75 445.10 446.60 447.20 86858 389.38
5 2014-02-24 443.0 449.50 443.00 446.50 446.30 81197 362.33
6 2014-02-21 447.9 448.65 442.95 445.50 446.80 95791 427.32

I guess you need to check if the domain quand1.com accepts remote Connections to the RBRTE.csv file.

Related

libcurl function was given a bad argument CURLOPT_SSL_VERIFYHOST no longer supports 1 as value

While running the "PrepareAnnotationRefseq"  function from the customProDB package in R, I  ran into  a problem due to a compatibility issue of the curl version. I am currently using curl version 4.3.2.  The error report I got is:
PrepareAnnotationRefseq(genome='mm39',CDSfasta="geneseq.fasta",pepfasta="proteinseq.fasta", annotation_path, dbsnp = NULL, splice_matrix=FALSE, ClinVar=FALSE)
In curlSetOpt(..., .opts = .opts, curl = h, .encoding = .encoding) : Error setting the option for # 3 (status = 43) (enum = 81) (value = 0x55822c7f3b70): A libcurl function was given a bad argument CURLOPT_SSL_VERIFYHOST no longer supports 1 as value!
This could be a trivial problem for an expert in R, however with my current skill set I am unable to resolve this after looking for a solution on several forums and R groups. I would be very grateful if you could kindly shed some light on this issue. Perhaps a patch file that can fix the problem.
It's easy to read the manual. Why can't you do it?
If verify value is set to 1:
From 7.28.1 to 7.65.3: setting it to 1 made curl_easy_setopt() return an error and leaving the flag untouched.
Use 2.
When CURLOPT_SSL_VERIFYHOST is 2, that certificate must indicate that the server is the server to which you meant to connect, or the connection fails. Simply put, it means it has to have the same name in the certificate as is in the URL you operate against.
But why do you touch it? The default value for this option is 2 and is suitable for most cases of libcurl usage.

Reticulate AWS Cogntito

This is my Python code (that I've checked and it works):
from warrant.aws_srp import AWSSRP
def auth(USERNAME,PASSWORD):
client = boto3.client('cognito-idp',region_name=region_name)
aws = AWSSRP(username=USERNAME, password=PASSWORD, pool_id=POOL_ID,
client_id=CLIENT_ID,client=client)
try:
tokens = aws.authenticate_user()
return(tokens)
except Exception as e:
return(e)
I'm working with R in order to create a visual interface for doing some operation (including this one) and it is a requirement.
I use the reticulare R package to execute Python code. I tested it with some dummy code in order to check the correct functioning (and it is okay).
When i execute the above function by running:
reticulate::source_python(FILE_PATH)
py$auth(USERNAME,PASSWORD)
i get the following error:
An error occurred (InvalidParameterException) when calling the RespondToAuthChallenge operation: TIMESTAMP format should be EEE MMM d HH:mm:ss z yyyy in english.
I tried to search a lot but I found nothing, I suppose that can exist a sort of wrapper or formatter. Maybe someone as already face this problem...
Thank a lot of any help.

Bokeh: Models must be owned by only a single document

I'm working with Bokeh 0.12.2 in a Jupyter notebook and it frequently throws exceptions about "Models must be owned by only a single document":
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-23-f50ac7abda5e> in <module>()
2 ea.legend.label_text_font_size = '10pt'
3
----> 4 show(column([co2, co, nox, o3]))
C:\Users\pokeeffe\AppData\Local\Continuum\Anaconda3\lib\site-packages\bokeh\io.py in show(obj, browser, new, notebook_handle)
308 '''
309 if obj not in _state.document.roots:
--> 310 _state.document.add_root(obj)
311 return _show_with_state(obj, _state, browser, new, notebook_handle=notebook_handle)
312
C:\Users\pokeeffe\AppData\Local\Continuum\Anaconda3\lib\site-packages\bokeh\document.py in add_root(self, model)
443 self._roots.append(model)
444 finally:
--> 445 self._pop_all_models_freeze()
446 self._trigger_on_change(RootAddedEvent(self, model))
447
C:\Users\pokeeffe\AppData\Local\Continuum\Anaconda3\lib\site-packages\bokeh\document.py in _pop_all_models_freeze(self)
343 self._all_models_freeze_count -= 1
344 if self._all_models_freeze_count == 0:
--> 345 self._recompute_all_models()
346
347 def _invalidate_all_models(self):
C:\Users\pokeeffe\AppData\Local\Continuum\Anaconda3\lib\site-packages\bokeh\document.py in _recompute_all_models(self)
367 d._detach_document()
368 for a in to_attach:
--> 369 a._attach_document(self)
370 self._all_models = recomputed
371 self._all_models_by_name = recomputed_by_name
C:\Users\pokeeffe\AppData\Local\Continuum\Anaconda3\lib\site-packages\bokeh\model.py in _attach_document(self, doc)
89 '''This should only be called by the Document implementation to set the document field'''
90 if self._document is not None and self._document is not doc:
---> 91 raise RuntimeError("Models must be owned by only a single document, %r is already in a doc" % (self))
92 doc.theme.apply_to_model(self)
93 self._document = doc
RuntimeError: Models must be owned by only a single document, <bokeh.models.tickers.DaysTicker object at 0x00000000042540B8> is already in a doc
The trigger is always calling show(...) (although never the first time after kernel start-up, only subsequent calls).
Based on the docs, I thought reset_output() would return my notebook to an operable state but the exception persists. Through trial-and-error, I've determined it's necessary to also re-define everything being passing to show(). That makes interactive work cumbersome and error-prone.
[Ref]:
reset_output(state=None)
  Clear the default state of all output modes.
  Returns: None
Am I right about reset_output() -- is it supposed to resolve the situation causing this exception?
Else, how do I avoid this kind of exception?
It may be because of conflicting objects that has the same name. you need to create completely new objects every time.
Seems it can by fixed by differentiating the source name
Like this:
source1 = df
p1.circle('A', 'B', source=source1)
source2 = df
p2 = figure()
p2.circle('C', 'D', source=source2)
sourceN = df
p2 = figure()
p2.circle('X', 'Y', source=sourceN)
I've been working in a jupyterlab notebook iterating on visualizations of a large amount of data with bokeh, holoviews, and panel, and have been running into this issue periodically.
Here are a couple of additional things that may help. Note that p is used as the bokeh conventional name for the figure. I am posting on this old thread because it was the top result in my Google search for the error message.
Try clearing the document (found in docs):
from bokeh.io import curdoc
curdoc().clear()
I observed that panel was able to display a bokeh object even when bokeh show would not.
import panel as pn
pn.extension()
pn.pane.Bokeh(p)
Digging into how panel is able to display an object even when bokeh is not, I noticed this function, which fixed the problem for me:
import panel as pn
pn.io.model.remove_root(p)
If you don't have panel installed, here is the source code from above:
from bokeh.models import Model
for model in p.select({'type': Model}):
prev_doc = model.document
model._document = None
if prev_doc:
prev_doc.remove_root(model)
Hopefully this helps someone, or future me.
I ran into this error message when using file_html from from bokeh.embed, after i upgraded to bokeh version 1.01. Downgrading again to bokeh version 0.12.16 solved it. (pip install bokeh==0.12.16)
Note sure why.
This solution works without upgrading or downgrading packages.
try:
reset_output()
output_notebook()
show(p)
except:
output_notebook()
show(p)
Solution provided here :
https://github.com/bokeh/bokeh/issues/8579
Try using after creating each plot. add_root will add that model as a root of the current Document, making sure that each Model is added to a single Document:
curdoc().add_root(column([plot]))
curdoc().title = doc_title //Add a title to the doc
show(figure)
Note: column(the list of plots) can be replaced with any object which inherits the Model class.
Refer to the link for more details on add_root and bokeh Documents: https://docs.bokeh.org/en/latest/docs/reference/document.html?highlight=add_root#bokeh.document.document.Document.add_root
column_data_source = ColumnDataSource(dataframe)
After each row in the jupyter notebook that used the column data source we have to make it again simply. The column data source seems like we cant use it many times in the same code.
This seems a little hack but it worked for me when i faced the same error.

read.csv fails to read a CSV file from google docs

I wish to use read.csv to read a google doc spreadsheet.
I try using the following code:
data_url <- "http://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv"
read.csv(data_url)
Which results in the following error:
Error in file(file, "rt") : cannot open the connection
I'm on windows 7. And the code was tried on R 2.12 and 2.13
I remember trying this a few months ago and it worked fine.
Any suggestion what might be causing this or how to solve it?
Thanks.
It might have something to do with the fact that Google is reporting a 302 temporarily moved response.
> download.file(data_url, "~/foo.csv", method = "wget")
--2011-04-29 18:01:01-- http://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv
Resolving spreadsheets0.google.com... 74.125.230.132, 74.125.230.128, 74.125.230.130, ...
Connecting to spreadsheets0.google.com|74.125.230.132|:80... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: https://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv [following]
--2011-04-29 18:01:01-- https://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv
Connecting to spreadsheets0.google.com|74.125.230.132|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/plain]
Saving to: `/home/gavin/foo.csv'
[ <=> ] 41 --.-K/s in 0s
2011-04-29 18:01:02 (1.29 MB/s) - `/home/gavin/foo.csv' saved [41]
> read.csv("~/foo.csv")
column1 column2
1 a 1
2 b 2
3 ds 3
4 d 4
5 f 5
6 ga 5
I'm not sure R's internal download code is capable of responding to such redirects:
> download.file(data_url, "~/foo.csv")
trying URL 'http://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv'
Error in download.file(data_url, "~/foo.csv") :
cannot open URL 'http://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv'
I ran into the same problem and eventually found a solution in a forum thread. Using my own public CSV file:
library(RCurl)
tt = getForm("https://spreadsheets.google.com/spreadsheet/pub",
hl ="en_US", key = "0Aonsf4v9iDjGdHRaWWRFbXdQN1ZvbGx0LWVCeVd0T1E",
output = "csv",
.opts = list(followlocation = TRUE, verbose = TRUE, ssl.verifypeer = FALSE))
holidays <- read.csv(textConnection(tt))
Check the solution on http://blog.forret.com/2011/07/google-docs-infamous-moved-temporarily-error-fixed/
So what is the solution: just add “&ndplr=1” to your URL and you will
skip the authentication redirect. I’m not sure what the NDPLR
parameter name stands for, let’s just call it: “Never Do Published
Link Redirection“.

Logfile analysis in R?

I know there are other tools around like awstats or splunk, but I wonder whether there is some serious (web)server logfile analysis going on in R. I might not be the first thought to do it in R, but still R has nice visualization capabilities and also nice spatial packages. Do you know of any? Or is there a R package / code that handles the most common log file formats that one could build on? Or is it simply a very bad idea?
In connection with a project to build an analytics toolbox for our Network Ops guys,
i built one of these about two months ago. My employer has no problem if i open source it, so if anyone is interested i can put it up on my github repo. I assume it's most useful to this group if i build an R Package. I won't be able to do that straight away though
because i need to research the docs on package building with non-R code (it might be as simple as tossing the python bytecode files in /exec along with a suitable python runtime, but i have no idea).
I was actually suprised that i needed to undertake a project of this sort. There are at least several excellent open source and free log file parsers/viewers (including the excellent Webalyzer and AWStats) but neither parse server error logs (parsing server access logs is the primary use case for both).
If you are not familiar with error logs or with the difference between them and access
logs, in sum, Apache servers (likewsie, nginx and IIS) record two distinct logs and store them to disk by default next to each other in the same directory. On Mac OS X,
that directory in /var, just below root:
$> pwd
/var/log/apache2
$> ls
access_log error_log
For network diagnostics, error logs are often far more useful than the access logs.
They also happen to be significantly more difficult to process because of the unstructured nature of the data in many of the fields and more significantly, because the data file
you are left with after parsing is an irregular time series--you might have multiple entries keyed to a single timestamp, then the next entry is three seconds later, and so forth.
i wanted an app that i could toss in raw error logs (of any size, but usually several hundred MB at a time) have something useful come out the other end--which in this case, had to be some pre-packaged analytics and also a data cube available inside R for command-line analytics. Given this, i coded the raw-log parser in python, while the processor (e.g., gridding the parser output to create a regular time series) and all analytics and data visualization, i coded in R.
I have been building analytics tools for a long time, but only in the past
four years have i been using R. So my first impression--immediately upon parsing a raw log file and loading the data frame in R is what a pleasure R is to work with and how it is so well suited for tasks of this sort. A few welcome suprises:
Serialization. To persist working data in R is a single command
(save). I knew this, but i didn't know how efficient is this binary
format. Thee actual data: for every 50 MB of raw logfiles parsed, the
.RData representation was about 500 KB--100 : 1 compression. (Note: i
pushed this down further to about 300 : 1 by using the data.table
library and manually setting compression level argument to the save
function);
IO. My Data Warehouse relies heavily on a lightweight datastructure
server that resides entirely in RAM and writes to disk
asynchronously, called redis. The proect itself is only about two
years old, yet there's already a redis client for R in CRAN (by B.W.
Lewis, version 1.6.1 as of this post);
Primary Data Analysis. The purpose of this Project was to build a
Library for our Network Ops guys to use. My goal was a "one command =
one data view" type interface. So for instance, i used the excellent
googleVis Package to create a professional-looking
scrollable/paginated HTML tables with sortable columns, in which i
loaded a data frame of aggregated data (>5,000 lines). Just those few
interactive elments--e.g., sorting a column--delivered useful
descriptive analytics. Another example, i wrote a lot of thin
wrappers over some basic data juggling and table-like functions; each
of these functions i would for instance, bind to a clickable button
on a tabbed web page. Again, this was a pleasure to do in R, in part
becasue quite often the function required no wrapper, the single
command with the arguments supplied was enough to generate a useful
view of the data.
A couple of examples of the last bullet:
# what are the most common issues that cause an error to be logged?
err_order = function(df){
t0 = xtabs(~Issue_Descr, df)
m = cbind( names(t0), t0)
rownames(m) = NULL
colnames(m) = c("Cause", "Count")
x = m[,2]
x = as.numeric(x)
ndx = order(x, decreasing=T)
m = m[ndx,]
m1 = data.frame(Cause=m[,1], Count=as.numeric(m[,2]),
CountAsProp=100*as.numeric(m[,2])/dim(df)[1])
subset(m1, CountAsProp >= 1.)
}
# calling this function, passing in a data frame, returns something like:
Cause Count CountAsProp
1 'connect to unix://var/ failed' 200 40.0
2 'object buffered to temp file' 185 37.0
3 'connection refused' 94 18.8
The Primary Data Cube Displayed for Interactive Analysis Using googleVis:
A contingency table (from an xtab function call) displayed using googleVis)
It is in fact an excellent idea. R also has very good date/time capabilities, can do cluster analysis or use any variety of machine learning alogorithms, has three different regexp engines to parse etc pp.
And it may not be a novel idea. A few years ago I was in brief email contact with someone using R for proactive (rather than reactive) logfile analysis: Read the logs, (in their case) build time-series models, predict hot spots. That is so obviously a good idea. It was one of the Department of Energy labs but I no longer have a URL. Even outside of temporal patterns there is a lot one could do here.
I have used R to load and parse IIS Log files with some success here is my code.
Load IIS Log files
require(data.table)
setwd("Log File Directory")
# get a list of all the log files
log_files <- Sys.glob("*.log")
# This line
# 1) reads each log file
# 2) concatenates them
IIS <- do.call( "rbind", lapply( log_files, read.csv, sep = " ", header = FALSE, comment.char = "#", na.strings = "-" ) )
# Add field names - Copy the "Fields" line from one of the log files :header line
colnames(IIS) <- c("date", "time", "s_ip", "cs_method", "cs_uri_stem", "cs_uri_query", "s_port", "cs_username", "c_ip", "cs_User_Agent", "sc_status", "sc_substatus", "sc_win32_status", "sc_bytes", "cs_bytes", "time-taken")
#Change it to a data.table
IIS <- data.table( IIS )
#Query at will
IIS[, .N, by = list(sc_status,cs_username, cs_uri_stem,sc_win32_status) ]
I did a logfile-analysis recently using R. It was no real komplex thing, mostly descriptive tables. R's build-in functions were sufficient for this job.
The problem was the data storage as my logfiles were about 10 GB. Revolutions R does offer new methods to handle such big data, but I at last decided to use a MySQL-database as a backend (which in fact reduced the size to 2 GB though normalization).
That could also solve your problem in reading logfiles in R.
#!python
import argparse
import csv
import cStringIO as StringIO
class OurDialect:
escapechar = ','
delimiter = ' '
quoting = csv.QUOTE_NONE
parser = argparse.ArgumentParser()
parser.add_argument('-f', '--source', type=str, dest='line', default=[['''54.67.81.141 - - [01/Apr/2015:13:39:22 +0000] "GET / HTTP/1.1" 502 173 "-" "curl/7.41.0" "-"'''], ['''54.67.81.141 - - [01/Apr/2015:13:39:22 +0000] "GET / HTTP/1.1" 502 173 "-" "curl/7.41.0" "-"''']])
arguments = parser.parse_args()
try:
with open(arguments.line, 'wb') as fin:
line = fin.readlines()
except:
pass
finally:
line = arguments.line
header = ['IP', 'Ident', 'User', 'Timestamp', 'Offset', 'HTTP Verb', 'HTTP Endpoint', 'HTTP Version', 'HTTP Return code', 'Size in bytes', 'User-Agent']
lines = [[l[:-1].replace('[', '"').replace(']', '"').replace('"', '') for l in l1] for l1 in line]
out = StringIO.StringIO()
writer = csv.writer(out)
writer.writerow(header)
writer = csv.writer(out,dialect=OurDialect)
writer.writerows([[l1 for l1 in l] for l in lines])
print(out.getvalue())
Demo output:
IP,Ident,User,Timestamp,Offset,HTTP Verb,HTTP Endpoint,HTTP Version,HTTP Return code,Size in bytes,User-Agent
54.67.81.141, -, -, 01/Apr/2015:13:39:22, +0000, GET, /, HTTP/1.1, 502, 173, -, curl/7.41.0, -
54.67.81.141, -, -, 01/Apr/2015:13:39:22, +0000, GET, /, HTTP/1.1, 502, 173, -, curl/7.41.0, -
This format can easily be read into R using read.csv. And, it doesn't require any 3rd party libraries.

Resources