Googleway timeout - r

I'm having trouble with the googleway package in R.
I am attempting to get driving distance for 159,000 records.
I am using a paid google cloud account and have set all quotas as unlimited.
I've attempted to use server keys and browser keys.
After multiple attempts the service returns a time out message
Error in open.connection(con, "rb") : Timeout was reached
Successfully returned x results before timeout
1) x ~=5,000 2) x ~=7,000 3) x ~=3,000 4) x ~= 12,000
All tried on different days.
As you can see none of these are any where near the 100,000/day quota.
We've checked firewall rules and made sure that the cause of the time out is not at our end.
For some reason Google API service is cutting off the requests.
We have had no response from Google and we are currently on the bronze support package so we don't get any real support from them as a matter of course.
The creator of the googleway packages is certain that there are no impediments coming from the package.
We're hoping there is someone out there who may know why this may be happening and how we could avoid it from happening to enable us to run the distance matrix over our full list of addresses.
Using R version "Supposedly Educational".
Using Googleway package.
CHARSET cp1252
DISPLAY :0
FP_NO_HOST_CHECK
NO
GFORTRAN_STDERR_UNIT
-1
GFORTRAN_STDOUT_UNIT
-1
NUMBER_OF_PROCESSORS
4
OS Windows_NT
PROCESSOR_ARCHITECTURE
AMD64
PROCESSOR_IDENTIFIER
Intel64 Family 6 Model 60 Stepping
3, GenuineIntel
PROCESSOR_LEVEL 6
PROCESSOR_REVISION
3c03
R_ARCH /x64
R_COMPILED_BY gcc 4.9.3
RS_LOCAL_PEER \\.\pipe\37894-rsession
RSTUDIO 1
RSTUDIO_SESSION_PORT
37894

I have developed a different implementation between Google maps and R:
install.packages("gmapsdistance")
You can try this one. However, take into account that in addition to the daily limits, there are limits to the number of queries even if you have the premium account (625 per request, 1,000 per second in the server side, etc.):
https://developers.google.com/maps/documentation/distance-matrix/usage-limits I think this might be the issue

Related

bokeh sample data download fail with 'HTTPError: HTTP Error 403: Forbidden'

When trying to download bokeh sample data following instructions in 'https://docs.bokeh.org/en/latest/docs/installation.html#sample-data' it fails to download with HTTP Error 403: Forbidden
in conda prompt:
bokeh sampledata (failed)
in Jupyter notebook
import bokeh.sampledata
bokeh.sampledata.download() (failed
TLDR; you will either need to upgrade to Bokeh version 1.3 or later, or else you can manually edit the bokeh.util.sampledata module to use the new CDN location http://sampledata.bokeh.org. You can see the exact change to make in PR #9075
The bokeh.sampledata module originally pulled data directly from a public AWS S3 bucket location hardcoded in the module. This was a poor choice that left open the possibility for abuse, and in late 2019 an incident finally happened where someone (intentionally or unintentionally) downloaded the entire dataset tens of thousands of times over a three day period, incurring a significant monetary cost. (Fortunately, AWS saw fit to award us a credit to cover this anomaly.) Starting in version 1.3 sample data is now only accessed from a proper CDN with much better cost structures. All public direct access to the original S3 bucket was removed. This change had the unfortunate effect of immediately breaking bokeh.sampledata for all previous Bokeh versions, however as an open-source project we simply cannot afford the real (and potentially unlimited) financial risk exposure.

Why isn't Carbon writing Whisper data points as per updated storage-schema retention?

My original carbon storage-schema config was set to 10s:1w, 60s:1y and was working fine for months. I've recently updated it to 1s:7d, 10s:30d, 60s,1y. I've resized all my whisper files to reflect the new retention schema using the following bit of bash:
collectd_dir="/opt/graphite/storage/whisper/collectd/"
retention="1s:7d 1m:30d 15m:1y"
find $collectd_dir -type f -name '*.wsp' | parallel whisper-resize.py \
--nobackup {} $retention \;
I've confirmed that they've been updated using whisper-info.py with the correct retention and data points. I've also confirmed that the storage-schema is valid using a storage-schema validation script.
The carbon-cache{1..8}, carbon-relay, carbon-aggregator, and collectd services have been stopped before the whisper resizing, then started once the resizing was complete.
However, when checking in on a Grafana dashboard, I'm seeing empty graphs with correct data points (per sec, but no data) on collectd plugin charts; but with the graphs that are providing data, it's showing data and data points every 10s (old retention), instead of 1s.
The /var/log/carbon/console.log is looking good, and the collectd whisper files all have carbon user access, so no permission denied issues when writing.
When running an ngrep on port 2003 on the graphite host, I'm seeing connections to the relay, along with metrics being sent. Those metrics are then getting relayed to a pool of 8 caches to their pickle port.
Has anyone else experienced similar issues, or can possibly help me diagnose the issue further? Have I missed something here?
So it took me a little while to figure this out. It had nothing to do with the local_settings.py file like some of the old responses, but it had to do with the Interval function in the collectd.conf.
A lot of the older responses mentioned that you needed to include 'Interval 1' inside each Plugin container. I think this would have been great due to the control of each metric. However, that would create config errors in my logs, and break the metric. Setting 'Interval 1' at top level of the config resolved my issues.

How to display the logged information on an aerospike server as a graph?

I would like to log some stats on some aerospike nodes and analyse the same.
I found that aerospike comes with this tool called asgraphite, which seems to be using a forked version of the
The integration guide of asgraphite mentions some commands which are supposed to, e.g., start logging. I can run the following command already on my node and see the expected output, so it looks like I am all set to start logging -
By the way, we are running the community edition, which, it seems, does not provide historical latency stats in the AMC dashboard.
python /opt/aerospike/bin/asgraphite --help
However, I don't see any information on how to monitor the data hence logged. I am expecting a web interface which is usually provided by graphite, like this -

streamR, filterStream ends instantly

I am new to R and apologize in advance if this is a very simple issue. I want to make use of the package streamR, however I receive the following output when I execute the filterStream function:
Capturing tweets...
Connection to Twitter stream was closed after 0 seconds with up to 1 tweets downloaded.
I am wondering if I am missing a step during authentication. I am able to successfully use the twitteR package and obtain tweets through the searchTwitter function. Is there something more that I need in order to gain access to the streaming api?
library("ROAuth")
library("twitteR")
library("streamR")
library("RCurl")
options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")))
cred <- OAuthFactory$new(consumerKey="xxxxxyyyyyzzzzzz",
consumerSecret="xxxxxxyyyyyzzzzzzz111111222222"',
requestURL='https://api.twitter.com/oauth/request_token',
accessURL='https://api.twitter.com/oauth/access_token',
authURL='https://api.twitter.com/oauth/authorize')
cred$handshake(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl") )
save(cred, file="twitter authentication.Rdata")
registerTwitterOAuth(cred)
scoring<- searchTwitter("Landon Donovan", n=100, cainfo="cacert.pem")
filterStream( file.name="tweets_rstats.json",track="Landon Donovan", tweets=10, oauth=cred)
Your credentials look fine. There are a few possible answers I've encountered while working within R:
If you are trying to run this on more than one computer at a time (or someone else is using your API keys), it won't work because you are only allowed one connection per API Key. The same goes for attempting to run it in two different projects.
It's possible that the internet connection won't allow access to the API to do the scraping. On certain days or in certain locations, my code won't run even after establishing connection. I'm not sure if this has to do with certain controls on the connection or what the deal is, but one day it'll run on client sites and others it won't.
3. It seems as though if you run a pull and then run another one immediately after, then the 0 tweets/1 second message appears. I don't think you're abusing the rate limit, but there may be a wait period in order to stream again.
4 . A final suggestion would be to make sure you have "Read, Write, and Access Direct Messages" allowed in your Application Preferences. This may not make the difference, but then you aren't limiting any connections.
Other than these, there aren't any "quick fixes" I've encountered, for me it's usually just waiting around, so when I run a pull I make sure to capture a large number of tweets or capture for a few hours at a time.

Indefinite Provisioning of EMR Cluster with Segue in R

I am trying to use the R package called Segue by JD Long, which is lauded as the ultimate in simplicity for using R with AWS by a book I read called "Parallel R".
However, for the 2nd day in a row I've run into a problem where I initiate the creation of a cluster and it just says STARTING indefinitely.
I tried this on OS X and in Linux with clusters of sizes 2, 6, 10, 20, and 25. I let them all run for at least 6 hours. I have no problem starting a cluster in the AWS EMR Management Console, though I have no clue how to connect Segue/R to a cluster that was started in the Management Console instead of via createCluster().
So my question is -- is there either some way to trouble shoot the provisioning of the cluster or to bypass the problem by creating the cluster manually and somehow getting Segue to work with that?
Here's an example of what I'm seeing:
library(segue)
Loading required package: rJava
Loading required package: caTools
Segue did not find your AWS credentials. Please run the setCredentials() function.
setCredentials("xxx", "xxx")
emr.handle <- createCluster(numInstances=10)
STARTING - 2013-07-12 10:36:44
STARTING - 2013-07-12 10:37:15
STARTING - 2013-07-12 10:37:46
STARTING - 2013-07-12 10:38:17
.... this goes on for hours and hours and hours...
UPDATE##: After 36 hours and many attempts that failed, this began working (randomly...) when I tried it with 1 node. I then tried it with 10 nodes and it worked great. To my knowledge nothing changed locally or on AWS...
I am answering my own question on behalf of the AWS support rep who gave me the following belated explanation:
The problem with the EMR creation is with the Availability Zone specified (us-east-1c), this availability zone is now constrained and doesn't allow the creation of new instances, so the job was trying to create the instances in a infinite loop.
You can see information about constrained AZ here: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html#concepts-regions-availability-zones
"As Availability Zones grow over time, our ability to expand them can become constrained. If this happens, we might restrict you from launching an instance in a constrained Availability Zone unless you already have an instance in that Availability Zone. Eventually, we might also remove the constrained Availability Zone from the list of Availability Zones for new customers. Therefore, your account might have a different number of available Availability Zones in a region than another account."
So you need to specify another AZ, or what I recommend is not specify any AZ, so EMR is going to be able to select any available.
I found this thread: https://groups.google.com/forum/#!topic/segue-r/GBd15jsFXkY
on Google Groups, where the topic of availability zones came up before. The zone that was set as the new default in that thread was the zone causing problems for me. I am attempting to edit the source of Segue.
Jason, I'm the author of Segue so maybe I can help.
Please look under the details section in the lower part of the AWS console and see if you can determine if the bootstrap sequences completed. This is an odd problem because typically an error at this stage is pervasive across all users. However I can't reproduce this one.

Resources