streamR, filterStream ends instantly - r

I am new to R and apologize in advance if this is a very simple issue. I want to make use of the package streamR, however I receive the following output when I execute the filterStream function:
Capturing tweets...
Connection to Twitter stream was closed after 0 seconds with up to 1 tweets downloaded.
I am wondering if I am missing a step during authentication. I am able to successfully use the twitteR package and obtain tweets through the searchTwitter function. Is there something more that I need in order to gain access to the streaming api?
library("ROAuth")
library("twitteR")
library("streamR")
library("RCurl")
options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")))
cred <- OAuthFactory$new(consumerKey="xxxxxyyyyyzzzzzz",
consumerSecret="xxxxxxyyyyyzzzzzzz111111222222"',
requestURL='https://api.twitter.com/oauth/request_token',
accessURL='https://api.twitter.com/oauth/access_token',
authURL='https://api.twitter.com/oauth/authorize')
cred$handshake(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl") )
save(cred, file="twitter authentication.Rdata")
registerTwitterOAuth(cred)
scoring<- searchTwitter("Landon Donovan", n=100, cainfo="cacert.pem")
filterStream( file.name="tweets_rstats.json",track="Landon Donovan", tweets=10, oauth=cred)

Your credentials look fine. There are a few possible answers I've encountered while working within R:
If you are trying to run this on more than one computer at a time (or someone else is using your API keys), it won't work because you are only allowed one connection per API Key. The same goes for attempting to run it in two different projects.
It's possible that the internet connection won't allow access to the API to do the scraping. On certain days or in certain locations, my code won't run even after establishing connection. I'm not sure if this has to do with certain controls on the connection or what the deal is, but one day it'll run on client sites and others it won't.
3. It seems as though if you run a pull and then run another one immediately after, then the 0 tweets/1 second message appears. I don't think you're abusing the rate limit, but there may be a wait period in order to stream again.
4 . A final suggestion would be to make sure you have "Read, Write, and Access Direct Messages" allowed in your Application Preferences. This may not make the difference, but then you aren't limiting any connections.
Other than these, there aren't any "quick fixes" I've encountered, for me it's usually just waiting around, so when I run a pull I make sure to capture a large number of tweets or capture for a few hours at a time.

Related

gmailr credentials randomly (?) need re-authentication

I'm using gmailr in an automatic R script to send out some emails. It's been working fine for about a month and a half, but recently it failed with the following error:
Error: Can't get Google credentials.
Are you running gmailr in a non-interactive session? Consider:
* Call `gm_auth()` directly with all necessary specifics.
Execution halted
My code, which hasn't changed, is
library(gmailr)
options(gargle_oauth_email = TRUE)
gm_auth_configure(path ="data/credentials.json")
gm_auth(email = TRUE, cache = ".secret")
and is run non-interactively. (there is only one token in the .secrets folder) When I now ran it interactively, it "did the dance" and opened up the authentication thingy in the browser, which I confirmed and now everything is running fine again.
The problem is that I don't understand why the credentials suddenly required re-authentication or how I could prevent the script failing like this in the future.
You can try to clean cache in gargle folder and then create new one.
It worked for me, when i had similar problem
gm_auth function with gargle_oauth_cache stop working

Authenticating Google Cloud Storage in R Studio

I know a similar question has been asked (link), but the response didn't work for me.
TLDR: I keep running into errors when trying to authenticate Google Cloud Storage in RStudio. I'm not sure what is going wrong and would love advice.
I have downloaded both the GCS_AUTH_FILE (created a service account with service admin privileges'--downloaded the key associated with the service account) and also downloaded GAR_CLIENT_WEB_JSON by creating a OAuth 2.0 Client ID and downloading that associated JSON file.
I've tried authenticating my Google Cloud Storage in several ways and hit different errors.
Way 1-automatic setup:
gcs_setup()
Then I select any one of the options, and get the error: Error in if (file.exists(local_file)) { : argument is of length zero And that error happens no matter which of the three options I select.
Way 2 - basic, following manual setup instructions from the package:
Sys.setenv("GCS_DEFAULT_BUCKET" = "my-default-bucket",
"GCS_AUTH_FILE" = "/fullpath/to/service-auth.json")
gcs_auth()
In this case, GCS_AUTH_FILE is the file that I mentioned at the beginning of this post, and the GCS_DEFAULT_BUCKET is the name of the bucket. When I run the first line, it seems to be working (nothing goes awry and it runs just fine), but when I run gcs_auth() I get taken to a web browser page that states:
"Authorization Error
Error 400: invalid_request
Missing required parameter: client_id"
Way 3: Following the method from the post that I linked above
This way involves manually setting the .Renviron file w/ the GCS_AUTH_FILE and GAR__CLIENT_WEB_JSON locations, and then running gar_auth(). And yet again, I get the exact same error as in Way 2.
Any ideas about what could be going wrong? Thanks for your help. I wasn't sure how to put in totally reproducible code in this case, so if there is a way I should do that, please let me know.

How to solve a data source error when loading Google Analytics data in Power BI?

I would like to load data from Google Analytics into Power BI.
After transforming the data in the Query Editor, I apply the changes.
At first, I see the message 'Waiting for www.googleapis.com' and the number of rows increases.
After a while, I get the following error message:
Failed to save modifications to the server. Error returned: 'OLE DB or ODBC error: [DataSource.Error] There was an internal error..'
Rows with errors have been removed in one of the steps and I have a stable Internet connection.
Does anyone have suggestions on how to solve this?
I was also facing this kind of refreshing issue, First go to edit query and verify the data types and change the data types if needed, after that if you still facing this error, you need to keep open the app.powerbi.com while refresh your PBI dashboard, I was followed the above steps and my issue got resolved now.

bokeh sample data download fail with 'HTTPError: HTTP Error 403: Forbidden'

When trying to download bokeh sample data following instructions in 'https://docs.bokeh.org/en/latest/docs/installation.html#sample-data' it fails to download with HTTP Error 403: Forbidden
in conda prompt:
bokeh sampledata (failed)
in Jupyter notebook
import bokeh.sampledata
bokeh.sampledata.download() (failed
TLDR; you will either need to upgrade to Bokeh version 1.3 or later, or else you can manually edit the bokeh.util.sampledata module to use the new CDN location http://sampledata.bokeh.org. You can see the exact change to make in PR #9075
The bokeh.sampledata module originally pulled data directly from a public AWS S3 bucket location hardcoded in the module. This was a poor choice that left open the possibility for abuse, and in late 2019 an incident finally happened where someone (intentionally or unintentionally) downloaded the entire dataset tens of thousands of times over a three day period, incurring a significant monetary cost. (Fortunately, AWS saw fit to award us a credit to cover this anomaly.) Starting in version 1.3 sample data is now only accessed from a proper CDN with much better cost structures. All public direct access to the original S3 bucket was removed. This change had the unfortunate effect of immediately breaking bokeh.sampledata for all previous Bokeh versions, however as an open-source project we simply cannot afford the real (and potentially unlimited) financial risk exposure.

How to resolve celery.backends.rpc.BacklogLimitExceeded error

I am using Celery with Flask after working for a good long while, my celery is showing a celery.backends.rpc.BacklogLimitExceeded error.
My config values are below:
CELERY_BROKER_URL = 'amqp://'
CELERY_TRACK_STARTED = True
CELERY_RESULT_BACKEND = 'rpc'
CELERY_RESULT_PERSISTENT = False
Can anyone explain why the error is appearing and how to resolve it?
I have checked the docs here which doesnt provide any resolution for the issue.
Possibly because your process consuming the results is not keeping up with the process that is producing the results? This can result in a large number of unprocessed results building up - this is the "backlog". When the size of the backlog exceeds an arbitrary limit, BacklogLimitExceeded is raised by celery.
You could try adding more consumers to process the results? Or set a shorter value for the result_expires setting?
The discussion on this closed celery issue may help:
Seems like the database backends would be a much better fit for this purpose.
The amqp/RPC result backends needs to send one message per state update, while for the database based backends (redis, sqla, django, mongodb, cache, etc) every new state update will overwrite the old one.
The "amqp" result backend is not recommended at all since it creates one queue per task, which is required to mimic the database based backends where multiple processes can retrieve the result.
The RPC result backend is preferred for RPC-style calls where only the process that initiated the task can retrieve the result.
But if you want persistent multi-consumer result you should store them in a database.
Using rabbitmq as a broker and redis for results is a great combination, but using an SQL database for results works well too.

Resources