Twitter API limit on live streaming tweets through the "rtweet" R package - r

I am currently live streaming tweets via the stream_tweets command based on a pre defined query, provided by the "rtweet" package. My only concern is whether I have some sort of limits from Twitter`s API?
Note that I am a beginner with regards to APIs, thus this question may be quite foolish.
Thank you

Update The stream_tweets section of the documentation references the link below, which states you have a rate limit of 10 requests per 60 seconds.
https://developer.twitter.com/en/docs/twitter-api/enterprise/decahose-api/api-reference/decahose
Original
The rtweet documentation says the rate limit for the standard search API
is 18,000 tweets per fifteen minutes. This is on page 5 in the bearer_token function section.
https://cran.r-project.org/web/packages/rtweet/rtweet.pdf

Related

Pricing for specific Googleway commands in RStudio

I am currently using the Google Places API on a free trial. I am interested in paying for the API but can't find the exact cost of the two commands that I use: google_places(), and google_place_details(). I have contacted the Google sales team and looked at the places and billing url, but I have not managed to find the answer of how much it would cost exactly to execute these two commands.
For google_places(), this is an example of a command I would execute:
google_places(search_string = "Cafeteria in Madrid, Spain", key=key)
From the places and billing url, it seems like this counts as a text search, so each time the code is executed it would cost 0,032$. Is this the case?
For google_place_details(), here is an example of the command I would execute:
google_place_details(place_id = "ChIJf_XA-F0U04kR1IPYSdTJ4so", key=key)
This command, as well giving basic place details (which cost 0,017$ according to the billing url),
gives information which counts as contact data (an extra 0,003$) and atmosphere data (an extra 0,005$). It also provides photo data (0,007$ according to the billing url), which I am not interested in but is automatically included in the results anyway. Does this mean that the cost of executing this command once is these four prices summed up?
I am interested in knowing exactly how much it would cost to execute the two commands I have listed.
probably this helps:
First of all you are billed monthly after you exceeded the 200 Euro/Dollars, which are given by google for free (as you probably described as "free plan"). So after every month you get a bill on how many requests of each function you send to google. There everything is written quite clearly including the amount and price of each "unit". then you can easily divide it.
Second option would be your Google Api Cockpit.
It tracks your requests quite precisely on different time bases. So sending your wanted commands only once on a day can give you an exact total-price.
The Cockpit is super handy for different things. If you want you can even set limits, which is probably helpful in your case too.
Here is the link to the billing monitor as well: Billing Google API Cockpit
Furthermore the description of how google charges you. Look here
best regards

How to determine how many free google distance queries are left on my account?

I'm pulling distance/time information for a large number of origin/destination pairs using the Google Maps API in R. I'm currently using the gmapsdistance package but have looked at a few others.
My premium API key includes 100k free queries per day. Are there any packages that can return how many are remaining? For example, the ggmap package has a geocodeQueryCheck(). The problem is I don't think this function actually returns the number remaining on your account. It doesn't ask for your API key. My guess is that it just keeps track of how many it has called today. The latest github version has a register_google() function that does allow you to set your API key, but when I make API requests with the gmapsdistance package, geocodeQueryCheck() doesn't update.
In summary, I just want to know how many are left. Even if I need to construct the URL address directly. When I look at the API documentation, I don't even see URL calls for it, which doesn't give me much hope.
As confirmed by #SymbolixAU, there is currently no way to do this.
Sorry, I guess this is late, but have you tried this?
sum(.GoogleDistQueryCount$elements)

How to extract Google Analytics historical data using APIs. Pros and cons?

I'm doing a quick proof of concept to understand the procedure to extract historical data from Google Analytics to be further used for offline data stitching to generate a holistic view of data and its analysis. I have not found any detailed online documentation available to understand pros and cons.
Would like to know any limitations on:
The time period for which data can be extracted or any limitation for max. calendar days?
Whether all dimensions/metrics can be extracted or any specific ones?
Will the data be real-time or sampled?
Can all data be pulled into a single table or separate ones?
Will it be available for both freeware and premium version?
The time period for which data can be extracted or any limitation for max. calendar days?
Start date can not be before the launch of Google analytics on '2005-01-01'. Due to processing time lag extracting data that is newer then 2 days old can result in incomplete data. Recommend checking the isDataGolden flag on the response.
Requesting large date ranges can result in sampling which can not be prevented. Its best to request the data in small chunks.
Whether all dimensions/metrics can be extracted or any specific ones?
A list of the dimensions and metrics you can extract can be found here. Each request can contain a maximum of 7 dimensions and 10 metrics.
Will the data be real-time or sampled?
Real-time API and Reporting API are two different APIs. Real-time API is not to my knowledge sampled but as its only about five minutes of data I find it hard to think anyone but really big websites will hit this problem if it is.
Will it be available for both freeware and premium version?
Accessing Google Analytics APIs is free there is no charge. There are however limits on how much data you can extract in a given day.
By default your application can run a maximum of 50k request a day. This can be extended.
Each view you are extracting from can make a maximum of 10k requests day. This can not be extended.
See: limits and quotas for more info.
Note: I am a developer on a business intelligence application that extracts Google Analytics data. I can tell you that its definitely doable.

twitteR r package: How to get as many tweets as possible per account within API limits

I'm a novice R and twitteR package user but I wasn't able to find a strong recommendation on how to accomplish the following.
I'd like to mine a small number of twitter accounts to identify their output for keyword usage. (i.e. I don't know what the keywords are yet)
Assumptions:
I have a small number of tweeter accounts (<6) I want to mine with a max of 7000 tweets if you aggregate the various account statuses
Those accounts are not generating new tweets at a fast rate (a few a
day)
The accounts all have less than 3200 tweets according to the profile data returned by lookupUsers()
When I use the twitteR function userTimeline("accountname", n=3200) I get between 40 and 600 observations returned i.e no where near the 3200. I know there are API limits but if it was an issue of limits I would expect to get the same number of observations back or get the notice that I need to wait 15 mins
How do I get all the text I need while still playing nice ?
By using a combination of cran and github packages it was possible to get all the tweets for a user
The packages used were streamR available in cran and https://github.com/SMAPPNYU/smappR/ to help with the analysis and getting the tweets.
The basic steps are
Authenticate to twitter using oauth and your twitter keys, tokens and secrets
use smappR function getTimeline() which saves the tweets to a json file you specify
Use parseTweets(jsonfile) to read the json contents into a dataframe
This can be accomplished with rtweet package, which is still supported. First you need to be approved as a developer and create an app. (As a note, twitter has now changed their policies, and approval can take a while. It took me almost a week.)
After that, just use get_timeline() to get all of the tweets from a timeline, up to 3200.
djt <- get_timeline("adamgreatkind", n = 3200)

Is it ok to scrape data from Google results? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I'd like to fetch results from Google using curl to detect potential duplicate content.
Is there a high risk of being banned by Google?
Google disallows automated access in their TOS, so if you accept their terms you would break them.
That said, I know of no lawsuit from Google against a scraper.
Even Microsoft scraped Google, they powered their search engine Bing with it. They got caught in 2011 red handed :)
There are two options to scrape Google results:
1) Use their API
UPDATE 2020: Google has reprecated previous APIs (again) and has new
prices and new limits. Now
(https://developers.google.com/custom-search/v1/overview) you can
query up to 10k results per day at 1,500 USD per month, more than that
is not permitted and the results are not what they display in normal
searches.
You can issue around 40 requests per hour You are limited to what
they give you, it's not really useful if you want to track ranking
positions or what a real user would see. That's something you are not
allowed to gather.
If you want a higher amount of API requests you need to pay.
60 requests per hour cost 2000 USD per year, more queries require a
custom deal.
2) Scrape the normal result pages
Here comes the tricky part. It is possible to scrape the normal result pages.
Google does not allow it.
If you scrape at a rate higher than 8 (updated from 15) keyword requests per hour you risk detection, higher than 10/h (updated from 20) will get you blocked from my experience.
By using multiple IPs you can up the rate, so with 100 IP addresses you can scrape up to 1000 requests per hour. (24k a day) (updated)
There is an open source search engine scraper written in PHP at http://scraping.compunect.com
It allows to reliable scrape Google, parses the results properly and manages IP addresses, delays, etc.
So if you can use PHP it's a nice kickstart, otherwise the code will still be useful to learn how it is done.
3) Alternatively use a scraping service (updated)
Recently a customer of mine had a huge search engine scraping requirement but it was not 'ongoing', it's more like one huge refresh per month.
In this case I could not find a self-made solution that's 'economic'.
I used the service at http://scraping.services instead.
They also provide open source code and so far it's running well (several thousand resultpages per hour during the refreshes)
The downside is that such a service means that your solution is "bound" to one professional supplier, the upside is that it was a lot cheaper than the other options I evaluated (and faster in our case)
One option to reduce the dependency on one company is to make two approaches at the same time. Using the scraping service as primary source of data and falling back to a proxy based solution like described at 2) when required.
Google will eventually block your IP when you exceed a certain amount of requests.
Google thrives on scraping websites of the world...so if it was "so illegal" then even Google won't survive ..of course other answers mention ways of mitigating IP blocks by Google. One more way to explore avoiding captcha could be scraping at random times (dint try) ..Moreover, I have a feeling, that if we provide novelty or some significant processing of data then it sounds fine at least to me...if we are simply copying a website.. or hampering its business/brand in some way...then it is bad and should be avoided..on top of it all...if you are a startup then no one will fight you as there is no benefit.. but if your entire premise is on scraping even when you are funded then you should think of more sophisticated ways...alternative APIs..eventually..Also Google keeps releasing (or depricating) fields for its API so what you want to scrap now may be in roadmap of new Google API releases..

Resources