I'm using the R package rtweet to stream live tweets.
Everything is ok, but what I want is to automatically store the information in Google Big Query and display it on Data Studio, and that information should be updated each X time (for example, 5 minutes).
How can I do it? The problem is that while sreaming, the R session is busy, so I can't do anything else.
I would also consider stopping the streaming for a second to store the information and resume it after...
Here is my code:
library(rtweet)
library(bigrquery)
token <- create_token(
app = "app name",
consumer_key = "consumer_key ",
consumer_secret = "consumer_secret ",
acess_token = "acess_token",
access_secret = "access_secret")
palabras <- ""
streamtime <- 2 * 60
rt <- stream_tweets(q = palabras, timeout = streamtime)
#This is what I want to do each X time to store the information in Big Query:
insert_upload_job("project id", "dataset name", "table name", df, write_disposition = "WRITE_APPEND")
Thanks to all,
I don't know much about R, but I had a similar case and there's nothing to do meanwhile stream_tweets() is running but wait the timeout.
I'm not sure if this is possible but, stream_tweets() creates a JSON object that is being filled meanwhile the function is running. Wouldn't be possible to run other R script in which when new item is added to the JSON store it to Big Query? Something like, split your code in two and run it parallely?
Hope my answer give you some ideas.
Related
I'm trying to use the rtweet-package to download some tweets from a certain hashtag. I've used a guide from a place called OpenCodez, and I've run into problems
Using the "search_tweets" function of the rtweet-package, I'm not able to download more than 5 tweets, while the limit of rtweet should be around 18.000 tweets.
I don't get any errors, but the "Downloading"-graphic when running my script simply stops at 10% (when trying to download n=2000).
I've tried using the "retryonratelimit=TRUE" without luck. I've reset my script, tried different tutorials to establish a connection - which all work fine - up until I'm actually using the search_tweets-function.
So this is my code to connect to the API:
api_key <- "xxxx"
api_secret_key <- "xxxx"
access_token <- "xxxx"
access_token_secret <- "xxxx"
## authenticate via web browser
token <- create_token(
app = "xxxx",
consumer_key = api_key,
consumer_secret = api_secret_key,
access_token = access_token,
access_secret = access_token_secret)
And this is my "scraper":
my_tweets = search_tweets("#vmd19", n=2000, lang='en')
The resulting data-frame is simply 5 columns, which is odd, when there should be at least a couple of hundred tweets under the hashtag. I've tried different queries (hashtags etc.), without luck. The download stops looking like this:
Downloading [===>-------------------------------------] 10%
I cannot figure out what I'm doing wrong. Hopefully, someone can help me troubleshoot this!
This issue was addressed here: https://github.com/ropensci/rtweet/issues/364
It looks like it's because of the window from which you can gather tweets (about the last week). If the number of tweets available from that time window is less than the n in your search_tweets function, it will cut out before reaching 100%. So if you ask for 100 tweets with a certain term, and that term was only tweeted 7 times in the last week, it will stop downloading at 7%.
I'm having trouble accessing the Energy Information Administration's API through R (https://www.eia.gov/opendata/).
On my office computer, if I try the link in a browser it works, and the data shows up (the full url: https://api.eia.gov/series/?series_id=PET.MCREXUS1.M&api_key=e122a1411ca0ac941eb192ede51feebe&out=json).
I am also successfully connected to Bloomberg's API through R, so R is able to access the network.
Since the API is working and not blocked by my company's firewall, and R is in fact able to connect to the Internet, I have no clue what's going wrong.
The script works fine on my home computer, but at my office computer it is unsuccessful. So I gather it is a network issue, but if somebody could point me in any direction as to what the problem might be I would be grateful (my IT department couldn't help).
library(XML)
api.key = "e122a1411ca0ac941eb192ede51feebe"
series.id = "PET.MCREXUS1.M"
my.url = paste("http://api.eia.gov/series?series_id=", series.id,"&api_key=", api.key, "&out=xml", sep="")
doc = xmlParse(file=my.url, isURL=TRUE) # yields error
Error msg:
No such file or directoryfailed to load external entity "http://api.eia.gov/series?series_id=PET.MCREXUS1.M&api_key=e122a1411ca0ac941eb192ede51feebe&out=json"
Error: 1: No such file or directory2: failed to load external entity "http://api.eia.gov/series?series_id=PET.MCREXUS1.M&api_key=e122a1411ca0ac941eb192ede51feebe&out=json"
I tried some other methods like read_xml() from the xml2 package, but this gives a "could not resolve host" error.
To get XML, you need to change your url to XML:
my.url = paste("http://api.eia.gov/series?series_id=", series.id,"&api_key=",
api.key, "&out=xml", sep="")
res <- httr::GET(my.url)
xml2::read_xml(res)
Or :
res <- httr::GET(my.url)
XML::xmlParse(res)
Otherwise with the post as is(ie &out=json):
res <- httr::GET(my.url)
jsonlite::fromJSON(httr::content(res,"text"))
or this:
xml2::read_xml(httr::content(res,"text"))
Please note that this answer simply provides a way to get the data, whether it is in the desired form is opinion based and up to whoever is processing the data.
If it does not have to be XML output, you can also use the new eia package. (Disclaimer: I'm the author.)
Using your example:
remotes::install_github("leonawicz/eia")
library(eia)
x <- eia_series("PET.MCREXUS1.M")
This assumes your key is set globally (e.g., in .Renviron or previously in your R session with eia_set_key). But you can also pass it directly to the function call above by adding key = "yourkeyhere".
The result returned is a tidyverse-style data frame, one row per series ID and including a data list column that contains the data frame for each time series (can be unnested with tidyr::unnest if desired).
Alternatively, if you set the argument tidy = FALSE, it will return the list result of jsonlite::fromJSON without the "tidy" processing.
Finally, if you set tidy = NA, no processing is done at all and you get the original JSON string output for those who intend to pass the raw output to other canned code or software. The package does not provide XML output, however.
There are more comprehensive examples and vignettes at the eia package website I created.
I've successfully received an access token from an oauth2.0 request so that I can start obtaining some data from the server. However, I keep getting error 403 on each attempt. APIs are very new to me and I only am entry level in using R so I can't figure out whats wrong with my request. I'm using the crul package currently, but I've tried to make the request with the httr package as well, but I can't get anything through without encountering the 403 error. I have a shiny app which in the end I'd like to be able to refresh with data imported from this other application which actually stores data, but I want to try to pull data to my console locally first so I can understand the basic process of doing so. I will post some of my current attempts.
(x <- HttpClient$new(
url = 'https://us.castoredc.com',
opts = list( exceptions = FALSE),
headers = list())
)
res.token <- x$post('oauth/token',
body = list(client_id = "{id}",
client_secret = "{secret}",
grant_type = 'client_credentials'))
importantStuff <- jsonlite::fromJSON(res$parse("UTF-8"))
token <- paste("Bearer", importantStuff$access_token)
I obtain my token, but the following doesn't seem to work.###
I'm attempting to get the list of study codes so that I can call on them in
further requests to actually get data from a study.
res.studies <- x$get('/api/study',headers = list(Authorization =
token,client_id = "{id}",
client_secret = "{secret}",
grant_type = 'client_credentials'),
body = list(
content_type = 'application/json'))
Their support team gave me the above endpoint to access the content, but I get 403 so I think i'm not using my token correctly?
status: 403
access-control-allow-headers: Authorization
access-control-allow-methods: Get,Post,Options,Patch
I'm the CEO at Castor EDC and although its pretty cool to see a Castor EDC question on here, I apologize for the time you lost over trying to figure this out. Was our support team not able to provide more assistance?
Regardless, I have actually used our API quite a bit in R and we also have an amazing R Engineer in house if you need more help.
Reflecting on your answer, yes, you always need a Study ID to be able to do anything interesting with the API. One thing that could make your life A LOT easier is our R API wrapper, you can find that here: https://github.com/castoredc/castoRedc
With that you would:
remotes::install_github("castoredc/castoRedc")
library(castoRedc)
castor_api <- CastorData$new(key = Sys.getenv("CASTOR_KEY"),
secret = Sys.getenv("CASTOR_SECRET"),
base_url = "https://data.castoredc.com")
example_study_id <- studies[["study_id"]][1]
fields <- castor_api$getFields(example_study_id)
etc.
Hope that makes you life a lot easier in the future.
So, After some investigation, It turns out that you first have to make a request to obtain another id for each Castor study under your username. I will post some example code that worked finally.
req.studyinfo <- httr::GET(url = "us.castoredc.com/api/study"
,httr::add_headers(Authorization = token))
json <- httr::content(req.studyinfo,as = "text")
studies <- fromJSON(json)
Then, this will give you a list of your studies in Castor for which you can obtain the ID that you care about for your endpoints. It will be a list that contains a data frame containing this information.
you use the same format with whatever endpoint you like that is posted in their documentation to retrieve data. Thank you for your observations! I will leave this here in case anyone is employed to develop anything from data used in the Castor EDC. Their documentation was vague to me, so maybe it will help someone in the future.
Example for next step:
req.studydata <- httr::GET("us.castoredc.com/api/study/{study id obtained
from previous step}/data-point-
collection/study",,httr::add_headers(Authorization =
token))
json.data <- httr::content(req.studydata,as = "text")
data <- fromJSON(json.data)
This worked for me, I removed the Sys.getenv() part
library(castoRedc)
castor_api <- CastorData$new(key = "CASTOR_KEY",
secret = "CASTOR_SECRET",
base_url = "https://data.castoredc.com")
example_study_id <- studies[["study_id"]][1]
fields <- castor_api$getFields(example_study_id)
So I am currently working with a connecting to an Access database. I am able to get connected to the Access DB which is located on my local system. This is actually connected to a SharePoint list. I would love to automate the process handling this SharePoint list with an R and Access combo! What I want to be able to do actually pretty basic, I want to introduce new data via a .csv which is processed for the relevant content and then compared to the current Access DB and finally the new information uploaded from r to Access.
I've learned that you need to pair the bit version of your Windows OS, Office version, and R version. So I am x64 on all of the above. This allowed me to connect to the Access DB. You also need the 'Microsoft Access Database Engine 2016 Redistributable' which is essentially the driver for the connection.
So what I have so far is:
library(odbc)
library(DBI)
file_path <- "C:/user/Documents/R Projects/...pathtofile.../filename.accdb"
accdb_con <- dbConnect(drv = odbc(), .connection_string = paste0("Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=",file_path,";"))
access.db <- dbReadTable(accdb_con, "sNPS Deep Dives")
That now connects!
I then read in a .csv of new information
new.df <- read.csv("C:/user/Documents/R projects/...pathtofile.csv", header=T, stringsAsFactors=FALSE, na.strings=c("","NA"))
an example of the data set might just look something like this:
date <- c("15/10/2018","15/10/2018", "16/10/2018", "12/11/2018", "07/09/2018")
score <- c("6", "10", "7", "10", "9")
group <- c("a","b", "b", "a", "b")
CaseID <- c("301", "302", "303", "304", "305")
new.df <- data.frame(date,score,group,CaseID)
new.df$date <- as.character(new.df$date)
new.df$score <- as.numeric(new.df$score)
new.df$group <- as.character(new.df$group)
new.df$CaseID <- as.numeric(new.df$CaseID)
Notably there are more columns in the Access DB that people will fill in by hand with further information.
and I process it to be ready go into the Access DB.
probably not that interesting...
Then I compare the the new data against the Access DB as such:
library(dplyr)
new <- anti_join(new.df, access.db, by= "Case.ID")
Now I've tried:
dbWriteTable(access.db.copy, new, append = TRUE)
dbAppendTable(access.db.copy, new)
I don't seem to be able to get this to go anywhere
I am getting an error:
Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘dbWriteTable’ for signature ‘"ACCESS", "data.frame", "missing"’
I've seen plenty of posts in which people are having trouble connecting to an Access DB but I haven't seen anything about writing new data into that database.
I know this isn't quite a reproducible example but it seems like a difficult problem to recreate since it's a connection problem between different tools. I would be happy to provide example sets that might make this easier
I would appreciate any direction you all can provide.
Thanks!
Edit:
It appears that Bing Sun was right, I was missing an argument. So it appears that we need something more like:
dbWriteTable(access.db.copy, "Name of table",new, append = TRUE)
Which produces the error:
Error in result_insert_dataframe(rs#ptr, values) :
nanodbc/nanodbc.cpp:1944: HY104: [Microsoft][ODBC Microsoft Access Driver]Invalid precision value
I wonder if this may something that is an error from Access about a file type?
now if I use the append I don't get an error I get a 0 for output
dbAppendTable(access.db.copy, "Name of table", new, append= TRUE)
With output:
[1] 0
But I don't see any of the new values when I check the Access file.
I know it's years later, but hopefully this will help someone else with this issue since you're right CrayCrayTown, there aren't very many posts covering this issue.
I've run into this problem repeatedly when dealing with R and MS Access. The solution that I've come up with is pretty "hacky" but it accomplishes what's trying to be done...just not very eloquently.
The way I do this is with a combo of RODBC and DBI packages.
First, I open a connection to the DB with RODBC, and use that connection to write my data to the DB as an intermediary table:
chan <- RODBC::odbcDriverConnection(connection = "/path/to/database.accdb")
RODBC::sqlSave(channel = chan,
dat = df,
tablename = "tbl_intermediary",
rownames = FALSE,
append = FALSE)
RODBC::odbcClose(chan)
rm(chan)
Make sure to close the RODBC connection, I also destroy it for good measure, because why not? I use RODBC for the intermediary table because it supports batch insert statements. I know that the same thing can, in theory, be done with DBI with DBI::dbAppendTable()(but we wouldn't be on this post if that worked how we had hoped). I tried this in a previous SO question here, but it didn't solve my problem. I also don't know how big my intermediary tables could get in the future. Hopefully by the time they get too big we'll be in a different DBMS.
Next, I reopen the connection, this time with DBI, and send a statement to the DB to write those data from the intermediary table to the final resting place for those data, and then drop the intermediary table.
con <- DBI::dbConnect(odbc::odbc(), .connection_string = "/path/to/database.accdb")
DBI::dbSendStatement(
conn = con,
statement = 'UPDATE
tbl_intermediary INNER JOIN final_tbl ON tbl_intermediary.SampleID = final_tbl.sampleNumber
SET
final_tbl.field1 = [tbl_intermediary].[field1],
final_tbl.notes = IIf(Nz([tbl_intermediary].[Notes],"")="",[final_tbl].[notes],[final_tbl].[notes] & "; Newest Notes: " & [tbl_intermediary].[Notes]);'
)
DBI::dbSendStatement(
conn = con,
statement = 'DROP TABLE tbl_intermediary;'
DBI::dbDisconnect(con)
rm(con)
)
The main reason why I chose this method is because some of the SQL I use with Access also has some VBA in it. When I send the SQL-VBA hybrid string with RODBC, I get assorted errors in the IIF() and Nz() functions (see example above). From the RODBC CRAN docs the query argument for the sqlQuery() function is strictly assumed to be a valid SQL statement. So, RODBC has no clue how to interpret the IIf() and Nz() MS Access functions. I think this also has to do with how the ODBC driver handles communication as well (please, someone correct me if I'm wrong about this).
As I understand it, DBI::dbSendStatment() however lets the database engine you're working with interpret how to use the statement argument you provide. In the situation above, the VBA is executed exactly how I would expect if it were run in Access directly. As per the DBI docs, for interactive use you'll generally want to use dbGetQuery or dbExecute.
I would like to retrieve a list of tweets from Twitter for a given hashtag using package RJSONIO in R. I think I am pretty close to the solution, but I seem to miss one step.
My code reads as follows (in this example, I use #NBA as a hashtag):
library(httr)
library(RJSONIO)
# 1. Find OAuth settings for twitter:
# https://dev.twitter.com/docs/auth/oauth
oauth_endpoints("twitter")
# Replace key and secret below
myapp <- oauth_app("twitter",
key = "XXXXXXXXXXXXXXX",
secret = "YYYYYYYYYYYYYYYYY"
)
# 3. Get OAuth credentials
twitter_token <- oauth1.0_token(oauth_endpoints("twitter"), myapp)
# 4. Use API
req=GET("https://api.twitter.com/1.1/search/tweets.json?q=%23NBA&src=typd",
config(token = twitter_token))
req <- content(req, as = "text")
response=fromJSON(req)
How can I get the list of tweets from object 'response'?
Eventually, I would like to get something like:
searchTwitter("#NBA", n=5000, lang="en")
Thanks a lot in advance!
The response object should be a list of length two: statuses and metadata. So, for example, to get the text of the first tweet, try:
response$statuses[[1]]$text
However, there are a couple of R packages designed to make just this kind of thing easier: Try streamR for the streaming API, and twitteR for the REST API. The latter has a searchTwitter function exactly as you describe.