For this query using R’s twitterR::searchTwitter:
search_t <- searchTwitter("#netanyahu", n = 1000, since = '2015-09-13')
df <- do.call("rbind", lapply(search_t, as.data.frame))
View(df[, c('text', 'created', 'favoriteCount', 'retweetCount', 'favorited', 'retweeted', 'isRetweet')])
… I get the following results:
What does favorited column mean? It obviously doesn’t mean that since tweet has been favorited, since it’s been done so 6 times. I also went on Twitter and favorited that particular tweet and then re-ran the query. It still shows FALSE.
From the twitter api overview: Object tweets link here
Favorited: Nullable. Perspectival. Indicates whether this Tweet has been favorited by the authenticating user.
Related
I am trying to use search_fullarchive from the rtweet package on sandbox PREMIUM with these exact search operators park OR parks, lang:en and point_radius:[51.5047 0.1278 25mi]. I have tried the following
test2 <- search_fullarchive(q = "park OR parks lang:en point_radius:[51.5074 0.1278 25mi]", n = 100, fromDate = "202003150000", toDate = "202003172359", env = "research", parse = TRUE, token = ActiveTravel_token)
The returned test2 object is a tbl_df filtered only by park OR parks. I've checked here and as a sandbox PREMIUM user I should be able to filter by lang: and point_radius:
Could someone please help me get the filtering to also match the other two operators lang:en and point_radius:[51.5047 0.1278 25mi].
Thanks in advance!
Best wishes,
Irena
This should be as simple as wrapping the text in parentheses, with the whitespace acting as a logical AND for the other fields.
q = "(park OR parks) lang:en point_radius:[51.5074 0.1278 25mi]"
However, I've just tried this search and at the moment, it returns zero Tweets within that point radius over that date range. I substituted in another point radius (the Boulder, CO example from the Twitter API documentation, point_radius:[-105.27346517 40.01924738 10.0mi], and it successfully brought back Tweets that matched the search parameters.
As to finding very few tweets. The point radius-operator will only return tweets that were geotagged manually by the user at the time of the tweet, and then only within a small area of maximum 25 miles. Only a small fraction of tweets are geo-tagged. You will probably have more luck with the place: operator. It will also return tweets by people who have the "place" you search for, set in their profile.
I need to get tweets that contain at least of the following hashtags: #EUwahl #Euwahlen #Europawahl #Europawahlen. This means, I am looking for tweets containing at least one of those hashtags but they can also contain more of them. Furthermore, in each of these tweets one out of seven specific user (eg #AfD) must be mentioned as well in the tweet.
So far I only know how to search Twitter for one hashtag only or several ones. Meaning, I am familiar with the operator and using a + between the hashtags but not with the operator for or.
This is an example of a code I have used so far to do any searches in Twitter:
euelection <- searchTwitter("#EUwahl", n=1000, since = "2019-05-01",until = "2019-05-26")
I can install twitteR but it requires some authentication key which is not very easy for me to get.
The principle is to search using OR with space in between. I provide you an example with rtweet
library(rtweet)
# your tags
TAGS = c("#EUwahl","#Europawahl")
# make the search term
SEARCH = paste(TAGS,collapse=" OR ")
# do the search
# you can also use twitteR
test <- search_tweets(SEARCH, n=100)
# your found tweet text
head(test$text)
## check which tweet contains which tag
tab = sapply(TAGS,function(i)as.numeric(grepl(i,test$text,ignore.case=T)))
# all of them contain either #EUwahl or #Europawahl
donald_tweets <- searchTwitter("Donald + Trump Republicans exclude:retweets",
n=50, lang = "en", since = "2016-03-16", until = "2016-03-17")
donald_tweets
But this gives me error.
Warning message:
In doRppAPICall("search/tweets", n, params = params, retryOnRateLimit = retryOnRateLimit
50 tweets were requested but the API can only return 0
and somewhere I have seen that this is the problem with since and until that these since and until search for fewer days. As it is 2018 not 2016. But what can I do in this regard? Please help! This is the project in R.
The Twitter Search Documentation contains two useful pieces of information.
It says that until:
Keep in mind that the search index has a 7-day limit. In other words, no tweets will be found for a date older than one week.
It also shows that there is no since parameter. It is since_id:
Returns results with an ID greater than (that is, more recent than) the specified ID. There are limits to the number of Tweets which can be accessed through the API. If the limit of Tweets has occured since the since_id, the since_id will be forced to the oldest ID available.
So there are the two errors in your code. You cannot search for anything older than a week. If you want to use a "since" parameter, you have to give it an ID, not a date.
After over a year struggling to no avail, I'm turning the SO community for help. I've used various RegEx creator sites, standalone RegEx creator software as well as manual editing all in a futile attempt to create a pattern to parse and extract dynamic data from the below e-mail samples (sanitized to protect the innocent):
Action to Take: Buy shares of Facebook (Nasdaq: FB) at market. Use a 20% trailing stop to protect yourself. ...
Action to Take: Buy Google (Nasdaq: GOOG) at $42.34 or lower. If the stock is above $42.34, don't chase it. Wait for it to come down. Place a stop at $35.75. ...
***Action to Take***
Buy International Business Machines (NYSE: IBM) at market. And use a protective stop at $51. ...
What needs to be parsed is both forms of "Action to Take" sections and the resulting extracted data must include the direction (i.e. buy or sell, but just concerned about buys here), the ticker, the limit price (if applicable) and the stop value as either a percentage or number (if applicable). Sometimes there's also multiple "Action to Take"'s in a single e-mail as well.
Here's examples of what the pattern should not match (or ideally be flexible enough to deal with):
Action to Take: Sell half of your Apple (NYSE: AAPL) April $46 calls for $15.25 or higher. If the spread between the bid and the ask is $0.20 or more, place your order between the bid and the ask - even if the bid is higher than $15.25.
Action to Take: Raise your stop on Apple (NYSE: AAPL) to $75.15.
Action to Take: Sell one-quarter of your Facebook (Nasdaq: FB) position at market. ...
Here's my R code with the latest Perl pattern (to be able to use lookaround in R) that I came up with that sort of works, but not consistently or over multiple saved e-mails:
library(httr)
library("stringr")
filenames <- list.files("R:/TBIRD", pattern="*.eml", full.names=TRUE)
parse <- function(input)
{
text <- readLines(input, warn = FALSE)
text <- paste(text, collapse = "")
trim <- regmatches(text, regexpr("Content-Type: text/plain.*Content-Type: text/html", text, perl=TRUE))
pattern <- "(?is-)(?<=Action to Take).*(?i-s)(Buy|Sell).*(?:\\((?:NYSE|Nasdaq)\\:\\s(\\w+)\\)).*(?:for|at)\\s(\\$\\d*\\.\\d* or|market)\\s"
df <- str_match(text,pattern)
return(df)
}
list <- lapply(filenames, function(x){ parse(x) })
table <- do.call(rbind,list)
table <- data.frame(table)
table <- table[rowSums(is.na(table)) < 1, ]
table <- subset(table, select=c("X2","X3","X4"))
The parsing has to operate on the text copy because the HTML appears way too complicated to do so due to lack of standardization from e-mail to e-mail. Unfortunately, the text copy also commonly tends to have wrong line endings than regexp expects which greatly aggravates things.
Why can't I get the number of tweets I request when I use the userTimeline() function in the twitteR package ? I now the limit request for a user timeline is 3200 tweets but I just get about 10% of that...
Here are two examples :
In this example 'googledevs' account has only 2,000 tweets so I did ask for 1000 tweets and I still only got 106...
> library(twitteR)
> load('OAuth.RData')
> test <- userTimeline(user = 'googledevs', n=1000)
> length(test)
[1] 106
In this example 'FiveThirtyEight' has 5622 tweets. So I asked for 3200 and only got 317...
> library(twitteR)
> load('OAuth.RData')
> test2 <- userTimeline(user = 'FiveThirtyEight', n=3200)
> length(test2)
[1] 317
Can someone help me fix this ?
Thank you
You need to include the includeRts=TRUE argument in your userTimeline call. This should give you the max of the number of tweets on the user's timeline and 3200.
The Twitter API will only return tweets from the past week or so. See the documentation.
"The Search API is not complete index of all Tweets, but instead an index of recent Tweets. At the moment that index includes between 6-9 days of Tweets."
You need to include the includeRts=TRUE argument in your userTimeline call. This will return the minimum of 3200 posts and the total posts on the user's timeline.
I believe the problem is that while it is only retrieving a certain number of tweets it is going through the maximum number you set. It includes retweets and replies. If you set to true the include replies and retweets options you should retrieve the total number of tweets you set. A nice workaround this limit is "making" time windows using the tweets ID and the sinceID and maxID options.