in sentiment analysis of twitter data, repeated retweets infulence the result? - r

I am doing sentiment analysis of twitter data in r. But have more repeated tweets in data. is it affect the result?
RT #Ananduvi: Will You Support #BharathBandh on Today against #demonetization ???
RT #Ananduvi: Will You Support #BharathBandh on Today against #demonetization ???
if yes! then how to deal with it.? i wanna remove that tweets from twitter dataset.
text<- gsub("(RT|via)((?:\\b\\W*#\\w+)+ )", "", text)
This code removes only name of person but tweet remains as it is.
I will be glad if you help me.

If you have tweets repeated, it will skew the analytics!
With the Twitter API you have Tweets returned in JSON format - you need to treat the "id" field (or better the "id_str" field) from the Tweet as the unique identifier and only select single instances of a given "id" in your analytics.
{"id": 123456789, "id_str": "123456789"}
If you make sure you only have one instance of each tweet keyed on the field above, you will avoid this problem.

Related

Why does Microsoft Text analytics API return neutral sentiment for a sentence with strong negative connotation?

I am trying to build app to analyze sentiment of survey data using Microsoft Analytics API. One of the survey response has Strong negative connotation
Personally, there is not a company in the world that needs this
product
but API returns a score of 50. What is the reason for this ?
I just tried on their page and it looks like it has changed and is reporting back a sentiment of 73%.
I think I know why, though. In the "key phrases" the word "not" wasn't picked up. Looking at the stop words from nltk, it seems that "not" is a stop word.
from nltk.corpus import stopwords
stop_words = stopwords.words("english")
[word for word in stop_words if word == "not"] # Returns ['not']
Since "not" may have been removed as a stop word there is no negativity to give a negative sentiment.

R Twitter Mining: Finding Frequency of Mentions/Hastags

I'm very new with R. I'm using it mostly for marketing purposes so the TwitteR package is very useful.
What I'm trying to do is find the frequency of #mentions and #hashtags within my data after I've got all the data through the searchTwitter command.
I'm not sure what type of vector it turns it into right away and if I need to convert it into a data.frame or another type of vector or a corpus or what
How do I break the data down into total number of mentions/hashtags and what the frequency of each #mention or #hashtag is?
This will give me a good idea of who the key influencers and key hashtags in a specific market are and how valueable those influencers/hashtags are.
Please help.
Thanks

Association Analysis using Tweets/twittR package

I'm new to R and was wondering if is possible using R,
to get a list of users who tweet using the word cats for example and then
go through their timeline and see did they tweet using the word dogs for example.
I have managed using the twitteR package to get a list of user names and their tweets and put them into a dataframe. I just don't know how to go about doing the rest or if it is even possible.
Any help at all would be greatly appreciated!
John I am not sure if I understand correctly what you are trying to achieve. But I am assuming that the dataframe also contains a time stamp of the tweet. If that is the case, then you can group by the user and arrange in ascending order as per the timestamp. Thereafter you could use grepl() for 'dogs' or any other word you are searching for.

How to search tweets efficiently with R (popular tweets first)?

With the package twitteR, it is possible to search tweets as follows:
tweets <- searchTwitter("term", n=100,lang="en",resultType="recent",
since="2016-06-10", until="2016-06-26")
When the resultType="recent" we can get the big number of tweets, but they are ranked with created time, so we begin with a lot of 2016-06-25 23:59:59.
I wanted to search for popular tweets first, so I use resultType="popular" :
tweets <- searchTwitter("term", n=100,lang="en",resultType="popular",
since="2016-06-10", until="2016-06-26")
But then I got this warning :
Warning message:
In doRppAPICall("search/tweets", n, params = params, retryOnRateLimit = retryOnRateLimit, :
100 tweets were requested but the API can only return 93
I understand that Twitter limit the resquests, but since they can return 100 tweets in the order of created time, I hoped that I could get the same number of tweets in the order of popularity. Apparently it is not true.
Or maybe I didn't use the function in right way.
So I would like find a way to search tweets efficiently:
How to get more popular tweets, in a day ?
How to specify the an hour for the search, for example 10am ? so that they are not tweeted at 2016-06-25 23:59:59, which can have a bias.
Maybe we have to pay, in order to get more tweets and more information ? For example, I noticed that my tweets are never geocoded.
Usually I save them in a data.frame, after that play with # of RT's, etc... I don't think you can do it directly. Hope it helps.
Don't believe Twitter will return the most popular Tweets in order. Either most recent or popular (however Twitter determines it) tweets are returned. Since Twitter only returned 93 Tweets, I'd suggest you try broadening your search terms and then looking at number of favorites, retweets, replies, etc. for each tweet.

searchTwitter - search tweet but not handle

Trying to dabble in doing some basic sentiment analysis using twitteR library and searchTwitter function. Say I'm searching for tweets specific to "Samsung". I can retrieve the tweets with the below command:
samsung_t = searchTwitter("#samsung", n=1500, lang="en",cainfo="cacert.pem")
This I know will return all the tweets containing the hash-tag #samsung. However, if I wanted to search for tweets containing "samsung" in them: I give the same command but without the "#"
samsung_t = searchTwitter("samsung", n=1500, lang="en",cainfo="cacert.pem")
This however will return all the tweets containing the term "samsung" in them including the handle. For example: it will return a tweet: "#I_Love_Samsung: I like R programming", which is completely irrelevant to my criteria. If I wanted to do a sentiment analysis on say, "Samsung phones", I'm afraid that data like this can skew the results.
Is there a way I can force searchTwitter to only look in the "Tweet" but not the "Handle"?
Thanks a lot in advance.
Looking at the search API documentation and the listing of available search operators, I don't think the twitter search API offers this specific search capability (which seems kind of strange, frankly). I think your best bet is probably to run your search with the tools available to you and filter out the tweets that don't match your criteria from the results you get back.

Resources