Rfacebook: get reactions to posts - r

I want to use Rfacebook to get the reactions (not just likes) to specific posts but couldn't find a way to do that. Basically, I would want the same output for a comment as I get for a post:
> BBC <- getPage(page="bbcnews", token=fb_oauth, n=5, since="2017-10-03", until="2017-10-06", feed=FALSE, reactions=TRUE, verbose=TRUE)
5 posts > BBC
id likes_count from_id from_name
1 228735667216_10155178331342217 1602 228735667216 BBC News
2 228735667216_10155178840252217 7575 228735667216 BBC News
3 228735667216_10155178915482217 5735 228735667216 BBC News
4 228735667216_10155180617187217 6843 228735667216 BBC News
5 228735667216_1964396086910573 1736 228735667216 BBC News
message
1 "What did those people do to deserve that?" \n\nThis woman left the scene of the Las Vegas shooting just moments before it began.
2 Puerto Rico: President Donald J. Trump compares Hurricane Maria to a "real catastrophe like Katrina" bbc.in/2yG9gyZ
3 Do mass shootings ever change gun laws? http://bbc.in/2fIbjv0
4 "Boris asked me to give you this" - The moment comedian Lee Nelson interrupts Prime Minister Theresa May's speech.. by handing her a P45.
5 In her big conference speech, Theresa May talked about council houses and energy prices - but the announcements were overshadowed by a coughing fit and a protester. (Via BBC Politics)\nhttp://bbc.in/2fMCIw3
created_time type link story comments_count shares_count
1 2017-10-03T18:23:36+0000 video https://www.facebook.com/bbcnews/videos/10155178331342217/ NA 406 230
2 2017-10-03T21:34:21+0000 video https://www.facebook.com/bbcnews/videos/10155178840252217/ NA 14722 12284
3 2017-10-03T21:56:01+0000 video https://www.facebook.com/bbcnews/videos/10155178915482217/ NA 3059 2418
4 2017-10-04T11:17:28+0000 video https://www.facebook.com/bbcnews/videos/10155180617187217/ NA 1737 2973
5 2017-10-04T17:16:33+0000 video https://www.facebook.com/bbcnews/videos/1964396086910573/ NA 636 238
love_count haha_count wow_count sad_count angry_count
1 125 16 18 1063 20
2 318 1155 5023 1072 23698
3 104 69 61 980 504
4 513 4127 76 10 80
5 83 467 24 11 21
Now, I want for the first 5 comments of the first post to also have an output like above. I get all of it except the reactions (corresponding to the columns love_count, haha_count, wow_count, sad_count, angry_count) by using the following code:
> BBC_post <- getPost(BBC$id[1], token=fb_oauth, comments=TRUE, n.comments=5, likes=FALSE, reactions=FALSE)
> BBC_post
$post
from_id from_name
1 228735667216 BBC News
message
1 "What did those people do to deserve that?" \n\nThis woman left the scene of the Las Vegas shooting just moments before it began.
created_time type link id
1 2017-10-03T18:23:36+0000 video https://www.facebook.com/bbcnews/videos/10155178331342217/ 228735667216_10155178331342217
likes_count comments_count shares_count
1 1602 406 230
$comments
from_id from_name
1 880124212162441 David Bourton
2 10159595379610445 Valerie Gregory
3 10159810965680122 Nadeem Hussain
4 1657693134252376 Samir Amghar
5 10215327133878123 Shlomo Resnikov
message
1 It's unfathomable to the rest of the world that there are so many people who believe the killer's right to their guns are greater than their victims right to life.
2 That's backwards. The victims didn't do anything. The NRA, the politicians who are bought and paid for by them, including President Trump, and the shooter did. That is where solving the problem begins.
3 BBC ask Israel the same Question... what did the Palestinians civilians do to deserve an Apartheid regime !!!
4 Praying and thinking of the victims will not prevent the next shooting. One failed attempt at a shoe bomb and we all take off our shoes at the airport. 274 Mass shootings since January and no change in your regulation of guns.
5 As a Jew , we constantly ask those kind of questions regarding to the holocaust ,”where was god in the holocaust “? Or “How did he allow this horror”? And the answer that facilitates the most is mysterious ways of god are beyond our perception ,we cannot grasp divine calculation .
created_time likes_count comments_count id
1 2017-10-03T18:25:58+0000 225 71 10155178331342217_10155178338952217
2 2017-10-03T18:29:04+0000 79 45 10155178331342217_10155178346307217
3 2017-10-03T18:28:34+0000 60 38 10155178331342217_10155178345382217
4 2017-10-03T18:32:11+0000 37 3 10155178331342217_10155178354272217
5 2017-10-03T18:44:19+0000 16 20 10155178331342217_10155178380902217
### how do I also display the REACTIONS a comment got? It is not "reactions=TRUE" since that will display the reactions to the post itself and not the comment of the post
Does anyone know how to get there? Or does Rfacebook simply not allow for that (yet) since the feature of 'reacting to comments' was introduced not too long ago?
Many thanks in advance and all the best,
Ivo

Related

Using spacyr for named entity recognition - inconsistent results

I plan to use the spacyr R library to perform named entity recognition across several news articles (spacyr is an R wrapper for the Python spaCy package). My goal is to identify partners for network analysis automatically. However, spacyr is not recognising common entities as expected. Here is sample code to illustrate my issue:
library(quanteda)
library(spacyr)
text <- data.frame(doc_id = c(1:5),
sentence = c("Brightmark LLC, the global waste solutions provider, and Florida Keys National Marine Sanctuary (FKNMS), today announced a new plastic recycling partnership that will reduce landfill waste and amplify concerns about ocean plastics.",
"Brightmark is launching a nationwide site search for U.S. locations suitable for its next set of advanced recycling facilities, which will convert hundreds of thousands of tons of post-consumer plastics into new products, including fuels, wax, and other products.",
"Brightmark will be constructing the facility in partnership with the NSW government, as part of its commitment to drive economic growth and prosperity in regional NSW.",
"Macon-Bibb County, the Macon-Bibb County Industrial Authority, and Brightmark have mutually agreed to end discussions around building a plastic recycling plant in Macon",
"Global petrochemical company SK Global Chemical and waste solutions provider Brightmark have signed a memorandum of understanding to create a partnership that aims to take the lead in the circular economy of plastic by construction of a commercial scale plastics renewal plant in South Korea"))
corpus <- corpus(text, text_field = "sentence")
spacy_initialize(model = "en_core_web_sm")
parsed <- spacy_parse(corpus)
entity <- entity_extract(parsed)
I expect the company "Brightmark" to be recognised in all 5 sentences. However this is what I get:
entity
doc_id sentence_id entity entity_type
1 1 1 Florida_Keys_National_Marine_Sanctuary ORG
2 1 1 FKNMS ORG
3 2 1 U.S. GPE
4 3 1 NSW ORG
5 4 1 Macon_-_Bibb_County ORG
6 4 1 Brightmark ORG
7 4 1 Macon GPE
8 5 1 SK_Global_Chemical ORG
9 5 1 South_Korea GPE
"Brightmark" only appears as an ORG entity type in the 4th sentence (doc_id refers to sentence number). It should show up in all the sentences. The "NSW Government" does not appear at all.
I am still figuring out spaCy and spacyr. Perhaps someone can advise me why this is happening and what steps I should take to remedy this issue. Thanks in advance.
I changed the model and achieved better results:
spacy_initialize(model = "en_core_web_trf")
parsed <- spacy_parse(corpus)
entity <- entity_extract(parsed)
entity
doc_id sentence_id entity entity_type
1 1 1 Brightmark_LLC ORG
2 1 1 Florida_Keys GPE
3 1 1 FKNMS ORG
4 2 1 Brightmark ORG
5 2 1 U.S. GPE
6 3 1 Brightmark ORG
7 3 1 NSW GPE
8 3 1 NSW GPE
9 4 1 Macon_-_Bibb_County GPE
10 4 1 the_Macon_-_Bibb_County_Industrial_Authority ORG
11 4 1 Brightmark ORG
12 4 1 Macon GPE
13 5 1 SK_Global_Chemical ORG
14 5 1 Brightmark ORG
15 5 1 South_Korea GPE
The only downside is that NSW Government and Florida Keys National Marine Sanctuary are not resolved. I also get this warning: UserWarning: User provided device_type of 'cuda', but CUDA is not available.

Match and count total words from an external list with text strings (tweets) in r

I am attempting to conduct emotional sentiment analysis of a large corpus of Tweets (91k) with an external list of emotionally-charged words (from the NRC Emotion Lexicon). To do this, I want to run a count and sum the total number of times any word from the words of joy list is contained within each Tweet. Ideally, this would not be a partial match of the word and not exact match. I would like for the total total to show in a new column in the df.
The df and column name for the Tweets are Tweets_with_Emotions$full_text and the list is Words_of_joy$word.
Example 1
> head(Tweets_with_Emotions, n=10)
ID Date full_text
1 58150 2012-09-12 I love an excellent cookie
2 12357 2012-09-28 Oranges are delicious and excellent
3 50788 2012-10-04 Eager to visit Disneyland
4 66038 2012-10-11 I wish my boyfriend would propose already
5 18119 2012-10-11 Love Maggie Smith
6 48349 2012-10-14 The movie was excellent, loved it.
7 23328 2012-10-16 Pineapples are so delicious and excellent
8 66038 2012-10-26 Eager to see the Champions Cup next week
9 32717 2012-10-28 Hating this show
10 11345 2012-11-08 Eager for the food
Example 2
> > head(words_of_joy, n=5)
word
1 eager
2 champion
3 delicious
4 excellent
5 love
Desired output
> head(New_df, n=10)
ID Date full_text joy_count
1 58150 2012-09-12 I love an excellent cookie 2
2 12357 2012-09-28 Oranges are delicious and excellent 2
3 50788 2012-10-04 Eager to visit Disneyland 1
4 66038 2012-10-11 I wish my boyfriend would propose already 0
5 18119 2012-10-11 Love Maggie Smith 1
6 48349 2012-10-14 The movie was excellent, loved it. 2
7 23328 2012-10-16 Pineapples are so delicious and excellent 2
8 66038 2012-10-26 Eager to see the Champions Cup next week 2
9 32717 2012-10-28 Hating this show 0
10 11345 2012-11-08 Eager for the food 1
I've effectively run the emotion list through the Tweets so that it returns a yes or no as to whether any words from the emotion list are contained within the Tweets (no = 0, yes = 1), however I cannot figure out how to count and return the totals in a new column
new_df <- Tweets_with_Emotions[stringr::str_detect(Tweets_with_Emotions$full_text, paste(Words_of_negative$words,collapse = '|')),]
I'm extremely new to R (and stackoverflow!) and have been struggling to figure this out for a few days so any help would be incredibly appreciated!

Variable selection using regsubset in R

I'm working on a Tweets Project and I extracted 87 variables, now i need to perform variable selection method so i used forward subset selection. But i'm facing an error.
regfit.fwd = regsubsets(screen_name ~.,merge_tweets,method = "forward",
complete.cases(merge_tweets),nvmax = 15)
Error in leaps.setup(x[, ii[reorder], drop = FALSE], y, wt,
force.in[reorder], : NA/NaN/Inf in foreign function call (arg 4)
> head(merge_tweets)
X user_id status_id created_at screen_name
1 1 1339835893 1.090257e+18 1548772454 HillaryClinton
2 2 1339835893 1.090002e+18 1548711688 HillaryClinton
3 3 1339835893 1.089999e+18 1548710912 HillaryClinton
4 4 1339835893 1.089994e+18 1548709837 HillaryClinton
5 5 1339835893 1.089994e+18 1548709756 HillaryClinton
6 6 1339835893 1.089994e+18 1548709738 HillaryClinton
text
1 On
top of human suffering and lasting damage to our national parks, the Trump
shutdown cost the economy $11 billion. End shutdowns as a political hostage-
taking tactic.
2 Hurricane Maria decimated trees and ecosystems in Puerto Rico. Para La
Naturaleza's nurseries have made a CGI commitment to plant 750,000 trees in
seven years. The team here has already grown 120,000 seedlings and planted
30,000 trees. source display_text_width is_quote is_retweet favorite_count
retweet_count lang
1 Twitter Web Client 192 FALSE FALSE 14324
4168 en
2 Twitter Web Client 235 FALSE FALSE 10684
2526 en
3 Twitter Web Client 238 FALSE FALSE 11423
2089 en
4 Twitter Web Client 34 FALSE FALSE 1293
113 en
5 Twitter Web Client 222 FALSE FALSE 6641
951 en
6 Twitter Web Client 214 FALSE FALSE 12192
2108 en
status_url name
location
Hillary
Clinton New York, NY
Hillary
Clinton New York, NY
description
1 2016 Democratic Nominee, SecState, Senator, hair icon. Mom, Wife, Grandma
x2, lawyer, advocate, fan of walks in the woods & standing up for our
democracy.
2 2016 Democratic Nominee, SecState, Senator, hair icon. Mom, Wife, Grandma
x2, lawyer, advocate, fan of walks in the woods & standing up for our
democracy.
url protected followers_count friends_count listed_count
statuses_count
1 FALSE 24017203 784
41782 10667
2 FALSE 24017203 784
41782 10667
3 FALSE 24017203 784
41782 10667
favourites_count account_created_at verified profile_url
profile_expanded_url
1 1138 1365530675 TRUE
2 1138 1365530675 TRUE
3 1138 1365530675 TRUE
I have removed some url columns as it doesn't support url to be posted. It would be great if anyone can help me out in solving this problem.
Thanks in advance!!

R Extract names from text

I'm trying to extract a list of rugby players names from a string. The string contains all of the information from a table, containing the headers (team names) as well as the name of the player in each position for each team. It also has the player ranking but I don't care about that.
Important - a lot of player rankings are missing. I found a solution to this however doesn't handle missing rankings (for example below Rabah Slimani is the first player not to have a ranking recorded).
Note, the 1-15 numbers indicate positions, and there's always two names following each position (home player and away player).
Here's the sample string:
" Team Sheets # FRA France RPI IRE Ireland RPI 1 Jefferson Poirot 72 Cian Healy 82 2 Guilhem Guirado 78 Rory Best 85 3 Rabah Slimani Tadhg Furlong 85 4 Arthur Iturria 82 Iain Henderson 84 5 Sebastien Vahaamahina 84 James Ryan 92 6 Wenceslas Lauret 82 Peter O'Mahony 93 7 Yacouba Camara 70 Josh van der Flier 64 8 Kevin Gourdon CJ Stander 91 9 Maxime Machenaud Conor Murray 87 10 Matthieu Jalibert Johnny Sexton 90 11 Virimi Vakatawa Jacob Stockdale 89 12 Henry Chavancy Bundee Aki 83 13 Rémi Lamerat Robbie Henshaw 78 14 Teddy Thomas Keith Earls 89 15 Geoffrey Palis Rob Kearney 80 Substitutes # FRA France RPI IRE Ireland RPI 16 Adrien Pelissie Sean Cronin 84 17 Dany Priso 70 Jack McGrath 70 18 Cedate Gomes Sa 71 John Ryan 86 19 Paul Gabrillagues 77 Devin Toner 90 20 Marco Tauleigne Dan Leavy 80 21 Antoine Dupont 92 Luke McGrath 22 Anthony Belleau 65 Joey Carbery 86 23 Benjamin Fall Fergus McFadden "
Note - it comes from here: https://www.rugbypass.com/live/six-nations/france-vs-ireland-at-stade-de-france-on-03022018/2018/info/
So basically what I want is just the list of names with the team names as the headers e.g.
France Ireland
Jefferson Poirot Cian Healy
Guilhem Guirado Rory Best
... ...
Any help would be much appreciated!
I tried this on an advanced notepad editor and tried to find occurrences of 2 consecutive numbers and replaced those with a new line. the ReGex is
\d+\s+\d+
Once you are done replacing, you will be left with 2 names in each line separated by a number. Then use the below ReGex to replace that number with a single tab
\s+\d+\s+
Hope that helps

Fuzzy string matching in r

I have 2 datasets with more than 100K rows each. I would like to merge them based on fuzzy string matching one column('movie title') as well as using release date. I am providing a sample from both datasets below.
dataset-1
itemid userid rating time title release_date
99991 1673 835 3 1998-03-27 mirage 1995
99992 1674 840 4 1998-03-29 mamma roma 1962
99993 1675 851 3 1998-01-08 sunchaser, the 1996
99994 1676 851 2 1997-10-01 war at home, the 1996
99995 1677 854 3 1997-12-22 sweet nothing 1995
99996 1678 863 1 1998-03-07 mat' i syn 1997
99997 1679 863 3 1998-03-07 b. monkey 1998
99998 1680 863 2 1998-03-07 sliding doors 1998
99999 1681 896 3 1998-02-11 you so crazy 1994
100000 1682 916 3 1997-11-29 scream of stone (schrei aus stein) 1991
dataset - 2
itemid userid rating time title release_date
1 2844 4477 3 2013-03-09 fantã´mas - 〠l'ombre de la guillotine 1913
2 4936 8871 4 2013-05-05 the bank 1915
3 4936 11628 3 2013-07-06 the bank 1915
4 4972 16885 4 2013-08-19 the birth of a nation 1915
5 5078 11628 2 2013-08-23 the cheat 1915
6 6684 4222 3 2013-08-24 the fireman 1916
7 6689 4222 3 2013-08-24 the floorwalker 1916
8 7264 2092 4 2013-03-17 the rink 1916
9 7264 5943 3 2013-05-12 the rink 1916
10 7880 11628 4 2013-07-19 easy street 1917
I have looked at 'agrep' but it only matches one string at a time. The 'stringdist' function is good but you need to run it in a loop, find the minimum distance and then go onto further precessing which is very time consuming given the size of the datasets. The strings can have typo's and special characters due to which fuzzy matching is required. I have looked around and found 'Lenenshtein' and 'Jaro-Winkler' methods. The later I read is good for when you have typo's in strings.
In this scenario, only fuzzy matching may not provide good results e.g., A movie title 'toy story' in one dataset can be matched to 'toy story 2' in the other which is not right. So I need to consider the release date to make sure the movies that are matched are unique.
I want to know if there is a way to achieve this task without using a loop? worse case scenario if I have to use a loop, how can I make it work efficiently and as fast as possible.
I have tried the following code but it has taken an awful amount of time to process.
for(i in 1:nrow(test))
for(j in 1:nrow(test1))
{
test$title.match <- ifelse(jarowinkler(test$x[i], test1$x[j]) > 0.85,
test$title, NA)
}
test - contains 1682 unique movie names converted to lower case
test1 - contains 11451 unique movie names converted to lower case
Is there a way to avoid the for loops and make it work faster?
What about this approach to move you forward? You can adjust the degree of match from 0.85 after you see the results. You could then use dplyr to group by the matched title and summarise by subtracting release dates. Any zeros would mean the same release date.
dataset-1$title.match <- ifelse(jarowinkler(dataset-1$title, dataset_2$title) > 0.85, dataset-1$title, NA)

Resources