Extraction of post from Facebook using RFacebook package - r

I succeed getting the text of the post and share and likes count.
However, I am not able to get the like of the comments associated with the post . If this information is not avalaible , I would like to merge the like count of the post to each comments.
Example: A post gets 900 likes and 80 comments. I would like to associated the 900 likes values to each of the comments (a new column called post_like maybe).
I would like to use this information to perform a sentiment analysis using the number of likes (complexe like (i.e. haha, sad...)) in a logistic regression with the frequence of the most frequent words as the x variable.
Here is my script so far:
token<- "**ur token , get it at https://developers.facebook.com/tools/explorer/**"
# Function to download the comments
download.post <- function(i, refetch=FALSE, path=".") {
post <- getPost(post=fb_page$id[i], comments = TRUE, likes = TRUE, token=token)
post1<- as.data.frame(melt(post))
}
#----------------------- Request posts --- ALL
# Get post for ALL
fb_page<- getPage(page="**the page number u want**", token=token, since='2010/01/01', until='2016/01/01', n= 10000, reactions=TRUE)
fb_page$order <- 1:nrow(fb_page)
# Apply function to download comments
files<-data.frame(melt(lapply(fb_page$order, download.post)))
# Select only comments
files_c<-files[complete.cases(files$message),]
So basically I get the page with the post ID and create a function to get the post of the post ID on that page.
As you can see , I get all the information I need BESIDE the likes and share count.
I hope I am clear , thanks a lot for you help

It's all there:
library(Rfacebook)
token <- "#############" # https://developers.facebook.com/tools/explorer
fb_page <- getPage(page="europeanparliament", token=token, n = 3)
transform(
fb_page[,c("message", "likes_count", "comments_count", "shares_count")],
message = sapply(message, toString, width=30)
)
# message likes_count comments_count shares_count
# 1 This week members called o.... 92 73 21
# 2 Today we're all Irish, bea.... 673 133 71
# 3 European citizens will mee.... 1280 479 71
packageVersion("Rfacebook")
# [1] ‘0.6.12’

Related

R: Replace all Values that are not equal to a set of values

All.
I've been trying to solve a problem on a large data set for some time and could use some of your wisdom.
I have a DF (1.3M obs) with a column called customer along with 30 other columns. Let's say it contains multiple instances of customers Customer1 thru Customer3000. I know that I have issues with 30 of those customers. I need to find all the customers that are NOT the customers I have issues and replace the value in the 'customer' column with the text 'Supported Customer'. That seems like it should be a simple thing...if it werent for the number of obs, I would have loaded it up in Excel, filtered all the bad customers out and copy/pasted the text 'Supported Customer' over what remained.
Ive tried replace and str_replace_all using grepl and paste/paste0 but to no avail. my current code looks like this:
#All the customers that have issues
out <- c("Customer123", "Customer124", "Customer125", "Customer126", "Customer127",
"Customer128", ..... , "Customer140")
#Look for everything that is NOT in the list above and replace with "Enabled"
orderData$customer <- str_replace_all(orderData$customer, paste0("[^", paste(out, collapse =
"|"), "]"), "Enabled Customers")
That code gets me this error:
Error in stri_replace_all_regex(string, pattern, fix_replacement(replacement), :
In a character range [x-y], x is greater than y. (U_REGEX_INVALID_RANGE)
I've tried the inverse of this approach and pulled a list of all obs that dont match the list of out customers. Something like this:
in <- orderData %>% filter(!customer %in% out) %>% select(customer) %>%
distinct(customer)
This gets me a much larger list of customers that ARE enabled (~3,100). Using the str_replace_all and paste approach seems to have issues though. At this large number of patterns, paste no longer collapses using the "|" operator. instead I get a string that looks like:
"c(\"Customer1\", \"Customer2345\", \"Customer54\", ......)
When passed into str_replace_all, this does not match any patterns.
Anyways, there's got to be an easier way to do this. Thanks for any/all help.
Here is a data.table approach.
First, some example data since you didn't provide any.
customer <- sample(paste0("Customer",1:300),5000,replace = TRUE)
orderData <- data.frame(customer = sample(paste0("Customer",1:300),5000,replace = TRUE),stringsAsFactors = FALSE)
orderData <- cbind(orderData,matrix(runif(0,100,n=5000*30),ncol=30))
out <- c("Customer123", "Customer124", "Customer125", "Customer126", "Customer127", "Customer128","Customer140")
library(data.table)
setDT(orderData)
result <- orderData[!(customer %in% out),customer := gsub("Customer","Supported Customer ",customer)]
result
customer 1 2 3 4 5 6 7 8 9
1: Supported Customer 134 65.35091 8.57117 79.594166 84.88867 97.225276 84.563997 17.15166 41.87160 3.717705
2: Supported Customer 225 72.95757 32.80893 27.318046 72.97045 28.698518 60.709381 92.51114 79.90031 7.311200
3: Supported Customer 222 39.55269 89.51003 1.626846 80.66629 9.983814 87.122153 85.80335 91.36377 14.667535
4: Supported Customer 184 24.44624 20.64762 9.555844 74.39480 49.189537 73.126275 94.05833 36.34749 3.091072
5: Supported Customer 194 42.34858 16.08034 34.182737 75.81006 35.167769 23.780069 36.08756 26.46816 31.994756
---

Can I filter out certain rows/records when retrieving data from Salesforce using the RForcecom function "rforcecom.retrieve"?

Thanks for helping me with my first Stack Overflow question. I am trying to retrieve all the data from several fields in an Object called "Applied Questionnaire"; however, I do not want to retrieve any records that have the name "Training Site".
Currently, this is my code, which works:
quarterly_site_scores = rforcecom.retrieve(session, "AppliedQuestionnaire__c",
c("Site__c", "Site_Name__c", "Total_Score__c"))
%>% rename(site_id = Site__c, site_name = Site_Name__c)
quarterly_site_scores = quarterly_site_scores[!(quarterly_site_scores$site_name == "TRAINING PARK SITE" |
quarterly_site_scores$status != "Completed"),]
However, I'm wondering if there's a more elegant, streamlined solution here. Can I filter at the same time I retrieve? Or is there a better way to filter here?
(I've simplified the code here - I'm actually pulling in about ten fields and filtering on about five or six criteria, just in this one example).
Thank you.
Adding what the OP discovered as an answer using the salesforcer package which returns the SOQL resultset as a tbl_df.
library(salesforcer)
library(tidyverse)
sf_auth(username, password, security_token)
# list all object names in a Salesforce org
ped_objects <- sf_list_objects() %>% .$sobjects %>% map_chr(~pluck(., "name"))
# list all the fields on a particular object
fields <- sf_describe_object_fields('AppliedQuestionnaireBundle2__c')
# write a query to retrieve certain records from that object
site_scores_soql <- "SELECT Site__c,
Site_Name__c,
Total_Score__c
FROM AppliedQuestionnaireBundle2__c
WHERE Site_Name__c != 'GENERIC SITE'
AND Site_Name__c != 'TRAINING PARK SITE'
AND Status__c = 'Completed'"
# run the query
quarterly_site_scores <- sf_query(site_scores_soql)
quarterly_site_scores
#> # A tibble: 3 x 3
#> Site__c Site_Name__c Total_Score__c
#> <chr> <chr> <dbl>
#> 1 A Site Name1 78
#> 2 B Site Name2 52
#> 3 C Site Name3 83

Calculating walking distance using Google Maps in R

I've been tying to get the distance between a list of home postcodes and a list of school postcodes for approximately 2,000 students. I'm using the gmapsdistance package within R to get this from the Google Maps Distance Matrix API. I've put in a valid API key and just replaced this in the following code for security reasons.
library(gmapsdistance)
set.api.key("valid API key")
results <- gmapsdistance(origin = school$HomePostcode,
destination = school$SchoolPostcode,
mode = "walking",
shape = "long")
However, this gives the following error code.
Error in function (type, msg, asError = TRUE) :
Unknown SSL protocol error in connection to maps.googleapis.com:443
Looking on the Google APIs website, it looks like it hasn't ran the query for all the data, it says that there were only 219 requests. I know I'm limited as to how many requests I can do in one day, but the limit is 2,500 and it's not even letting me get close to that.
I've tried running the code on one set of postcodes, like below;
test <- gmapsdistance(origin = "EC4V+5EX",
destination = "EC4V+3AL",
mode = "walking",
shape = "long")
Which gives the following, as I would expect.
$Time
[1] 384
$Distance
[1] 497
$Status
[1] "OK"
My data looks something like this, I've anonymised the data and removed all variables that aren't needed. There are 1,777 sets of postcodes.
head(school)
HomePostcode SchoolPostcode
1 EC4V+5EX EC4V+3AL
2 EC2V+7AD EC4V+3AL
3 EC2A+1WD EC4V+3AL
4 EC1V+3QG EC4V+3AL
5 EC2N+2PT EC4V+3AL
6 EC1M+5QA EC4V+3AL
I do not have enough reputation to comment but have you tried to set the parameter combinations to "pairwise". If set to "all" then it will compute all the combinations between one origin and all destinations.
library(gmapsdistance)
from <- c("EC4V+5EX", "EC2V+7AD", "EC2A+1WD", "EC1V+3QG", "EC2N+2PT", "EC1M+5QA")
to <- c("EC4V+3AL", "EC4V+3AL", "EC4V+3AL", "EC4V+3AL", "EC4V+3AL", "EC4V+3AL")
test <- gmapsdistance(origin=from,
destination=to,
combinations="pairwise",
key="YOURAPIKEYHERE",
mode="walking")
test$Distance
or de Distance
1 EC4V+5EX EC4V+3AL 497
2 EC2V+7AD EC4V+3AL 995
3 EC2A+1WD EC4V+3AL 2079
4 EC1V+3QG EC4V+3AL 2492
5 EC2N+2PT EC4V+3AL 1431
6 EC1M+5QA EC4V+3AL 1892
With this small set of 6 destinations it works, I have an API key, if you send me a bigger set I can try.
Another option would be to use the package googleway, it allows to set as well an API key. Example:
library(googleway)
test <- google_distance(origins = from,
destinations = to,
mode = "walking",
key="YOURAPIKEYHERE")

Convert R JSON Twitter data to list

When using SearchTwitter, I converted to dataframe and then exported to JSON. However, all the text is in one line, etc (sample below). I need to separate so that each tweet is its own.
phish <- searchTwitteR('phish', n = 5, lang = 'en')
phishdf <- do.call("rbind", lapply(phish, as.data.frame))
exportJson <-toJSON(phishdf)
write(exportJson, file = "phishdf.json")
json_phishdf <- fromJSON(file="phishdf.json")
I tried converting to a list and am wondering if maybe converting to a data frame is a mistake.
However, for a list, I tried:
newlist['text']=phish[[1]]$getText()
But this will just give me the text for the first tweet. Is there a way to iterate over the entire data set, maybe in a for loop?
{"text":["#ilazer #abbijacobson I do feel compelled to say that I phind phish awphul... sorry, Abbi!","#phish This on-sale was an embarrassment. Something needs to change.","FS: Have 2 Tix To Phish In Chula Vista #Phish #facevaluetickets #phish #facevalue GO: https://t.co/dFdrpyaotp","RT #WKUPhiDelt: Come unwind from a busy week of class and kick off the weekend with a Phish Fry! 4:30-7:30 at the Phi Delt house. Cost is $\u2026","RT #phish: Tickets for Phish's July 15 & 16 shows at The Gorge go on sale in fifteen minutes at 1PM ET: https://t.co/tEKLNjI5u7 https://t.c\u2026"],
"favorited":[false,false,false,false,false],
"favoriteCount":[0,0,0,0,0],
"replyToSN":["rAlexandria","phish","NA","NA","NA"],
"created":[1456521159,1456521114,1456521022,1456521016,1456520988],
"truncated":[false,false,false,false,false],
"replyToSID":["703326502629277696","703304948990222337","NA","NA","NA"],
"id":["703326837720662016","703326646074343424","703326261045829632","703326236722991105","703326119328686080"],
"replyToUID":["26152867","14503997","NA","NA","NA"],"statusSource":["Mobile Web (M5)","Twitter for iPhone","CashorTrade - Face Value Tickets","Twitter for iPhone","Twitter for Android"],
"screenName":["rAlexandria","adamgelvan","CashorTrade","Kyle_Smith1087","timogrennell"],
"retweetCount":[0,0,0,2,5],
"isRetweet":[false,false,false,true,true],
"retweeted":[false,false,false,false,false],
"longitude":["NA","NA","NA","NA","NA"],
"latitude":["NA","NA","NA","NA","NA"]}
I followed your code and don't have the issue you're describing. Are you using library(twitteR) and library(jsonlite)?
Here is the code, and a screenshot of it working
library(twitteR)
library(jsonlite)
phish <- searchTwitteR('phish', n = 5, lang = 'en')
phishdf <- do.call("rbind", lapply(phish, as.data.frame))
exportJson <-toJSON(phishdf)
write(exportJson, file = "./../phishdf.json")
## note the `txt` argument, as opposed to `file` used in the question
json_phishdf <- fromJSON(txt="./../phishdf.json")

Scrape number of articles on a topic per year from NYT and WSJ?

I would like to create a data frame that scrapes the NYT and WSJ and has the number of articles on a given topic per year. That is:
NYT WSJ
2011 2 3
2012 10 7
I found this tutorial for the NYT but is not working for me :_(. When I get to line 30 I get this error:
> cts <- as.data.frame(table(dat))
Error in provideDimnames(x) :
length of 'dimnames' [1] not equal to array extent
Any help would be much appreciated.
Thanks!
PS: This is my code that is not working (A NYT api key is needed http://developer.nytimes.com/apps/register)
# Need to install from source http://www.omegahat.org/RJSONIO/RJSONIO_0.2-3.tar.gz
# then load:
library(RJSONIO)
### set parameters ###
api <- "API key goes here" ###### <<<API key goes here!!
q <- "MOOCs" # Query string, use + instead of space
records <- 500 # total number of records to return, note limitations above
# calculate parameter for offset
os <- 0:(records/10-1)
# read first set of data in
uri <- paste ("http://api.nytimes.com/svc/search/v1/article?format=json&query=", q, "&offset=", os[1], "&fields=date&api-key=", api, sep="")
raw.data <- readLines(uri, warn="F") # get them
res <- fromJSON(raw.data) # tokenize
dat <- unlist(res$results) # convert the dates to a vector
# read in the rest via loop
for (i in 2:length(os)) {
# concatenate URL for each offset
uri <- paste ("http://api.nytimes.com/svc/search/v1/article?format=json&query=", q, "&offset=", os[i], "&fields=date&api-key=", api, sep="")
raw.data <- readLines(uri, warn="F")
res <- fromJSON(raw.data)
dat <- append(dat, unlist(res$results)) # append
}
# aggregate counts for dates and coerce into a data frame
cts <- as.data.frame(table(dat))
# establish date range
dat.conv <- strptime(dat, format="%Y%m%d") # need to convert dat into POSIX format for this
daterange <- c(min(dat.conv), max(dat.conv))
dat.all <- seq(daterange[1], daterange[2], by="day") # all possible days
# compare dates from counts dataframe with the whole data range
# assign 0 where there is no count, otherwise take count
# (take out PSD at the end to make it comparable)
dat.all <- strptime(dat.all, format="%Y-%m-%d")
# cant' seem to be able to compare Posix objects with %in%, so coerce them to character for this:
freqs <- ifelse(as.character(dat.all) %in% as.character(strptime(cts$dat, format="%Y%m%d")), cts$Freq, 0)
plot (freqs, type="l", xaxt="n", main=paste("Search term(s):",q), ylab="# of articles", xlab="date")
axis(1, 1:length(freqs), dat.all)
lines(lowess(freqs, f=.2), col = 2)
UPDATE: the repo is now at https://github.com/rOpenGov/rtimes
There is a RNYTimes package created by Duncan Temple-Lang https://github.com/omegahat/RNYTimes - but it is outdated because the NYTimes API is on v2 now. I've been working on one for political endpoints only, but not relevant for you.
I'm rewiring RNYTimes right now...Install from github. You need to install devtools first to get install_github
install.packages("devtools")
library(devtools)
install_github("rOpenGov/RNYTimes")
Then try your search with that, e.g,
library(RNYTimes); library(plyr)
moocs <- searchArticles("MOOCs", key = "<yourkey>")
This gives you number of articles found
moocs$response$meta$hits
[1] 121
You could get word counts for each article by
as.numeric(sapply(moocs$response$docs, "[[", 'word_count'))
[1] 157 362 1316 312 2936 2973 355 1364 16 880

Resources