How to Mine Data from a Facebook Group using R? - r

I am using RFacebook package to do data mining.
For example, to get the facebook page then we do
fb_page <- getPage(page="facebook", token=fb_oauth)
In my case is it is a private group and the URL is something like this. So how do I get the page info for a group? I tried the following but got the following error.
Error in callAPI(url = url, token = token) : Unknown path
components: /posts
my_page <- getPage(page="group/222568978569", token=my_oauth)

This isn´t a reproducible example because the page you mention doesn´t exist. But I suggest that you check that this group has authorized your app. Note that after the introduction of version 2.0 of the Graph API, only friends/groups who are using the application that you used to generate the token to query the API will be returned.

Related

Salesforce: Download Reports via URL in R

I try to download the reports available in Salesforce via the URL, e.g.
http://YOURInstance.my.salesforce.com/012389u13541?export=1&enc=UTF-8&xf=csv
in R.
I already did some investigation to access the report via HTTR-GET, however, up until today without any meaningful outcomes. Unfortunately, R is downloading HTML-code instead of the desired csv file. I also tried to realize the approach suggested here:
https://salesforce.stackexchange.com/questions/47414/download-a-report-using-python
The package "RForcecom" allows the interaction via an API, but I was not able to figure out how to realize above solution in R.
General GET-Request:
GET("http://YOUR_Instance.my.salesforce.com/012389u13541?export=1&enc=UTF-8&xf=csv")
I expect the output to be in csv format, but I receive the report data as html source code.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3...
<html>
<head>
<meta HTTP-EQUIV="PRAGMA" CONTENT="NO-CACHE">
...
Did anyone of you guys encounter same issues and can provide guidance? Any kind of help is much appreciated. Thanks in advance!
UPDATED and not-working R-Snippet:
library(RForcecom)
library(httr)
username='username'
password='password'
instanceURL <- "https://login.salesforce.com/"
session <- rforcecom.login(username, password, instanceURL)
sid=as.character(session['sessionID'])
url='http://YOURInstance.my.salesforce.com/012389u13541?export=1&enc=UTF-8&xf=csv'
getData=GET(url,add_headers('Content-Type'='application/json','Authorization'=paste0("Bearer ",sid),'X-PrettyPrint'='1'),set_cookies('sid'=sid))
Are you sure you have a valid report id? It doesn't look right (did you just obfuscate it for purposes of this post?). What is in that HTML you're getting, an error message? SF login screen?
What you're doing is effectively "screen scraping". This is not a real API, it can break at any time, you should find/build something that properly uses Salesforce Analytics API. You've been warned.
But if you're after a quick and dirty solution...
You need to pretend you're an authenticated user, that you have a valid session id. Add a cookie to your GET request.
How to get a valid session id?
You'd have to log in to SF first (for example use SOAP API's login call or I listed some REST api ideas here: https://stackoverflow.com/a/56034159/313628 )
or display some user's session ID in a SF formula, visualforce page and user would copy-paste it to your app.
Once you have it - add a Cookie header to your GET with value sid=<session id goes here>
Here's a raw request & response in SoapUI.
I recently struggled with the same issue, there's a magic parameter you need to add to the query : isdtp=p1
so if you try:
http://YOURInstance.my.salesforce.com/012389u13541?export=1&enc=UTF-8&xf=csv&isdtp=p1
it should return you the file directly.
In your example, I don't think that you can use the rforcecom session with httr functions as you are trying.
Here is a slightly different way to solve the problem.
Rather than trying to retrieve a report that you already created in Salesforce, why not specify the report in SOQL and use rforcecom.query function to execute the SOQL from r. That would return the data in a data frame and would require no further data wrangling in r to make it useable.
I use this technique often and once you get used to the Salesforce API I think that its probably faster and more powerful for most use cases.
Here is a simple function that I use to return select opportunity data for all opportunities in Salesforce.
getSFOpps <- function(session) {
#Construct SOQL Query
soql <- "SELECT Id,
Name,
AccountId,
Amount,
CurrencyIsoCode,
convertCurrency(Amount) usd_amount,
CloseDate,
CreatedDate,
Region__c,
IsClosed,
IsWon,
LastActivityDate,
LeadSource,
OwnerId,
Probability,
StageName,
Type,
IsDeleted
FROM Opportunity"
#Retrieve Opp information
as_tibble(RForcecom::rforcecom.query(session, soql))
}
It requires that you pass in a valid session from Rforcecom.login but you seem to have that part working from your code above.
I hope this helps ...
As of v0.2.0, the {salesforcer} R package implements the Salesforce Reports and Dashboards REST API. You can execute and manage reports without needing to write functions from scratch to pull down report data. Below is an example of how to find a report in your Org and then retrieve its data. You can also just use the report Id which appears in the URL bar when viewing the report in Salesforce (highlighted in red in the screenshot below).
# install.packages('salesforcer')
library(dplyr, warn.conflicts = FALSE)
library(salesforcer)
# Authenticate using username, password, and security token ...
sf_auth(username = "test#gmail.com",
password = "{PASSWORD_HERE}",
security_token = "{SECURITY_TOKEN_HERE}")
# ... or using OAuth 2.0 authentication
sf_auth()
# find a report in your org and run it
all_reports <- sf_query("SELECT Id, Name FROM Report")
this_report_id <- all_reports$Id[1]
results <- sf_run_report(this_report_id)
results

twitteR R API getUser with a username of only numbers

I am working in R with the twitteR package, which is meant to get information from Twitter through their API. After getting authentificated, I can download information about any user with the getUser function. However, I am not able to do so with usernames which are only numbers (for example, 1234). With the line getUser("1234") I get the following error message:
Error in twInterfaceObj$doAPICall(paste("users", "show", sep = "/"),
params = params, : Not Found (HTTP 404).
Is there any way to get user information when the username is made completely of numbers? The function tries to search by ID instead of screenname when it finds only numbers.
Thanks in advance!
First of all, twitteR is deprecated in favour of rtweet, so you might want to look into that.
The specific user ID you've provided is a protected account, so unless your account follows it / has access to it, you will not be able to query it anyway.
Using rtweet and some random attempts to find a valid numerical user ID, I succeeded with this:
library(rtweet)
users <- c("989", "andypiper")
usr_df <- lookup_users(users)
usr_df
rtweet also has some useful coercion functions to force the use of screenname or ID (as_screenname and as_userid respectively)

Rfacebook package: "Unsuported get request" for the getPage method

I am trying to extract all the posts from this year from a Facebook page using the Rfacebook package.
However, for the page that I need I get this error:
"Error in callAPI(url = url, token = token) :
Unsupported get request. Object with ID 'XXXXX' does not exist,
cannot be loaded due to missing permissions, or does not support this operation.
Please read the Graph API documentation at https://developers.facebook.com/docs/graph-api"
This is the command that I used:
datafb <- getPage('XXXXX', token, n = 1000, since = '2017/01/01', until = '2017/04/01',
feed = TRUE)
I am sure the page exists, because I can access it from my Facebook account.
Also, the token is valid because it works when I try for other pages.
I really can't see what's wrong. Does anyone have any idea?

Scraping login protected website with a challenge form?

I'm trying to do some web scraping from steamspy.com, specifically the total playtime hours for a certain game. That info is behind the login wall for the site, so I've been trying to figure out how to get R past it for html mining.
I tried this method for passing login credentials via POST() but it doesn't seem to work. I noticed that the login handler for that example used POST, whereas looking at the source code for steamspy it seems to use a challenge form and I wasn't sure how to proceed with R.
My attempt thus far looks like this:
handle <- handle("http://steamspy.com")
path <- "/login/"
login <- list(
jschl_vc = "bc4e...",
pass = "148..."
)
response <- POST(handle = handle, path = path, body = login)
I found the values for the jschl_vc and pass from inspecting the source code after I logged in. The code above doesn't work and gives me:
Error in curl::curl_fetch_memory(url, handle = handle) : Failure
when receiving data from the peer
probably since I'm tryign to use POST to a challenge form. Is there way that I'm missing to proceed?

Issue in Getting the tweets of whom I follow using twitter API

I am using Twitter API and Flex 4 for creating Desktop App. I need to show tweets in two parts:
1)Tweets of User ABC to be shown in one section.
2)The tweets of people whom the User ABC is following, to be shown in another section.
I achieved point 1 by using :
https://api.twitter.com/1/statuses/user_timeline.xml?&screen_name="+ usrObj.username + "&count=" + usrObj.count;
But getting Bad Authentication Error while trying for point no.2.
I am hitting the following URL using HTTPSERVICE:
https://api.twitter.com/1.1/statuses/home_timeline.json
Also, I used:
https://api.twitter.com/1/statuses/home_timeline.xml?&screen_name="+ usrObj.username + "&count=" + usrObj.count;
where usrObj is an object.
Getting the following error message:
Error #2032: Stream Error. URL: https://api.twitter.com/1.1/statuses/home_timeline.json" errorID=2032
Please let me know whether I am following proper url and queries. Can anyone suggest me as to how to get the tweets exactly?
Got my work done by using the following url for retrieving home timeline:
https://api.twitter.com/1/statuses/following_timeline.xml

Resources