Salesforce: Download Reports via URL in R - r

I try to download the reports available in Salesforce via the URL, e.g.
http://YOURInstance.my.salesforce.com/012389u13541?export=1&enc=UTF-8&xf=csv
in R.
I already did some investigation to access the report via HTTR-GET, however, up until today without any meaningful outcomes. Unfortunately, R is downloading HTML-code instead of the desired csv file. I also tried to realize the approach suggested here:
https://salesforce.stackexchange.com/questions/47414/download-a-report-using-python
The package "RForcecom" allows the interaction via an API, but I was not able to figure out how to realize above solution in R.
General GET-Request:
GET("http://YOUR_Instance.my.salesforce.com/012389u13541?export=1&enc=UTF-8&xf=csv")
I expect the output to be in csv format, but I receive the report data as html source code.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3...
<html>
<head>
<meta HTTP-EQUIV="PRAGMA" CONTENT="NO-CACHE">
...
Did anyone of you guys encounter same issues and can provide guidance? Any kind of help is much appreciated. Thanks in advance!
UPDATED and not-working R-Snippet:
library(RForcecom)
library(httr)
username='username'
password='password'
instanceURL <- "https://login.salesforce.com/"
session <- rforcecom.login(username, password, instanceURL)
sid=as.character(session['sessionID'])
url='http://YOURInstance.my.salesforce.com/012389u13541?export=1&enc=UTF-8&xf=csv'
getData=GET(url,add_headers('Content-Type'='application/json','Authorization'=paste0("Bearer ",sid),'X-PrettyPrint'='1'),set_cookies('sid'=sid))

Are you sure you have a valid report id? It doesn't look right (did you just obfuscate it for purposes of this post?). What is in that HTML you're getting, an error message? SF login screen?
What you're doing is effectively "screen scraping". This is not a real API, it can break at any time, you should find/build something that properly uses Salesforce Analytics API. You've been warned.
But if you're after a quick and dirty solution...
You need to pretend you're an authenticated user, that you have a valid session id. Add a cookie to your GET request.
How to get a valid session id?
You'd have to log in to SF first (for example use SOAP API's login call or I listed some REST api ideas here: https://stackoverflow.com/a/56034159/313628 )
or display some user's session ID in a SF formula, visualforce page and user would copy-paste it to your app.
Once you have it - add a Cookie header to your GET with value sid=<session id goes here>
Here's a raw request & response in SoapUI.

I recently struggled with the same issue, there's a magic parameter you need to add to the query : isdtp=p1
so if you try:
http://YOURInstance.my.salesforce.com/012389u13541?export=1&enc=UTF-8&xf=csv&isdtp=p1
it should return you the file directly.

In your example, I don't think that you can use the rforcecom session with httr functions as you are trying.
Here is a slightly different way to solve the problem.
Rather than trying to retrieve a report that you already created in Salesforce, why not specify the report in SOQL and use rforcecom.query function to execute the SOQL from r. That would return the data in a data frame and would require no further data wrangling in r to make it useable.
I use this technique often and once you get used to the Salesforce API I think that its probably faster and more powerful for most use cases.
Here is a simple function that I use to return select opportunity data for all opportunities in Salesforce.
getSFOpps <- function(session) {
#Construct SOQL Query
soql <- "SELECT Id,
Name,
AccountId,
Amount,
CurrencyIsoCode,
convertCurrency(Amount) usd_amount,
CloseDate,
CreatedDate,
Region__c,
IsClosed,
IsWon,
LastActivityDate,
LeadSource,
OwnerId,
Probability,
StageName,
Type,
IsDeleted
FROM Opportunity"
#Retrieve Opp information
as_tibble(RForcecom::rforcecom.query(session, soql))
}
It requires that you pass in a valid session from Rforcecom.login but you seem to have that part working from your code above.
I hope this helps ...

As of v0.2.0, the {salesforcer} R package implements the Salesforce Reports and Dashboards REST API. You can execute and manage reports without needing to write functions from scratch to pull down report data. Below is an example of how to find a report in your Org and then retrieve its data. You can also just use the report Id which appears in the URL bar when viewing the report in Salesforce (highlighted in red in the screenshot below).
# install.packages('salesforcer')
library(dplyr, warn.conflicts = FALSE)
library(salesforcer)
# Authenticate using username, password, and security token ...
sf_auth(username = "test#gmail.com",
password = "{PASSWORD_HERE}",
security_token = "{SECURITY_TOKEN_HERE}")
# ... or using OAuth 2.0 authentication
sf_auth()
# find a report in your org and run it
all_reports <- sf_query("SELECT Id, Name FROM Report")
this_report_id <- all_reports$Id[1]
results <- sf_run_report(this_report_id)
results

Related

Scrape BSCScan Token Holdings Page

I'm trying to get data from this page
https://bscscan.com/tokenholdings?a=0xFAe2dac0686f0e543704345aEBBe0AEcab4EDA3d
But the Website owner doesn't provide endpoints APIs for this purpose. So I tried to achieve it in different ways:
-USING DRYSCRAPE but the library seems to be abandoned;
-USING REQUESTS but the data are provided dinamically by javascript;
-USING REQUESTS HTML but even in this case the data doesn't seems to be loaded.
I would like to ignore selenium cause it's slow but I don't know how to solve this issue. Anyone has a solution that could work? The data I need is the table containing the tokens of the wallet. Thank U in advice and hv a nice day.
You can do it with requests-html, for example let's grab the symbol of the first row:
from requests_html import HTMLSession
session = HTMLSession()
url='https://bscscan.com/tokenholdings'
token={'a': '0xFAe2dac0686f0e543704345aEBBe0AEcab4EDA3d'}
r = session.get(url, params=token)
r.html.render(sleep=2)
binance_row = r.html.find('tbody tr', first=True)
symbol = binance_row.find('td')[2].text
print(symbol)
Output:
BNB

How to figure out where is the raw data in a table?

https://www.nyse.com/quote/XNYS:A
After I access the above URL, I open Developer Tools in Firefox. Then change the date in HISTORIC PRICES, then click 'GO'. The table is updated. But I don't see relevant HTTP requests sent in devtools.
So this means that the data has already been downloaded in the first request. But I can not figure out how to extract the raw data of the table. Could anybody take a look at how to extract the raw data from the table? (Note that I don't want to use methods like selenium, I want to stay with raw HTTP requests to get the raw data.)
EDIT: websocket is mentioned in the comment. But I can't see it in Developer Tools. I add websocket tag anyway in case somebody knows more about websocket can chime in.
I am afraid you cannot extract javascript rendered content without selenium. You can always make use of a headless browser(you don't see any instance on your screen, the only pitfall is that you have to wait until the page fully loads) and it won't bother you anymore.
In other words, all the other scraping libs are based on urls and forms. Scrapy can post forms but not run javascripts.
Selenium will save the day, all you lose is a couple of seconds for each attempt(will be milliseconds if it is run in frontend). You can share page source with driver.page_source and it can be directly used for parsing(as a html text) with BeautifulSoup or whatever.
You can do it with requests-html, for example let's grab the first row of the table:
from requests_html import HTMLSession
session = HTMLSession()
url = 'https://www.nyse.com/quote/XNYS:A'
r = session.get(url)
r.html.render(sleep=7)
first_row = r.html.find('.flex_tr', first=True)
print(first_row.text)
Output:
06/18/2021
146.31
146.83
144.94
145.01
3,220,680
As #Nikita said you will have to wait the page loading (here 7sec but maybe less), but if you want to do multiple requests you can do it asynchronously !

twitteR R API getUser with a username of only numbers

I am working in R with the twitteR package, which is meant to get information from Twitter through their API. After getting authentificated, I can download information about any user with the getUser function. However, I am not able to do so with usernames which are only numbers (for example, 1234). With the line getUser("1234") I get the following error message:
Error in twInterfaceObj$doAPICall(paste("users", "show", sep = "/"),
params = params, : Not Found (HTTP 404).
Is there any way to get user information when the username is made completely of numbers? The function tries to search by ID instead of screenname when it finds only numbers.
Thanks in advance!
First of all, twitteR is deprecated in favour of rtweet, so you might want to look into that.
The specific user ID you've provided is a protected account, so unless your account follows it / has access to it, you will not be able to query it anyway.
Using rtweet and some random attempts to find a valid numerical user ID, I succeeded with this:
library(rtweet)
users <- c("989", "andypiper")
usr_df <- lookup_users(users)
usr_df
rtweet also has some useful coercion functions to force the use of screenname or ID (as_screenname and as_userid respectively)

How can I allow new R users to send information to a Google Form?

How can I allow new R users to send information to a Google Form? (RSelenium requires a bit of set up, at least for headless browsing, so it's not the best candidate IMO but I may be missing something that makes it the best choice).
I have some new R users I want to get responses from interactively and send to a secure location. I have chosen Google Forms to pass the information to, as it allows one way sends of the info and doesn't allow the user access to the spreadsheet that is created from the form.
Here's a url of this form:
url <- "https://docs.google.com/forms/d/1tz2RPftOLRCQrGSvgJTRELrd9sdIrSZ_kxfoFdHiqD4/viewform"
To give context here's how I'm using R to interact with the user:
question <- function(message, opts = c("Yes", "No")){
message(message)
ans <- menu(opts)
if (ans == "2") FALSE else TRUE
}
question("Was this information helpful?")
I want to then send that TRUE/FALSE to the Google form above. How can I send a response to the Google Form above from within R in a way that I can embed in code the user will interact with and doesn't require difficult set up by the user?
Add on R packages are fine if they accomplish the task.
You can send a POST query. Here an example using httr package:
For example:
library(httr)
send_response<-
function(response){
form_url <- "https://docs.google.com/forms/d/1tz2RPftOLRCQrGSvgJTRELrd9sdIrSZ_kxfoFdHiqD4/formResponse"
POST(form_url,
query = list(`entry.1651773982`=response)
)
}
Then you can call it :
send_response(question("Was this information helpful?"))

R: Search google for a string and return number of hits

Is there a way in R to simply search google for something and then return the number of results? I have seen alot of R packages around some service of google (RGoogleDocs, RGoogleData, RGoogleMaps, googleVis), but I can't find this feature anywhere.
This is what I use, but it's based on an API protocol that's eventually being phased out. It's also rate limited, I believe to 100 searches/day. In the below function, service is "web"; you'll need to get a key from http://code.google.com/apis/loader/signup.html (any URL will work).
GetGoogleResults <- function(keyword, service, key) {
library(RCurl)
library(rjson)
base_url <- "http://ajax.googleapis.com/ajax/services/search/"
keyword <- gsub(" ", "+", keyword)
query <- paste(base_url, service, "?v=1.0&q=", keyword, sep="")
if(!is.null(key))
query <- paste(query, "&key=", key, sep="")
query <- paste(query, "&start=", 0, sep="")
results <- fromJSON(getURL(query))
return(results)
}
Then, you can do something like
google <- GetGoogleResults("searchTerm", "web", yourkey)
str(google) will tell you the structure of the result. If you just want the number of results, you can use google$responseData$cursor$estimatedResultCount.
As I said, this is based on a protocol that may go out of style some day. Per Dirk's answer, there is an alternate approach using a custom search engine that you can use instead, but it's also rate limited (if you want a function for this method, you can ping me at noah_at_noahhl.com).
The final, and not rate limited, way is just to use RCurl to get a page from google, but it's pretty messy to parse, and requires spoofing a user agent to get around Google's attempts to prevent people from doing this. (I can also share this code, but it gets broken whenever Google tweaks any of their HTML).
You may want to start at the Google Custom Search API documentation and then see how much JSON you have to learn to hit it :)
There should be enough R infrastructure in place to get something going.

Resources