How to include a JSON query string in GET request in R? - r

I'm new to R and coding. I'm trying to get some data by making a GET request in R and using the httr and josonlite packages. I've made some successful API calls and parsed the data into a dataframe, but now need to include a JSON query string in my URL. I've read similar questions and answers on Stack Overflow, but am still stumped as to what the best way to go about this is using R. Can I simply encode the query string or do I need to set the query statements seperately (and if so what is the format needed). Below is the URL i need to work with.
Cheers!
liz_URL <- "https://squidle.org/api/annotation_set?q={"order_by":[{"field":"created_at","direction":"desc"}],"filters":[{"name":"usergroups","op":"any","val":{"name":"id","op":"eq","val":49}}]}&results_per_page=100"
PQdata <- GET(liz_URL,add_headers("X-Auth-Token"="XXXXX"))

Related

passing objects into the url - Url Params /Query strings

I have two websites. One website is going to capture form data and put it into a url...
let url = `https://xxxxxxxxxx.herokuapp.com/band/${band._id}?toggle=true&eventDate={"eventDate": ${mainDate[0]}, "eventCharge": ${mainDate[1]}}&quoteAdjuster=${sliderValue}`
Some of the information that I collect in the form is stored in objects and arrays.
Is there a way to send objects/arrays in this url to my website? Currently that whole object above, mainDate, still comes through as a string.
Thanks!
You could change your objects and arrays into strings on purpose by using JSON.stringify(myObject).
Then the server would just need to just use JSON.parse(sentData) in order to reconstruct the arrays and objects (some types of data don't survive this operation, so be careful. For example Date objects become strings and you have to reconstruct them manually).
Also, remember that the GET protocol has a fairly small payload limit (8KB). You will want to switch to using POST, if those parameters aren't important for the URL that the user is browsing.

How can I connect to Salesforce using R

I am trying to connect Rstudio to salesforce database using 'RForcecom' package. When I type in my username, password,loginURL and apiVersion I get the following error:
Error in curl::curl_fetch_memory(url, handle = handle) :
Could not resolve host: na90.salesforce.comservices
I found the following link that explains how to go around this issue
https://akitsche.netlify.com/post/2015-07-23-r-rmarkdown/ with package called 'curl'.
As I proceed to get the proxy using ie_get_proxy_for_url command, instead of returning me the actual proxies it gives me NULL.
I am using Mac.
##Install necessary packages
install.packages("RForcecom")
library(RForcecom)
##Pick out HTTP Proxy
library('curl')
ie_get_proxy_for_url(target_url)
##Connect the exististing Salesforce account to R
connection.salesforce <- rforcecom.login(username, password loginURL, apiVersion)
I tried to use salesforcer when building a pipe into salesforce (SF). For some reason I could not get it to work (my bad) so I built the pipe from scratch.
This response is a description of how to build a data pipe into Salesforce Lightning (SF) using the SF REST API using OAuth Authorization Flow.
There are a couple of things to setup first:
You need an active SF account with username and password
You also need:
the SF token (SF emails this to you open password change)
customer key, customer secret
grant service (something like: /services/oauth2/token?grant_type=password)
login URL (something like: https://login.salesforce.com )
For the customer key & secret, grant service and login URL: consult your IT department or SF application owner.
I use package httr, to send a POST to the login URL as follows:
data <- POST(
loginurl %&% grantservice, # %&%: inline user defined function to concatenate two strings
body = list(
client_id = customerkey,
client_secret = customersecret,
username = username,
password = password %&% token
))
If all goes well, SF will respond by returning data to you and you can obtain the access token and instance URL from SF. You will need this for your subsequent GETs.
# obtain content data
content <- content(data)
access_token <- content$access_token
instance_url <- content$instance_url
id <- content$id
token_type <- content$token_type
At this time I am authorised by the SF server (for 15 minutes I believe) I am ready to run queries through GETs.
First, I have to define the request headers which contain the access token. The instance URL becomes the prefix to the query. Note that the query has to be an SOQL query in a format according to SF requirements (consult SF documentation, it is rather specific). ua is the user_agent (also an httr function).
request_headers <- c("Accept" = "application/json",
"Content-Type" = "application/json",
"Authorization" = paste0("Bearer ", access_token))
resultset <- GET(instance_url %&% query,
add_headers(request_headers), ua)
response_parsed <- content(resultset, "text", encoding="UTF-8")
SF returns data which can be extracted with the content function from httr. This will give me a JSON object which I can transform into a DF (typically DF with list columns if your query was relational).
resultset <- fromJSON(response_parsed, flatten = TRUE)
fromJSON is a function from the jsonlight package. Be prepared to do substantial post-processing on this DF to get the data in the shape you require.
SF does not like that you do things easily so here are two hurdles you need to overcome:
The length of the string sent to SF using GET is limited to around 16500 characters. This sounds like a lot but you'd be surprised how easy it is to go over this limit. For instance, my query contains an IN clause with thousands 18 character SF Identifiers. You will need to test for the length of your query and break it up into sub-queries. (I create a list of sub-queries of appropriate length and then lapply the function that GETs the records from SF).
SF returns a maximum of 2000 records for each GET. If you expect more records to be returned you need to look at 2 data elements returned from SF that will allow you to get the next set of 2000 records: nextRecordsURL and done. done is a flag that tells you if there are more records and nextRecordsURL contains the location where to send your next GET to. E.g. you can write a simple for loop that keeps going until done equals TRUE. Don't forget to combine the data you retrieve in each loop.
You will have to run multiple queries where you had expected one query to do the job. Alas.
The final hurdle to overcome is that the structure of the data from SF really depends on the query (and in extension on the database schema in SF). A query into a relational table will result in a nested list column in the DF that you get. It is really up to you to work out what is the best way to go as only you know how you structure your queries and the database schema.
A final note. I run this script on an R server on a company server inside the corporate firewall. While this is relatively safe, I made sure I did not hard code the credentials into my R code.
It is best to avoid this; people with access to the server can otherwise read your credentials. My naive approach: I created an rds file with encrypted content (using package cyphr) for all credentials. On the server, I only store the encrypted file. I created a function that reads and decrypts the rds file. I make sure that this function is only called inside the POST function call. This ensures that the credentials are only existing in RAM without encryption for the duration of the POST function call (once per session). Amateur level security stuff but better than nothing.

Writing a function that scrapes dataset that appears only after typing in values and clicking a button

I am trying to write a function that will take a list of dates and retrieve the dataset as found on https://www.treasurydirect.gov/GA-FI/FedInvest/selectSecurityPriceDate.htm
I am using PROC IML in SAS to execute R-code (since I am more familiar with R).
My problem is within R, and is due to the website.
First, I am aware that there is an API but this is an exercise I really want to learn because many sites do not have APIs.
Does anyone know how to retrieve the datasets?
Things I've heard:
Use RSelenium to program the clicking. RSelenium got taken off of the archive recently so that isn't an option (even downloading it off of a previous version is causing issues).
Look at the XML url changes as I click the "submit" button in Chrome. However, the XML in the Network tab doesn't show anything, whereas on other websites that have different methods of searching do.
I have been looking for a solution all day, but to no avail! Please help
First, you need to read the terms and conditions and make sure that you are not breaking the rules when scraping.
Next, if there is an API, you should use it so that they can better manage their data usage and operations.
In addition, you should also limit the number of requests made so as not to overload the server. If I am not wrong, this is related to DNS Denial of Service attacks.
Finally, if those above conditions are satisfied, you can use the inspector on Chrome to see what HTTP requests are being made when you browse these webpages.
In this particular case, you do not need RSelenium and a simple HTTP POST will do
library(httr)
resp <- POST("https://www.treasurydirect.gov/GA-FI/FedInvest/selectSecurityPriceDate.htm",
body=list(
priceDate.month=5,
priceDate.day=15,
priceDate.year=2018,
submit="CSV+Format"
),
encode="form")
read.csv(text=rawToChar(resp$content), header=FALSE)
You can perform the same http processing in a SAS session using Proc HTTP. The CSV data does not contain a header row, so perhaps the XML Format is more appropriate. There are a couple of caveats for the treasurydirect site.
Prior to posting a data download request the connection needs some cookies that are assigned during a GET request. Proc HTTP can do this.
The XML contains an extra tag container <bpd> that the SAS XMLV2 library engine can't handle simply. This extra tag can be removed with some DATA step processing.
Sample code for XML
filename response TEMP;
filename respfilt TEMP;
* Get request sets up fresh session and cookies;
proc http
clear_cache
method = "get"
url ="https://www.treasurydirect.gov/GA-FI/FedInvest/selectSecurityPriceDate.htm"
;
run;
* Post request as performed by XML format button;
* automatically utilizes cookies setup in GET request;
* in= can now directly specify the parameter data to post;
proc http
method = "post"
in = 'priceDate.year=2018&priceDate.month=5&priceDate.day=15&submit=XML+Format'
url ="https://www.treasurydirect.gov/GA-FI/FedInvest/selectSecurityPriceDate.htm"
out = response
;
run;
* remove bpd tag from the response (the downloaded xml);
data _null_;
infile response;
file respfilt;
input;
if _infile_ not in: ('<bpd', '</bpd');
put _infile_;
run;
* copy data collections from xml file to tables in work library;
libname respfilt xmlv2 ;
proc copy in=respfilt out=work;
run;
Reference material
REST at Ease with SASĀ®: How to Use SAS to Get Your REST
Joseph Henry, SAS Institute Inc., Cary, NC
http://support.sas.com/resources/papers/proceedings16/SAS6363-2016.pdf

Using U-SQL MultiLevelJsonExtractor gives Error: Path returned multiple tokens

I am using the MultiLevelJsonExtractor forked on Git by kotvisbj, When I put a Path that contains an array (body.header.items[*] or body.header.items) into the JsonPaths parameter string, I get a "Error: Path returned multiple tokens". Is there a way to extract the paths in code so I can get an array like when using the Root? I tried to explain this the best way I could, I don't have excellent c# skills, it's been a few years.
I think it would be best to ask the owner of the branch to see if he can advise you. I assume that his code expects a single token only and not an array of tokens.
You can probably achieve what you need by using code similar to this: U-SQL - Extract data from json-array

Using Json.NET to parse result returned by Google Maps API

I am trying to use google map api's web service to make a web request, and get the json string, and then get the latitude and longitude I need for the input address.
Everything is fine. I got the json string I need.
Now I am using Json.net to parse the string.
I don't know why, but I simply cannot convert it into a JArray.
Here is the json string
Can anyone teach me how to write the c# code to get the lat and lng in geometry > location?
Thanks
Here is my codes and the bug screenshot
You have a few options when using JSON.NET to Parse the JSON.
The best option, IMHO, is to use Serialization to pull the object back into a structured type that you can manipulate as you could any other class. For this you can see serialization in the JSON.NET documentation (I can also post more details if that isn't clear enough).
If all you want is to grab the address, as you listed in your question, you can also use the LINQ feature to pull that information back. You could use code similar to the following to pull it off (the key lies in the SelectToken method to pull back the details you need).
Dim json As Newtonsoft.Json.Linq.JObject
json = Newtonsoft.Json.Linq.JObject.Parse(jsonString)
json.SelectToken("results.formatted_address").ToString()
You can also use all the normal power of Linq to traverse the JSON as you'd expect. See the LINQ documentation as well.
[I realize this is an old question, but in the off chance it helps someone else...]
The problem here is that json["results"] is a JArray, but you are not querying it like one. You need to use an array index to get the first (and only, in this case) element, then you can access the objects inside it.
string address = json["results"][0]["formatted_address"].Value<string>();
To get the latitude and longitude you can do:
JToken location = json["results"][0]["geometry"]["location"];
double lat = location["lat"].Value<double>();
double lng = location["lng"].Value<double>();

Resources