Converting cURL command to R GET request - r

Does anybody know how to convert more complex cURL commands into httr:GET() requests. The issue I am having is that the API only requires a key in the form of a username <YOUR_API_KEY> but does not require any password.
$ curl https://api.goclimate.com/v1/flight_footprint \
-u YOUR_API_KEY: \
-d 'segments[0][origin]=ARN' \
-d 'segments[0][destination]=BCN' \
-d 'segments[1][origin]=BCN' \
-d 'segments[1][destination]=ARN' \
-d 'cabin_class=economy' \
-d 'currencies[]=SEK' \
-d 'currencies[]=USD' \
-G
Perhaps another package like Rcurl might be more appropriate?
Thanks!

Well, when you include the ":" with nothing after it, you are specifying a password as the empty string. So using httr that would be like
GET("https://api.goclimate.com/v1/flight_footprint",
authenticate("YOUR_API_KEY",""),
query=list(
"segments[0][origin]"="ARN",
"segments[0][destination]"="BCN",
"segments[1][origin]"="BCN",
"segments[1][destination]"="ARN",
"cabin_class"="ecomony",
"currencies[0]"="SEK",
"currencies[1]"="USD"))
Expanding the indexes of the paramters is kind of messy, you can write a helper function
query_expand <- function(x) {
expd <- function(name, value) {
do.call("c", unname(Map(function(name, value) {
if(is.list(value) && !is.null(names(value))) {
xx <- expd(paste0("[", names(value), "]"), value)
setNames(xx, paste0(name, names(xx)))
} else if(is.list(value)) {
xx <- expd(paste0("[",seq_along(value)-1,"]"), value)
setNames(xx, paste0(name, names(xx)))
} else if (length(value)>1) {
setNames(as.list(value), paste0(name, "[", seq_along(value)-1,"]"))
} else {
setNames(list(value), name)
}}, name, value)))
}
expd(names(x), x)
}
Then if you have your data nearly in an object
params <- list("segments" = list(
list(origin="ARN", destination="BCN"),
list(origin="BCN", destination=c("ARN"))
),
"cabin_class" = "ecomony",
"currencies" = c("SEK","USD"))
You could just use
GET("https://api.goclimate.com/v1/flight_footprint",
authenticate("YOUR_API_KEY",""),
query = query_expand(params))

Related

Creating a composite string sourced from multiple places in a JSON document

Consider this JSON document
echo '
{
"alpha": {
"id": "id1",
"values": [
"one",
"two"
]
},
"beta": {
"id": "id2",
"values": [
"three"
]
}
}
' >data.json
check syntax
$ yq -p json -P -o j 'true ' data.json
true
I want to generate a series of strings that combines the id field with each of the values. So output I need should look like this
"id1-one"
"id1-two"
"id2-three"
This is what I've tried
$ yq -p json -P -o j '.[] | .id as $ID | .values[] | $ID + "-" + . ' data.json
"id1-one"
"id2-one"
"id1-two"
"id2-two"
"id1-three"
"id2-three"
There seems to be a multiplication factor kicking in with the $ID variable. Is this the correct approach to get attributes from a different scope, or is there a cleaner way to achieve this?
Note -- the real JSON document contains a lot more nesting, so there are multiple nested arrays/objects between the values and the id attributes.
One final point. I tried the same code with jq and it worked fine.
$ jq ' .[] | .id as $ID | .values[] | $ID + "-" + . ' data.json
"id1-one"
"id1-two"
"id2-three"
Do you need that variable elsewhere? Because it just works without:
yq -p json -P -o json '.[] | .id + "-" + .values[]' data.json
"id1-one"
"id1-two"
"id2-three"
Tested with mikefarah/yq version v4.30.5

How can I read data from delta lib using SparkR?

I couldn't find any reference to access data from Delta using SparkR so I tried myself. So, fist I created a Dummy dataset in Python:
from pyspark.sql.types import StructType,StructField, StringType, IntegerType
data2 = [("James","","Smith","36636","M",2000),
("Robert","","Williams","42114","M",5000),
("Maria","Anne","Jones","39192","F",5000),
("Jen","Mary","Brown","","F",-1)
]
schema = StructType([ \
StructField("firstname",StringType(),True), \
StructField("middlename",StringType(),True), \
StructField("lastname",StringType(),True), \
StructField("id", StringType(), True), \
StructField("gender", StringType(), True), \
StructField("salary", IntegerType(), True) \
])
df = spark.createDataFrame(data=data2,schema=schema)
df.write \
.format("delta")\
.mode("overwrite")\
.option("userMetadata", "first-version") \
.save("/temp/customers")
You can modify this code to change the data and run again to simulate the change over time.
I can query in python using this:
df3 = spark.read \
.format("delta") \
.option("timestampAsOf", "2020-11-30 22:03:00") \
.load("/temp/customers")
df3.show(truncate=False)
But I don't know how to pass the option in Spark R, can you help me?
%r
library(SparkR)
teste_r <- read.df("/temp/customers", source="delta")
head(teste_r)
It works but returns only the current version.
timestampAsOf will work as a parameter in SparkR::read.df.
SparkR::read.df("/temp/customers", source = "delta", timestampAsOf = "2020-11-30 22:03:00")
This can be also done with SparkR::sql.
SparkR::sql('
SELECT *
FROM delta.`/temp/customers`
TIMESTAMP AS OF "2020-11-30 22:03:00"
')
Alternatively, to do it in sparklyr, use the timestamp parameter in spark_read_delta.
library(sparklyr)
sc <- spark_connect(method = "databricks")
spark_read_delta(sc, "/temp/customers", timestamp = "2020-11-30 22:03:00")

Simple Curl -H in R

I want to do
curl -H "Authorization: Basic YOUR_API_KEY" -d '{"classifier_id":155, "value":"TEST"}' "https://www.machinelearningsite.com/language/classify"
I tried
h = getCurlHandle(header = TRUE, userpwd = YOUR_API_KEY, netrc = TRUE)
out <- getURL("https://www.machinelearningsite.com/language/classify?classifier_id=155&value=TEST", curl=h,ssl.verifypeer=FALSE)
but it says method not allowed
It's much easier to translate curl command-line arguments into httr calls:
library(httr)
result <- GET("https://www.machinelearningsite.com/language/classify",
add_headers(Authorization=sprintf("Basic %s", YOUR_API_KEY),
query=list(classifier_id=155, value="TEST")))
ideally, YOUR_API_KEY would be an environment variable, so you can change that to:
result <- GET("https://www.machinelearningsite.com/language/classify",
add_headers(Authorization=sprintf("Basic %s", Sys.getenv("YOUR_API_KEY")),
query=list(classifier_id=155, value="TEST")))
You can then do:
content(result)
To retrieve the actual data.

Better string interpolation in R

I need to build up long command lines in R and pass them to system(). I find it is very inconvenient to use paste0/paste function, or even sprintf function to build each command line. Is there a simpler way to do like this:
Instead of this hard-to-read-and-too-many-quotes:
cmd <- paste("command", "-a", line$elem1, "-b", line$elem3, "-f", df$Colum5[4])
or:
cmd <- sprintf("command -a %s -b %s -f %s", line$elem1, line$elem3, df$Colum5[4])
Can I have this:
cmd <- buildcommand("command -a %line$elem1 -b %line$elem3 -f %df$Colum5[4]")
For a tidyverse solution see https://github.com/tidyverse/glue. Example
name="Foo Bar"
glue::glue("How do you do, {name}?")
With version 1.1.0 (CRAN release on 2016-08-19), the stringr package has gained a string interpolation function str_interp() which is an alternative to the gsubfn package.
# sample data
line <- list(elem1 = 10, elem3 = 30)
df <- data.frame(Colum5 = 1:4)
# do the string interpolation
stringr::str_interp("command -a ${line$elem1} -b ${line$elem3} -f ${df$Colum5[4]}")
#[1] "command -a 10 -b 30 -f 4"
This comes pretty close to what you are asking for. When any function f is prefaced with fn$, i.e. fn$f, character interpolation will be performed replacing ... with the result of running ... as an R expression.
library(gsubfn)
cmd <- fn$identity("command -a `line$elem1` -b `line$elem3` -f `df$Colum5[4]`")
Here is a self contained reproducible example:
library(gsubfn)
# test inputs
line <- list(elem1 = 10, elem3 = 30)
df <- data.frame(Colum5 = 1:4)
fn$identity("command -a `line$elem1` -b `line$elem3` -f `df$Colum5[4]`")
## [1] "command -a 10 -b 30 -f 4"
system
Since any function can be used we could operate directly on the system call like this. We have used echo here to make it executable but any command could be used.
exitcode <- fn$system("echo -a `line$elem1` -b `line$elem3` -f `df$Colum5[4]`")
## -a 10 -b 30 -f 4
Variation
This variation would also work. fn$f also performs substitution of $whatever with the value of variable whatever. See ?fn for details.
with(line, fn$identity("command -a $elem1 -b $elem3 -f `df$Colum5[4]`"))
## [1] "command -a 10 -b 30 -f 4"
Another option would be to use whisker.render from https://github.com/edwindj/whisker which is a {{Mustache}} implementation in R. Usage example:
require(dplyr); require(whisker)
bedFile="test.bed"
whisker.render("processing {{bedFile}}") %>% print
Not really a string interpolation solution, but still a very good option for the problem is to use the processx package instead of system() and then you don't need to quote anything.
library(GetoptLong)
str = qq("region = (#{region[1]}, #{region[2]}), value = #{value}, name = '#{name}'")
cat(str)
qqcat("region = (#{region[1]}, #{region[2]}), value = #{value}, name = '#{name}'")
https://cran.r-project.org/web/packages/GetoptLong/vignettes/variable_interpolation.html

unix ftp script to get latest file from server

I have a unix script to get files via ftp looks something like this:
#!/bin/sh
HOST='1.1.1.1'
USER='user'
PASSWD='pass'
FILE='1234'
ftp -n $HOST <<END_SCRIPT
quote USER $USER
quote PASS $PASSWD
cd .LogbookPlus
get $FILE
quit
END_SCRIPT
exit 0
Instead of getting a specific file, I want to get the last modified file in a folder, or all files created in the last 24 hours. Is this possible via ftp?
This is really pushing the FTP client further than it should be pushed, but it is possible.
Note that the LS_FILE_OFFSET might be different on your system and this won't work at all if the offset is wrong.
#!/bin/sh
HOST='1.1.1.1'
USER='user'
PASSWD='pass'
DIRECTORY='.LogbookPlus'
FILES_TO_GET=1
LS_FILE_OFFSET=57 # Check directory_listing to see where filename begins
rm -f directory_listing
# get listing from directory sorted by modification date
ftp -n $HOST > directory_listing <<fin
quote USER $USER
quote PASS $PASSWD
cd $DIRECTORY
ls -t
quit
fin
# parse the filenames from the directory listing
files_to_get=`cut -c $LS_FILE_OFFSET- < directory_listing | head -$FILES_TO_GET`
# make a set of get commands from the filename(s)
cmd=""
for f in $files_to_get; do
cmd="${cmd}get $f
"
done
# go back and get the file(s)
ftp -n $HOST <<fin
quote USER $USER
quote PASS $PASSWD
cd $DIRECTORY
$cmd
quit
fin
exit 0
You should have definitely given some more information about the systems you are using, e.g. not every ftp server supports ls -t that #JesseParker uses. I used the opportunity and put some ideas that I have used myself for some time into a script that uses awk to to the dirty deeds. As you can see, knowing what flavor of unix your client uses would be beneficial. I have tested this script to run under Debian Wheezy GNU/Linux and FreeBSD 9.2.
#!/bin/sh
# usage: <this_script> <num_files> <date...> [ <...of...> <...max....> <...age...> ... ]
#
# Fetches files from preconfigured ftp server to current directory.
# Maximum number of files is <num_files>
# Only files that have a newer modification time than given date are considered.
# This date is given according to the local 'date' command, which is very different
# on BSD and GNU systems, e.g.:
#
# GNU:
# yesterday
# last year
# Jan 01 1970
#
# BSD:
# -v-1d # yesterday (now minus 1 day)
# -v-1y # last year (now minus 1 year)
# -f %b %e %C%y Jan 01 1970 # format: month day century year
#
# Script tries to autodetect date system, YMMV.
#
# BUGS:
# Does not like quotation marks (") in file names, maybe much more.
#
# Should not have credentials inside this file, but maybe have them
# in '.netrc' and not use 'ftp -n'.
#
# Plenty more.
#
HOST='1.1.1.1'
USER='user'
PASSWD='pass'
DIR='.LogbookPlus'
# Date format for numerical comparison. Can be simply +%s if supported.
DATE_FMT=+%C%y%m%d%H%M%S
# The server's locale for date strings.
LC_SRV_DATE=C
# The 'date' command from BSD systems and that from the GNU coreutils
# are completely different. Test for the appropriate system here:
if LC_ALL=C date -j -f "%b %e %C%y" "Jan 01 1970" $DATE_FMT > /dev/null 2>&1 ; then
SYS_TYPE=BSDish
elif LC_ALL=C date -d "Jan 01 1970" $DATE_FMT > /dev/null 2>&1 ; then
SYS_TYPE=GNUish
else
echo "sh: don't know how to date ;-) sorry!"
exit 1;
fi
# Max. number of files to get (newest files first)
MAX_NUM=$(( ${1:-1} + 0 )) # ensure argv[1] is treated as a number
shift
# Max. age of files. Only files newer that this will be considered.
if [ GNUish = "$SYS_TYPE" ] ; then
MAX_AGE=$( date "$DATE_FMT" -d "${*:-yesterday}" )
elif [ BSDish = "$SYS_TYPE" ] ; then
MAX_AGE=$( date -j "${*:--v-1d}" "$DATE_FMT" )
fi
# create temporary file
TMP_FILE=$(mktemp)
trap 'rm -f "$TMP_FILE"' EXIT INT TERM HUP
ftp -i -n $HOST <<END_FTP_SCRIPT | \
awk -v max_age="$MAX_AGE" \
-v max_num="$MAX_NUM" \
-v date_fmt="$DATE_FMT" \
-v date_loc="$LC_SRV_DATE" \
-v sys_type="$SYS_TYPE" \
-v tmp_file="$TMP_FILE" '
BEGIN {
# columns in the 'dir' output from the ftp server:
# drwx------ 1 user group 4096 Apr 8 2009 Mail
# -rw------- 1 user group 13052 Nov 20 02:07 .bash_history
perm=1; links=2; user=3; group=4; size=5; month=6; day=7; yeartime=8; # name=$9..$NF
if ( "BSDish" == sys_type ) {
date_cmd="LC_ALL=" date_loc " date -j -f"
} else if ( "GNUish" == sys_type ) {
date_cmd="LC_ALL=" date_loc " date -d"
} else {
print "awk: don'\''t know how to date ;-) sorry!" > "/dev/stderr"
exit 1;
}
files[""] = ""
file_cnt = 0
out_cmd = "sort -rn | head -n " max_num " > " tmp_file
}
$perm ~ /^[^-]/ { # skip non-regular files
next
}
{
if ( "BSDish" == sys_type ) {
if ( $yeartime ~ /[0-9][0-9][0-9][0-9]/ ) {
ts_fmt = "\"%b %e %C%y\""
} else if ( $yeartime ~ /[0-9][0-9:[0-9][0-9]/ ) {
ts_fmt = "\"%b %e %H:%M\""
} else {
print "has neither year nor time: " $8
exit 1
}
} else { # tested in BEGIN: must be "GNUish"
ts_fmt = ""
}
cmd = date_cmd " " ts_fmt " \"" $month " " $day " " $yeartime "\" " date_fmt
cmd | getline timestamp
close( cmd )
if ( timestamp > max_age ) {
# clear everything but the file name
$perm=$links=$user=$group=$size=$month=$day=$yeartime=""
files[ file_cnt,"name" ] = $0
files[ file_cnt,"time" ] = timestamp
++file_cnt
}
}
END {
for( i=0; i<file_cnt; ++i ) {
print files[ i,"time" ] "\t" files[ i,"name" ] \
| out_cmd
}
close( out_cmd )
print "quote USER '$USER'\nquote PASS '$PASSWD'\ncd \"'$DIR'\""
i = 0
while( (getline < tmp_file) > 0 ) {
$1 = "" # drop timestamp
gsub( /^ /,"" ) # strip leading space
print "get \"" $0 "\""
}
print "quit"
}
' \
| ftp -v -i -n $HOST
quote USER $USER
quote PASS $PASSWD
cd "$DIR"
dir .
quit
END_FTP_SCRIPT

Resources