How to input many variable into a paste - r

I have a base query that i am trying to add 14 other strings to, which are named query1:14
I am trying to combine them using paste and the following works for one query. However, i am trying to set this up in a function which will have the number of queries passed to it, to know how many it should loop/add into the final restuls
Here is some code to add one query, which works:
result<- paste(base_query, eval(as.name(paste0("query", 2)))
Here is something i tried to loop the queries added, to no success
range <- 14
result<- paste(base_query, while(loop<range){loop<-loop+1
eval(as.name(paste0("query", loop)))}
I'm not sure how to get the names generated in the while loop to be added to the paste, thanks

In general, it is preferable to store the queries 1-14 in a list, rather than store them separately in the global environment. That is why half of this solution entails getting the objects into a list, which is easily pasted together with the collapse argument of paste.
Also pay attention to stools::mixedsort, which sorts them in the order that I assume you want (incrementally 1 to 14).
# Generate the query objects
base_query = "The letters of the alphabet are:"
for (i in 1:14) {
assign(x = paste0("query",i), value = LETTERS[i])
}
# Find all the query names, and sort them in a meaningful order
query_names = gtools::mixedsort(ls(pattern = "query*"))
query_names
#> [1] "base_query" "query1" "query2" "query3" "query4"
#> [6] "query5" "query6" "query7" "query8" "query9"
#> [11] "query10" "query11" "query12" "query13" "query14"
# Now get the appropriate objects
query_list = lapply(query_names, function(x){
get(x)
})
# Paste and collapse the contents of the list
paste(query_list, collapse = "")
#> [1] "The letters of the alphabet are:ABCDEFGHIJKLMN"
Created on 2022-08-26 by the reprex package (v2.0.0)

Related

In R, what does parentheses followed by parentheses mean

The syntax for using scales::label_percent() in a mutate function is unusual because it uses double parentheses:
label_percent()(an_equation_goes_here)
I don't think I have seen ()() syntax in R before and I don't know how to look it up because I don't know what it is called. I tried ?`()()` and ??`()()` and neither helped. What is double parentheses syntax called? Can someone recommend a place to read about it?
Here is an example for context:
library(tidyverse)
members <-
read_csv(
paste0(
"https://raw.githubusercontent.com/rfordatascience/tidytuesday/",
"master/data/2020/2020-09-22/members.csv"
),
show_col_types = FALSE)
members %>%
count(success, died) %>%
group_by(success) %>%
# old syntax:
# mutate(percent = scales::percent(n / sum(n)))
# new syntax:
mutate(percent = scales::label_percent()(n / sum(n)))
#> # A tibble: 4 × 4
#> # Groups: success [2]
#> success died n percent
#> <lgl> <lgl> <int> <chr>
#> 1 FALSE FALSE 46452 98%
#> 2 FALSE TRUE 868 2%
#> 3 TRUE FALSE 28961 99%
#> 4 TRUE TRUE 238 1%
Created on 2023-01-01 with reprex v2.0.2
Most functions return a value, whether something atomic (numeric, integer, character), list-like (including data.frame), or something more complex. For those, the single set of ()s (as you recognize) are for the one call.
Occasionally, however, a function call returns a function. For example, if we look at ?scales::label_percent, we can scroll down to
Value:
All 'label_()' functions return a "labelling" function, i.e. a
function that takes a vector 'x' and returns a character vector of
'length(x)' giving a label for each input value.
Let's look at it step-by-step:
fun <- scales::label_percent()
fun
# function (x)
# {
# number(x, accuracy = accuracy, scale = scale, prefix = prefix,
# suffix = suffix, big.mark = big.mark, decimal.mark = decimal.mark,
# style_positive = style_positive, style_negative = style_negative,
# scale_cut = scale_cut, trim = trim, ...)
# }
# <bytecode: 0x00000168ee5440e8>
# <environment: 0x00000168ee5501b8>
fun(0.35)
# [1] "35%"
The first call to scales::label_percent() returned a function. We can then use that function with as many arguments as we want.
If you don't want to store the returned function in a variable like fun, you can use it immediately by following the first set of ()s with another set of parens.
scales::label_percent()(0.35)
# [1] "35%"
A related question is "why would you want a function to return another function?" There are many stylistic reasons, but in the case of scales::label_*, they are designed to be used in places where the option needs to be expressed as a function, not as a static value. For example, it can be used in ggplot code: axis ticks are often placed conveniently with simple heuristics to determine the count, locations, and rendering of the ticks marks. While one can use ggplot2::scale_*_manual(values = ...) to manually control how many, where, and what they look like, it is often more convenient to not care a priori how many or where, and in cases where faceting is used, it can vary per faceting variable(s), so not something one can easily assign in a static variable. In those cases, it is often better to assign a function that is given some simple parameters (such as the min/max of the axis), and the function returns something meaningful.
Why can't we just pass it scales::label_percent? (Good question.) Even though you're using the default values in your call here, one might want to change any or all of the controllable things, such as:
suffix= defaults to "%", but perhaps you want a space as in " %"?
decimal.mark= defaults to ".", but maybe your locale prefers commas?
While it is feasible to have multiple functions for all of the combinations of these options, it is generally easier in the long run to provide a "template function" for creating the function, such as
fun <- scales::label_percent(accuracy = 0.01, suffix = " %", decimal.mark = ",")
fun(0.353)
# [1] "35,30 %"
scales::label_percent(accuracy = 0.01, suffix = " %", decimal.mark = ",")(0.353)
# [1] "35,30 %"
An Expression followed by an argument list in round parentheses (( / )) is called a Function Call in R.
There's no need to have a special name for two function calls in a row. They're still just function calls.
If we run a function and the value returned by the function is itself a function then we could call one that too.
For example, we first run f using f() assigning the return value to g but the return value is itself a function so g is a function -- it is the function function() 3 -- and we can run that too.
# f is a function which returns a function
f <- function() function() 3
g <- f() # this runs f which returns `function() 3`
g() # thus g is a function so we can call it
## [1] 3
Now putting that all together we can write it in one line as
f()()
## [1] 3
As seen there is only one meaning for () and the fact that there were two together was simply because we were calling the result of a call.

Selecting an item from list object and adding it to a dataframe in R

This is my first question on stack overflow :)
To add information to a biodiveristy database I have written a program loops calls to an API on the basis of a latin name, then returns the common names of the species, and adds this to a dataframe. The program is working fine, but
I have a problem with the following situation:
This is an example of the output of each individual loop:
taxonname primary language
1 Polynesian Chestnut FALSE eng
2 Tahitian Chestnut TRUE eng
3 Chataignier de Tahiti FALSE fre
The current program adds the first line to the dataframe. However, I only want to add the name that is the primary result. In some cases, like the one demonstrated above, the primary result is in the second line, and sometimes there is no primary result at all. My problem is that this object seems to be a list, and that I do not know how to select the primary result column. I could really use some help with this.
After having selected the list item, I think i need to make a if/else loop that accounts for these three situations:
Under the condition that the name (used in loop) = name (output)
(1) If primary result = TRUE, the row can be added to the data frame
(2) if primary result = FALSE in the first row, the program needs to move on to the next row until it finds a primary result = TRUE, then add this row to the data frame.
(3) If all rows contain primary result = FALSE, I want the program to fill in the text "No main common name"
Here is the code:
# loop the API call
for (i in encoded.api.names) {
name <- i
# if i is empty ("") then add NULL values to dataframe and move on to next entry,
# if else (non empty) move on with loop
if (i == "") {
commondf <- rbind(commondf, "NULL")
# if i is not empty, move on with loop
} else{
call1 <- paste(base,endpoint,name,"?token=",token, sep ="")
print(call1)
# make call to the API
get_species <- GET(call1)
# convert response API into text
get_species_text <- content(get_species, "text")
# convert text into JSON format
get_species_json <- fromJSON(get_species_text, flatten = TRUE)
print(get_species_json)
Until here everything works fine, here comes the part where I will need to adapt something:
# turn into dataframe
if(length(get_species_json$result) > 0){
get_species_test <- as.data.frame(get_species_json)[1,c('name','result.taxonname','result.primary','result.language')]
}else{
get_species_test <- "NULL"
}
# use rbind to append each API call to the get_species dataframe
commondf <- rbind(commondf, get_species_test)
}
}
The simplest way to work with lists for me is to turn them into tables, operate on the table, and go back into lists
library(tidyverse)
example_data <- read_table('taxonname primary language
PolynesianChestnut FALSE eng
TahitianChestnut TRUE eng
ChataignierdeTahiti FALSE fre')
example_list <- example_data |>
as.list()
example_list |>
as_tibble() |>
filter(primary) |>
as.list()
#> $taxonname
#> [1] "TahitianChestnut"
#>
#> $primary
#> [1] TRUE
#>
#> $language
#> [1] "eng"
#>
#> attr(,"spec")
#> cols(
#> taxonname = col_character(),
#> primary = col_logical(),
#> language = col_character()
#> )
Created on 2022-01-24 by the reprex package (v2.0.1)
Apparently the item is not a list but a dataframe, and the correct column can easily be accessed with get_species_json$result$primary.

Assign value to indices of nested lists stored as strings in R

I have a dataframe of nested list indices, which have been stored as strings. To give a simplified example:
df1 <- data.frame(x = c("lst$x$y$a", "lst$x$y$b"), stringsAsFactors = F)
These are then coordinates for the following list:
lst <- list(x=list(y=list(a="foo",b="bar",c="")))
I'd like replace values or assign new values to these elements using the indices in df1.
One attempt was
do.call(`<-`, list(eval(parse(text = df1[1,1])), "somethingelse"))
but this doesn't seem tow work. Instead it assigns "something" to foo.
I'm not too happy with using eval(parse(text=)) (maintaining code will become a nightmare), but recognise I may have little choice.
Any tips welcome.
Let's consider 3 situations:
Case 1
do.call(`<-`, list("lst$x$y$a", "somethingelse"))
This will create a new variable named lst$x$y$a in your workspace, so the following two commands will call different objects. (The former is the object you store in lst, and the latter is the new variable. You need to call it with backticks because its name will confuse R.)
> lst$x$y$a # [1] "foo"
> `lst$x$y$a` # [1] "somethingelse"
Case 2
do.call(`<-`, list(parse(text = "lst$x$y$a"), "somethingelse"))
You mostly get what you expect with this one but an error still occurs:
invalid (do_set) left-hand side to assignment
Let's check:
> parse(text = "lst$x$y$a") # expression(lst$x$y$a)
It belongs to the class expression, and the operator <- seems not to accept this class to the left-hand side.
Case 3
This one will achieve what you want:
do.call(`<-`, list(parse(text = "lst$x$y$a")[[1]], "somethingelse"))
If put [[1]] behind an expression object, a call object will be extracted and take effect in the operator <-.
> lst
# $x
# $x$y
# $x$y$a
# [1] "somethingelse"
#
# $x$y$b
# [1] "bar"
#
# $x$y$c
# [1] ""

Extract package names from R scripts

I am trying to write a function to extract package names from a list of R script files. My regular expression do not seem to be working and I am not sure why. For begginers, I am not able to match lines that include library. For example
str <- c(" library(abc)", "library(def)", "some other text")
grep("library\\(", str, value = TRUE)
grep("library\\(+[A-z]\\)", str, value = TRUE)
Why does my second grep do not return elements 1 and 2 from the str vector? I have tried so many options but all my results come back empty.
Your second grep does not return 1,2 for two reasons.
You used value=TRUE which makes it return the matching string instead of the
location. and
You misplaced the +. You wantgrep("library\\(\\w+\\)", str)
If you'd like something a bit more robust that will handle some edge cases (library() takes a number of parameters and the package one can be a name/symbol or a string and doesn't necessarily have to be specified first):
library(purrr)
script <- '
library(js) ; library(foo)
#
library("V8")
ls()
library(package=rvest)
TRUE
library(package="hrbrthemes")
1 + 1
library(quietly=TRUE, "ggplot2")
library(quietly=TRUE, package=dplyr, verbose=TRUE)
'
x <- parse(textConnection(script)) # parse w/o eval
keep(x, is.language) %>% # `library()` is a language object
keep(~languageEl(.x, 1) == "library") %>% # other things are too, so only keep `library()` ones
map(as.call) %>% # turn it into a `call` object
map(match.call, definition = library) %>% # so we can match up parameters and get them in the right order
map(languageEl, 2) %>% # language element 1 is `library`
map_chr(as.character) %>% # turn names/symbols into characters
sort() # why not
## [1] "dplyr" "foo" "ggplot2" "hrbrthemes" "js" "rvest" "V8"
This won't catch library() calls within functions (it could be expanded to do that) but if top-level edge cases are infrequent, there is an even smaller likelihood of ones in functions (those wld likely use require() as well).

R - Return an object name from a for loop

Using a basic function such as this:
myname<-function(z){
nm <-deparse(substitute(z))
print(nm)
}
I'd like the name of the item to be printed (or returned) when iterating through a list e.g.
for (csv in list(acsv, bcsv, ccsv)){
myname(csv)
}
should print:
acsv
bcsv
ccsv
(and not csv).
It should be noted that acsv, bcsv, and ccsvs are all dataframes read in from csvs i.e.
acsv = read.csv("a.csv")
bcsv = read.csv("b.csv")
ccsv = read.csv("c.csv")
Edit:
I ended up using a bit of a compromise. The primary goal of this was not to simply print the frame name - that was the question, because it is a prerequisite for doing other things.
I needed to run the same functions on four identically formatted files. I then used this syntax:
for(i in 1:length(csvs)){
cat(names(csvs[i]), "\n")
print(nrow(csvs[[i]]))
print(nrow(csvs[[i]][1]))
}
Then the indexing of nested lists was utilized e.g.
print(nrow(csvs[[i]]))
which shows the row count for each of the dataframes.
print(nrow(csvs[[i]][1]))
Then provides a table for the first column of each dataframe.
I include this because it was the motivator for the question. I needed to be able to label the data for each dataframe being examined.
The list you have constructed doesn't "remember" the expressions it was constructed of anymore. But you can use a custom constructor:
named.list <- function(...) {
l <- list(...)
exprs <- lapply(substitute(list(...))[-1], deparse)
names(l) <- exprs
l
}
And so:
> named.list(1+2,sin(5),sqrt(3))
$`1 + 2`
[1] 3
$`sin(5)`
[1] -0.9589243
$`sqrt(3)`
[1] 1.732051
Use this list as parameter to names, as Thomas suggested:
> names(mylist(1+2,sin(5),sqrt(3)))
[1] "1 + 2" "sin(5)" "sqrt(3)"
To understand what's happening here, let's analyze the following:
> as.list(substitute(list(1+2,sqrt(5))))
[[1]]
list
[[2]]
1 + 2
[[3]]
sqrt(5)
The [-1] indexing leaves out the first element, and all remaining elements are passed to deparse, which works because of...
> lapply(as.list(substitute(list(1+2,sqrt(5))))[-1], class)
[[1]]
[1] "call"
[[2]]
[1] "call"
Note that you cannot "refactor" the call list(...) inside substitute() to use simply l. Do you see why?
I am also wondering if such a function is already available in one of the countless R packages around. I have found this post by William Dunlap effectively suggesting the same approach.
I don't know what your data look like, so here's something made up:
csvs <- list(acsv=data.frame(x=1), bcsv=data.frame(x=2), ccsv=data.frame(x=3))
for(i in 1:length(csvs))
cat(names(csvs[i]), "\n")

Resources