Manipulating the quotes on strings when coding in R - r

This is actually a series of questions about the referencing character type of values in R. Would add more bullets when I recalled any other related questions I believe which is interesting and related to this topic. For simplification, here I shall use some simple random examples to explain my questions. Hope this helps:
When building up a set of datasets using for loops and wanted to output a series of vectors with names restored in a list called name_list = ("a", "b", "c", "d", "e", "f") in the loop we would like to define as
for(i in 1:4){
a <- data[data$Year == 2010,]
b <- unique(data$Name)
c <- summarise(group_by(data,Year,Name), avg = mean(quantity))
...
f <- left_join(data,data1, by = c("Year", "Names)
}
Is there any function that allows me to use function(name_list[1]) through function(name_list[6]) to replace the a through f in the for loop? This question also goes for trying to create columns using column names in some tables/data frames embedded a chunk of code. (as.name and noquote function work when just referencing the vector/dataset but don't work when attempting to assign values to the target variable, if possible could anyone share why this happens?)
When we extract some information from SQL or other data sources we might have some information separated by comma or some other delimiters as one variable. How could we test if certain values is among one of the values separated by commas? See the example below:
1567 %in% c(1567,1456,123)
TRUE
a <- "c(1567,1456,123)"
noquote(a)
c(1567,1456,123)
1567 %in% noquote(a)
FALSE
1567 %in% list(noquote(a))
FALSE
b <- "1567,1456,123"
noquote(b)
1567,1456,123
1567 %in% noquote(strsplit(a,","))
FALSE
1567 %in% list(noquote(strsplit(a,",")))
FALSE
I kind of get why the %in% here doesn't work, seems like R is taking 1567,1456,123 as one element. So I used the strsplit to separate them. But seems that it's still not working. Wondering is there any way that allows us to get R taking the string as commands?

If all you need to do is convert comma-separated lists like "1567,1456,123" into R vectors like c(1567, 1456, 123), you definitely do not need to wrap them in c(...) and try to evaluate them directly as vectors. You should just use strsplit to split the data:
data_str <- "1567,1456,123"
data_vec <- as.integer(strsplit(string_data, ","))
stopifnot(1567 %in% data_vec)
Note that strsplit returns a list, because it can also character vectors of length greater than one:
stopifnot(
all.equal(
list(c("a", "b"), c("x", "y")),
strsplit(c("a,b", "x,y"), ",")) == TRUE)
which makes it useful for operating on columns of SQL output:
| id | concatenated_field |
|----|--------------------|
| 1 | 5362,395,9000,7 |
| 2 | 319,75624,63 |
(etc.)
d <- data.frame(
id = c(1, 2),
concatenated_field = c("5362,395,9000,7", "319,75624,63"))
d$split_field <- strsplit(d$concatenated_field, ",")
sapply(d, class)
# id concatenated_field split_field
# "numeric" "character" "list"
d$split_field[[1]]
# [1] "5362" "395" "9000" "7"
Alternatively, if you're reading in one big stream of comma-separated data, you can use scan:
data_vec <- scan(
what = 0, # arcane way to say "expect numeric input"
sep = ",",
text = "1,2,3,4,5,6,7,8,9,10")
stopifnot(all.equal(data_vec, 1:10) == TRUE)
scan is more heavy-duty than strsplit and can handle more complicated inputs as well, such as data with quoted fields:
weird_data <- scan(what="", sep=",", text='marvin,ruby,"joe,joseph",dean')
print(weird_data)
# [1] "marvin" "ruby" "joe,joseph" "dean"
If you are really really sure you need to be able to accept and evaluate R code passed as an input (this can be VERY DANGEROUS since it means you will be executing arbitrary unverified R code), you can use
r_code_string <- 'c("a", "b"), c("x", "y"))'
stopifnot(
all.equal(
c("a", "b"), c("x", "y")),
eval(parse(r_code_string))) == TRUE)
parse converts raw text into an unevaluated "expression", which is a representation of R code in the form of a special R object, eval passes the expression to the interpreter for execution.
As for noquote, it doesn't do what you think it does. It doesn't actually modify the string, it just adds a flag to the variable so that it will print without quotation marks. You can emulate this behavior with print(..., quote = FALSE).

Related

Generate all possible combinations of a text string with two specific letters substituted for each other in R

Using R, I have generated several strings of letters that range from 6-25 characters. I'd like for each one to generate an output that consists of all the combinations of these strings with every "I" substituted for a "L" and vice versa, the order of the characters should stay the same.
For example:
Input
"IVGLWEA"
OUTPUT
"IVGLWEA"
"LVGLWEA"
"LVGIWEA"
'IVGIWEA"
"LVGLWEA"
many thanks
rob
Edit: Thanks to #Skaqqs for the dynamic solution!
string <- "IVGLWEA"
# find the number of I's and L's in the string
n <- length(unlist(gregexpr("I|L", string)))
# make a grid of all possible combinations with this amount of I's and L's
df <- expand.grid(rep(list(c("I", "L")), n))
# replace I's and L's with %s
string_ <- gsub("I|L", "\\%s", string)
# replace %s with letters in grid
do.call(sprintf, as.list(c(string_, df)))
Result:
[1] "IVGIWEA" "LVGIWEA" "IVGLWEA" "LVGLWEA"
Here's an extremely inefficient (but concise!) approach:
Create all potential combinations of your input characters and use regex to extract the desired pattern.
pattern <- "(I|L)VG(I|L)WEA"
b <- c("I", "V", "G", "L", "W", "E", "A")
strings <- apply(expand.grid(rep(list(b), 7)), 1, paste0, collapse = "")
grep(pattern, strings, value = TRUE)
[1] "IVGIWEA" "LVGIWEA" "IVGLWEA" "LVGLWEA"

Using a list's assigned name from a character string in a vector

I have some lists:
my_list1 <- list("data" = list(c("a", "b", "c")), "meta" = list(c("a", "b")))
my_list2 <- list("data" = list(c("x", "y", "z")), "meta" = list(c("x", "y")))
I'd like to be able to perform some operations on these lists but I need to use the names of the lists stored in a vector as I'm creating them dynamically from an API call. Such a vector might be:
list_vec <- c("my_list1", "my_list2")
I'm running into problems evaluating the character string in the vector into the name of the list. I know this topic's been covered but the part I'm stuck on specifically is being able to extract just the data sublist when running functions within assign. Essentially a situation like this:
library(purrr)
for(i in seq_along(1:length(list_vec))){
assign(list_vec[[i]], map_df(list_vec[[i]][["data"]], unlist))
}
Which would give a result of:
# A tibble: 3 x 1
data
<chr>
1 a
2 b
3 c
I could also do something like:
my_list1$meta <- NULL
with
list_vec[[1]][["meta"]] <- NULL
To reduce the list to just the data sublist, but I can't within dynamically assigned names.
I've also wrapping things with eval but can't get that to work.
So specifically I need to evaluate the list's name from a string so I can extract a sublist from it.
We can pass the vector list_vec to mget, which returns a nested list. We use lapply to extract ([[) the data element and use unlist to convert this nested list to a list.
unlist(lapply(mget(list_vec), `[[`, "data"), recursive = FALSE)
Result
#$my_list1
#[1] "a" "b" "c"
#$my_list2
#[1] "x" "y" "z"

How to convert several characters to vectors in R?

I am struggling with converting several characters to vectors and making them as a list in R.
The converting rule is as follows:
Assign a number to each character. ex. A=1, B=2, C=3,...
Make a vector when the length of characters is ">=2". ex. AB = c(1,2), ABC = c(1,2,3)
Make lists containing several vectors.
For example, suppose that there is ex object with three components. For each component, I want to make it to list objects list1, list2, and list3.
ex = c("(A,B,C,D)", "(AB,BC,CD)","(AB,C)")
# 3 lists to be returned from ex object
list1 = "list(1,2,3,4)" # from (A,B,C,D)
list2 = "list(c(1,2), c(2,3), c(3,4))" # from (AB,BC,CD)
list3 = "list(c(1,2), c(3))" # from (AB,C)
Please let me know a good R function to solve the example above.
* The minor change is reflected.
lookUpTable = as.numeric(1:4) #map numbers to their respective strings
names(lookUpTable) = LETTERS[1:4]
step1<- #get rid of parentheses and split by ",".
strsplit(gsub("[()]", "", ex), ",")
result<- #split again to make things like "AB" into "A", "B", also convert the strings to numbers acc. to lookUpTable
lapply(step1, function(x){ lapply(strsplit(x, ""), function(u) unname(lookUpTable[u])) })
# assign to the global environment.
invisible(
lapply(seq_along(result), function(x) {assign(paste0("list", x), result[[x]], envir = globalenv()); NULL})
)
# get it as strings:
invisible(
lapply(seq_along(result), function(x) {assign(paste0("list_string", x), capture.output(dput(result[[x]])), envir = globalenv()); NULL})
)
data:
ex = c("(A,B,C,D)", "(AB,BC,CD)","(AB,C)")
tips and tricks:
I make use of regular expressions in gsub (and strsplit). Learn regex!
I made a lookUpTable that maps the individual strings to numbers. Make sure your lookUpTable is set up analogously.
Have a look at apply functions like in that case ?lapply.
lastly I assign the result to the global environment. I dont recommend this step but its what you have requested.

Vectorized use of the substring function for a row selection of a dataframe with different length

My dataframe has a column named Code of the type char which goes like b,b1,b110-b139,b110,b1100,b1101,... (1602 entries)
I am trying to select all the entries that match the strings in a vector and all the ones that start with the same string.
So lets say I have the vector
Selection=c("b114","d2")
then i want all codes like b114, b1140, b1141, b1142, ... as well as d2, d200, d2000, d2001, d2002, d2003 etc...
what does work in principle is to create a new dataframe like this:
bTable <- TreeMapTable[substr(TreeMapTable$Code,1,4)=="b114"|substr(TreeMapTable$Code,1,2)=="d2",]
which gives me all the data i want, but requires me to manually type the condition for each entry and i just want to give the script a vector with the strings.
I tried to do it like this:
SelectionL=nchar(Selection)
Beispieltable <- TreeMapTable[substr(TreeMapTable$Code,1,AuswahlL)==Auswahl1,]
but this gives me somehow only half of the required entries and i confess i don't really know what it is doing. I know i could use a for loop but from everything i read so far, loops should be avoided and the problem should be solveable by use of vectors.
sample data
df <- data.frame( Code = c("b114", "b115", "b11456", "d2", "d12", "d200", "db114"),
stringsAsFactors = FALSE)
Selection=c("b114","d2")
answer
library( dplyr )
#create a regex pattern to filter on
pattern <- paste0( "^", Selection, collapse = "|" )
#filter out all rows wher 'Code' dows not start with the entries from 'Selection'
df %>% filter( grepl( pattern, Code, perl = TRUE ) )
# Code
# 1 b114
# 2 b11456
# 3 d2
# 4 d200

Convert a character string with an index to an object reference

I am parsing the left-hand side of an R formula. In my specific case, this can be a variable or object with an index (something like myvariable[[3]]). I would like to access the third sub-object of this object and store it in another object. The following example starts at the point where I have the string of the indexed object, but I need the reference.
mychars <- c("a", "b", "c")
mystring <- "mychars[2]"
get(mystring) # does not work
eval(as.name(mystring)) # does not work either
I could of course parse the number using regular expressions and use as.numeric to convert it to a real index. But in some cases, there may be named indices, like mystring["second"]. So how can I extract the sub-object?
You can parse and then eval this expression.
mychars <- c("a", "b", "c")
mystring <- "mychars[2]"
eval(parse(text = mystring))
[1] "b"
It works for named indices too
names(mychars) <- c("first", "second", "third")
eval(parse(text = 'mychars["second"]'))
second
"b"

Resources