Say I have a string such as "x = 1, y = 'cat', z = NULL". I want to obtain the list created by the code list(x = 1, z = 'cat', z = NULL). Here is my first attempt, which I am aware is horrible:
parse_text <- function(x) parse(text = x)[[1]]
strsplit2 <- function(x, ...) strsplit(x, ...)[[1]]
trim_whitespace <- function (x) gsub("^\\s+|\\s+$", "", x)
# take 1
x <- "nk = 1, ncross = 1, pmethod = 'backward'"
x <- strsplit2(x, ",")
xs <- lapply(x, strsplit2, "=")
keys <- lapply(xs, function(x) trim_whitespace(x[1]))
vals <- lapply(xs, function(x) parse_text(x[2]))
setNames(vals, keys)
This is what I imagined a more canonical approach to look like:
# take 2
x <- "nk = 1, ncross = 1, pmethod = 'backward'"
x <- strsplit2(x, ",")
xs <- lapply(x, parse_text)
do.call(list, xs)
But this loses the names of the list. Any help much appreciated! Cheers
You can first create a string containing the expression that you want to execute (i.e. list('your string'), in this case "list( nk = 1, ncross = 1, pmethod = 'backward' )" ) with function paste to add list( and ), then parse the expression with parse function and finally evaluate it with eval function:
x <- "nk = 1, ncross = 1, pmethod = 'backward'" #your string
eval(parse(text=paste('list(',x,')'))) #create and returns the desired list
$nk
[1] 1
$ncross
[1] 1
$pmethod
[1] "backward"
As shown, this will returns you the correct named list.
I hope this will help you.
Here is another way, avoiding the dreaded parse & eval route (but IMHO entirely suitable for this use-case). It relies on the conformity of your tag=value pairings, delimited by ,.
x <- "nk = 1, ncross = 1, pmethod = 'backward'"
# Split into tag=value
vals <- strsplit( x , "," )[[1]]
# Split again and transform to matrix of tags and values
mat <- do.call( rbind , strsplit( vals , "=" ) )
# Return as a list
setNames( as.list( mat[,2] ) , mat[,1] )
#$`nk `
#[1] " 1"
#$` ncross `
#[1] " 1"
#$` pmethod `
#[1] " 'backward'"
Convert the commas to semicolons, source the string into environment e and convert e to a list:
source(textConnection(chartr(",", ";", s)), local = e <- new.env())
as.list(e)
giving:
$x
[1] 1
$y
[1] "cat"
$z
NULL
Related
I have the following strings:
x <- "??????????DRHRTRHLAK??????????"
x2 <- "????????????????????TRCYHIDPHH"
x3 <- "FKDHKHIDVK????????????????????TRCYHIDPHH"
x4 <- "FKDHKHIDVK????????????????????"
What I want to do is to replace all the ? characters with
another string
rep <- "ndqeegillkkkkfpssyvv"
Resulting in:
ndqeegillkDRHRTRHLAKkkkfpssyvv # x
ndqeegillkkkkfpssyvvTRCYHIDPHH # x2
FKDHKHIDVKndqeegillkkkkfpssyvvTRCYHIDPHH # x3
FKDHKHIDVKndqeegillkkkkfpssyvv # x4
Basically, keeping the order of rep in the replacement with the interleaving characters DRHRTRHLAK in x.
The total length of rep is the same as the total length of ?, 20 characters.
Note that I don't want to split rep manually again as an extra step.
I tried this but failed:
>gsub(pattern = "\\?+", replacement = rep, x = x)
[1] "ndqeegillkkkkfpssyvvDRHRTRHLAKndqeegillkkkkfpssyvv"
Example data:
x <- c(
"??????????DRHRTRHLAK??????????",
"????????????????????TRCYHIDPHH",
"FKDHKHIDVK????????????????????TRCYHIDPHH"
)
rep <- "ndqeegillkkkkfpssyvv"
Fix it up with regmatches<- replacements in a vectorised fashion:
gr <- gregexpr("\\?+", x)
csml <- lapply(gr, \(x) cumsum(attr(x, "match.length")) )
regmatches(x, gr) <- lapply(csml, \(x) substring(rep, c(1,x[-length(x)]+1), x))
#[1] "ndqeegillkDRHRTRHLAKkkkfpssyvv"
#[2] "ndqeegillkkkkfpssyvvTRCYHIDPHH"
#[3] "FKDHKHIDVKndqeegillkkkkfpssyvvTRCYHIDPHH"
String Split with substr():
x <- "??????????DRHRTRHLAK??????????"
rep <- "ndqeegillkkkkfpssyvv"
x<-gsub(pattern = "^\\?+", replacement = substr(rep, 1, 10), x = x)
x<-gsub(pattern = "\\?+$", replacement = substr(rep, 11, 20), x = x)
x
#[1] "ndqeegillkDRHRTRHLAKkkkfpssyvv"
Regex ^ matches start, and $ matches end.
You can count the number of ?'s and then cut rep based on that:
x <- "??????????DRHRTRHLAK??????????"
rep <- "ndqeegillkkkkfpssyvv"
pattern <- "(\\?+)(DRHRTRHLAK)(\\?+)"
n <- nchar(gsub(pattern, "\\1", x))
gsub(pattern, paste0(substr(rep, 1, n), "\\2", substr(rep, n+1, nchar(rep))), x)
#[1] "ndqeegillk??????????kkkfpssyvv"
Edit: new examples:
A very verbose way is to do a if else chain, checking where the ?'s are, and substituting rep accordingly.
if(grepl("^\\?.+\\?$", x)){ #?'s on both ends
n <- gsub(pattern, "\\1", x) %>% nchar()
gsub(pattern, paste0(substr(rep, 1, n), "\\2", substr(rep, n+1, nchar(rep))), x)
} else if(grepl("^\\?", x)){ #?'s only on start
n <- gsub(pattern, "\\1", x) %>% nchar()
gsub(pattern, paste0(substr(rep, 1, n), "\\2"), x)
} else if(grepl("\\?$", x)){ #?'s only on end
n <- gsub(pattern, "\\2", x) %>% nchar()
gsub(pattern, paste0("\\2", substr(rep, 1, n)), x)
} else if(grepl("^[A-Z]+\\?+[A-Z]+$", x)){ #?'s only on middle
n <- gsub(pattern, "\\2", x) %>% nchar()
gsub("([A-Z]+)\\?+([A-Z]+)", paste0("\\1", substr(rep, 1, n), "\\2"), x)
}
This may strike you as odd, but I want to exactly achieve the following: I want to get the index of a list pasted into a string containing a string reference to a subset of this list.
For illustration:
l1 <- list(a = 1, b = 2)
l2 <- list(a = 3, b = 4)
l <- list(l1,l2)
X_l <- vector("list", length = length(l))
for (i in 1:length(l)) {
X_l[[i]] = "l[[ #insert index number as character# ]]$l_1*a"
}
In the end, I want something like this:
X_l_wanted <- list("l[[1]]$l_1*a","l[2]]$l_1*a")
You can use sprintf/paste0 directly :
sprintf('l[[%d]]$l_1*a', seq_along(l))
#[1] "l[[1]]$l_1*a" "l[[2]]$l_1*a"
If you want final output as list :
as.list(sprintf('l[[%d]$l_1*a', seq_along(l)))
#[[1]]
#[1] "l[[1]]$l_1*a"
#[[2]]
#[1] "l[[2]]$l_1*a"
Using paste0 :
paste0('l[[', seq_along(l), ']]$l_1*a')
Try paste0() inside your loop. That is the way to concatenate chains. Here the solution with slight changes to your code:
#Data
l1 <- list(a = 1, b = 2)
l2 <- list(a = 3, b = 4)
l <- list(l1,l2)
#List
X_l <- vector("list", length = length(l))
#Loop
for (i in 1:length(l)) {
#Code
X_l[[i]] = paste0('l[[',i,']]$l_1*a')
}
Output:
X_l
[[1]]
[1] "l[[1]]$l_1*a"
[[2]]
[1] "l[[2]]$l_1*a"
Or you could do it with lapply()
library(glue)
X_l <- lapply(1:length(l), function(i)glue("l[[{i}]]$l_l*a"))
X_l
# [[1]]
# l[[1]]$l_l*a
# [[2]]
# l[[2]]$l_l*a
I have a vector of strings, similar to this one, but with many more elements:
s <- c("CGA-DV-558_T_90.67.0_DV_1541_07", "TC-V-576_T_90.0_DV_151_0", "TCA-DV-X_T_6.0_D_A2_07", "T-V-Z_T_2_D_A_0", "CGA-DV-AW0_T.1_24.4.0_V_A6_7", "ACGA-DV-A4W0_T_274.46.0_DV_A266_07")
And I would like to use a function that extracts the string between the nth and ith instances of the delimiter "_". For example, the string between the 2nd (n = 2) and 3rd (i = 3) instances, to get this:
[1] "90.67.0" "90.0" "6.0" "2" "24.4.0" "274.46.0"
Or if n = 4 and i = 5"
[1] "1541" "151" "A2" "A" "A" "A266"
Any suggestions? Thank you for your help!
You can do this with gsub
n = 2
i = 3
pattern1 = paste0("(.*?_){", n, "}")
temp = gsub(pattern1, "", s)
pattern2 = paste0("((.*?_){", i-n, "}).*")
temp = gsub(pattern2, "\\1", temp)
temp = gsub("_$", "", temp)
[1] "1541" "151" "A2" "A" "A6" "A266"
#FUNCTION
foo = function(x, n, i){
do.call(c, lapply(x, function(X)
paste(unlist(strsplit(X, "_"))[(n+1):(i)], collapse = "_")))
}
#USAGE
foo(x = s, n = 3, i = 5)
#[1] "DV_1541" "DV_151" "D_A2" "D_A" "V_A6" "DV_A266"
A third method, that uses substring for the extraction and gregexpr to find the positions is
# extract postions of "_" from each vector element, returns a list
spots <- gregexpr("_", s, fixed=TRUE)
# extract text in between third and fifth underscores
substring(s, sapply(spots, "[", 3) + 1, sapply(spots, "[", 5) - 1)
"DV_1541" "DV_151" "D_A2" "D_A" "V_A6" "DV_A266"
Given is vector:
vec <- c(LETTERS[1:10])
I would like to be able to combine it in a following manner:
resA <- c("AB", "CD", "EF", "GH", "IJ")
resB <- c("ABCDEF","GHIJ")
where elements of the vector vec are merged together according to the desired size of a new element constituting the resulting vector. This is 2 in case of resA and 5 in case of resB.
Desired solution characteristics
The solution should allow for flexibility with respect to the element sizes, i.e. I may want to have vectors with elements of size 2 or 20
There may be not enough elements in the vector to match the desired chunk size, in that case last element should be shortened accordingly (as shown)
This is shouldn't make a difference but the solution should work on words as well
Attempts
Initially, I was thinking of using something on the lines:
c(
paste0(vec[1:2], collapse = ""),
paste0(vec[3:4], collapse = ""),
paste0(vec[5:6], collapse = "")
# ...
)
but this would have to be adapted to jump through the remaining pairs/bigger groups of the vec and handle last group which often would be of a smaller size.
Here is what I came up with. Using Harlan's idea in this question, you can split the vector in different number of chunks. You also want to use your paste0() idea in lapply() here. Finally, you unlist a list.
unlist(lapply(split(vec, ceiling(seq_along(vec)/2)), function(x){paste0(x, collapse = "")}))
# 1 2 3 4 5
#"AB" "CD" "EF" "GH" "IJ"
unlist(lapply(split(vec, ceiling(seq_along(vec)/5)), function(x){paste0(x, collapse = "")}))
# 1 2
#"ABCDE" "FGHIJ"
unlist(lapply(split(vec, ceiling(seq_along(vec)/3)), function(x){paste0(x, collapse = "")}))
# 1 2 3 4
#"ABC" "DEF" "GHI" "J"
vec <- c(LETTERS[1:10])
f1 <- function(x, n){
f <- function(x) paste0(x, collapse = '')
regmatches(f(x), gregexpr(f(rep('.', n)), f(x)))[[1]]
}
f1(vec, 2)
# [1] "AB" "CD" "EF" "GH" "IJ"
or
f2 <- function(x, n)
apply(matrix(x, nrow = n), 2, paste0, collapse = '')
f2(vec, 5)
# [1] "ABCDE" "FGHIJ"
or
f3 <- function(x, n) {
f <- function(x) paste0(x, collapse = '')
strsplit(gsub(sprintf('(%s)', f(rep('.', n))), '\\1 ', f(x)), '\\s+')[[1]]
}
f3(vec, 4)
# [1] "ABCD" "EFGH" "IJ"
I would say the last is best of these since n for the others must be a factor or you will get warnings or recycling
edit - more
f4 <- function(x, n) {
f <- function(x) paste0(x, collapse = '')
Vectorize(substring, USE.NAMES = FALSE)(f(x), which((seq_along(x) %% n) == 1),
which((seq_along(x) %% n) == 0))
}
f4(vec, 2)
# [1] "AB" "CD" "EF" "GH" "IJ"
or
f5 <- function(x, n)
mapply(function(x) paste0(x, collapse = ''),
split(x, c(0, head(cumsum(rep_len(sequence(n), length(x)) %in% n), -1))),
USE.NAMES = FALSE)
f5(vec, 4)
# [1] "ABCD" "EFGH" "IJ"
Here is another way, working with the original array.
A side note, working with words is not straightforward, since there is at least two ways to understand it: you can either keep each word separately or collapse them first an get individual characters. The next function can deal with both options.
vec <- c(LETTERS[1:10])
vec2 <- c("AB","CDE","F","GHIJ")
cuts <- function(x, n, bychar=F) {
if (bychar) x <- unlist(strsplit(paste0(x, collapse=""), ""))
ii <- seq_along(x)
li <- split(ii, ceiling(ii/n))
return(sapply(li, function(y) paste0(x[y], collapse="")))
}
cuts(vec2,2,F)
# 1 2
# "ABCDE" "FGHIJ"
cuts(vec2,2,T)
# 1 2 3 4 5
# "AB" "CD" "EF" "GH" "IJ"
I regulary have the problem that I need to access the actual id variable when using d*ply or l*ply. A simple (yet nonsense) example would be:
df1 <- data.frame( p = c("a", "a", "b", "b"), q = 1:4 )
df2 <- data.frame( m = c("a", "b" ), n = 1:2 )
d_ply( df1, "p", function(x){
actualId <- unique( x$p )
print( mean(x$q)^df2[ df2$m == actualId, "n" ] )
})
So in case of d*ply functions I can help myself with unique( x$p ). But when it comes to l*ply, I have no idea how to access the name of the according list element.
l_ply( list(a = 1, b = 2, c = 3), function(x){
print( <missing code> )
})
# desired output
[1] "a"
[1] "b"
[1] "c"
Any suggestions? Anything I am ignoring?
One way I've gotten around this is to loop over the index (names) and do the subsetting within the function.
l <- list(a = 1, b = 2, c = 3)
l_ply(names(l), function(x){
print(x)
myl <- l[[x]]
print(myl)
})
myl will then be the same as
l_ply(l, function(myl) {
print(myl)
})
Here's one idea.
l_ply( list(a = 1, b = 2, c = 3), function(x){
print(eval(substitute(names(.data)[i], parent.frame())))
})
# [1] "a"
# [1] "b"
# [1] "c"
(Have a look at the final code block of l_ply to see where I got the names .data and i.)
I'm not sure there's a way to do that, because the only argument to your anonymous function is the list element value, without its name :
l_ply( list(a = 1, b = 2, c = 3), function(x){
print(class(x))
})
[1] "numeric"
[1] "numeric"
[1] "numeric"
But if you get back the results of your command as a list or a data frame, the names are preserved for you to use later :
llply( list(a = 1, b = 2, c = 3), function(x){
x
})
$a
[1] 1
$b
[1] 2
$c
[1] 3
Aside from Josh solution, you can also pass both names and values of your list elements to a function with mapply or m*ply :
d <- list(a = 1, b = 2, c = 3)
myfunc <- function(value, name) {
print(as.character(name))
print(value)
}
mapply(myfunc, d, names(d))
m_ply(data.frame(value=unlist(d), name=names(d)), myfunc)