I have the need for a function generator that takes another function and any arguments of that function and sets new defaults. I thought #hadley's pryr::partial was that magic function. It does exactly what I want except you can't then change that new default. So here I can change sep in my new paste function but not the new default of collapse = "_BAR_". How can I make partial perform this way (i.e., default to collapse = "_BAR_" but enable setting it to collapse = NULL if desired)? If this is not possible with partial is there a way to rewrite the code for partial to do this: https://github.com/hadley/pryr/blob/master/R/partial.r
library(pryr)
.paste <- pryr::partial(paste, collapse = "_FOO_")
.paste(1:5)
.paste(1:5, LETTERS[1:5], sep="_BAR_")
.paste(1:5, collapse=NULL)
> .paste(1:5)
[1] "1_FOO_2_FOO_3_FOO_4_FOO_5"
> .paste(1:5, LETTERS[1:5], sep="_BAR_")
[1] "1_BAR_A_FOO_2_BAR_B_FOO_3_BAR_C_FOO_4_BAR_D_FOO_5_BAR_E"
> .paste(1:5, collapse=NULL)
Error in paste(collapse = "_FOO_", ...) :
formal argument "collapse" matched by multiple actual arguments
partial is good for fixing certain parameter values, but if you want to change defaults, you might consider a different strategy. This would work
.paste <- paste
formals(.paste)$collapse <- "_FOO_"
This changes the parameters to the function
args(.paste)
# function (..., sep = " ", collapse = "_FOO_")
# NULL
Then you can do
.paste(1:5)
# [1] "1_FOO_2_FOO_3_FOO_4_FOO_5"
.paste(1:5, LETTERS[1:5], sep="_BAR_")
# [1] "1_BAR_A_FOO_2_BAR_B_FOO_3_BAR_C_FOO_4_BAR_D_FOO_5_BAR_E"
.paste(1:5, collapse=NULL)
# [1] "1" "2" "3" "4" "5"
This is a canned function taking #MrFlick's great response and putting it into a function for future searchers:
hijack <- function(FUN, ...){
.FUN <- FUN
args <- list(...)
invisible(lapply(seq_along(args), function(i) {
formals(.FUN)[[names(args)[i]]] <<- args[[i]]
}))
.FUN
}
# Now Try It
.paste <- hijack(paste, collapse = "_FOO_")
.paste(1:5)
.paste(1:5, LETTERS[1:5], sep="_BAR_")
.paste(1:5, collapse=NULL)
Yielding
> .paste(1:5)
[1] "1_FOO_2_FOO_3_FOO_4_FOO_5"
> .paste(1:5, LETTERS[1:5], sep="_BAR_")
[1] "1_BAR_A_FOO_2_BAR_B_FOO_3_BAR_C_FOO_4_BAR_D_FOO_5_BAR_E"
> .paste(1:5, collapse=NULL)
[1] "1" "2" "3" "4" "5"
You could just write a simple wrapper
.paste <- function(..., collapse = "_FOO_"){paste(..., collapse = collapse)}
which gives
> .paste <- function(..., collapse = "_FOO_"){paste(..., collapse = collapse)}
> .paste(1:5)
[1] "1_FOO_2_FOO_3_FOO_4_FOO_5"
> .paste(1:5, collapse = NULL)
[1] "1" "2" "3" "4" "5"
Related
I have some amino acid modifications, something like:
example <- c('_(Acetyl (Protein N-term))DDDIAAM(Oxidation (M))CK_')
I would like to split such a sequence into a state similar to the following:
example2 <- c('_','(Acetyl (Protein N-term))','D','D','D','I','A','A','M','(Oxidation (M))','C','K','_')
But I don't know how to split such a string while keeping the content inside the brackets, is there any function or code that can help me do this?
Thanks,
LeeLee
Update
Borrowing ideas from #benson23 by inserting a special character, e.g., #, we can try the the code below using strsplit + nested (g)sub's
unlist(
lapply(
unlist(
strsplit(
sub(
"(.*)\\)", "\\1)#",
sub(
"\\(", "#(",
gsub("(\\))([^()]+)(\\()", "\\1#\\2#\\3", example)
)
), "#"
)
),
function(s) {
if (startsWith(s, "(")) {
s
} else {
strsplit(s, "")
}
}
)
)
Here is a bulky implementation to find the paired brackets and do the split
# split string by characters
v <- unlist(strsplit(example, ""))
# positions of "(" and ")"
a <- which(v == "(")
b <- which(v == ")")
# split as per the position of ")"
lst1 <- split(v, cumsum(replace(rep(0, length(v)), 1 + by(b, findInterval(b, a), max), 1)))
# split as per the position of "("
lst2 <- unlist(lapply(lst1, function(x) split(x, cumsum(x == "(") > 0)), recursive = FALSE)
# output
res <- unlist(
lapply(
lst2,
function(s) {
if (s[1] == "(") {
paste0(s, collapse = "")
} else {
s
}
}
),
use.names = FALSE
)
Test
Let's try a little tricky exmaple example <- c("_(Acetyl (Protein (N-term)) XXX) DDDIAAM(Oxidation (M))CK_"), and we will see res as
[1] "_" "(Acetyl (Protein (N-term)) XXX)"
[3] " " "D"
[5] "D" "D"
[7] "I" "A"
[9] "A" "M"
[11] "(Oxidation (M))" "C"
[13] "K"
First insert a special character (here I choose "#") before and after brackets that should be kept together. Then strsplit on the special character. This will get an intermediate example_tmp vector.
example_tmp <- gsub("(?<=\\w)(?=\\()", "#", example, perl = T) %>%
gsub("(?<=\\))(?=\\w)", "#", ., perl = T) %>%
strsplit("#") %>%
unlist()
example_tmp
[1] "_" "(Acetyl (Protein N-term))"
[3] "DDDIAAM" "(Oxidation (M))"
[5] "CK_"
Then use sapply to loop through the vector, and strsplit on strings that do not contain any brackets.
example2 <- unname(unlist(sapply(example_tmp, \(x) if (!grepl("\\(", x)) strsplit(x, "") else x)))
example2
[1] "_" "(Acetyl (Protein N-term))"
[3] "D" "D"
[5] "D" "I"
[7] "A" "A"
[9] "M" "(Oxidation (M))"
[11] "C" "K"
[13] "_"
Here's a solution with tidyverse:
library(tidyverse)
data.frame(example) %>%
mutate(
# extract the strings with multiple uppercase letters:
XX = paste0(unlist(str_extract_all(example, "[A-Z]{2,}")), collapse = "|"),
# remove these strings from `example`:
example = str_remove_all(example, XX),
# split the multiple uppercase letter strings into single letters:
XX = paste0(unlist(str_split(sub("\\|", "", XX), "(?<!^)(?!$)")), collapse = ","),
# split `example` as appropriate:
example = paste0(unlist(str_split(example, "(?<=\\)\\)|_)")), collapse = ","),
# put everything together:
res = paste0(example, XX, collapse = ",")
) %>%
# remove obsolete columns:
select(-c(example, XX))
res
1 _,(Acetyl (Protein N-term)),(Oxidation (M)),_,D,D,D,I,A,A,M,C,K
I have a set of vectors inside a list wherein I want to append certain values to each vector. When I used append() outside the loop, it worked perfectly fine but inside a loop it doesn't seem to work.
factors <- list(c("K3BG","9"),c("RTCKO","4"))
len <- length(factors)
for (i in 1:length)
{
rejig_score <- factors[[i]][2]
rejig_score <- as.numeric(rejig_score)
if(rejig_score > 5)
{
factors[[i]] <- append(factors[[i]],"Approved")
}
else
{
factors[[i]] <- append(factors[[i]],"Disapproved")
}
}
I changed 1:lenght to 1:len inside for
factors <- list(c("K3BG","9"),c("RTCKO","4"))
len <- length(factors)
for (i in 1:len)
{
rejig_score <- factors[[i]][2]
rejig_score <- as.numeric(rejig_score)
if(rejig_score > 5)
{
factors[[i]] <- append(factors[[i]],"Approved")
}
else
{
factors[[i]] <- append(factors[[i]],"Disapproved")
}
}
factors
[[1]]
[1] "K3BG" "9" "Approved"
[[2]]
[1] "RTCKO" "4" "Disapproved"
Using lapply
lapply(factors, function(x) c(x, if(as.numeric(x[2]) > 5)
"Approved" else "Disapproved"))
-output
[[1]]
[1] "K3BG" "9" "Approved"
[[2]]
[1] "RTCKO" "4" "Disapproved"
Or another option is to extract the second element from the list and do the comparison outside, create the vector values and append
new <- c("Disapproved", "Approved")[1 +
(as.numeric(sapply(factors, `[[`, 2)) > 5)]
Map(c, factors, new)
[[1]]
[1] "K3BG" "9" "Approved"
[[2]]
[1] "RTCKO" "4" "Disapproved"
I have a function, that when given a list of strings, should return a vector of all unique strings of N size.
get_unique <- function (input_list, size = 3) {
output = c()
for (input in input_list) {
current = stringr::str_replace(input, "[-_\\s]", "")
current = trimws(gsub(paste0("(.{",size,"})"), "\\1 ", current))
parts = stringr::str_split(current, "\\s", simplify = TRUE)[1,]
output = union(output, parts)
}
return(output)
}
The expectation I have would be:
get_unique(c("ABC", "ABCDEF", "GHIDEF"))
[1] "ABC" "DEF" "GHI"
But what I get is:
get_unique(c("ABC", "ABCDEF", "GHIDEF"))
[[1]]
[1] "ABC"
[[2]]
[1] "DEF"
[[3]]
[1] "GHI"
I'm fairly new to R, so I'm having a tough time understanding where I've gone wrong.
We can use unlist at the end
get_unique <- function (input_list, size = 3) {
output = c()
for (input in input_list) {
current = stringr::str_replace(input, "[-_\\s]", "")
current = trimws(gsub(paste0("(.{",size,"})"), "\\1 ", current))
parts = stringr::str_split(current, "\\s", simplify = TRUE)[1,]
output = union(output, parts)
}
return(unlist(output))
}
get_unique(c("ABC", "ABCDEF", "GHIDEF"))
#[1] "ABC" "DEF" "GHI"
We could also do this in a single line with a regex lookaround to split at every 3 character
unique(unlist(strsplit(v1, "(?<=...)", perl = TRUE)))
#[1] "ABC" "DEF" "GHI"
data
v1 <- c("ABC", "ABCDEF", "GHIDEF")
full on baseR solution, using substr:
get_unique <- function(v) unique(unlist(sapply(v, function(x) sapply(1:(nchar(x)/3), function(y) substr(x, 3*(y-1)+1, 3*y) ))))
get_unique(v1)
[1] "ABC" "DEF" "GHI"
substr(x, 3*(y-1)+1, 3*y) grab 3 characters substrings from x.
R's strsplit drops the last element if "empty" (example 2) but not when occurring first (example 3) or in the middle of the vector to split (example 4).
> unlist(strsplit(x = "1,4", split = ",")) #Example 1
[1] "1" "4"
> unlist(strsplit(x = ",4", split = ",")) #Example 2
[1] "" "4"
> unlist(strsplit(x = "1,", split = ",")) #Example 3
[1] "1"
> unlist(strsplit(x = "1,,,4", split = ",")) #Example 4
[1] "1" "" "" "4"
Is there a way to parse strings that allows keeping the last element if empty after split :
> strmagic(x = "1,", split = ",") #strmagic being the wanted function
[1] "1" ""
A solution with other packages is here (is seems). Can it be done in base R?
UPDATE
Will adding a filler element be necessary ed then a la:
strmagic <- function(v, sep)lapply(v, function(x)head(unlist(strsplit(paste(x, "-", sep = sep), split = sep)), -1))
Weird. This works but is not the most efficient. ZAQ is just a set of random characters
sp <- function( X ){
X <- paste0( X, "ZAQ" )
X <- unlist(strsplit(x = X, split = ","))
X <- gsub( "ZAQ" ,"" ,X)
X
}
sp("1,4")
sp(",4")
sp("1,")
sp("1,,,4")
strmagic <- function(x) unlist(strsplit(sub(",$",",,",x), split = ","))
I have nested list, for example:
x <- c(as.list(c("b", 4)), as.list(c("a", 4)))
Is it possible to order it by the second element in the sublists?
I think you want this as an example:
x <- c(list(c("b", 4)), list(c("a", 4)), list(c("b", 3)) )
And to order by the second element in each list you can use this:
> x[ order ( sapply(x, "[[", 2) )]
[[1]]
[1] "b" "3"
[[2]]
[1] "b" "4"
[[3]]
[1] "a" "4"
The saplly(... , "[[" , <n>) paradigm is often useful for extracting from the results of strsplit:
> z <- strsplit(c( "test of sentence reading", "another test", "something esle") , split=" ")
> sapply(z, "[[", 2)
[1] "of" "test" "esle"