R: subset of character vector - r

I want to get a subset from a character vector. However I want to obtain vector2 containing elements from initial vector between specific elements.
vector <- c("a", "", "b", "c","","d", "e")
vector
how to grab all elements between elements "b" and "e" and get vector2?
#Expected result:
vector2
"c","","d"

You can also do something like this:
vector <- c("a", "", "b", "c","","d", "e")
vector[seq(which(vector=="b")+1,which(vector=="e")-1)]
#[1] "c" "" "d"

Here is one option
f <- function(x, left, right) {
idx <- x %in% c(left, right)
x[as.logical(cumsum(idx) * !idx)]
}
f(vector, "b", "e")
# [1] "c" "" "d"
The first step is to calculate idx as
vector %in% c("b", "e")
# [1] FALSE FALSE TRUE FALSE FALSE FALSE TRUE
then calculate the cumulative sum
cumsum(vector %in% c("b", "e"))
# [1] 0 0 1 1 1 1 2
multiply by !vector %in% c("b", "e") which gives
cumsum(vector %in% c("b", "e")) * !vector %in% c("b", "e")
# [1] 0 0 0 1 1 1 0
convert to this to a logical vector and use it to subset x.
For the given example another option is charmatch
x <- charmatch(c("b", "e"), vector) + c(1, -1)
vector[seq.int(x[1], x[2])]
# [1] "c" "" "d"

With negative subscripts:
x[-c(1:which(x == 'b'), which(x =='e'):length(x))]
#[1] "c" "" "d"
In case when e is found before b it returns empty vector:
(y <- rev(x))
#[1] "e" "d" "" "c" "b" "" "a"
y[-c(1:which(y == 'b'), which(y =='e'):length(y))]
#character(0)

You can also try:
vector[cumsum(vector %in% c("b", "e")) == 1][-1]
[1] "c" "" "d"

Related

Finding specific elements in lists

I am stuck at one of the challenges proposed in a tutorial I am reading.
# Using the following code:
challenge_list <- list(words = c("alpha", "beta", "gamma"),
numbers = 1:10
letter = letters
# challenge_list
# Extract the following things:
#
# - The word "gamma"
# - The letters "a", "e", "i", "o", and "u"
# - The numbers less than or equal to 3
I have tried using the followings:
## 1
challenge_list$"gamma"
## 2
challenge_list [[1]["gamma"]]
But nothing works.
> challenge_list$words[challenge_list$words == "gamma"]
[1] "gamma"
> challenge_list$letter[challenge_list$letter %in% c("a","e","i","o","u")]
[1] "a" "e" "i" "o" "u"
> challenge_list$numbers[challenge_list$numbers<=3]
[1] 1 2 3
We can use a function and then do the subset if it is numeric or not and then use Map to pass the list to vector that correspond to the original list element and apply the f1. This would return the new list with the filtered values
f1 <- function(x, y) if(is.numeric(x)) x[ x <= y] else x [x %in% y]
out <- Map(f1, challenge_list, list('gamma', 3, c("a","e","i","o","u")))
out
-output
#$words
#[1] "gamma"
#$numbers
#[1] 1 2 3
#$letter
#[1] "a" "e" "i" "o" "u"
Try this. Most of R objects can be filtered using brackets. In the case of lists you have to use a pair of them like [[]][] because the first one points to the object inside the list and the second one makes reference to the elements inside them. For vectors the task is easy as you only can use a pair of brackets and set conditions to extract elements. Here the code:
#Data
challenge_list <- list(words = c("alpha", "beta", "gamma"),
numbers = 1:10
letter = letters
#Code
challenge_list[[1]][1]
letter[letter %in% c("a", "e", "i", "o","u")]
numbers[numbers<=3]
As I have noticed your data is in a list, you can also play with the position of the elements like this:
#Data 2
challenge_list <- list(words = c("alpha", "beta", "gamma"),numbers = 1:10,letter = letters)
#Code 2
challenge_list[[1]][1]
challenge_list[[3]][challenge_list[[3]] %in% c("a", "e", "i", "o","u")]
challenge_list[[2]][challenge_list[[2]]<=3]
Output:
challenge_list[[1]][1]
[1] "alpha"
challenge_list[[3]][challenge_list[[3]] %in% c("a", "e", "i", "o","u")]
[1] "a" "e" "i" "o" "u"
challenge_list[[2]][challenge_list[[2]]<=3]
[1] 1 2 3

Generate all combinations (and their sum) of a vector of characters in R

Suppose that I have a vector of length n and I need to generate all possible combinations and their sums. For example:
If n=3, we have:
myVec <- c("a", "b", "c")
Output =
"a"
"b"
"c"
"a+b"
"a+c"
"b+c"
"a+b+c"
Note that we consider that a+b = b+a, so only need to keep one.
Another example if n=4,
myVec <- c("a", "b", "c", "d")
Output:
"a"
"b"
"c"
"d"
"a+b"
"a+c"
"a+d"
"b+c"
"b+d"
"c+d"
"a+b+c"
"a+c+d"
"b+c+d"
"a+b+c+d"
We can use sapply with varying length in combn and use paste as function to apply.
sapply(seq_along(myVec), function(n) combn(myVec, n, paste, collapse = "+"))
#[[1]]
#[1] "a" "b" "c"
#[[2]]
#[1] "a+b" "a+c" "b+c"
#[[3]]
#[1] "a+b+c"
myVec <- c("a", "b", "c", "d")
sapply(seq_along(myVec), function(n) combn(myVec, n, paste, collapse = "+"))
#[[1]]
#[1] "a" "b" "c" "d"
#[[2]]
#[1] "a+b" "a+c" "a+d" "b+c" "b+d" "c+d"
#[[3]]
#[1] "a+b+c" "a+b+d" "a+c+d" "b+c+d"
#[[4]]
#[1] "a+b+c+d"
We can unlist if we need output as single vector.

Concat named list of vectors

I'd trying to figure out how to transform a named list where the values are also list in a named list where the value is the result of a concatenation of the values within a vector.
I do not know if I explain correctly or easily, so follow the example.
x <- list(A = c("e", "f", "g"), B = c("a", "b", "c"), C = c("m", "l", "w"))
#$A
#[1] "e" "f" "g"
#$B
#[1] "a" "b" "c"
#$C
#[1] "m" "l" "w"
named_list_concat <- function(data){ ... }
named_list_concat(x)
#$A
#[1] "efg"
#$B
#[1] "abc"
#$C
#[1] "mlw"
One base possibility:
lapply(x, function(x) paste(x, collapse = ""))
$A
[1] "efg"
$B
[1] "abc"
$C
[1] "mlw"
Or the same thing in a shortened form:
lapply(x, paste, collapse = "")

Return all elements of list containing certain strings

I have a list of vectors containing strings and I want R to give me another list with all vectors that contain certain strings. MWE:
list1 <- list("a", c("a", "b"), c("a", "b", "c"))
Now, I want a list that contains all vectors with "a" and "b" in it. Thus, the new list should contain two elements, c("a", "b") and c("a", "b", "c").
As list1[grep("a|b", list1)] gives me a list of all vectors containing either "a" or "b", I expected list1[grep("a&b", list1)] to do what I want, but it did not (it returned a list of length 0).
This should work:
test <- list("a", c("a", "b"), c("a", "b", "c"))
test[sapply(test, function(x) sum(c('a', 'b') %in% x) == 2)]
Try purrr::keep
library(purrr)
keep(list1, ~ all(c("a", "b") %in% .))
We can use Filter
Filter(function(x) all(c('a', 'b') %in% x), test)
#[[1]]
#[1] "a" "b"
#[[2]]
#[1] "a" "b" "c"
A solution with grepl:
> list1[grepl("a", list1) & grepl("b", list1)]
[[1]]
[1] "a" "b"
[[2]]
[1] "a" "b" "c"

generate labels for variables in R

I'm searching for a better/faster way than this one to generate labels for a variable :
df <- data.frame(a=c(0,7,1,10,2,4,3,5,10,1,7,8,3,2))
pick <- c(0,1,2,3,10)
df[sapply(df$a,function(x) !(x %in% pick)),"a"] <- "a"
df[sapply(df$a,function(x) x==0),"a"] <- "b"
df[sapply(df$a,function(x) x==1 | x==2 | x==3),"a"] <- "c"
df[sapply(df$a,function(x) x==10),"a"] <- "d"
df$a
[1] "b" "a" "c" "d" "c" "a" "c" "a" "d" "c" "a" "a" "c" "c"
For simplicity, I just have one variable in this example, of course there are more variables in my dataset but I just want to change a specific one.
You don't need sapply:
df$a[!df$a %in% pick] <- "a"
df$a[df$a==0] <- "b"
df$a[df$a %in% 1:3] <- "c"
df$a[df$a==10] <- "d"
You could also produce the same result with factors:
df <- data.frame(a=c(0,7,1,10,2,4,3,5,10,1,7,8,3,2))
# the above method
a <- df$a
a[!df$a %in% pick] <- "a"
a[df$a==0] <- "b"
a[df$a %in% 1:3] <- "c"
a[df$a==10] <- "d"
# one way that gives a warning
b1 <- factor(df$a, levels=0:10, labels=c("b",rep("c",3),rep("a",6),"d"))
# another way that won't give a warning
b2 <- factor(df$a)
levels(b2) <- c("b",rep("c",3),rep("a",4),"d")
b2 <- as.character(b2)
# a third strategy using `library(car)`
b3 <- car::recode(df$a,"0='b';1:3='c';10='d';else='a'")
# check that all strategies are the same
all.equal(a,as.character(b1))
# [1] TRUE
all.equal(as.character(b1),as.character(b2))
# [1] TRUE
all.equal(as.character(b1),as.character(b3))
# [1] TRUE
You might also consider mapvalues or revalue in plyr, particularly if you're dealing with more labels:
df$a <- mapvalues(df$a, c(0, 1, 2, 3, 10), c("b", "c", "c", "c", "d"))
df$a[! df$a %in% c("b", "c", "d")] <- "a" # The !pick values
Here is another fairly straightforward solution:
names(pick) <- c("b", "c", "c", "c", "d")
x <- names(pick[match(df$a, pick)])
x[is.na(x)] <- "a"
x
# [1] "b" "a" "c" "d" "c" "a" "c" "a" "d" "c" "a" "a" "c" "c"
It is even more straightforward if you include an NA in your "pick" object.
pick <- c(NA, 0, 1, 2, 3, 10)
names(pick) <- c("a", "b", "c", "c", "c", "d")
names(pick[match(df$a, pick, nomatch = 1)])
# [1] "b" "a" "c" "d" "c" "a" "c" "a" "d" "c" "a" "a" "c" "c"
If you use this second alternative, note that nomatch takes an integer value of the position of what you're matching agains. Here, nomatch maps to "NA" which is in the first position in your "pick" vector. If the "NA" were in the last position, you would enter it as nomatch = 6 instead.
You can also use ifelse function.
with(df,ifelse(a==0,"b",ifelse(a %in% c(1,2,3),"c",ifelse(a==10,"d","a"))))
[1] "b" "a" "c" "d" "c" "a" "c" "a" "d" "c" "a" "a" "c" "c"

Resources