How to order vectors with priority layout? - r

Let's consider these vector of strings following:
x <- c("B", "C_small", "A", "B_big", "C", "A_huge", "D", "A_big", "B_tremendous")
As you can see there are certain strings in this vector starting the same e.g. "B", "B_big".
What I want to end up with is a vector ordered in such layout that all strings with same starting should be next to each other. But order of letter should stay the same (that "B" should be first one, "C" second one and so on). Let me put an example to clarify it:
In simple words, I want to end up with vector:
"B", "B_big", "B_tremendous", "C_small", "C", "A", "A_huge", "A_big", "D"
What I've done to achive this vector: I read from the left and I see "B" so I'm looking on all other vector which starts the same and put it to the right of "B". Then is "C", so I'm looking on all remaining strings and put all starting with "C" e.g. "C_small" to the right and so on.
I'm not sure how to do it. I'm almost sure that gsub function can be used to approach this result, however I'm not sure how to combine it with this searching and replacing. Could you please give me a hand doing so ?

Here's one option:
x <- c("B", "C_small", "A", "B_big", "C", "A_huge", "D", "A_big", "B_tremendous")
xorder <- unique(substr(x, 1, 1))
xnew <- c()
for (letter in xorder) {
if (letter %in% substr(x, 1, 1)) {
xnew <- c(xnew, x[substr(x, 1, 1) == letter])
}
}
xnew
[1] "B" "B_big" "B_tremendous" "C_small" "C"
[6] "A" "A_huge" "A_big" "D"

Use the "prefix" as factor levels and then order:
sx = substr(x, 1, 1)
x[order(factor(sx, levels = unique(sx)))]
# [1] "B" "B_big" "B_tremendous" "C_small" "C" "A" "A_huge" "A_big" "D"
If you are open for non-base alternatives, data.table::chgroup may be used, "groups together duplicated values but retains the group order (according the first appearance order of each group), efficiently":
x[chgroup(substr(x, 1, 1))]
# [1] "B" "B_big" "B_tremendous" "C_small" "C" "A" "A_huge" "A_big" "D"

I suggest splitting the two parts of the text into separate dimensions. Then, define a clear rank order for the descriptive part of the name using a named character vector. From there you can reorder the input vector on the fly. Bundled as a function:
x <- c("B", "C_small", "A", "B_big", "C", "A_huge", "D", "A_big", "B_tremendous")
sorter <- function(x) {
# separate the two parts
prefix <- sub("_.*$", "", x)
suffix <- sub("^.*_", "", x)
# identify inputs with no suffix
suffix <- ifelse(suffix == "", "none", suffix)
# map each suffix to a rank ordering
suffix_order <- c(
"small" = -1,
"none" = 0,
"big" = 1,
"huge" = 2,
"tremendous" = 3
)
# return input vector,
# ordered by the prefix and the mapping of suffix to rank
x[order(prefix, suffix_order[suffix])]
}
sorter(x)
Result
[1] "A_big" "A_huge" "A" "B_big" "B_tremendous" "B" "C_small" "C"
[9] "D"

Related

Finding specific elements in lists

I am stuck at one of the challenges proposed in a tutorial I am reading.
# Using the following code:
challenge_list <- list(words = c("alpha", "beta", "gamma"),
numbers = 1:10
letter = letters
# challenge_list
# Extract the following things:
#
# - The word "gamma"
# - The letters "a", "e", "i", "o", and "u"
# - The numbers less than or equal to 3
I have tried using the followings:
## 1
challenge_list$"gamma"
## 2
challenge_list [[1]["gamma"]]
But nothing works.
> challenge_list$words[challenge_list$words == "gamma"]
[1] "gamma"
> challenge_list$letter[challenge_list$letter %in% c("a","e","i","o","u")]
[1] "a" "e" "i" "o" "u"
> challenge_list$numbers[challenge_list$numbers<=3]
[1] 1 2 3
We can use a function and then do the subset if it is numeric or not and then use Map to pass the list to vector that correspond to the original list element and apply the f1. This would return the new list with the filtered values
f1 <- function(x, y) if(is.numeric(x)) x[ x <= y] else x [x %in% y]
out <- Map(f1, challenge_list, list('gamma', 3, c("a","e","i","o","u")))
out
-output
#$words
#[1] "gamma"
#$numbers
#[1] 1 2 3
#$letter
#[1] "a" "e" "i" "o" "u"
Try this. Most of R objects can be filtered using brackets. In the case of lists you have to use a pair of them like [[]][] because the first one points to the object inside the list and the second one makes reference to the elements inside them. For vectors the task is easy as you only can use a pair of brackets and set conditions to extract elements. Here the code:
#Data
challenge_list <- list(words = c("alpha", "beta", "gamma"),
numbers = 1:10
letter = letters
#Code
challenge_list[[1]][1]
letter[letter %in% c("a", "e", "i", "o","u")]
numbers[numbers<=3]
As I have noticed your data is in a list, you can also play with the position of the elements like this:
#Data 2
challenge_list <- list(words = c("alpha", "beta", "gamma"),numbers = 1:10,letter = letters)
#Code 2
challenge_list[[1]][1]
challenge_list[[3]][challenge_list[[3]] %in% c("a", "e", "i", "o","u")]
challenge_list[[2]][challenge_list[[2]]<=3]
Output:
challenge_list[[1]][1]
[1] "alpha"
challenge_list[[3]][challenge_list[[3]] %in% c("a", "e", "i", "o","u")]
[1] "a" "e" "i" "o" "u"
challenge_list[[2]][challenge_list[[2]]<=3]
[1] 1 2 3

Trouble evaluating combinations from combn using purrr

I am trying to use combn to divide a group of n = 20 different units into 3 groups of unequal size -- 4, 6 and 10. Then I am trying to validate for values that must be together within a group -- if one element from the pair exists in the group then the other should also be in the group. If one is not in the group then neither should be in the group. In this fashion, I'd like to evaluate the groups in order to find all possible valid solutions where the rules are true.
x <- letters[1:20]
same_group <- list(
c("a", "c"),
c("d", "f"),
c("b", "k", "r")
)
combinations_list <- combn(x, 4, simplify = F)
validate_combinations <- function(x) all(c("a", "c") %in% x) | !any(c("a", "c") %in% x)
valid_combinations <- keep(combinations_list, validate_combinations)
In this way I'd like to combine -> reduce each group until I have a list of all valid combinations. I'm not sure how to combine combinations_list, validate_combinations, and the same_group to check all same_group "rules" against the combinations in the table. The furthest I can get is to check against one combination c("a", "c"), which when run against keep(combinations_list, validate_combinations) is indeed giving me the output I want.
I think once I can do this, I can then use the unpicked values in another combn function for the group of 6 and the group of 10.
We can change the function to accept variable group
validate_combinations <- function(x, group) all(group %in% x) | !any(group %in% x)
then for each group subset the combinations_list which satisfy validate_combinations
lapply(same_group, function(x) combinations_list[
sapply(combinations_list, function(y) validate_combinations(y, x))])
#[[1]]
#[[1]][[1]]
#[1] "a" "b" "c" "d"
#[[1]][[2]]
#[1] "a" "b" "c" "e"
#[[1]][[3]]
#[1] "a" "b" "c" "f"
#[[1]][[4]]
#[1] "a" "b" "c" "g"
#[[1]][[5]]
#[1] "a" "b" "c" "h"
#[[1]][[6]]
#[1] "a" "b" "c" "i"
#[[1]][[7]]
#[1] "a" "b" "c" "j"
#[[1]][[8]]
#[1] "a" "b" "c" "k"
#......

Returning the values of a list based on "two" parameters

Very new to R. So I am wondering if you can use two different parameters to get the position of both elements from a list. See the below example...
x <- c("A", "B", "A", "A", "B", "B", "C", "C", "A", "A", "B")
y <- c(which(x == "A"))
[1] 1 3 4 9 10
x[y]
[1] "A" "A" "A" "A" "A"
x[y+1]
[1] "B" "A" "B" "A" "B"
But I would like to return the positions of both y and y+1 together in the same list. My current solution is to merge the two above lists by row number and create a dataframe from there. I don't really like that and was wondering if there is another way. Thanks!
I dont know what exactly you want, but this could help:
newY = c(which(x == "A"),which(x == "A")+1)
After that you can sort it with
finaldata <- newY[order(newY)]
Or you do both in one step:
finaldata <- c(which(x == "A"),which(x == "A")+1)[order(c(which(x == "A"),which(x == "A")+1))]
Then you could also delete duplicates if you want to. Please tell me if this is what you wanted.

Return all elements of list containing certain strings

I have a list of vectors containing strings and I want R to give me another list with all vectors that contain certain strings. MWE:
list1 <- list("a", c("a", "b"), c("a", "b", "c"))
Now, I want a list that contains all vectors with "a" and "b" in it. Thus, the new list should contain two elements, c("a", "b") and c("a", "b", "c").
As list1[grep("a|b", list1)] gives me a list of all vectors containing either "a" or "b", I expected list1[grep("a&b", list1)] to do what I want, but it did not (it returned a list of length 0).
This should work:
test <- list("a", c("a", "b"), c("a", "b", "c"))
test[sapply(test, function(x) sum(c('a', 'b') %in% x) == 2)]
Try purrr::keep
library(purrr)
keep(list1, ~ all(c("a", "b") %in% .))
We can use Filter
Filter(function(x) all(c('a', 'b') %in% x), test)
#[[1]]
#[1] "a" "b"
#[[2]]
#[1] "a" "b" "c"
A solution with grepl:
> list1[grepl("a", list1) & grepl("b", list1)]
[[1]]
[1] "a" "b"
[[2]]
[1] "a" "b" "c"

R beginner standard regarding grouping levels used in R

So one of the problems I am stuck on is that:
I have some variable X which takes values {1,2,3,4}. Thus
X:
1
2
2
4
2
3
What I want to do, is turn the 1's and 2's into A, and the 3's and 4's into B.
Is there any possible suggestions how I would go about doing this. Or hints?
I was initially thinking of using the subset command, but these seems to just extract them from the dataset.
One possible option is to use recodeVar from the doBy package
library(doBy)
x <- c(1, 2, 2, 4, 2, 3)
src = list(c(1, 2), c(3, 4))
tgt = list("A", "B")
recodeVar(x, src, tgt)
which yields
> recodeVar(x, src, tgt)
[1] "A" "A" "A" "B" "A" "B"
>
Or you can use the car package:
library(car)
recode(x, "1:2='A'; 3:4='B'")
X <- c(1,2,2,4,2,3)
Y <- ifelse(X %in% 1:2, "A", "B")
## or
Y <- cut(X,breaks=c(0,2.5,5),labels=c("A","B"))
The latter approach creates a factor rather than a character vector; you can use as.character to turn it back into a character vector if you want.
Another alternative:
LETTERS[ceiling((1:4)/2)]
[1] "A" "A" "B" "B"
LETTERS[ceiling(X/2)]
[1] "A" "A" "A" "B" "A" "B"
if it's dataframe, dplyr package:
dataframe %>%
mutate (newvar = case_when(var %in% c(1, 2) ~ "A",
case_when(var %in% c(3, 4) ~ "B")) -> dataframe

Resources