Get the unique values of two vectors keeping the order of both original - r

I am trying to get a vector of the unique elements of two vectors that respects the order of both of the original vectors.
The vectors are both sampled from a longer "hidden" vector that only contains unique entries (i.e. no repeats are allowed), which ensures both v1 and v2 have a compatible order (i.e. v1<-("Z","A",...) and v2<-("A","Z",...) can not occur).
The order is arbitrary, so I cannot use any simple order() or sort().
An example below:
v1 <- c("Z", "A", "F", "D")
v2 <- c("A", "T", "F", "Q", "D")
Result desired:
c("Z", "A", "T", "F", "Q", "D") or
Further explanation: v1 establishes the relationship
"Z" < "A" < "F" < "D"
and v2 states
"A" < "T" < "F" < "Q" < "D"
so the sequence that satisfies v1 and v2 is
"Z" < "A" < "T" < "F" < "Q" < "D"
I understand this case is fully determined (the two vectors do completely define the order of all elements), but there would be cases when this is not enough. In that case, any permutation that respects the two sets of ordering would be a satisfactory solution.
Any tips will be appreciated.

You can get unique from v1 and v2 and resort it using match on v1 and v2 and repeat this until no change happens.
x <- unique(c(v1, v2))
repeat {
y <- x
i <- match(v2, x)
x[sort(i)] <- x[i]
i <- match(v1, x)
x[sort(i)] <- x[i]
if(identical(x, y)) break;
}
x
#[1] "Z" "A" "T" "F" "Q" "D"
Alternative you can get the overlapping letters of v1 and v2 and then join to this anchor points the subsets of v1 and v2:
i <- v2[na.omit(match(v1, v2))]
j <- c(0, match(i, v2))
i <- c(0, match(i, v1))
unique(c(unlist(lapply(seq_along(i)[-1], function(k) {
c(v1[head((i[k-1]:i[k]), -1)], v2[head((j[k-1]:j[k])[-1], -1)])
})), v1, v2))
#[1] "Z" "A" "T" "F" "Q" "D"

For this example the next code works. One first has to define auxiliar vectors w1, w2 depending on which has the first common element and another vector w on which to append the lacking elements by order.
It would be clearer using a for loop, which would avoid this cumbersome code, but at first, this is faster and shorter.
w <- w1 <- unlist(ifelse(intersect(v1,v2)[1] == v1[1], list(v2), list(v1)))
w2 <- unlist(ifelse(intersect(v1,v2)[1] == v1[1], list(v1), list(v2)))
unique(lapply(setdiff(w2,w1), function(elmt) w <<- append(w, elmt, after = match(w2[match(elmt,w2)-1],w)))[[length(setdiff(w2,w1))]])
[1] "Z" "A" "T" "F" "Q" "D"

Related

How to order vectors with priority layout?

Let's consider these vector of strings following:
x <- c("B", "C_small", "A", "B_big", "C", "A_huge", "D", "A_big", "B_tremendous")
As you can see there are certain strings in this vector starting the same e.g. "B", "B_big".
What I want to end up with is a vector ordered in such layout that all strings with same starting should be next to each other. But order of letter should stay the same (that "B" should be first one, "C" second one and so on). Let me put an example to clarify it:
In simple words, I want to end up with vector:
"B", "B_big", "B_tremendous", "C_small", "C", "A", "A_huge", "A_big", "D"
What I've done to achive this vector: I read from the left and I see "B" so I'm looking on all other vector which starts the same and put it to the right of "B". Then is "C", so I'm looking on all remaining strings and put all starting with "C" e.g. "C_small" to the right and so on.
I'm not sure how to do it. I'm almost sure that gsub function can be used to approach this result, however I'm not sure how to combine it with this searching and replacing. Could you please give me a hand doing so ?
Here's one option:
x <- c("B", "C_small", "A", "B_big", "C", "A_huge", "D", "A_big", "B_tremendous")
xorder <- unique(substr(x, 1, 1))
xnew <- c()
for (letter in xorder) {
if (letter %in% substr(x, 1, 1)) {
xnew <- c(xnew, x[substr(x, 1, 1) == letter])
}
}
xnew
[1] "B" "B_big" "B_tremendous" "C_small" "C"
[6] "A" "A_huge" "A_big" "D"
Use the "prefix" as factor levels and then order:
sx = substr(x, 1, 1)
x[order(factor(sx, levels = unique(sx)))]
# [1] "B" "B_big" "B_tremendous" "C_small" "C" "A" "A_huge" "A_big" "D"
If you are open for non-base alternatives, data.table::chgroup may be used, "groups together duplicated values but retains the group order (according the first appearance order of each group), efficiently":
x[chgroup(substr(x, 1, 1))]
# [1] "B" "B_big" "B_tremendous" "C_small" "C" "A" "A_huge" "A_big" "D"
I suggest splitting the two parts of the text into separate dimensions. Then, define a clear rank order for the descriptive part of the name using a named character vector. From there you can reorder the input vector on the fly. Bundled as a function:
x <- c("B", "C_small", "A", "B_big", "C", "A_huge", "D", "A_big", "B_tremendous")
sorter <- function(x) {
# separate the two parts
prefix <- sub("_.*$", "", x)
suffix <- sub("^.*_", "", x)
# identify inputs with no suffix
suffix <- ifelse(suffix == "", "none", suffix)
# map each suffix to a rank ordering
suffix_order <- c(
"small" = -1,
"none" = 0,
"big" = 1,
"huge" = 2,
"tremendous" = 3
)
# return input vector,
# ordered by the prefix and the mapping of suffix to rank
x[order(prefix, suffix_order[suffix])]
}
sorter(x)
Result
[1] "A_big" "A_huge" "A" "B_big" "B_tremendous" "B" "C_small" "C"
[9] "D"

Finding specific elements in lists

I am stuck at one of the challenges proposed in a tutorial I am reading.
# Using the following code:
challenge_list <- list(words = c("alpha", "beta", "gamma"),
numbers = 1:10
letter = letters
# challenge_list
# Extract the following things:
#
# - The word "gamma"
# - The letters "a", "e", "i", "o", and "u"
# - The numbers less than or equal to 3
I have tried using the followings:
## 1
challenge_list$"gamma"
## 2
challenge_list [[1]["gamma"]]
But nothing works.
> challenge_list$words[challenge_list$words == "gamma"]
[1] "gamma"
> challenge_list$letter[challenge_list$letter %in% c("a","e","i","o","u")]
[1] "a" "e" "i" "o" "u"
> challenge_list$numbers[challenge_list$numbers<=3]
[1] 1 2 3
We can use a function and then do the subset if it is numeric or not and then use Map to pass the list to vector that correspond to the original list element and apply the f1. This would return the new list with the filtered values
f1 <- function(x, y) if(is.numeric(x)) x[ x <= y] else x [x %in% y]
out <- Map(f1, challenge_list, list('gamma', 3, c("a","e","i","o","u")))
out
-output
#$words
#[1] "gamma"
#$numbers
#[1] 1 2 3
#$letter
#[1] "a" "e" "i" "o" "u"
Try this. Most of R objects can be filtered using brackets. In the case of lists you have to use a pair of them like [[]][] because the first one points to the object inside the list and the second one makes reference to the elements inside them. For vectors the task is easy as you only can use a pair of brackets and set conditions to extract elements. Here the code:
#Data
challenge_list <- list(words = c("alpha", "beta", "gamma"),
numbers = 1:10
letter = letters
#Code
challenge_list[[1]][1]
letter[letter %in% c("a", "e", "i", "o","u")]
numbers[numbers<=3]
As I have noticed your data is in a list, you can also play with the position of the elements like this:
#Data 2
challenge_list <- list(words = c("alpha", "beta", "gamma"),numbers = 1:10,letter = letters)
#Code 2
challenge_list[[1]][1]
challenge_list[[3]][challenge_list[[3]] %in% c("a", "e", "i", "o","u")]
challenge_list[[2]][challenge_list[[2]]<=3]
Output:
challenge_list[[1]][1]
[1] "alpha"
challenge_list[[3]][challenge_list[[3]] %in% c("a", "e", "i", "o","u")]
[1] "a" "e" "i" "o" "u"
challenge_list[[2]][challenge_list[[2]]<=3]
[1] 1 2 3

Trouble evaluating combinations from combn using purrr

I am trying to use combn to divide a group of n = 20 different units into 3 groups of unequal size -- 4, 6 and 10. Then I am trying to validate for values that must be together within a group -- if one element from the pair exists in the group then the other should also be in the group. If one is not in the group then neither should be in the group. In this fashion, I'd like to evaluate the groups in order to find all possible valid solutions where the rules are true.
x <- letters[1:20]
same_group <- list(
c("a", "c"),
c("d", "f"),
c("b", "k", "r")
)
combinations_list <- combn(x, 4, simplify = F)
validate_combinations <- function(x) all(c("a", "c") %in% x) | !any(c("a", "c") %in% x)
valid_combinations <- keep(combinations_list, validate_combinations)
In this way I'd like to combine -> reduce each group until I have a list of all valid combinations. I'm not sure how to combine combinations_list, validate_combinations, and the same_group to check all same_group "rules" against the combinations in the table. The furthest I can get is to check against one combination c("a", "c"), which when run against keep(combinations_list, validate_combinations) is indeed giving me the output I want.
I think once I can do this, I can then use the unpicked values in another combn function for the group of 6 and the group of 10.
We can change the function to accept variable group
validate_combinations <- function(x, group) all(group %in% x) | !any(group %in% x)
then for each group subset the combinations_list which satisfy validate_combinations
lapply(same_group, function(x) combinations_list[
sapply(combinations_list, function(y) validate_combinations(y, x))])
#[[1]]
#[[1]][[1]]
#[1] "a" "b" "c" "d"
#[[1]][[2]]
#[1] "a" "b" "c" "e"
#[[1]][[3]]
#[1] "a" "b" "c" "f"
#[[1]][[4]]
#[1] "a" "b" "c" "g"
#[[1]][[5]]
#[1] "a" "b" "c" "h"
#[[1]][[6]]
#[1] "a" "b" "c" "i"
#[[1]][[7]]
#[1] "a" "b" "c" "j"
#[[1]][[8]]
#[1] "a" "b" "c" "k"
#......

name character vectors with same name of list

I have a list that looks like this.
my_list <- list(Y = c("p", "q"), K = c("s", "t", "u"))
I want to name each list element (the character vectors) with the name of the list they are in. All element of the same vector must have the same name
I was able to write this function that works on a single list element
name_vector <- function(x){
names(x[[1]]) <- rep(names(x[1]), length(x[[1]]))
return(x)
}
> name_vector(my_list[1])
$Y
Y Y
"p" "q"
But can't find a way to vectorize it. If I run it with an apply function it just returns the list unchanged
> lapply(my_list, name_vector)
$K
[1] "p" "q"
$J
[1] "x" "y"
My desired output for my_list is a named vector
Y Y K K K
"p" "q" "s" "t" "u"
We unlist the list while setting the names by replicating
setNames(unlist(my_list), rep(names(my_list), lengths(my_list)))
Or stack into a two column data.frame, extract the 'values' column and name it with 'ind'
with(stack(my_list), setNames(values, ind))
if your names don't end with numbers :
vec <- unlist(my_list)
names(vec) <- sub("\\d+$","",names(vec))
vec
# Y Y K K K
# "p" "q" "s" "t" "u"

Count number of elements meeting criteria in columns with NA values

I've got a matrix with "A", "B" and NA values, and I would like to count the number of "A" or "B" or NA values in every column.
sum(mydata[ , i] == "A")
and
sum(mydata[ , i] == "B")
worked fine for columns without NA. For columns that contain NA I can count the number of NAs with sum(is.na(mydata[ , i]). In these columns sum(mydata[ , i] == "A") returns NA as a result instead of a number.
How can i count the number of "A" values in columns which contain NA values?
Thanks for your help!
Example:
> mydata
V1 V2 V3 V4
V2 "A" "A" "A" "A"
V3 "A" "A" "A" "A"
V4 "B" "B" NA NA
V5 "A" "A" "A" "A"
V6 "B" "A" "A" "A"
V7 "B" "A" "A" "A"
V8 "A" "A" "A" "A"
sum(mydata[ , 2] == "A")
# [1] 6
sum(mydata[ , 3] == "A")
# [1] NA
sum(is.na(mydata[ , 3]))
# [1] 1
The function sum (like many other math functions in R) takes an argument na.rm. If you set na.rm=TRUE, R removes all NA values before doing the calculation.
Try:
sum(mydata[,3]=="A", na.rm=TRUE)
Not sure if this is what you are after. RnewB too so check if this working.
Difference between the number of rows and your number of rows will tell you number of NA items.
colSums(!is.na(mydata))
To expand on the answer from #Andrie,
mydata <- matrix(c(rep("A", 8), rep("B", 2), rep(NA, 2), rep("A", 4),
rep(c("B", "A", "A", "A"), 2), rep("A", 4)), ncol = 4, byrow = TRUE)
myFun <- function(x) {
data.frame(n.A = sum(x == "A", na.rm = TRUE), n.B = sum(x == "B",
na.rm = TRUE), n.NA = sum(is.na(x)))
}
apply(mydata, 2, myFun)
Another possibility is to convert the column in a factor and then to use the function summary. Example:
vec<-c("A","B","A",NA)
summary(as.factor(vec))
A quick way to do this is to do summary stats for the variable:
summary(mydata$my_variable) of table(mydata$my_variable)
This will give you the number of missing variables.
Hope this helps
You can use table to count all your values at once.

Resources