Returning the values of a list based on "two" parameters - r

Very new to R. So I am wondering if you can use two different parameters to get the position of both elements from a list. See the below example...
x <- c("A", "B", "A", "A", "B", "B", "C", "C", "A", "A", "B")
y <- c(which(x == "A"))
[1] 1 3 4 9 10
x[y]
[1] "A" "A" "A" "A" "A"
x[y+1]
[1] "B" "A" "B" "A" "B"
But I would like to return the positions of both y and y+1 together in the same list. My current solution is to merge the two above lists by row number and create a dataframe from there. I don't really like that and was wondering if there is another way. Thanks!

I dont know what exactly you want, but this could help:
newY = c(which(x == "A"),which(x == "A")+1)
After that you can sort it with
finaldata <- newY[order(newY)]
Or you do both in one step:
finaldata <- c(which(x == "A"),which(x == "A")+1)[order(c(which(x == "A"),which(x == "A")+1))]
Then you could also delete duplicates if you want to. Please tell me if this is what you wanted.

Related

How to order vectors with priority layout?

Let's consider these vector of strings following:
x <- c("B", "C_small", "A", "B_big", "C", "A_huge", "D", "A_big", "B_tremendous")
As you can see there are certain strings in this vector starting the same e.g. "B", "B_big".
What I want to end up with is a vector ordered in such layout that all strings with same starting should be next to each other. But order of letter should stay the same (that "B" should be first one, "C" second one and so on). Let me put an example to clarify it:
In simple words, I want to end up with vector:
"B", "B_big", "B_tremendous", "C_small", "C", "A", "A_huge", "A_big", "D"
What I've done to achive this vector: I read from the left and I see "B" so I'm looking on all other vector which starts the same and put it to the right of "B". Then is "C", so I'm looking on all remaining strings and put all starting with "C" e.g. "C_small" to the right and so on.
I'm not sure how to do it. I'm almost sure that gsub function can be used to approach this result, however I'm not sure how to combine it with this searching and replacing. Could you please give me a hand doing so ?
Here's one option:
x <- c("B", "C_small", "A", "B_big", "C", "A_huge", "D", "A_big", "B_tremendous")
xorder <- unique(substr(x, 1, 1))
xnew <- c()
for (letter in xorder) {
if (letter %in% substr(x, 1, 1)) {
xnew <- c(xnew, x[substr(x, 1, 1) == letter])
}
}
xnew
[1] "B" "B_big" "B_tremendous" "C_small" "C"
[6] "A" "A_huge" "A_big" "D"
Use the "prefix" as factor levels and then order:
sx = substr(x, 1, 1)
x[order(factor(sx, levels = unique(sx)))]
# [1] "B" "B_big" "B_tremendous" "C_small" "C" "A" "A_huge" "A_big" "D"
If you are open for non-base alternatives, data.table::chgroup may be used, "groups together duplicated values but retains the group order (according the first appearance order of each group), efficiently":
x[chgroup(substr(x, 1, 1))]
# [1] "B" "B_big" "B_tremendous" "C_small" "C" "A" "A_huge" "A_big" "D"
I suggest splitting the two parts of the text into separate dimensions. Then, define a clear rank order for the descriptive part of the name using a named character vector. From there you can reorder the input vector on the fly. Bundled as a function:
x <- c("B", "C_small", "A", "B_big", "C", "A_huge", "D", "A_big", "B_tremendous")
sorter <- function(x) {
# separate the two parts
prefix <- sub("_.*$", "", x)
suffix <- sub("^.*_", "", x)
# identify inputs with no suffix
suffix <- ifelse(suffix == "", "none", suffix)
# map each suffix to a rank ordering
suffix_order <- c(
"small" = -1,
"none" = 0,
"big" = 1,
"huge" = 2,
"tremendous" = 3
)
# return input vector,
# ordered by the prefix and the mapping of suffix to rank
x[order(prefix, suffix_order[suffix])]
}
sorter(x)
Result
[1] "A_big" "A_huge" "A" "B_big" "B_tremendous" "B" "C_small" "C"
[9] "D"

Non duplicate remove subsetting [duplicate]

This question already has answers here:
"Set Difference" between two vectors with duplicate values
(4 answers)
Closed 2 years ago.
a <- c("A", "B", "C", "A", "A", "B")
b <- c("A", "C", "A")
I want to subset a wrt to b such that the following set is obtained:-
("B" "A" "B")
Tradition subsetting results in removal of all the "A"s and "C"s from set a.
It removes duplicates also. I don't want them to be remove. For ex:- Set b has 2 "A"s and 1 "C". So while subsetting a wrt b only two "A"s and one "C" should be removed from set a. And rest all the elements in a should remain even though they might be "A" or "C".
I just want to know if there is a way of doing this in R.
An easy option is to use vsetdiff from package vecsets, i.e.,
vecsets::vsetdiff(a,b)
such that
> vecsets::vsetdiff(a,b)
[1] "B" "A" "B"
Using tibble and dplyr, you can do:
enframe(a) %>%
transmute(name = value) %>%
group_by(name) %>%
mutate(ID = 1:n()) %>%
left_join(enframe(table(b)), by = c("name" = "name")) %>%
filter(ID > value | is.na(value)) %>%
pull(name)
[1] "B" "A" "B"
Here is a way to do this :
#Count occurrences of `a`
a_count <- table(a)
#Count occurrences of `b`
b_count <- table(b)
#Subtract the count present in b from a
a_count[names(b_count)] <- a_count[names(b_count)] - b_count
#Create a new vector of remaining values
rep(names(a_count), a_count)
#[1] "A" "B" "B"
Or:
a <- c("A", "B", "C", "A", "A", "B")
b <- c("A", "C", "A")
greedy_delete <- function(x, rmv) {
for (i in rmv) {
x <- x[-which(x == i)[1]]
}
x
}
greedy_delete(a, b)
#"B" "A" "B"

Group together identical elements in a vector using r

A very simple question but couldn't find an answer.
I have a vector of characters (for example - "a" "a" "a" "c" "c" "c" "b" "b" "b").
I would like to group together the elements to "a" "c" "b".
Is there a specific function for that?
Thank you
You can using sqldf librayr and using group by:
require(sqldf)
vector<- data.frame(v=c("a", "a", "a", "c", "c", "c", "b", "b", "b"))
sqldf("SELECT v from vector group by v")
Here you go
vector <- c("a", "a", "a", "c", "c", "c", "b", "b", "b")
sorted <- sort(vector)
If you just want the unique elements then there is, well, unique.
> unique(c("a", "a", "a", "c", "c", "c", "b", "b", "b"), sort=TRUE)
[1] "a" "c" "b"
Update
With the new description of the problem, this would be my solution
shifted <- c(NA, vector[-length(vector)])
vector[is.na(shifted) | vector != shifted]
I shift the vector one to the right, putting NA at the front because I have no better idea of what to put there, and then pick out the elements that are not NA and not equal to the previous element.
If the vector contains NA, some additional checks will be needed. It is not obvious how to put something that isn't the first element in the first position of the shifted vector without knowing a bit more. For example, you could extract all the elements form the vector and pick one that isn't the first, but that would fail if the vector only contains identical elements.
Another question now: is there a smarter way to implement the shift operation? I couldn't think of one, but there might be an more canonical solution.

Return all elements of list containing certain strings

I have a list of vectors containing strings and I want R to give me another list with all vectors that contain certain strings. MWE:
list1 <- list("a", c("a", "b"), c("a", "b", "c"))
Now, I want a list that contains all vectors with "a" and "b" in it. Thus, the new list should contain two elements, c("a", "b") and c("a", "b", "c").
As list1[grep("a|b", list1)] gives me a list of all vectors containing either "a" or "b", I expected list1[grep("a&b", list1)] to do what I want, but it did not (it returned a list of length 0).
This should work:
test <- list("a", c("a", "b"), c("a", "b", "c"))
test[sapply(test, function(x) sum(c('a', 'b') %in% x) == 2)]
Try purrr::keep
library(purrr)
keep(list1, ~ all(c("a", "b") %in% .))
We can use Filter
Filter(function(x) all(c('a', 'b') %in% x), test)
#[[1]]
#[1] "a" "b"
#[[2]]
#[1] "a" "b" "c"
A solution with grepl:
> list1[grepl("a", list1) & grepl("b", list1)]
[[1]]
[1] "a" "b"
[[2]]
[1] "a" "b" "c"

Order a numeric vector by length in R

I've got two numeric vectors that I want to order by the length of the their observations, i.e., the number of times each observation appears.
For example:
x <- c("a", "a", "a", "b", "b", "b", "b", "c", "e", "e")
Here, b occurs four times, a three times, e two and c one time. I'd like my result in this order.
ans <- c("b", "b", "b", "b", "a", "a", "a", "e", "e", "c")
I´ve tried this:
x <- x[order(-length(x))] # and some similar lines.
Thanks
Using rle you can get values lenghts. You order lengths, and use values to recreate the vector again using the new order:
xx <- c('a', 'a', 'a', 'b', 'b', 'b','b', 'c', 'e', 'e')
rr <- rle(xx)
ord <- order(rr$lengths,decreasing=TRUE)
rep(rr$values[ord],rr$length[ord])
## [1] "b" "b" "b" "b" "a" "a" "a" "e" "e" "c"
You may also use ave when calculating the lengths
x[order(ave(x, x, FUN = length), decreasing = TRUE)]
# [1] "b" "b" "b" "b" "a" "a" "a" "e" "e" "c"

Resources