Create new vector from row index of two matching columns - r

I have a data frame:
a <- c(1,2,3,4,5,6)
b <- c(1,2,1,2,1,4)
c <- c("A", "B", "C", "D", "E", "F")
df <- data.frame(a,b,c)
What I want to do, is create another vector d, which contains the value of c in the row of a which matches each value of b
So my new vector would look like this:
d <- c("A", "B", "A", "B", "A", "D")
As an example, the final value of b is 4, which matches with the 4th row of a, so the value of d is the 4th row of c, which is "D".

If a and b are both lists with integer values you can use them directly.
d <- c[b[a]]
d
[1] "A" "B" "A" "B" "A" "D"
if a is a regular integer sequence along c you can simply call c from b.
c[b]
[1] "A" "B" "A" "B" "A" "D"

Another option is to convert to factor and use it as:
factor(a, labels = c)[b]
#[1] A B A B A D
OR
as.character(factor(a, labels = c)[b])
#[1] "A" "B" "A" "B" "A" "D"
data
a <- c(1,2,3,4,5,6)
b <- c(1,2,1,2,1,4)
c <- c("A", "B", "C", "D", "E", "F")

Related

Finding in which vector does the element belong to

suppose I have 3 vectors:
a = c("A", "B", "C")
b = c("D", "E", "F")
c = c("G", "H", "I")
then I have an element:
element = "E"
I want to find which list does my element belongs to. In this case, list b.
It will be appreciated if the solution to this problem is more general because my real data set have more than a hundred lists.
element = "E"
names(our_lists)[sapply(our_lists, `%in%`, x = element)]
# [1] "b"
Data
our_lists <- list(
a = c("A", "B", "C"),
b = c("D", "E", "F"),
c = c("G", "H", "I")
)
Using grep.
element <- "E"
l <- mget(c("a", "b", "c"))
names(l)[grep(element, l)]
# [1] "b"
If you keep the data in individual objects, you need to check for the element in each one individually. Get them in a list.
list_data <- mget(c('a', 'b', 'c'))
names(Filter(any, lapply(list_data, `==`, element)))
#[1] "b"
If all your vectors have the same length then a vectorised idea can be,
c('a', 'b', 'c')[ceiling(which(c(a, b, c) == 'E') / length(a))]
#[1] "b"
You can use dplyr::lst that creates named list from variable names. Then purrr::keep to keep only the vectors that contain your element.
require(tidyverse)
lst(a, b, c) %>%
keep(~ element %in% .x) %>%
names()
output:
[1] "b"

Return all elements of list containing certain strings

I have a list of vectors containing strings and I want R to give me another list with all vectors that contain certain strings. MWE:
list1 <- list("a", c("a", "b"), c("a", "b", "c"))
Now, I want a list that contains all vectors with "a" and "b" in it. Thus, the new list should contain two elements, c("a", "b") and c("a", "b", "c").
As list1[grep("a|b", list1)] gives me a list of all vectors containing either "a" or "b", I expected list1[grep("a&b", list1)] to do what I want, but it did not (it returned a list of length 0).
This should work:
test <- list("a", c("a", "b"), c("a", "b", "c"))
test[sapply(test, function(x) sum(c('a', 'b') %in% x) == 2)]
Try purrr::keep
library(purrr)
keep(list1, ~ all(c("a", "b") %in% .))
We can use Filter
Filter(function(x) all(c('a', 'b') %in% x), test)
#[[1]]
#[1] "a" "b"
#[[2]]
#[1] "a" "b" "c"
A solution with grepl:
> list1[grepl("a", list1) & grepl("b", list1)]
[[1]]
[1] "a" "b"
[[2]]
[1] "a" "b" "c"

Using fct_relevel over a list of variables using map_at

I have a bunch of factor variables that have the same levels, and I want them all reordered similarly using fct_relevel from the forcats package. Many of the variable names start with the same characters ("Q11A" to "Q11X", "Q12A" to "Q12X", "Q13A" to "Q13X", etc.). I wanted to use the starts_with function from dplyr to shorten the task. The following error didn't give me an error, but it didn't do anything either. Is there anything I'm doing wrong?
library(dplyr)
library(purrr)
library(forcats)
library(tibble)
#Setting up dataframe
f1 <- factor(c("a", "b", "c", "d"))
f2 <- factor(c("a", "b", "c", "d"))
f3 <- factor(c("a", "b", "c", "d"))
f4 <- factor(c("a", "b", "c", "d"))
f5 <- factor(c("a", "b", "c", "d"))
df <- tibble(f1, f2, f3, f4, f5)
levels(df$f1)
[1] "a" "b" "c" "d"
#Attempting to move level "c" up before "a" and "b".
df <- map_at(df, starts_with("f"), fct_relevel, "c")
levels(df$f1)
[1] "a" "b" "c" "d" #Didn't work
#If I just re-level for one variable:
fct_relevel(df$f1, "c")
[1] a b c d
Levels: c a b d
#That worked.
I think you're looking for mutate_at:
df <- mutate_at(df, starts_with("f"), fct_relevel, ... = "c")
df$f1
[1] a b c d
Levels: c a b d

Getting the set of nodes connected till the main parent node in R

I have a data set which has 6 rows and 3 columns. The first column represents children, whereas second column onward immediate parents of the corresponding child is allocated.
Above, one can see that "a" and "b" don't have any parents. whereas "c" has only parent and that is "a". "d" has parents "b" and "c" and so on.
What I need is: if given the input as the child, it should give me all the ancestors of that child including child.
e.g. "f" is the child I chose then desired output should be :
{"f", "d", "b"}, {"f", "d", "c", "a"}, {"f", "e", "b"}, {"f", "e", "c", "a"}.
Note: Order of the nodes does not matter.
Thank you so much in advance.
Create sample data. Note use of stringsAsFactors here, I'm assuming your data are characters and not factors:
> d <- data.frame(list("c" = c("a", "b", "c", "d", "e", "f"), "p1" = c(NA, NA, "a", "b", "b", "d"), "p2" = c(NA, NA, NA, "c", "c", "e")),stringsAsFactors=FALSE)
First tidy it up - make the data long, not wide, with each row being a child-parent pair:
> pairs = subset(reshape2::melt(d,id.vars="c",value.name="parent"), !is.na(parent))[,c("c","parent")]
> pairs
c parent
3 c a
4 d b
5 e b
6 f d
10 d c
11 e c
12 f e
Now we can make a graph of the parent-child relationships. This is a directed graph, so plots child-parent as an arrow:
> g = graph.data.frame(pairs)
> plot(g)
Now I'm not sure exactly what you want, but igraph functions can do anything... So for example, here's a search of the graph starting at d from which we can get various bits of information:
> d_search = bfs(g,"d",neimode="out", unreachable=FALSE, order=TRUE, dist=TRUE)
First, which nodes are ancestors of d? Its the ones that can be reached from d via the exhaustive (here, breadth-first) search:
> d_search$order
+ 6/6 vertices, named:
[1] d c b a <NA> <NA>
Note it includes d as well. Trivial enough to drop from this list. That gives you the set of ancestors of d which is what you asked for.
What is the relationship of those nodes to d?
> d_search$dist
c d e f a b
1 0 NaN NaN 2 1
We see that e and f are unreachable, so are not ancestors of d. c and b are direct parents, and a is a grandparent. You can check this from the graph.
You can also get all the paths from any child upwards using functions like shortest_paths and so on.
Here is a recursive function that makes all possible family lines:
d <- data.frame(list("c" = c("a", "b", "c", "d", "e", "f"),
"p1" = c(NA, NA, "a", "b", "b", "d"),
"p2" = c(NA, NA, NA, "c", "c", "e")), stringsAsFactors = F)
# Make data more convenient for the task.
library(reshape2)
dp <- melt(d, id = c("c"), value.name = "p")
# Recursive function builds ancestor vectors.
getAncestors <- function(data, x, ancestors = list(x)) {
parents <- subset(data, c %in% x & !is.na(p), select = c("c", "p"))
if(nrow(parents) == 0) {
return(ancestors)
}
x.c <- parents$c
p.c <- parents$p
ancestors <- lapply(ancestors, function(x) {
if (is.null(x)) return(NULL)
# Here we want to repeat ancestor chain for each new parent.
res <- list()
matches <- 0
for (i in 1:nrow(parents)) {
if (tail(x, 1) == parents[i, ]$c){
res[[i]] <- c(x, parents[i, ]$p)
matches <- matches + 1
}
}
if (matches == 0) { # There are no more parents.
res[[1]] <- x
}
return (res)
})
# remove one level of lists.
ancestors <- unlist(ancestors, recursive = F)
res <- getAncestors(data, p.c, ancestors)
return (res)
}
# Demo of results for the lowest level.
res <- getAncestors(dp, "f")
res
#[[1]]
#[1] "f" "d" "b"
#[[2]]
#[1] "f" "d" "c" "a"
#[[3]]
#[1] "f" "e" "b"
#[[4]]
#[1] "f" "e" "c" "a"
You will need to implement this in a similar way through recursion or with a while loop.

Order a numeric vector by length in R

I've got two numeric vectors that I want to order by the length of the their observations, i.e., the number of times each observation appears.
For example:
x <- c("a", "a", "a", "b", "b", "b", "b", "c", "e", "e")
Here, b occurs four times, a three times, e two and c one time. I'd like my result in this order.
ans <- c("b", "b", "b", "b", "a", "a", "a", "e", "e", "c")
I´ve tried this:
x <- x[order(-length(x))] # and some similar lines.
Thanks
Using rle you can get values lenghts. You order lengths, and use values to recreate the vector again using the new order:
xx <- c('a', 'a', 'a', 'b', 'b', 'b','b', 'c', 'e', 'e')
rr <- rle(xx)
ord <- order(rr$lengths,decreasing=TRUE)
rep(rr$values[ord],rr$length[ord])
## [1] "b" "b" "b" "b" "a" "a" "a" "e" "e" "c"
You may also use ave when calculating the lengths
x[order(ave(x, x, FUN = length), decreasing = TRUE)]
# [1] "b" "b" "b" "b" "a" "a" "a" "e" "e" "c"

Resources