I have a vector x, that I would like to sort based on the order of values in vector y. The two vectors are not of the same length.
x <- c(2, 2, 3, 4, 1, 4, 4, 3, 3)
y <- c(4, 2, 1, 3)
The expected result would be:
[1] 4 4 4 2 2 1 3 3 3
what about this one
x[order(match(x,y))]
You could convert x into an ordered factor:
x.factor <- factor(x, levels = y, ordered=TRUE)
sort(x)
sort(x.factor)
Obviously, changing your numbers into factors can radically change the way code downstream reacts to x. But since you didn't give us any context about what happens next, I thought I would suggest this as an option.
How about?:
rep(y,table(x)[as.character(y)])
(Ian's is probably still better)
In case you need to get order on "y" no matter if it's numbers or characters:
x[order(ordered(x, levels = y))]
4 4 4 2 2 1 3 3 3
By steps:
a <- ordered(x, levels = y) # Create ordered factor from "x" upon order in "y".
[1] 2 2 3 4 1 4 4 3 3
Levels: 4 < 2 < 1 < 3
b <- order(a) # Define "x" order that match to order in "y".
[1] 4 6 7 1 2 5 3 8 9
x[b] # Reorder "x" according to order in "y".
[1] 4 4 4 2 2 1 3 3 3
[Edit: Clearly Ian has the right approach, but I will leave this in for posterity.]
You can do this without loops by indexing on your y vector. Add an incrementing numeric value to y and merge them:
y <- data.frame(index=1:length(y), x=y)
x <- data.frame(x=x)
x <- merge(x,y)
x <- x[order(x$index),"x"]
x
[1] 4 4 4 2 2 1 3 3 3
x <- c(2, 2, 3, 4, 1, 4, 4, 3, 3)
y <- c(4, 2, 1, 3)
for(i in y) { z <- c(z, rep(i, sum(x==i))) }
The result in z: 4 4 4 2 2 1 3 3 3
The important steps:
for(i in y) -- Loops over the elements of interest.
z <- c(z, ...) -- Concatenates each subexpression in turn
rep(i, sum(x==i)) -- Repeats i (the current element of interest) sum(x==i) times (the number of times we found i in x).
Also you can use sqldf and do it by a join function in sql likes the following:
library(sqldf)
x <- data.frame(x = c(2, 2, 3, 4, 1, 4, 4, 3, 3))
y <- data.frame(y = c(4, 2, 1, 3))
result <- sqldf("SELECT x.x FROM y JOIN x on y.y = x.x")
ordered_x <- result[[1]]
Related
I have a vector x = [5, 5, 3, 2, 2]. The rank of an element is its position in the descending list of unique values. I would like to return the vector contains the rank of each element, i.e [1, 1, 2, 3, 3]. Unfortunately, the function order does not do the job.
x <- c(5, 5, 3, 2, 2)
order(x)
and the result is
[1] 4 5 3 1 2
Could you please elaborate on how to do so?
1) factor Convert to a factor having the indicated levels and then convert to numeric to get the level numbers:
as.numeric(factor(x, levels = unique(x)))
## [1] 1 1 2 3 3
2) match Another possibility is to use match:
match(x, unique(x))
## [1] 1 1 2 3 3
3) findInterval findInterval requires non-descending numbers in the second argument so we negate x.
findInterval(-x, unique(-x))
## [1] 1 1 2 3 3
4) diff/cumsum
cumsum(c(TRUE, diff(x) != 0))
## [1] 1 1 2 3 3
5) rle
r <- rle(x)
r$values <- seq_along(r$values)
inverse.rle(r)
## [1] 1 1 2 3 3
Note
The input in R syntax is:
x <- c(5, 5, 3, 2, 2)
I have a tree data serialized like the following:
Relationship: P to C is "one-to-many", and C to P is "one-to-one". So column P may have duplicate values, but column C has unique values.
P, C
1, 2
1, 3
3, 4
2, 5
4, 6
# in data.frame
df <- data.frame(P=c(1,1,3,2,4), C=c(2,3,4,5,6))
1. How do I efficiently implement a function func so that:
func(df, val) returns a vector of full path to root (1 in this case).
For example:
func(df, 3) returns c(1,2,3)
func(df, 5) returns c(1,2,5)
func(df, 6) returns c(1,3,4,6)
2. Alternatively, quickly transforming df to a lookup table like this also works for me:
C, Paths
2, c(1,2)
3, c(1,3)
4, c(1,3,4)
5, c(1,2,5)
6, c(1,2,4,6)
Here is a solution using igraph
library(igraph)
g <- graph_from_data_frame(df)
df <- within(df,
Path <- sapply(match(as.character(C),names(V(g))),
function(k) toString(names(unlist(all_simple_paths(g,1,k))))))
such that
> df
P C Path
1 1 2 1, 2
2 1 3 1, 3
3 3 4 1, 3, 4
4 2 5 1, 2, 5
5 4 6 1, 3, 4, 6
Given two sorted vectors, how can you get the index of the closest values from one onto the other.
For example, given:
a = 1:20
b = seq(from=1, to=20, by=5)
how can I efficiently get the vector
c = (1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4)
which, for each value in a, provides the index of the largest value in b that is less than or equal to it. But the solution needs to work for unpredictable (though sorted) contents of a and b, and needs to be fast when a and b are large.
You can use findInterval, which constructs a sequence of intervals given by breakpoints in b and returns the interval indices in which the elements of a are located (see also ?findInterval for additional arguments, such as behavior at interval boundaries).
a = 1:20
b = seq(from = 1, to = 20, by = 5)
findInterval(a, b)
#> [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4
We can use cut
as.integer(cut(a, breaks = unique(c(b-1, Inf)), labels = seq_along(b)))
#[1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4
This question already has answers here:
Aggregating regardless of the order of columns
(4 answers)
Closed 3 years ago.
The following works as expected:
m <- matrix (c(1, 2, 3,
1, 2, 4,
2, 1, 4,
2, 1, 4,
2, 3, 4,
2, 3, 6,
3, 2, 3,
3, 2, 2), byrow=TRUE, ncol=3)
df <- data.frame(m)
aggdf <- aggregate(df$X3, list(df$X1, df$X2), FUN=sum)
colnames(aggdf) <- c("A", "B", "value")
and results in:
A B value
1 2 1 8
2 1 2 7
3 3 2 5
4 2 3 10
But I would like to treat rows 1/2 and 3/4 as equal, not caring whether observation A is 1 and B is 2 or vice versa.
I also do not care about how the aggregation is sorting A/B in the final data.frame, so both of the following results would be fine:
A B value
1 2 1 15
2 3 2 15
A B value
1 1 2 15
2 2 3 15
How can that be achieved?
You need to get them in a consistent order. For just 2 columns, pmin and pmax work nicely:
df$A = with(df, pmin(X1, X2))
df$B = with(df, pmax(X1, X2))
aggregate(df$X3, df[c("A", "B")], FUN = sum)
# A B x
# 1 1 2 15
# 2 2 3 15
For more columns, use sort, as akrun recommends:
df[1:2] <- t(apply(df[1:2], 1, sort))
By changing 1:2 to all the key columns, this generalizes up easily.
I have these two vectors:
x=c('a','c','b','b','c','a','d','d')
y=c(1, 4, 2, 4, 5, 9, 3, 3)
I want the order of x based on value of y such that each group in x are ordered following their minimum in y. Moreover within each group a, b, c, d, I want the order depending on ascending values of y.
eg the result of this ordering per group is:
x |a a b b d d c c
y |1 9 2 4 3 3 4 5
Hence the output must be:
output = c(1, 7, 3, 4, 8, 2, 5, 6)
I tried to use ave but can't combine both:
> ave(y, x, FUN=function(u) rank(u, ties.method='first'))
[1] 1 1 1 2 2 2 1 2
> ave(y, x, FUN=min)
[1] 1 4 2 2 4 1 3 3
You are trying to order first by the grouped y minimum and then by the y value itself, so you should pass these as the first and second arguments to the order function:
ordering <- order(ave(y, x, FUN=min), y)
x[ordering]
# [1] "a" "a" "b" "b" "d" "d" "c" "c"
y[ordering]
# [1] 1 9 2 4 3 3 4 5