R find order of a vector - r

I have these two vectors:
x=c('a','c','b','b','c','a','d','d')
y=c(1, 4, 2, 4, 5, 9, 3, 3)
I want the order of x based on value of y such that each group in x are ordered following their minimum in y. Moreover within each group a, b, c, d, I want the order depending on ascending values of y.
eg the result of this ordering per group is:
x |a a b b d d c c
y |1 9 2 4 3 3 4 5
Hence the output must be:
output = c(1, 7, 3, 4, 8, 2, 5, 6)
I tried to use ave but can't combine both:
> ave(y, x, FUN=function(u) rank(u, ties.method='first'))
[1] 1 1 1 2 2 2 1 2
> ave(y, x, FUN=min)
[1] 1 4 2 2 4 1 3 3

You are trying to order first by the grouped y minimum and then by the y value itself, so you should pass these as the first and second arguments to the order function:
ordering <- order(ave(y, x, FUN=min), y)
x[ordering]
# [1] "a" "a" "b" "b" "d" "d" "c" "c"
y[ordering]
# [1] 1 9 2 4 3 3 4 5

Related

How to find orders of elements in a vector in which duplicate elements have the same order?

I have a vector x = [5, 5, 3, 2, 2]. The rank of an element is its position in the descending list of unique values. I would like to return the vector contains the rank of each element, i.e [1, 1, 2, 3, 3]. Unfortunately, the function order does not do the job.
x <- c(5, 5, 3, 2, 2)
order(x)
and the result is
[1] 4 5 3 1 2
Could you please elaborate on how to do so?
1) factor Convert to a factor having the indicated levels and then convert to numeric to get the level numbers:
as.numeric(factor(x, levels = unique(x)))
## [1] 1 1 2 3 3
2) match Another possibility is to use match:
match(x, unique(x))
## [1] 1 1 2 3 3
3) findInterval findInterval requires non-descending numbers in the second argument so we negate x.
findInterval(-x, unique(-x))
## [1] 1 1 2 3 3
4) diff/cumsum
cumsum(c(TRUE, diff(x) != 0))
## [1] 1 1 2 3 3
5) rle
r <- rle(x)
r$values <- seq_along(r$values)
inverse.rle(r)
## [1] 1 1 2 3 3
Note
The input in R syntax is:
x <- c(5, 5, 3, 2, 2)

Aggregate rows in data.frame containing same values over different columns [duplicate]

This question already has answers here:
Aggregating regardless of the order of columns
(4 answers)
Closed 3 years ago.
The following works as expected:
m <- matrix (c(1, 2, 3,
1, 2, 4,
2, 1, 4,
2, 1, 4,
2, 3, 4,
2, 3, 6,
3, 2, 3,
3, 2, 2), byrow=TRUE, ncol=3)
df <- data.frame(m)
aggdf <- aggregate(df$X3, list(df$X1, df$X2), FUN=sum)
colnames(aggdf) <- c("A", "B", "value")
and results in:
A B value
1 2 1 8
2 1 2 7
3 3 2 5
4 2 3 10
But I would like to treat rows 1/2 and 3/4 as equal, not caring whether observation A is 1 and B is 2 or vice versa.
I also do not care about how the aggregation is sorting A/B in the final data.frame, so both of the following results would be fine:
A B value
1 2 1 15
2 3 2 15
A B value
1 1 2 15
2 2 3 15
How can that be achieved?
You need to get them in a consistent order. For just 2 columns, pmin and pmax work nicely:
df$A = with(df, pmin(X1, X2))
df$B = with(df, pmax(X1, X2))
aggregate(df$X3, df[c("A", "B")], FUN = sum)
# A B x
# 1 1 2 15
# 2 2 3 15
For more columns, use sort, as akrun recommends:
df[1:2] <- t(apply(df[1:2], 1, sort))
By changing 1:2 to all the key columns, this generalizes up easily.

Sorting by successive vectors in R [duplicate]

I have a vector x, that I would like to sort based on the order of values in vector y. The two vectors are not of the same length.
x <- c(2, 2, 3, 4, 1, 4, 4, 3, 3)
y <- c(4, 2, 1, 3)
The expected result would be:
[1] 4 4 4 2 2 1 3 3 3
what about this one
x[order(match(x,y))]
You could convert x into an ordered factor:
x.factor <- factor(x, levels = y, ordered=TRUE)
sort(x)
sort(x.factor)
Obviously, changing your numbers into factors can radically change the way code downstream reacts to x. But since you didn't give us any context about what happens next, I thought I would suggest this as an option.
How about?:
rep(y,table(x)[as.character(y)])
(Ian's is probably still better)
In case you need to get order on "y" no matter if it's numbers or characters:
x[order(ordered(x, levels = y))]
4 4 4 2 2 1 3 3 3
By steps:
a <- ordered(x, levels = y) # Create ordered factor from "x" upon order in "y".
[1] 2 2 3 4 1 4 4 3 3
Levels: 4 < 2 < 1 < 3
b <- order(a) # Define "x" order that match to order in "y".
[1] 4 6 7 1 2 5 3 8 9
x[b] # Reorder "x" according to order in "y".
[1] 4 4 4 2 2 1 3 3 3
[Edit: Clearly Ian has the right approach, but I will leave this in for posterity.]
You can do this without loops by indexing on your y vector. Add an incrementing numeric value to y and merge them:
y <- data.frame(index=1:length(y), x=y)
x <- data.frame(x=x)
x <- merge(x,y)
x <- x[order(x$index),"x"]
x
[1] 4 4 4 2 2 1 3 3 3
x <- c(2, 2, 3, 4, 1, 4, 4, 3, 3)
y <- c(4, 2, 1, 3)
for(i in y) { z <- c(z, rep(i, sum(x==i))) }
The result in z: 4 4 4 2 2 1 3 3 3
The important steps:
for(i in y) -- Loops over the elements of interest.
z <- c(z, ...) -- Concatenates each subexpression in turn
rep(i, sum(x==i)) -- Repeats i (the current element of interest) sum(x==i) times (the number of times we found i in x).
Also you can use sqldf and do it by a join function in sql likes the following:
library(sqldf)
x <- data.frame(x = c(2, 2, 3, 4, 1, 4, 4, 3, 3))
y <- data.frame(y = c(4, 2, 1, 3))
result <- sqldf("SELECT x.x FROM y JOIN x on y.y = x.x")
ordered_x <- result[[1]]

Data manipulation in R merge columns

I have a column in a data set A: 1, 1 , 2 , 2, 3, 4, 4, 4, 4, 5, 5.
and a data set B B:1, 2, 3, 4, 5
Is there a way how to respectively assign the values of B to the values of A.
The desirable result has to be :
A B C
1 v v
1 b v
2 n b
2 m b
3 k n
4 m
4 m
4 m
4 m
5 k
5 k
You could try
C <- B[A]
#> C
# [1] "v" "v" "b" "b" "n" "m" "m" "m" "m" "k" "k"
If you want to store this result in a data frame, you could use
length(B) <- length(A) # adapt the length of column B to that of column A
df <- cbind(A, B, C) # generate a matrix with three columns
df[is.na(df)] <- "" # remove the NA entries in column B (replace them with
# an empty string) in the rows where it is not defined
df <- as.data.frame(df) # convert the matrix into a data frame
#> df
# A B C
#1 1 v v
#2 1 b v
#3 2 n b
#4 2 m b
#5 3 k n
#6 4 m
#7 4 m
#8 4 m
#9 4 m
#10 5 k
#11 5 k
data
A <- c(1, 1, 2, 2, 3, 4, 4, 4, 4, 5, 5)
B <- c("v", "b", "n", "m", "k")
However, if you already have the columns A and B stored in a data frame and you only need to generate column C, you could obtain this result using df$C <- with(df, B[A])

Nested lapply() in a list?

I have a list l, which has the following features:
It has 3 elements
Each element is a numeric vector of length 5
Each vector contains numbers from 1 to 5
l = list(a = c(2, 3, 1, 5, 1), b = c(4, 3, 3, 5, 2), c = c(5, 1, 3, 2, 4))
I want to do two things:
First
I want to know how many times each number occurs in the entire list and I want each result in a vector (or any form that can allow me to perform computations with the results later):
Code 1:
> a <- table(sapply(l, "["))
> x <- as.data.frame(a)
> x
Var1 Freq
1 1 3
2 2 3
3 3 4
4 4 2
5 5 3
Is there anyway to do it without using the table() function. I would like to do it "manually". I try to do it right below.
Code 2: (I know this is not very efficient!)
x <- data.frame(
"1" <- sum(sapply(l, "[")) == 1
"2" <- sum(sapply(l, "[")) == 2
"3" <- sum(sapply(l, "[")) == 3
"4" <- sum(sapply(l, "[")) == 4
"5" <- sum(sapply(l, "[")) == 5)
I tried the following, but I did not work. I actually did not understand the result.
> sapply(l, "[") == 1:5
a b c
[1,] FALSE FALSE FALSE
[2,] FALSE FALSE FALSE
[3,] FALSE TRUE TRUE
[4,] FALSE FALSE FALSE
[5,] FALSE FALSE FALSE
> sum(sapply(l, "[") == 1:5)
[1] 2
Second
Now, I would like to get the number of times each number appears in the list, but now in each element $a, $b and $c. I thought about using the lapply() but I don't know how exactly. Following is what I tried, but it is inefficient just like Code 2:
lapply(l, function(x) sum(x == 1))
lapply(l, function(x) sum(x == 2))
lapply(l, function(x) sum(x == 3))
lapply(l, function(x) sum(x == 4))
lapply(l, function(x) sum(x == 5))
What I get with these 5 lines of code are 5 lists of 3 elements each containing a single numeric value. For example, the second line of code tells me how many times number 2 appears in each element of l.
Code 3:
> lapply(l, function(x) sum(x == 2))
$a
[1] 1
$b
[1] 1
$c
[1] 1
What I would like to obtain is a list with three elements containing all the information I am looking for.
Please, use the references "Code 1", "Code 2" and "Code 3" in your answers. Thank you very much.
Just use as.data.frame(l) for the second part and table(unlist(l)) for the first.
> table(unlist(l))
1 2 3 4 5
3 3 4 2 3
> data.frame(lapply(l, tabulate))
a b c
1 2 0 1
2 1 1 1
3 1 2 1
4 0 1 1
5 1 1 1`
For code 1/2, you could use sapply to obtain the counts for whichever values you wanted:
l = list(a = c(2, 3, 1, 5, 1), b = c(4, 3, 3, 5, 2), c = c(5, 1, 3, 2, 4))
data.frame(number = 1:5,
freq = sapply(1:5, function(x) sum(unlist(l) == x)))
# number freq
# 1 1 3
# 2 2 3
# 3 3 4
# 4 4 2
# 5 5 3
For code 3, if you wanted to get the counts for lists a, b, and c, you could just apply your frequency function to each element of the list with the lapply function:
freqs = lapply(l, function(y) sapply(1:5, function(x) sum(unlist(y) == x)))
data.frame(number = 1:5, a=freqs$a, b=freqs$b, c=freqs$c)
# number a b c
# 1 1 2 0 1
# 2 2 1 1 1
# 3 3 1 2 1
# 4 4 0 1 1
# 5 5 1 1 1
here you have another example with nested lapply().
created data:
list = NULL
list[[1]] = c(1:5)
list[[2]] = c(1:5)+3
list[[2]] = c(1:5)+4
list[[3]] = c(1:5)-1
list[[4]] = c(1:5)*3
list2 = NULL
list2[[1]] = rep(1,5)
list2[[2]] = rep(2,5)
list2[[3]] = rep(0,5)
The result is this; it serve to subtract each element of one list with all elements of the other list.
lapply(list, function(d){ lapply(list2, function(a,b) {a-b}, b=d)})

Resources