R Subtracting columns within a list - r

I'd like to subtract specific columns within a list. I'm still learning how to properly use the apply functions. For example, given
> b <- list(data.frame(12:16, 3*2:6), data.frame(10:14, 2*1:5))
> b
[[1]]
X12.16 X3...2.6
1 12 6
2 13 9
3 14 12
4 15 15
5 16 18
[[2]]
X10.14 X2...1.5
1 10 2
2 11 4
3 12 6
4 13 8
5 14 10
I'd like some function x so that I get
> x(b)
[[1]]
X12.16 X3...2.6 <newcol>
1 12 6 6
2 13 9 4
3 14 12 2
4 15 15 0
5 16 18 -2
[[2]]
X10.14 X2...1.5 <newcol>
1 10 2 8
2 11 4 7
3 12 6 6
4 13 8 5
5 14 10 4
Thanks in advance.

If your data.frames had nice and consistent names, you could use transform with lapply
b <- list(data.frame(a=12:16, b=3*2:6), data.frame(a=10:14, b=2*1:5))
lapply(b, transform, c=a-b)

Here is a solution:
lapply(b, function(x) {
x[, 3] <- x[, 1] - x[, 2]
x
})
[[1]]
X12.16 X3...2.6 V3
1 12 6 6
2 13 9 4
3 14 12 2
4 15 15 0
5 16 18 -2
[[2]]
X10.14 X2...1.5 V3
1 10 2 8
2 11 4 7
3 12 6 6
4 13 8 5
5 14 10 4

with dplyr:
library(dplyr)
lapply(b, function(x) x %>% mutate(new_col = .[[1]]-.[[2]]))
Result:
[[1]]
X12.16 X3...2.6 new_col
1 12 6 6
2 13 9 4
3 14 12 2
4 15 15 0
5 16 18 -2
[[2]]
X10.14 X2...1.5 new_col
1 10 2 8
2 11 4 7
3 12 6 6
4 13 8 5
5 14 10 4

Related

Convert dataframe from vertical to horizontal

I already checked many questions and I don't seem to find the suitable answer.
I have this df
df = data.frame(x = 1:10,y=11:20)
the output
x y
1 1 11
2 2 12
3 3 13
4 4 14
5 5 15
6 6 16
7 7 17
8 8 18
9 9 19
10 10 20
I just wish the output to be:
1 2 3 4 5 6 7 8 9 10
x 1 2 3 4 5 6 7 8 9 10
y 11 12 13 14 15 16 17 18 19 20
thanks
Try t() like below
> data.frame(t(df), check.names = FALSE)
1 2 3 4 5 6 7 8 9 10
x 1 2 3 4 5 6 7 8 9 10
y 11 12 13 14 15 16 17 18 19 20
A transpose should do it
setNames(data.frame(t(df)), df[,"x"])
1 2 3 4 5 6 7 8 9 10
x 1 2 3 4 5 6 7 8 9 10
y 11 12 13 14 15 16 17 18 19 20

How to order a dataframe by a column with hyphen in dplyr

I have a dataframe like below.
I want to always have the dataframe in this order, but when I try to reorder the dataframe by id using dplyr::arrange(), it changes in a way that I don't want to. Is there any solution for this?
library(dplyr)
set.seed(10)
df <- data.frame(id = paste(2022,1:20, sep = "-"), weight = round(rnorm(20, 5, 1)))
df
id weight
1 2022-1 5
2 2022-2 5
3 2022-3 4
4 2022-4 4
5 2022-5 5
6 2022-6 5
7 2022-7 4
8 2022-8 5
9 2022-9 3
10 2022-10 5
11 2022-11 6
12 2022-12 6
13 2022-13 5
14 2022-14 6
15 2022-15 6
16 2022-16 5
17 2022-17 4
18 2022-18 5
19 2022-19 6
20 2022-20 5
wrong_order_df <- df %>% arrange(weight) %>% arrange(id)
wrong_order_df
id weight
1 2022-1 5
2 2022-10 5
3 2022-11 6
4 2022-12 6
5 2022-13 5
6 2022-14 6
7 2022-15 6
8 2022-16 5
9 2022-17 4
10 2022-18 5
11 2022-19 6
12 2022-2 5
13 2022-20 5
14 2022-3 4
15 2022-4 4
16 2022-5 5
17 2022-6 5
18 2022-7 4
19 2022-8 5
20 2022-9 3
The idea that I came up with is to add a new column just for working on this issue. But I believe there is a more ellegant way.
correct_order_df <- wrong_order_df %>% mutate(id_order = as.numeric(str_extract(id, '\\b\\w+$'))) %>% arrange(id_order)
correct_order_df
id weight id_order
1 2022-1 5 1
2 2022-2 5 2
3 2022-3 4 3
4 2022-4 4 4
5 2022-5 5 5
6 2022-6 5 6
7 2022-7 4 7
8 2022-8 5 8
9 2022-9 3 9
10 2022-10 5 10
11 2022-11 6 11
12 2022-12 6 12
13 2022-13 5 13
14 2022-14 6 14
15 2022-15 6 15
16 2022-16 5 16
17 2022-17 4 17
18 2022-18 5 18
19 2022-19 6 19
20 2022-20 5 20
You can input the specifications within arrange. Not sure how much more elegant we need than one line. Let me know if this works:
arrange(df, as.numeric(str_extract(id, "(?<=-)\\d+")))

How to find all pairs of two lists, and categorize them without repetitions?

We are preparing for a program where 18 people should discuss topics in a way that in each round they form pairs, and then they switch until everyone has talked to everyone. It means 153 discussions, 9 pairs talking parallelly in each round, for 17 rounds. I tried to formulate a matrix showing who should talk to whom in order to avoid the chaos, but could not succeed. For the sake of simplicity everyone is given a number, so the bottom line is, i would need all pairs of combinations of the numbers from 1 to 18 (did that with combn function), but then these pairs should be rearranged for the 17 round so that each number only appears once per round. Any ideas?
Let's first look at a simpler problem with 6 persons. The following matrix lists who (rows) is talking to whom (columns) in which round (entry):
So for example in round 1 (yellow) we have the following pairs:
(1-2), (3-5), (4-6)
For round 2 (green) we would have:
(1-3), (2-6), (4-5)
and so on.
Thus, basically we are looking for a symmetric latin square (i.e. in each row and in each column each entry appears only once, cf. Latin Squares on Wikipedia).
The latin square in the box can be easily generated via an addition table:
inner_ls <- function(k) {
res <- outer(0:(k-1), 0:(k-1), function(i, j) (i + j) %% k)
## replace zeros by k
res[res == 0] <- k
## replace diagonal by NA
diag(res) <- NA
res
}
inner_ls(5)
# [,1] [,2] [,3] [,4] [,5]
# [1,] NA 1 2 3 4
# [2,] 1 NA 3 4 5
# [3,] 2 3 NA 5 1
# [4,] 3 4 5 NA 2
# [5,] 4 5 1 2 NA
So all is left to append the last row (column) with the missing round number:
full_ls <- function(k) {
i_ls <- inner_ls(k - 1)
last_row <- apply(i_ls, 1, function(row) {
rounds <- 1:(k - 1)
rounds[!rounds %in% row]
})
res <- cbind(rbind(i_ls, last_row), c(last_row, NA))
rownames(res) <- colnames(res) <- 1:k
res
}
full_ls(6)
# 1 2 3 4 5 6
# 1 NA 1 2 3 4 5
# 2 1 NA 3 4 5 2
# 3 2 3 NA 5 1 4
# 4 3 4 5 NA 2 1
# 5 4 5 1 2 NA 3
# 6 5 2 4 1 3 NA
With that you get your assignment matrix as follows:
full_ls(18)
# 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
# 1 NA 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
# 2 1 NA 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 2
# 3 2 3 NA 5 6 7 8 9 10 11 12 13 14 15 16 17 1 4
# 4 3 4 5 NA 7 8 9 10 11 12 13 14 15 16 17 1 2 6
# 5 4 5 6 7 NA 9 10 11 12 13 14 15 16 17 1 2 3 8
# 6 5 6 7 8 9 NA 11 12 13 14 15 16 17 1 2 3 4 10
# 7 6 7 8 9 10 11 NA 13 14 15 16 17 1 2 3 4 5 12
# 8 7 8 9 10 11 12 13 NA 15 16 17 1 2 3 4 5 6 14
# 9 8 9 10 11 12 13 14 15 NA 17 1 2 3 4 5 6 7 16
# 10 9 10 11 12 13 14 15 16 17 NA 2 3 4 5 6 7 8 1
# 11 10 11 12 13 14 15 16 17 1 2 NA 4 5 6 7 8 9 3
# 12 11 12 13 14 15 16 17 1 2 3 4 NA 6 7 8 9 10 5
# 13 12 13 14 15 16 17 1 2 3 4 5 6 NA 8 9 10 11 7
# 14 13 14 15 16 17 1 2 3 4 5 6 7 8 NA 10 11 12 9
# 15 14 15 16 17 1 2 3 4 5 6 7 8 9 10 NA 12 13 11
# 16 15 16 17 1 2 3 4 5 6 7 8 9 10 11 12 NA 14 13
# 17 16 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 NA 15
# 18 17 2 4 6 8 10 12 14 16 1 3 5 7 9 11 13 15 NA

making sort order in merge() numeric

I have two easy matrices (or df's) to merge:
a <- cbind(one=0:15, two=0:15, three=0:15)
b <- cbind(one=0:15, two=0:15, three=0:15)
#a <- data.frame(one=0:15, two=0:15, three=0:15)
#b <- data.frame(one=0:15, two=0:15, three=0:15)
No problem: after sorting on column one, column one is output ascending nicely from 0 to 15:
merge(a,b,by=c("one"), sort=T)
one two.x three.x two.y three.y
1 0 0 0 0 0
2 1 1 1 1 1
3 2 2 2 2 2
4 3 3 3 3 3
5 4 4 4 4 4
6 5 5 5 5 5
7 6 6 6 6 6
8 7 7 7 7 7
9 8 8 8 8 8
10 9 9 9 9 9
11 10 10 10 10 10
12 11 11 11 11 11
13 12 12 12 12 12
14 13 13 13 13 13
15 14 14 14 14 14
16 15 15 15 15 15
But wait: when merging on two columns --- both numeric --- the sort order suddenly seems alphabetic.
merge(a,b,by=c("one", "two"), sort=T)
one two three.x three.y
1 0 0 0 0
2 1 1 1 1
3 10 10 10 10
4 11 11 11 11
5 12 12 12 12
6 13 13 13 13
7 14 14 14 14
8 15 15 15 15
9 2 2 2 2
10 3 3 3 3
11 4 4 4 4
12 5 5 5 5
13 6 6 6 6
14 7 7 7 7
15 8 8 8 8
16 9 9 9 9
Eww, gross. What's going on? And what do I do?
Based on #joran's comments, it looks like if you want the rows to be sorted in any particular order, you should explicitly set it yourself.
If the order you'd like is one in which the rows have increasing values of one or more columns, you can use the function order(), like this:
X <- merge(a, b, by = c("one", "two"))
X[with(X, order(one, two)),]

How to create a dataframe with different number of values?

When I create a dataframe I do:
dt = data.frame(a=c(1:5),b=c(1:20))
dt
a b
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 1 6
7 2 7
8 3 8
9 4 9
10 5 10
11 1 11
12 2 12
13 3 13
14 4 14
15 5 15
16 1 16
17 2 17
18 3 18
19 4 19
20 5 20
as you can see the value of the first column (a) are repeated.
How can I create different "columns" with different number of values?
Thanks
H
Use a list. A data.frame is a special kind of list in which all elements are of the same length.
list(a=c(1:5),b=c(1:20))
$a
[1] 1 2 3 4 5
$b
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Resources