This question already has answers here:
Sorting rows alphabetically
(4 answers)
Closed 7 years ago.
I am trying to sort each row of a data frame using this line,
sapply(df, function(x) sort(x))
However, the columns are getting sorted instead of the rows.
For example, this data frame
5 10 7 1 5
6 3 9 2 4
4 5 1 3 3
is ending up like this:
4 3 1 1 3
5 5 7 2 4
6 10 9 3 5
And I want this:
1 5 5 7 10
2 3 4 6 9
1 3 3 4 5
Any recommendations? Thanks
You could use the plain apply function with MARGIN = 1 to apply over rows and then transpose the result.
t(apply(df, 1, sort))
You can transpose it (coverts it to matrix), and split by column and sort
t(sapply(split(t(df), col(t(df))), sort))
# [,1] [,2] [,3] [,4] [,5]
# 1 1 5 5 7 10
# 2 2 3 4 6 9
# 3 1 3 3 4 5
Because a data.frame is a list of columns, when you sapply like that you are sorting the columns.
or apply by row
t(apply(df, 1, sort))
Related
It is necessary to “increase” the data frame by adding each line from each rows (combinations without repetitions) and writing the result to a new data frame. The result is a huge number of lines compared to the original data frame, so I would like to do without a cycle, deciding, for example, with apply. Data frame for example:
1 3 6
2 2 4
5 1 2
6 4 1
The result should be:
1 3 6
2 2 4
5 1 2
6 4 1
3 5 10
6 4 8
7 7 7
7 3 6
8 6 5
11 5 3
We can use combn and generate combination of row numbers taking 2 at a time, add a custom function to add those rows and bind them with the original dataframe.
rbind(df, do.call("rbind",
combn(1:nrow(df), 2, function(x) df[x[1], ] + df[x[2], ], simplify = FALSE)))
# V1 V2 V3
#1 1 3 6
#2 2 2 4
#3 5 1 2
#4 6 4 1
#11 3 5 10
#23 6 4 8
#32 7 7 7
#21 7 3 6
#22 8 6 5
#31 11 5 3
FYI, the key part here is
combn(1:nrow(df), 2) #which gives
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] 1 1 1 2 2 3
#[2,] 2 3 4 3 4 4
and this input is used to subset rows from original data frame.
As a short example, when running combn(1:5,2), I get a matrix of 2 rows and 10 columns.
I know I can convert the output matrix to a data frame, but is it possible (any option inside combn) to have the output readily in the form of a vertical data frame of 2 columns and 10 rows ?
Thanks.
Simply transpose the matrix with t():
data.frame(t(combn(1:5, 2)))
Yields:
X1 X2
1 1 2
2 1 3
3 1 4
4 1 5
5 2 3
6 2 4
7 2 5
8 3 4
9 3 5
10 4 5
I have a matrix sort of like:
data <- round(runif(30)*10)
dimnames <- list(c("1","2","3","4","5"),c("1","2","3","2","3","2"))
values <- matrix(data, ncol=6, dimnames=dimnames)
# 1 2 3 2 3 2
# 1 5 4 9 6 7 8
# 2 6 9 9 1 2 5
# 3 1 2 5 3 10 1
# 4 6 5 1 8 6 4
# 5 6 4 5 9 4 4
Some of the column names are the same. I want to essentially reduce the columns in this matrix by taking the min of all values in the same row where the columns have the same name. For this particular matrix, the result would look like this:
# 1 2 3
# 1 5 4 7
# 2 6 1 2
# 3 1 1 5
# 4 6 4 1
# 5 6 4 4
The actual data set I'm using here has around 50,000 columns and 4,500 rows. None of the values are missing and the result will have around 40,000 columns. The way I tried to solve this was by melting the data then using group_by from dplyr before reshaping back to a matrix. The problem is that it takes forever to generate the data frame from the melt and I'd like to be able to iterate faster.
We can use rowMins from library(matrixStats)
library(matrixStats)
res <- vapply(split(1:ncol(values), colnames(values)),
function(i) rowMins(values[,i,drop=FALSE]), rep(0, nrow(values)))
res
# 1 2 3
#[1,] 5 4 7
#[2,] 6 1 2
#[3,] 1 1 5
#[4,] 6 4 1
#[5,] 6 4 4
row.names(res) <- row.names(values)
This question already has answers here:
Sorting rows alphabetically
(4 answers)
Closed 7 years ago.
I am trying to sort each row of a data frame using this line,
sapply(df, function(x) sort(x))
However, the columns are getting sorted instead of the rows.
For example, this data frame
5 10 7 1 5
6 3 9 2 4
4 5 1 3 3
is ending up like this:
4 3 1 1 3
5 5 7 2 4
6 10 9 3 5
And I want this:
1 5 5 7 10
2 3 4 6 9
1 3 3 4 5
Any recommendations? Thanks
You could use the plain apply function with MARGIN = 1 to apply over rows and then transpose the result.
t(apply(df, 1, sort))
You can transpose it (coverts it to matrix), and split by column and sort
t(sapply(split(t(df), col(t(df))), sort))
# [,1] [,2] [,3] [,4] [,5]
# 1 1 5 5 7 10
# 2 2 3 4 6 9
# 3 1 3 3 4 5
Because a data.frame is a list of columns, when you sapply like that you are sorting the columns.
or apply by row
t(apply(df, 1, sort))
This question already has answers here:
How to randomize (or permute) a dataframe rowwise and columnwise?
(9 answers)
Closed 7 years ago.
I have a dataframe with 9000 rows and 6 columns. I want to make the order of rows random i.e. some kind of shuffling to produce another dataframe with the same data but the rows in random order. Could anyone tell me how to do this in R?
Thanks
If you want to sample (but keep) the same order of the rows then you can just sample the rows.
df <- data.frame(x=1:8, y=1:8, z=1:8)
df[sample(1:nrow(df)),]
which will produce
x y z
2 2 2 2
3 3 3 3
4 4 4 4
6 6 6 6
5 5 5 5
8 8 8 8
7 7 7 7
1 1 1 1
If you rows should be sampled individually for each row then you can do something like
lapply(df, function(x) { sample(x)})
which results in
$x
[1] 3 1 4 6 5 2 8 7
$y
[1] 2 5 6 3 4 8 7 1
$z
[1] 6 1 8 3 2 7 4 5