Formatting R combn output - r

As a short example, when running combn(1:5,2), I get a matrix of 2 rows and 10 columns.
I know I can convert the output matrix to a data frame, but is it possible (any option inside combn) to have the output readily in the form of a vertical data frame of 2 columns and 10 rows ?
Thanks.

Simply transpose the matrix with t():
data.frame(t(combn(1:5, 2)))
Yields:
X1 X2
1 1 2
2 1 3
3 1 4
4 1 5
5 2 3
6 2 4
7 2 5
8 3 4
9 3 5
10 4 5

Related

R Create column that provides grouping number for each distinct group [duplicate]

This question already has an answer here:
get sequence of group in R
(1 answer)
Closed 2 years ago.
I need to add a column to my data that contains a number grouping for each distinct combination of other columns. It will likely be more clear with this example:
# Make data
df <- data.frame(x = c(1,1,2,3,4,5,2,3,4,5),
y = c(2, 2,3,4,5,1,3,4,5,1),
value = c(1,2,3,4,5,6,7,8,9,10))
# Print the data
df
x y value
1 1 2 1
2 1 2 2
3 2 3 3
4 3 4 4
5 4 5 5
6 5 1 6
7 2 3 7
8 3 4 8
9 4 5 9
10 5 1 10
I need to add a "Location" column that has the numbers each unique (or distinct) combination of x and y. Duplicated x and y combinations should all use the same number. In my example there are 5 unique combinations of x and y, so I only have a maximum of 5 Locations. My goal output is this:
x y value Location
1 1 2 1 1
2 1 2 2 1
3 2 3 3 2
4 3 4 4 3
5 4 5 5 4
6 5 1 6 5
7 2 3 7 2
8 3 4 8 3
9 4 5 9 4
10 5 1 10 5
I imagine doing something like this:
df <- df %>%
group_by(x,y) %>%
mutate(Location = ndistinct(x,y)
But this doesn't work. Any help is appreciated!
Thanks!
df %>% mutate(., Location=group_indices(., x,y))
x y value Location
1 1 2 1 1
2 1 2 2 1
3 2 3 3 2
4 3 4 4 3
5 4 5 5 4
6 5 1 6 5
7 2 3 7 2
8 3 4 8 3
9 4 5 9 4
10 5 1 10 5
See here and here.
Not quite as straightforward as I thought to start with.
Update
To answer OP's question: the dot . is a placeholder for "the object on the left hand side of the pipe" (%>%). Normally you don't need it because, by default, magrittr (the package which defines the pipe) assumes that you want to use the object on the left hand side of the pipe as the first argument to the function on the right hand side of the pipe, and makes the substitution for you. This is very helpful because the tidyverse is designed so that the object on the left hand side of the pipe is always the first argument to the function on the right hand side - so you don't have to use the dot.
If you use functions that don't belong to the tidyverse, you sometimes need the dot to override magrittr's default behaviour.
I wrote my first version of this answer without testing the code because the solution seemed "obvious". But I did test it afterwards (at the same time as OP reported the error) and found that it didn't work. A quick Google brought me to the github issue in the second link above, and hence to the correct answer.
I don't yet understand why, in this particular case, a tidyverse function doesn't work as I expect. (Other than taking the easy way out and saying that my expectation was wrong!)
In base R we can use:
df$location <- as.numeric(factor(paste(df$x,df$y)))
x y value location
1 1 2 1 1
2 1 2 2 1
3 2 3 3 2
4 3 4 4 3
5 4 5 5 4
6 5 1 6 5
7 2 3 7 2
8 3 4 8 3
9 4 5 9 4
10 5 1 10 5

Change the order of numerically named columns in r

If I have a dataframe like the one below which has numerical column names
example = data.frame(1=c(1,8,3,9), 2=c(3,2,3,3), 3=c(5,2,5,4), 4=c(1,2,3,4), 5=c(2,5,7,8))
Which looks like this:
1 2 3 4 5
1 3 5 1 2
8 2 2 2 5
3 3 5 3 7
9 3 4 4 8
And I want to arrange it so that the column names start with three and proceed through five and back to one, like this:
3 4 5 1 2
5 1 2 1 3
2 2 5 8 2
5 3 7 3 3
4 4 8 9 3
I know how to rearrange the position of a single column in a dataset, but I'm not sure how to do this with more than one column in this particular order.
We can use the column index concatenated (c) based on the sequence (:) on a range of values
example[c(3:5, 1:2)]
# 3 4 5 1 2
#1 5 1 2 1 3
#2 2 2 5 8 2
#3 5 3 7 3 3
#4 4 4 8 9 3
As the column names are all numeric, just convert to numeric and use that for ordering
v1 <- as.numeric(names(example))
example[c(v1[3:5], v1[1:2])]
Or simply do
example[c(names(example)[3:5], names(example)[1:2])]
Or another way is with head and tail
example[c(tail(names(example), 3), head(names(example), 2))]
data
example <- data.frame(`1`=c(1,8,3,9), `2`=c(3,2,3,3),
`3`=c(5,2,5,4), `4`=c(1,2,3,4), `5`=c(2,5,7,8), check.names = FALSE)
R will not easily let you create columns with numbers as name. If somehow, you are able to create columns with numbers you can use match to get order in which you want the column names.
example[match(c(3:5, 1:2), names(example))]
# 3 4 5 1 2
#1 5 1 2 1 3
#2 2 2 5 8 2
#3 5 3 7 3 3
#4 4 4 8 9 3

Order data frame by column and display WITH indices

I have the following R data frame
> df
a
1 3
3 2
4 1
5 3
6 6
7 7
8 2
10 8
I order it by the a column with the order function df[ order(df), ]:
[1] 1 2 2 3 3 6 7 8
This is the result I want, BUT, how can list the whole data frame with the permuted indices?
The only thing that works is the following, but it seems sloppy and I don't really understand what it does:
> df[ order(df), c(1,1) ] # I want this but without the a.1 column!!!!
a a.1
4 1 1
3 2 2
8 2 2
1 3 3
5 3 3
6 6 6
7 7 7
10 8 8
Thanks
If we need the indices as well, use sort with index.return = TRUE
data.frame(sort(df$a, index.return=TRUE))

How do I preserve continuous (1,2,3,...n) ranking notation when ranking in R?

If I want to rank a set of numbers using the minimum rank for shared cases (aka ties):
dat <- c(13,13,14,15,15,15,15,15,15,16,17,22,45,46,112)
rank(dat, ties = 'min')
I get the results:
1 1 3 4 4 4 4 4 4 10 11 12 13 14 15
However, I want the rank to be a continuous series consisting of 1,2,3,...n, where n is the number of unique ranks.
Is there a way to make rank (or a similar function) rank a series of numbers by assigning ties to the lowest rank as above but instead of skipping subsequent rank values by the number of previous ties to instead continue ranking from the previous rank?
For example, I would like the above ranking to result in:
1 1 2 3 3 3 3 3 3 4 5 6 7 8 9
you could do it using dplyr:
library(dplyr)
dense_rank(dat)
[1] 1 1 2 3 3 3 3 3 3 4 5 6 7 8 9
if you don't want to load the whole library and do it in base r:
match(dat, sort(unique(dat)))
[1] 1 1 2 3 3 3 3 3 3 4 5 6 7 8 9
Use a factor and then bring it back to numeric format:
as.numeric(factor(rank(dat)))
# [1] 1 1 2 3 3 3 3 3 3 4 5 6 7 8 9

Reduce columns of a matrix by a function in R

I have a matrix sort of like:
data <- round(runif(30)*10)
dimnames <- list(c("1","2","3","4","5"),c("1","2","3","2","3","2"))
values <- matrix(data, ncol=6, dimnames=dimnames)
# 1 2 3 2 3 2
# 1 5 4 9 6 7 8
# 2 6 9 9 1 2 5
# 3 1 2 5 3 10 1
# 4 6 5 1 8 6 4
# 5 6 4 5 9 4 4
Some of the column names are the same. I want to essentially reduce the columns in this matrix by taking the min of all values in the same row where the columns have the same name. For this particular matrix, the result would look like this:
# 1 2 3
# 1 5 4 7
# 2 6 1 2
# 3 1 1 5
# 4 6 4 1
# 5 6 4 4
The actual data set I'm using here has around 50,000 columns and 4,500 rows. None of the values are missing and the result will have around 40,000 columns. The way I tried to solve this was by melting the data then using group_by from dplyr before reshaping back to a matrix. The problem is that it takes forever to generate the data frame from the melt and I'd like to be able to iterate faster.
We can use rowMins from library(matrixStats)
library(matrixStats)
res <- vapply(split(1:ncol(values), colnames(values)),
function(i) rowMins(values[,i,drop=FALSE]), rep(0, nrow(values)))
res
# 1 2 3
#[1,] 5 4 7
#[2,] 6 1 2
#[3,] 1 1 5
#[4,] 6 4 1
#[5,] 6 4 4
row.names(res) <- row.names(values)

Resources