Rearranging the columns of a data frame [duplicate] - r

This question already has answers here:
Splitting triplicates into duplicates
(3 answers)
Closed 8 years ago.
Given a data frame, I'd like to rearrange it and return another data frame of 2 columns. The 2 columns of this data frame are made up of any 2 elements of a row in the original data frame. So we will have C(ncol,2) * nrow number of rows in the second data frame. Here's an example. Given the data frame z, I'd like to return x. How can I do this?
> z = data.frame(A = c(1,2,3), B = c(4,5,6), C = c(7,8,9))
> z
A B C
1 1 4 7
2 2 5 8
3 3 6 9
> x
A B
1 1 4
2 1 7
3 4 7
4 2 5
5 2 8
6 5 8
7 3 6
8 3 9
9 6 9

Or, you could try:
matrix(apply(z, 1, combn,2), ncol=2, byrow=TRUE)
# [,1] [,2]
#[1,] 1 4
#[2,] 1 7
#[3,] 4 7
#[4,] 2 5
#[5,] 2 8
#[6,] 5 8
#[7,] 3 6
#[8,] 3 9
#[9,] 6 9
To get data.frame as output
setNames(as.data.frame(matrix(apply(z, 1, combn,2), ncol=2, byrow=TRUE)), LETTERS[1:2])

Something like this would work
newz <- setNames(do.call(rbind.data.frame, lapply(split(z, 1:nrow(z)), function(x)
t(combn(x,2)))),
c("A","B"))
newz
# A B
# 1.1 1 4
# 1.2 1 7
# 1.3 4 7
# 2.1 2 5
# 2.2 2 8
# 2.3 5 8
# 3.1 3 6
# 3.2 3 9
# 3.3 6 9
This generates the new rows using all combinations if the columns via combn(). If you hate the default rownames, you can get rid of them with
rownames(newz)<-NULL
newz
# A B
# 1 1 4
# 2 1 7
# 3 4 7
# 4 2 5
# 5 2 8
# 6 5 8
# 7 3 6
# 8 3 9
# 9 6 9

Related

Generating an vector with rep and seq but without the c() function [duplicate]

This question already has answers here:
R repeating sequence add 1 each repeat
(2 answers)
Closed 5 months ago.
Suppose that I am not allowed to use the c() function.
My target is to generate the vector
"1 2 3 4 5 2 3 4 5 6 3 4 5 6 7 4 5 6 7 8 5 6 7 8 9"
Here is my attempt:
rep(seq(1, 5, 1), 5)
# [1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
rep(0:4,rep(5,5))
# [1] 0 0 0 0 0 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4
So basically I am sum them up. But I wonder if there is a better way to use rep and seq functions ONLY.
Like so:
1:5 + rep(0:4, each = 5)
# [1] 1 2 3 4 5 2 3 4 5 6 3 4 5 6 7 4 5 6 7 8 5 6 7 8 9
I like the sequence option as well:
sequence(rep(5, 5), 1:5)
You could do
rep(1:5, each=5) + rep.int(0:4, 5)
# [1] 1 2 3 4 5 2 3 4 5 6 3 4 5 6 7 4 5 6 7 8 5 6 7 8 9
Just to be precise and use seq as well:
rep(seq.int(1:5), each=5) + rep.int(0:4, 5)
(PS: You can remove the .ints, but it's slower.)
One possible way:
as.vector(sapply(1:5, `+`, 0:4))
[1] 1 2 3 4 5 2 3 4 5 6 3 4 5 6 7 4 5 6 7 8 5 6 7 8 9
I would also propose the outer() function as well:
library(dplyr)
outer(1:5, 0:4, "+") %>%
array()
Or without magrittr %>% function in newer R versions:
outer(1:5, 0:4, "+") |>
array()
Explanation.
The first function will create an array of 1:5 by 0:4 sequencies and fill the intersections with sums of these values:
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 2 3 4 5 6
[3,] 3 4 5 6 7
[4,] 4 5 6 7 8
[5,] 5 6 7 8 9
The second will pull the vector from the array and return the required vector:
[1] 1 2 3 4 5 2 3 4 5 6 3 4 5 6 7 4 5 6 7 8 5 6 7 8 9

create a column in R with data from two other columns [duplicate]

This question already has answers here:
Replace a value NA with the value from another column in R
(5 answers)
Closed 3 years ago.
I don't have the slightest idea of programming, but I need to solve the following problem in R.
Let's suppose I have this data:
x y
5 8
6 5
2
9 8
4
0
6 6
7 3
3 2
I need to create a third column called "z" containing the data of "y" exccept for the missing values where it should have the values of "x". It would be something like this:
x y z
5 8 8
6 5 5
2 2
9 8 8
4 4
0 0
6 6 6
7 3 3
3 2 2
dat <- data.frame(x=c(5,6,2,9,4,0,6,7,3), y = c(8,5,NA,8,NA,NA,6,3,2))
library(tidyverse)
dat %>% mutate(z = ifelse(is.na(y), x, y))
# x y z
# 1 5 8 8
# 2 6 5 5
# 3 2 NA 2
# 4 9 8 8
# 5 4 NA 4
# 6 0 NA 0
# 7 6 6 6
# 8 7 3 3
# 9 3 2 2

For loop in matrix or similar structure for solving large matrix [duplicate]

This question already has answers here:
R Sum every k columns in matrix
(5 answers)
Closed 4 years ago.
[Can we have a for loop or other thing for solving the following matrix?
Matrix A (given 6 x 16)
a 1 5 6 9 5 8 5 6 7 9 4 6 2 5 4 6
b 8 6 2 4 7 9 2 3 4 8 6 2 1 6 8 2
c 9 5 1 7 5 3 7 5 3 9 5 1 2 6 9 3
d 2 5 6 3 4 1 8 4 2 6 9 5 1 3 7 1
e 7 4 2 3 6 5 7 4 1 2 3 6 9 8 5 2
f 1 5 3 7 8 9 4 6 3 1 5 2 8 9 5 4
Output (6 x 4)
a 1+5+6+9 5+8+5+6 7+9+4+6 2+5+4+6
b 8+6+2+4 7+9+2+3 4+8+6+2 1+6+8+2
c 9+5+1+7 5+3+7+5 3+9+5+1 2+6+9+3
d 2+5+6+3 4+1+8+4 2+6+9+5 1+3+7+1
e 7+4+2+3 6+5+7+4 1+2+3+6 9+8+5+2
f 1+5+3+7 8+9+4+6 3+1+5+2 8+9+5+4
I have a large maxtrix of 4519 x 4519, therefore looking for a for loop.]
matb <- matrix(data = 0, nrow =6 ,ncol = 6)
for (a in 1: nrow (data)) {
for (b in 1:seq (1,5,by=2)) {
c <- b+1
matb [a,1:3] <- rbind (sum(data[a,b:c]))
}
}
I tried using above syntax, but it did not work. Therefore, looking for help on for loop or function to solve this problem.
We can use recycling to select alternating columns, then add:
# example matrix
m <- matrix(1:12, ncol = 4)
# [,1] [,2] [,3] [,4]
# [1,] 1 4 7 10
# [2,] 2 5 8 11
# [3,] 3 6 9 12
m[, c(TRUE, FALSE)] + m[, c(FALSE, TRUE)]
# [,1] [,2]
# [1,] 5 17
# [2,] 7 19
# [3,] 9 21

sort matrix elements based on diagonal position in R [duplicate]

This question already has answers here:
Get all diagonal vectors from matrix
(3 answers)
Closed 5 years ago.
Before I attempt writing a custom function; is there an elegant/native method to achieve this?
m<-matrix(1:9,ncol = 3)
m
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
By column:
as.vector(m)
[1] 1 2 3 4 5 6 7 8 9
By row:
as.vector(t(m))
[1] 1 4 7 2 5 8 3 6 9
By diagonal (I would like a function output):
some.function(m)
[1] 1 2 4 3 5 7 6 8 9
And the perpendicular diagonal:
some.other.function(m)
[1] 7 8 4 9 5 1 6 2 3
ind = expand.grid(1:3, 1:3)
ind[,3] = rowSums(ind)
ind = ind[order(ind[,3], ind[,2], ind[,1]),]
m[as.matrix(ind[,1:2])]
#[1] 1 2 4 3 5 7 6 8 9
m[,3:1][as.matrix(ind[,1:2])]
#[1] 7 8 4 9 5 1 6 2 3

Reduce columns of a matrix by a function in R

I have a matrix sort of like:
data <- round(runif(30)*10)
dimnames <- list(c("1","2","3","4","5"),c("1","2","3","2","3","2"))
values <- matrix(data, ncol=6, dimnames=dimnames)
# 1 2 3 2 3 2
# 1 5 4 9 6 7 8
# 2 6 9 9 1 2 5
# 3 1 2 5 3 10 1
# 4 6 5 1 8 6 4
# 5 6 4 5 9 4 4
Some of the column names are the same. I want to essentially reduce the columns in this matrix by taking the min of all values in the same row where the columns have the same name. For this particular matrix, the result would look like this:
# 1 2 3
# 1 5 4 7
# 2 6 1 2
# 3 1 1 5
# 4 6 4 1
# 5 6 4 4
The actual data set I'm using here has around 50,000 columns and 4,500 rows. None of the values are missing and the result will have around 40,000 columns. The way I tried to solve this was by melting the data then using group_by from dplyr before reshaping back to a matrix. The problem is that it takes forever to generate the data frame from the melt and I'd like to be able to iterate faster.
We can use rowMins from library(matrixStats)
library(matrixStats)
res <- vapply(split(1:ncol(values), colnames(values)),
function(i) rowMins(values[,i,drop=FALSE]), rep(0, nrow(values)))
res
# 1 2 3
#[1,] 5 4 7
#[2,] 6 1 2
#[3,] 1 1 5
#[4,] 6 4 1
#[5,] 6 4 4
row.names(res) <- row.names(values)

Resources