I have csv file which reads like this
1 5
2 3
3 2
4 6
5 3
6 7
7 2
8 1
9 1
What I want to do is to this:
1 5 4 6 7 2
2 3 5 3 8 1
3 2 6 7 9 1
i.e after every third row, I want a different column of the values side by side. Any advise?
Thanks a lot
Here's a way to do this with matrix indexing. It's a bit strange, but I find it interesting so I will post it.
You want an index matrix, with indices as follows. This gives the order of your data as a matrix (column-major order):
1, 1
2, 1
3, 1
1, 2
2, 2
3, 2
4, 1
...
8, 2
9, 2
This gives the pattern that you need to select the elements. Here's one approach to building such a matrix. Say that your data is in the object dat, a data frame or matrix:
m <- matrix(
c(
outer(rep(1:3, 2), seq(0,nrow(dat)-1,by=3), FUN='+'),
rep(rep(1:2, each=3), nrow(dat)/3)
),
ncol=2
)
The outer expression is the first column of the desired index matrix, and the rep expression is the second column. Now just index dat with this index matrix, and build a result matrix with three rows:
matrix(dat[m], nrow=3)
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 1 5 4 6 7 2
## [2,] 2 3 5 3 8 1
## [3,] 3 2 6 7 9 1
a <- read.table(text = "1 5
2 3
3 2
4 6
5 3
6 7
7 2
8 1
9 1")
(seq_len(nrow(a))-1) %/% 3
# [1] 0 0 0 1 1 1 2 2 2
split(a, (seq_len(nrow(a))-1) %/% 3)
# $`0`
# V1 V2
# 1 1 5
# 2 2 3
# 3 3 2
# $`1`
# V1 V2
# 4 4 6
# 5 5 3
# 6 6 7
# $`2`
# V1 V2
# 7 7 2
# 8 8 1
# 9 9 1
do.call(cbind,split(a, (seq_len(nrow(a))-1) %/% 3))
# 0.V1 0.V2 1.V1 1.V2 2.V1 2.V2
# 1 1 5 4 6 7 2
# 2 2 3 5 3 8 1
# 3 3 2 6 7 9 1
Related
I have a 3D array that looks like this:
# Create two vectors
vector1 <- c(1,2,3,4,5,6)
vector2 <- c(10, 11, 12, 13, 14, 15,16)
# Convert to 3D array
my_array <- array(c(vector1, vector2), dim = c(2,3,2))
print(my_array)
where the output is
, , 1
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
, , 2
[,1] [,2] [,3]
[1,] 10 12 14
[2,] 11 13 15
I would like to turn this into a tidy dataset, where is one row per value, and there are 4 columns for each of the values:
the value itself
dimension 1
dimension 2
dimension 3
so for example, a few rows would be
Value Dimension1(Row) Dimension2(Column) Dimension3(Width)
1 1 1 1
2 2 1 1
...
15 2 3 2
Is there a good way to do this in base R, or with tidyverse tools like tidyr?
We could use reshape2::melt
library(reshape2)
melt(my_array)
-output
Var1 Var2 Var3 value
1 1 1 1 1
2 2 1 1 2
3 1 2 1 3
4 2 2 1 4
5 1 3 1 5
6 2 3 1 6
7 1 1 2 10
8 2 1 2 11
9 1 2 2 12
10 2 2 2 13
11 1 3 2 14
12 2 3 2 15
Or use as.data.frame.table in base R
as.data.frame.table(my_array)
Or may also use
cbind(which(is.finite(my_array), arr.ind = TRUE), value = c(my_array))
This question already has answers here:
R Sum every k columns in matrix
(5 answers)
Closed 4 years ago.
[Can we have a for loop or other thing for solving the following matrix?
Matrix A (given 6 x 16)
a 1 5 6 9 5 8 5 6 7 9 4 6 2 5 4 6
b 8 6 2 4 7 9 2 3 4 8 6 2 1 6 8 2
c 9 5 1 7 5 3 7 5 3 9 5 1 2 6 9 3
d 2 5 6 3 4 1 8 4 2 6 9 5 1 3 7 1
e 7 4 2 3 6 5 7 4 1 2 3 6 9 8 5 2
f 1 5 3 7 8 9 4 6 3 1 5 2 8 9 5 4
Output (6 x 4)
a 1+5+6+9 5+8+5+6 7+9+4+6 2+5+4+6
b 8+6+2+4 7+9+2+3 4+8+6+2 1+6+8+2
c 9+5+1+7 5+3+7+5 3+9+5+1 2+6+9+3
d 2+5+6+3 4+1+8+4 2+6+9+5 1+3+7+1
e 7+4+2+3 6+5+7+4 1+2+3+6 9+8+5+2
f 1+5+3+7 8+9+4+6 3+1+5+2 8+9+5+4
I have a large maxtrix of 4519 x 4519, therefore looking for a for loop.]
matb <- matrix(data = 0, nrow =6 ,ncol = 6)
for (a in 1: nrow (data)) {
for (b in 1:seq (1,5,by=2)) {
c <- b+1
matb [a,1:3] <- rbind (sum(data[a,b:c]))
}
}
I tried using above syntax, but it did not work. Therefore, looking for help on for loop or function to solve this problem.
We can use recycling to select alternating columns, then add:
# example matrix
m <- matrix(1:12, ncol = 4)
# [,1] [,2] [,3] [,4]
# [1,] 1 4 7 10
# [2,] 2 5 8 11
# [3,] 3 6 9 12
m[, c(TRUE, FALSE)] + m[, c(FALSE, TRUE)]
# [,1] [,2]
# [1,] 5 17
# [2,] 7 19
# [3,] 9 21
I would like to create a vector of sequenced numbers such as:
1,2,3,4,5, 2,3,4,5,1, 3,4,5,1,2
Whereby after a sequence is complete (say, rep(seq(1,5),3)), the first number of the previous sequence now moves to the last spot in the sequence.
%% to modulo?
(1:5) %% 5 + 1 # left shift by 1
[1] 2 3 4 5 1
(1:5 + 1) %% 5 + 1 # left shift by 2
[1] 3 4 5 1 2
also try
(1:5 - 2) %% 5 + 1 # right shift by 1
[1] 5 1 2 3 4
(1:5 - 3) %% 5 + 1 # right shift by 2
[1] 4 5 1 2 3
I would start off by making a matrix of one column longer than the length of the series.
> lseries <- 5
> nreps <- 3
> (values <- matrix(1:lseries, nrow = lseries + 1, ncol = nreps))
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 2 3 4
[3,] 3 4 5
[4,] 4 5 1
[5,] 5 1 2
[6,] 1 2 3
This may throw a warning (In matrix(1:lseries, nrow = lseries + 1, ncol = nreps) : data length [5] is not a sub-multiple or multiple of the number of rows [6]) which you can ignore. Note, the first 1:lseries rows have the data you want. We can get the final result using:
> as.vector(values[1:lseries, ])
[1] 1 2 3 4 5 2 3 4 5 1 3 4 5 1 2
Here's method to get a matrix of each of these
matrix(1:5, 5, 6, byrow=TRUE)[, -6]
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 2 3 4 5 1
[3,] 3 4 5 1 2
[4,] 4 5 1 2 3
[5,] 5 1 2 3 4
or turn it into a list
split.default(matrix(1:5, 5, 6, byrow=TRUE)[, -6], 1:5)
$`1`
[1] 1 2 3 4 5
$`2`
[1] 2 3 4 5 1
$`3`
[1] 3 4 5 1 2
$`4`
[1] 4 5 1 2 3
$`5`
[1] 5 1 2 3 4
or into a vector with c
c(matrix(1:5, 5, 6, byrow=TRUE)[, -6])
[1] 1 2 3 4 5 2 3 4 5 1 3 4 5 1 2 4 5 1 2 3 5 1 2 3 4
For the sake of variety, here is a second method to return the vector:
# construct the larger vector
temp <- rep(1:5, 6)
# use sapply with which to pull off matching positions, then take select position to drop
temp[-sapply(1:5, function(x) which(temp == x)[x+1])]
[1] 1 2 3 4 5 2 3 4 5 1 3 4 5 1 2 4 5 1 2 3 5 1 2 3 4
Given m:
m <- structure(c(5, 1, 3, 2, 1, 4, 5, 2, 5, 1, 1, 5, 1, 4, 0, 4, 5,
5, 3, 2, 0, 0, 3, 0, 3, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0), .Dim = c(7L,
5L))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 5 2 0 0 0
# [2,] 1 5 4 3 0
# [3,] 3 1 5 0 0
# [4,] 2 1 5 3 0
# [5,] 1 5 3 2 0
# [6,] 4 1 2 3 0
# [7,] 5 4 0 0 0
Consider the element 1, it appears in 5 rows (2, 3, 4, ,5, 6) and the respective column-wise indices are (1, 2, 2, 1, 2). I would like to have the following:
1 2 1
1 3 2
1 4 2
1 5 1
1 6 2
As another example, consider the element 2, it appears in 4 rows (1, 4, 5, 6) and the respective column-wise indices are (2, 1, 4, 3) and we have:
1 2 1
1 3 2
1 4 2
1 5 1
1 6 2
2 1 2
2 4 1
2 5 4
2 6 3
What I want is a n*3 matrix for all 1-5. Preferably in base R
A convenient way to transform it is to use sparseMatrix from Matrix library, since your desired output is very close to the representation of sparse Matrix:
library(Matrix)
summary(Matrix(m, sparse = T))
# 7 x 5 sparse Matrix of class "dgCMatrix", with 23 entries
# i j x
# 1 1 1 5
# 2 2 1 1
# 3 3 1 3
# 4 4 1 2
# 5 5 1 1
# 6 6 1 4
# 7 7 1 5
# 8 1 2 2
# 9 2 2 5
# 10 3 2 1
# 11 4 2 1
# 12 5 2 5
# 13 6 2 1
# 14 7 2 4
# 15 2 3 4
# 16 3 3 5
# 17 4 3 5
# 18 5 3 3
# 19 6 3 2
# 20 2 4 3
# 21 4 4 3
# 22 5 4 2
# 23 6 4 3
To see it better:
summary(Matrix(m, sparse = T)) %>% dplyr::arrange(x)
# i j x
# 1 2 1 1
# 2 5 1 1
# 3 3 2 1
# 4 4 2 1
# 5 6 2 1
# 6 4 1 2
# 7 1 2 2
# 8 6 3 2
# 9 5 4 2
# 10 3 1 3
# 11 5 3 3
# 12 2 4 3
# 13 4 4 3
# 14 6 4 3
# 15 6 1 4
# 16 7 2 4
# 17 2 3 4
# 18 1 1 5
# 19 7 1 5
# 20 2 2 5
# 21 5 2 5
# 22 3 3 5
# 23 4 3 5
We can use which with arr.ind=TRUE
cbind(val= 1, which(m==1, arr.ind=TRUE))
# val row col
#[1,] 1 2 1
#[2,] 1 5 1
#[3,] 1 3 2
#[4,] 1 4 2
#[5,] 1 6 2
For multiple cases, as #RHertel mentioned
for(i in 1:5) print(cbind(i,which(m==i, arr.ind=TRUE)))
Or with lapply
do.call(rbind, lapply(1:2, function(i) {
m1 <-cbind(val=i,which(m==i, arr.ind=TRUE))
m1[order(m1[,2]),]}))
# val row col
#[1,] 1 2 1
#[2,] 1 3 2
#[3,] 1 4 2
#[4,] 1 5 1
#[5,] 1 6 2
#[6,] 2 1 2
#[7,] 2 4 1
#[8,] 2 5 4
#[9,] 2 6 3
As the OP mentioned about base R solutions, the above would help. But, in case, if somebody wants a compact solution,
library(reshape2)
melt(m)
and then subset the values of interest.
Just use row and col.
> data.frame(m=as.vector(m), row=as.vector(row(m)), col=as.vector(col(m)))
m row col
1 5 1 1
2 1 2 1
3 3 3 1
4 2 4 1
5 1 5 1
...
Subset, sort, and print as desired.
> tmp <- out[order(out$m, out$row), ]
> print(subset(tmp, m==1), row.names=FALSE)
m row col
1 2 1
1 3 2
1 4 2
1 5 1
1 6 2
Lets say I have a data frame with the following structure:
DF <- data.frame(x = 0:4, y = 5:9)
> DF
x y
1 0 5
2 1 6
3 2 7
4 3 8
5 4 9
what is the most efficient way to turn 'DF' into a data frame with the following structure:
w x y
1 0 5
1 1 6
2 1 6
2 2 7
3 2 7
3 3 8
4 3 8
4 4 9
Where w is a length 2 window rolling through the dataframe 'DF.' The length of the window should be arbitrary, i.e a length of 3 yields
w x y
1 0 5
1 1 6
1 2 7
2 1 6
2 2 7
2 3 8
3 2 7
3 3 8
3 4 9
I am a bit stumped by this problem, because the data frame can also contain an arbitrary number of columns, i.e. w,x,y,z etc.
/edit 2: I've realized edit 1 is a bit unreasonable, as xts doesn't seem to deal with multiple observations per data point
My approach would be to use the embed function. The first thing to do is to create a rolling sequence of indices into a vector. Take a data-frame:
df <- data.frame(x = 0:4, y = 5:9)
nr <- nrow(df)
w <- 3 # window size
i <- 1:nr # indices of the rows
iw <- embed(i,w)[, w:1] # matrix of rolling-window indices of length w
> iw
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 2 3 4
[3,] 3 4 5
wnum <- rep(1:nrow(iw),each=w) # window number
inds <- i[c(t(iw))] # the indices flattened, to use below
dfw <- sapply(df, '[', inds)
dfw <- transform(data.frame(dfw), w = wnum)
> dfw
x y w
1 0 5 1
2 1 6 1
3 2 7 1
4 1 6 2
5 2 7 2
6 3 8 2
7 2 7 3
8 3 8 3
9 4 9 3