Suppose I have a data set with 40 observations
y <- rnorm(40,10,10)
Now I would like to transform this vector into a matrix with 4 observations in each row.
On top of that, I would like the row to start with value y[i] and add one each iteration upuntil the 40th observation.
So for example:
r1 = y[1] y[2] y[3] y[4]
r2 = y[2] y[3] y[4] y[5]
r3 = y[3] y[4] y[5] y[6]
.
.
r40 = y[39] y[38] y[37] y[36]
Does anyone know how to do this?
You can use matrix like:
y <- 1:40
matrix(y, 41, 4)[1:37,]
# [,1] [,2] [,3] [,4]
# [1,] 1 2 3 4
# [2,] 2 3 4 5
# [3,] 3 4 5 6
#...
#[35,] 35 36 37 38
#[36,] 36 37 38 39
#[37,] 37 38 39 40
Or using seq in mapply and fill the index matrix with the values of y.
i <- 1:37
M <- t(mapply(seq, i, i+3))
M
# [,1] [,2] [,3] [,4]
# [1,] 1 2 3 4
# [2,] 2 3 4 5
# [3,] 3 4 5 6
#...
#[35,] 35 36 37 38
#[36,] 36 37 38 39
#[37,] 37 38 39 40
M[] <- y[M]
This is one way to produce the first 37 rows. If you want to change the direction for the last 3 rows, then it would be easy to do with the same code:
purrr::map(seq_len(37), ~y[.x:(.x+3)]) %>%
unlist() %>%
matrix(nrow = 37, byrow = T)
Only difference would be to first save the values of the first 37 rows, then produce the last 3 rows, bind them, and turn that vector to a matrix.
Try embed
embed(y, 4)[, 4:1]
which could give the desired output
Related
Suppose I have a matrix A of dimensions n x m. A starting cell (i,j), And a constant k which satisfies k < n x m.
I need a way to extract the values inside A such that all values are within k steps from the starting cell. a step is either a column move or a row move.
Then Im looking to sum the extracted values by 2 groups where 1 group consists of sums obtained from the same column in the original matrix and the other group is the sum obtained from summation of values along rows of the original matrix.
It is important for me that this addresses situations where the starting cell is within k steps from the edge of the matrix.
Example set (I'm heavily simplifying here):
> #create matrix where m = 7,n = 7
> Mat <- sample(1:49,49) %>% matrix(7,7)
>
> #declare starting cell where (i = 4, j = 2)
> i = 4
> j = 2
>
> #declare number of steps
> k = 2
>
> Mat
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 25 35 29 10 16 46 23
[2,] 32 43 7 5 31 1 14
[3,] 36 19 49 45 13 41 47
[4,] 17 18 48 9 3 28 12
[5,] 26 6 30 33 20 2 11
[6,] 40 24 39 21 37 38 8
[7,] 4 15 34 22 27 44 42
> Mat[i,j]
[1] 18
for this example an output would be two vectors (one for column sums and one for row sums):
> Columnsum <- c(sum(36,17,26) , #sum(Mat[3:5,1])
+ sum(43,19,18,6,24), #sum(Mat[2:6,2])
+ sum(49,48,30), #sum(Mat[3:5,3])
+ sum(9)) #sum(Mat[4:4,3])
>
> Rowsum <- c(sum(43), #sum(Mat[2,2:2])
+ sum(36,19,49), #sum(Mat[3,1:3])
+ sum(17,18,48,9), #sum(Mat[4,1:4])
+ sum(26,6,30), #sum(Mat[5,1:3])
+ sum(24)) #sum(Mat[6,2:2])
>
> Columnsum
[1] 79 110 127 9
> Rowsum
[1] 43 104 92 62 24
You could 'remove' parts of your matrix Mat with entries more than k steps away from (i,j) by overwriting them with NA:
Mat[abs(row(Mat) - i) + abs(col(Mat) - j) > k] <- NA
Then remove the rows and columns that are entirely NA:
Mat <- Mat[rowSums(is.na(Mat)) != ncol(Mat), colSums(is.na(Mat)) != nrow(Mat)]
And finally you can compute the row and column sums:
Columnsum <- colSums(Mat, na.rm = TRUE)
Rowsum <- rowSums(Mat, na.rm = TRUE)
I have been trying to prepare a train data set for CNN in keras, but i can't find the way to properly set the data.
In the keras CNN example, they use the MNIST data set, in which:
library(keras)
img_rows <- 28
img_cols <- 28
mnist <- dataset_mnist()
x_train <- mnist$train$x
x_train <- array_reshape(x_train, c(nrow(x_train), img_rows, img_cols, 1))
Which outputs an array with this dim:
class(x_train)
[1] "array"
dim(x_train)
[1] 60000 28 28 1
I have a dataframe like this:
x = data.frame(c(1,10,19,28),c(2,11,20,29),c(3,12,21,30),c(4,13,22,31),c(5,14,23,32),c(6,15,24,33),c(7,16,25,34),c(8,17,26,35),c(9,18,27,36))
Each row represent a 3x3 image like this:
1 2 3
4 5 6
7 8 9
I am trying this:
x = as.integer(unlist(x))
x = array_reshape(x, c(4,3,3, 1))
This return 3 matrices with scramble numbers. ¿How can I properly transform my data.frame for a CNN in keras?
If I'm understanding what you're trying to do, its to reshape your training data (x) into an array of of 4 elements with 3x3 matrices each. If that is in fact the case then try this,
> aperm(array(t(x), dim = c(3, 3, 4)), perm = c(2,1,3))
, , 1
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
, , 2
[,1] [,2] [,3]
[1,] 10 11 12
[2,] 13 14 15
[3,] 16 17 18
, , 3
[,1] [,2] [,3]
[1,] 19 20 21
[2,] 22 23 24
[3,] 25 26 27
, , 4
[,1] [,2] [,3]
[1,] 28 29 30
[2,] 31 32 33
[3,] 34 35 36
This might help:
# Create an empty array the size you want
x_array <- array(NA, dim = c(3, 3, length(x)))
# Loop in each object of your list into the array
for (i in length(x)) {
x_array[,, i] <- x[[i]]
}
So I have taken a look at this question posted before which was used for summing every 2 values in each row in a matrix. Here is the link:
sum specific columns among rows. I also took a look at another question here: R Sum every k columns in matrix which is more similiar to mine. I could not get the solution in this case to work. Here is the code that I am working with...
y <- matrix(1:27, nrow = 3)
y
m1 <- as.matrix(y)
n <- 3
dim(m1) <- c(nrow(m1)/n, ncol(m1), n)
res <- matrix(rowSums(apply(m1, 1, I)), ncol=n)
identical(res[1,],rowSums(y[1:3,]))
sapply(split.default(y, 0:(length(y)-1) %/% 3), rowSums)
I just get an error message when applying this. The desired output is a matrix with the following values:
[,1] [,2] [,3]
[1,] 12 39 66
[2,] 15 42 69
[3,] 18 45 72
To sum consecutive sets of n elements from each row, you just need to write a function that does the summing and apply it to each row:
n <- 3
t(apply(y, 1, function(x) tapply(x, ceiling(seq_along(x)/n), sum)))
# 1 2 3
# [1,] 12 39 66
# [2,] 15 42 69
# [3,] 18 45 72
Transform the matrix to an array and use colSums (as suggested by #nongkrong):
y <- matrix(1:27, nrow = 3)
n <- 3
a <- y
dim(a) <- c(nrow(a), ncol(a)/n, n)
b <- aperm(a, c(2,1,3))
colSums(b)
# [,1] [,2] [,3]
#[1,] 12 39 66
#[2,] 15 42 69
#[3,] 18 45 72
Of course this assumes that ncol(y) is divisible by n.
PS: You can of course avoid creating so many intermediate objects. They are there for didactic purposes.
I would do something similar to the OP -- apply rowSums on subsets of the matrix:
n = 3
ng = ncol(y)/n
sapply( 1:ng, function(jg) rowSums(y[, (jg-1)*n + 1:n ]))
# [,1] [,2] [,3]
# [1,] 12 39 66
# [2,] 15 42 69
# [3,] 18 45 72
I have a numeric matrix mat and a logical matrix ind of the same size. My goal is very basic: apply a function fun columnwise on entries, indicated by ind. Here's an example:
set.seed(42)
mat <- matrix(1:20, 4)
ind <- matrix(sample(c(F, T), 20, replace = T), 4)
fun <- function(x) sum(x)
The "active" subset for fun is:
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 9 13 17
[2,] 2 6 10 . .
[3,] . 7 . . .
[4,] 4 . 12 16 20
My current solution:
sapply(1:ncol(mat), function(i) fun(mat[ind[, i], i]))
[1] 7 18 31 29 37
It does the job, but I have a feeling I'm missing something very simple and elegant. Any ideas?
Edit: here's another function that does not tolerate extra zeroes, so multiplying mat * ind won't work well.
fun2 <- function(x) sd(x)
sapply(1:ncol(mat), function(i) fun2(mat[ind[, i], i]))
[1] 1.527525 1.000000 1.527525 2.121320 2.121320
In this case, using apply over columns, while multiplying with the logical matrix should work:
> apply(mat*ind,2,fun)
[1] 7 18 31 29 37
To better handle FALSE in more general cases (which evaluates to 0 in the multiplication) you could change it to NA and have a function that knows how to handle NAs, e.g.:
ind<- ifelse(ind == FALSE, NA, ind)
fun <- function(x) sum(x, na.rm=TRUE)
apply(mat*ind,2,fun)
This can be generalized to whatever function you want to apply, with an explicit handling of NAs.
A very easy approach with tapply:
tapply(mat[ind], col(mat)[ind], fun)
# 1 2 3 4 5
# 7 18 31 29 37
Another approach with mapply:
mapply(function(m, i) fun(m[i]), split(mat, col(mat)), split(ind, col(mat)))
# 1 2 3 4 5
# 7 18 31 29 37
Try this:
colSums(mat * ind)
#[1] 7 18 31 29 37
I have the following data frame and vector.
> y
v1 v2 v3
1 1 6 43
2 4 7 5
3 0 2 32
> v
[1] 1 2 3
I want to apply the following function to every ROW in that data frame such that v is added to every ROW of y:
x <- function(vector1,vector2) {
x <- vector1 + vector2
}
... in order to get THESE results:
v1 v2 v3
1 2 8 46
2 5 9 8
3 1 4 35
mapply applies the function to COLUMNS:
> z <- mapply(x, y, MoreArgs=list(vector2=v))
> z
v1 v2 v3
[1,] 2 7 44
[2,] 6 9 7
[3,] 3 5 35
I've tried transposing the data frame so that the function will be applied to rows and not columns, but mapply gives me weird results after transposing:
> transposed <- t(y)
> transposed
[,1] [,2] [,3]
v1 1 4 0
v2 6 7 2
v3 43 5 32
> z <- mapply(x, transposed, MoreArgs=list(vector2=v))
> z
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 2 7 44 5 8 6 1 3 33
[2,] 3 8 45 6 9 7 2 4 34
[3,] 4 9 46 7 10 8 3 5 35
...Help?
############################ EDIT #########################
Thanks for all the answers! I'm learning tons of new R functions that I've never seen before, which is fantastic.
I want to clarify my earlier question a bit. What I'm really asking is a much more general question - how to apply a multi-parameter function to each row in R (at the moment, I'm tempted to conclude that I should just use a loop, but I would like to figure out if it IS possible, just for future reference...) (I also purposefully refrained from showing the code I'm working with since it's kind of messy).
I tried using the sweep function as was suggested, but I get the following error:
testsweep <- function(vector, z, n) {
testsweep <- z
}
> n <- names(Na_exp)
> n
[1] "NaCl.10000.2hr.AVG_Signal" "NaCl.10000.4hr.AVG_Signal"
> t <- head(Li_fcs,n=1)
> t
LiCl.1000.1hr.FoldChange LiCl.2000.1hr.FoldChange LiCl.5000.1hr.FoldChange
[1,] -0.05371838 -0.1010928 -0.01939986
LiCl.10000.1hr.FoldChange LiCl.1000.2hr.FoldChange
[1,] 0.1275617 -0.107154
LiCl.2000.2hr.FoldChange LiCl.5000.2hr.FoldChange
[1,] -0.06760782 -0.09770226
LiCl.10000.2hr.FoldChange LiCl.1000.4hr.FoldChange
[1,] -0.1124188 -0.06140386
LiCl.2000.4hr.FoldChange LiCl.5000.4hr.FoldChange
[1,] -0.04323497 -0.04275953
LiCl.10000.4hr.FoldChange LiCl.1000.8hr.FoldChange
[1,] 0.03633496 0.01879461
LiCl.2000.8hr.FoldChange LiCl.5000.8hr.FoldChange
[1,] 0.257977 -0.06357423
LiCl.10000.8hr.FoldChange
[1,] 0.07214176
> z <- colnames(Li_fcs)
> z
[1] "LiCl.1000.1hr.FoldChange" "LiCl.2000.1hr.FoldChange"
[3] "LiCl.5000.1hr.FoldChange" "LiCl.10000.1hr.FoldChange"
[5] "LiCl.1000.2hr.FoldChange" "LiCl.2000.2hr.FoldChange"
[7] "LiCl.5000.2hr.FoldChange" "LiCl.10000.2hr.FoldChange"
[9] "LiCl.1000.4hr.FoldChange" "LiCl.2000.4hr.FoldChange"
[11] "LiCl.5000.4hr.FoldChange" "LiCl.10000.4hr.FoldChange"
[13] "LiCl.1000.8hr.FoldChange" "LiCl.2000.8hr.FoldChange"
[15] "LiCl.5000.8hr.FoldChange" "LiCl.10000.8hr.FoldChange"
But when I try to apply sweep...
> test <- sweep(t, 2, z, n, FUN="testsweep")
Error in if (check.margin) { : argument is not interpretable as logical
In addition: Warning message:
In if (check.margin) { :
the condition has length > 1 and only the first element will be used
When I remove the n parameter from this test example, sweep works fine. This suggests to me that sweep cannot be used unless the all parameters provided to sweep are either the same number of columns as the t vector, or of length 1. Please correct me if I am mistaken...
You are asking to "sweeping" v across rows of y with the "+" function:
sweep(y, 1, v, FUN="+")
v1 v2 v3
1 2 7 44
2 6 9 7
3 3 5 35
If your actual problem is really no more complicated than this, you can take advantage of R's recycling rules. You need to transpose y first, then add, then transpose the result because R matrices are stored in column-major order.
t(t(y)+v)
v1 v2 v3
1 2 8 46
2 5 9 8
3 1 4 35
I don't think you need mapply here. Just use t() directly or you can use rep() to make the recycling match as you want:
> set.seed(1)
> mat <- matrix(sample(1:100, 9, TRUE), ncol = 3)
> vec <- 1:3
>
> mat
[,1] [,2] [,3]
[1,] 27 91 95
[2,] 38 21 67
[3,] 58 90 63
#Approach 1 using t()
> ans1 <- t(t(mat) + vec)
#Approach 2 using rep()
> ans2 <- mat + rep(vec, each = nrow(mat))
#Are they the same?
> identical(ans1, ans2)
[1] TRUE
#Hurray!
> ans1
[,1] [,2] [,3]
[1,] 28 93 98
[2,] 39 23 70
[3,] 59 92 66
How about using apply?
t(apply(y, 1, function(x) x + v))
[,1] [,2] [,3]
[1,] 2 8 46
[2,] 5 9 8
[3,] 1 4 35
I don't know why apply returns the row as columms so it needs to be transposed.
I would defintely take a look at mdply form the plyr package. This exactly does what you want to do:
mdply(data.frame(mean = 1:5, sd = 1:5), rnorm, n = 2)