sum by groups in matrix in R - r

so I am starting to learn R and don't know if there is an easy way to sum every n parameters of a matrix going by each row and when it finishes the range move to the other set of columns until all the columns have been computed
[1 4 7]
[2 5 8]
[3 6 9]
so in this case if n=2 the output should be
[5 11 8]
[7 13 10]
[9 15 12]
Is there an efficient way? Thank you!

data:
m <- matrix(1:9, 3, 3)
setting:
n = 2
code:
t(
apply(m, 1, function(x) { zoo::rollsum(c(x,x), n, align = "left")[seq_along(x)] })
)
result:
# [,1] [,2] [,3]
#[1,] 5 11 8
#[2,] 7 13 10
#[3,] 9 15 12
your homework: :-)
Next question will be a clear one.
Reading about every function I have used: for eg type ?t, ?apply ... etc into R-console.

Related

How can I select a specific element across all rows from a list within a tibble in R

I have a tibble in which one column is a list containing 2x2 matrices. I want to be able to select a specific element from the matrices across all rows in the tibble. I am able to select a specific element from one tibble row using indexing:
t1 <- tibble(x = 1:2, y = 1, z = x ^ 2 + y)
rM1 <- matrix(c(2,3,1,4), nrow=2, ncol=2, byrow = TRUE)
rM2 <- matrix(c(10,19,9,15), nrow=2, ncol=2, byrow = TRUE)
t1$my.lists <- list(rM1,rM2)
t1[[4]][[2]][[2,2]]
[1] 15
However when I try to access that specific element across multiple rows I get an error:
t1[[4]][1:2][[2,2]]
Error in t1[[4]][1:2][[2, 2]] : incorrect number of subscripts
I have also tried using piping and functions such as slice but still haven't been able to acheive the desired result. In this example I expect a return of:
[1] 4 15
where 4 is the 2x2 element from rM1 and 15 is the 2x2 element from rM2. Of course I could write a loop to achieve this but I assume there is also a more direct way to do this.
We can use sapply to loop over the list column number 4, and extract the elements based on row/column index
sapply(t1[[4]], function(x) x[2, 2])
#[1] 4 15
Or with map
library(dplyr)
library(purrr)
t1 %>%
mutate(new = map_dbl(my.lists, ~ .x[2, 2]))
# A tibble: 2 x 5
# x y z my.lists new
# <int> <dbl> <dbl> <list> <dbl>
#1 1 1 2 <dbl[,2] [2 × 2]> 4
#2 2 1 5 <dbl[,2] [2 × 2]> 15
The OP's code didn't work out because the below is a list
t1[[4]][1:2]
#[[1]]
# [,1] [,2]
#[1,] 2 3
#[2,] 1 4
#[[2]]
# [,1] [,2]
#[1,] 10 19
#[2,] 9 15
and the row/column indexing can be done by selecting each list element one by one or using a loop
t1[[4]][1:2][[2]][2,2]
#[1] 15

I want to find the depreciation using colSums

I have a table (let's name it development.costs), as below:
1 2 3 4
4 5 6 7
8 9 1 2
Each column represents a year.
I want to create another table (let's name it depreciation.costs) with the same dimensions such that:
in each row, each element of each column is equal to:
0.4*[(Sum of all elements of that row of the development.costs table up until the year of the element) - (Sum of all elements of that row of the depreciation.costs table up until one year before)]
so I want to create a table
a b c d
e f g h
i j k l
such that e.g. c =0.4*[(1+2+3) - (a+b)]
the code I managed to write is
for (y in Years)
{depreciation.costs[y, ] <- 0.4*(colSums(development.costs[1:y], )-colSums(depreciation.costs[1:(y-1), ]))}
where Years <- 1:4
but this is wrong since the system gives me the error
Error in colSums(depreciation.rate[, 1:(y - 1)]) :
'x' must be an array of at least two dimensions
many thanks for any feedback
This seems to match the algorithm you describe, though it's difficult to tell from your description what the values in the first column are supposed to be.
Here's your data in matrix form:
dev_costs <- t(matrix(c(1:4, 4:7, 8:9, 1:2), nrow = 4))
dev_costs
#> [,1] [,2] [,3] [,4]
#> [1,] 1 2 3 4
#> [2,] 4 5 6 7
#> [3,] 8 9 1 2
We can easily make a cumulative sum of rows like this:
cum_dev <- t(apply(dev_costs, 1, cumsum))
Then an iterative loop to complete the algorithm:
answer <- cum_dev
for(i in seq(ncol(cum_dev))[-1])
{
answer[,i] <- 0.4 * (cum_dev[,i] - rowSums(answer[,1:(i-1), drop = FALSE]))
}
Giving us
answer
#> [,1] [,2] [,3] [,4]
#> [1,] 1 0.8 1.68 2.608
#> [2,] 4 2.0 3.60 4.960
#> [3,] 8 3.6 2.56 2.336
Created on 2020-03-06 by the reprex package (v0.3.0)

How does the 'group' argument in rowsum work?

I understand what rowsum() does, but I'm trying to get it to work for myself. I've used the example provided in R which is structured as such:
x <- matrix(runif(100), ncol = 5)
group <- sample(1:8, 20, TRUE)
xsum <- rowsum(x, group)
What is the matrix of values that is produced by xsum and how are the values obtained. What I thought was happening was that the values obtained from group were going to be used to state how many entries from the matrix to use in a rowsum. For example, say that group = (2,4,3,1,5). What I thought this would mean is that the first two entries going by row would be selected as the first entry to xsum. It appears as though this is not what is happening.
rowsum adds all rows that have the same group value. Let us take a simpler example.
m <- cbind(1:4, 5:8)
m
## [,1] [,2]
## [1,] 1 5
## [2,] 2 6
## [3,] 3 7
## [4,] 4 8
group <- c(1, 1, 2, 2)
rowsum(m, group)
## [,1] [,2]
## 1 3 11
## 2 7 15
Since the first two rows correspond to group 1 and the last 2 rows to group 2 it sums the first two rows giving the first row of the output and it sums the last 2 rows giving the second row of the output.
rbind(`1` = m[1, ] + m[2, ], `2` = m[3, ] + m[4, ])
## [,1] [,2]
## 1 3 11
## 2 7 15
That is the 3 is formed by adding the 1 from row 1 of m and the 2 of row 2 of m. The 11 is formed by adding 5 from row 1 of m and 6 from row 2 of m.
7 and 15 are formed similarly.

Filling dataframe with loops

I have a dataframe:
Start <- data.frame("Number" = 2,"Square" = 4,"Cube" = 8)
A Vector of inputs:
Numbers <- c(3,5)
I want to iterate the elements of Numbers in the function Squarecube and fill the dataframe with the results:
SquareCube <- function(x){ df <- c(x^2,x^3)
df}
Desired Output:
Filled <- data.frame("Number" = c(2,3,5),"Square" = c(4,9,25),"Cube" = c(8,27,125))
Note: Already searched for this topic , but in this case the size of the vector Numbers can be different. My intent is to fill the dataframe with the results of the function.
Thanks
If I am reading your question right, you may just be having issues with structure that do.call may be able to help with. I also redefined the function slightly to accommodate the naming:
Start <- data.frame("Number" = 2,"Square" = 4, "Cube" = 8)
Number <- c(3,5)
Define your function:
SquareCube <- function(x){ list(Number=x,Square=x^2,Cube=x^3) }
Then construct the data frame with desired end results:
> rbind(Start, data.frame( do.call(cbind, SquareCube(Number)) ))
Number Square Cube
1 2 4 8
2 3 9 27
3 5 25 125
You can also make a wrapper function and just hand it the Start data and the original Number list that you want to process, which will yield a data frame:
> makeResults <- function(a, b) { rbind(a, data.frame(do.call(cbind,SquareCube(b)))) }
> makeResults(Start, Number)
Number Square Cube
1 2 4 8
2 3 9 27
3 5 25 125
outer() function produces matrix which has exactly same output of yours. You can just change it to data frame and rename.
(Filled <- outer(
c(2, 3, 5),
1:3,
FUN = "^"
))
#> [,1] [,2] [,3]
#> [1,] 2 4 8
#> [2,] 3 9 27
#> [3,] 5 25 125
For this matrix, you can use any function what you know to
change class
change column names
Here, for instance, dplyr::rename():
library(tidyverse)
Filled %>%
as_tibble() %>% # make data frame
rename(Number = V1, Square = V2, Cube = V3) # rename column names
#> # A tibble: 3 x 3
#> Number Square Cube
#> <dbl> <dbl> <dbl>
#> 1 2 4 8
#> 2 3 9 27
#> 3 5 25 125

Swap (selected/subset) data frame columns in R

What is the simplest way that one can swap the order of a selected subset of columns in a data frame in R. The answers I have seen (Is it possible to swap columns around in a data frame using R?) use all indices / column names for this. If one has, say, 100 columns and need either: 1) to swap column 99 with column 1, or 2) move column 99 before column 1 (but keeping column 1 now as column 2) the suggested approaches appear cumbersome. Funny there is no small package around for this (Wickham's "reshape" ?) - or can one suggest a simple code ?
If you really want a shortcut for this, you could write a couple of simple functions, such as the following.
To swap the position of two columns:
swapcols <- function(x, col1, col2) {
if(is.character(col1)) col1 <- match(col1, colnames(x))
if(is.character(col2)) col2 <- match(col2, colnames(x))
if(any(is.na(c(col1, col2)))) stop("One or both columns don't exist.")
i <- seq_len(ncol(x))
i[col1] <- col2
i[col2] <- col1
x[, i]
}
To move a column from one position to another:
movecol <- function(x, col, to.pos) {
if(is.character(col)) col <- match(col, colnames(x))
if(is.na(col)) stop("Column doesn't exist.")
if(to.pos > ncol(x) | to.pos < 1) stop("Invalid position.")
x[, append(seq_len(ncol(x))[-col], col, to.pos - 1)]
}
And here are examples of each:
(m <- matrix(1:12, ncol=4, dimnames=list(NULL, letters[1:4])))
# a b c d
# [1,] 1 4 7 10
# [2,] 2 5 8 11
# [3,] 3 6 9 12
swapcols(m, col1=1, col2=3) # using column indices
# c b a d
# [1,] 7 4 1 10
# [2,] 8 5 2 11
# [3,] 9 6 3 12
swapcols(m, 'd', 'a') # or using column names
# d b c a
# [1,] 10 4 7 1
# [2,] 11 5 8 2
# [3,] 12 6 9 3
movecol(m, col='a', to.pos=2)
# b a c d
# [1,] 4 1 7 10
# [2,] 5 2 8 11
# [3,] 6 3 9 12

Resources