I want to find the depreciation using colSums - r

I have a table (let's name it development.costs), as below:
1 2 3 4
4 5 6 7
8 9 1 2
Each column represents a year.
I want to create another table (let's name it depreciation.costs) with the same dimensions such that:
in each row, each element of each column is equal to:
0.4*[(Sum of all elements of that row of the development.costs table up until the year of the element) - (Sum of all elements of that row of the depreciation.costs table up until one year before)]
so I want to create a table
a b c d
e f g h
i j k l
such that e.g. c =0.4*[(1+2+3) - (a+b)]
the code I managed to write is
for (y in Years)
{depreciation.costs[y, ] <- 0.4*(colSums(development.costs[1:y], )-colSums(depreciation.costs[1:(y-1), ]))}
where Years <- 1:4
but this is wrong since the system gives me the error
Error in colSums(depreciation.rate[, 1:(y - 1)]) :
'x' must be an array of at least two dimensions
many thanks for any feedback

This seems to match the algorithm you describe, though it's difficult to tell from your description what the values in the first column are supposed to be.
Here's your data in matrix form:
dev_costs <- t(matrix(c(1:4, 4:7, 8:9, 1:2), nrow = 4))
dev_costs
#> [,1] [,2] [,3] [,4]
#> [1,] 1 2 3 4
#> [2,] 4 5 6 7
#> [3,] 8 9 1 2
We can easily make a cumulative sum of rows like this:
cum_dev <- t(apply(dev_costs, 1, cumsum))
Then an iterative loop to complete the algorithm:
answer <- cum_dev
for(i in seq(ncol(cum_dev))[-1])
{
answer[,i] <- 0.4 * (cum_dev[,i] - rowSums(answer[,1:(i-1), drop = FALSE]))
}
Giving us
answer
#> [,1] [,2] [,3] [,4]
#> [1,] 1 0.8 1.68 2.608
#> [2,] 4 2.0 3.60 4.960
#> [3,] 8 3.6 2.56 2.336
Created on 2020-03-06 by the reprex package (v0.3.0)

Related

Is there an R function to retrieve values from a matrix of column names?

I have a matrix M consisting of column names from a data frame with one row, such that each column name has just one corresponding value in the data frame. Is there a function to create a new matrix with the corresponding values from the column names in M?
M <- t(data.frame(A=c("label_1","label_2","label_3"),
B=c("label_4","label_5","label_6"),
C=c("label_7","label_8","label_9")))
M
> [,1] [,2] [,3]
A "label_1" "label_2" "label_3"
B "label_4" "label_5" "label_6"
C "label_7" "label_8" "label_9"
df <- data.frame(label_2=5, label_1=0, label_4=7,
label_6=15, label_3=12, label_5=11,
label_9=9, label_8=15, label_7=35)
df
> label_2 label_1 label_4 label_6 label_3 label_5 label_9 label_8 label_7
1 5 0 7 15 12 11 9 15 35
## I want to create a new data.frame with the values from these labels
> [,1] [,2] [,3]
A 0 5 12
B 7 11 15
C 35 15 9
One possible way I'm aware of is to convert the data frame df to a key-value pair, with k = column names and v = values. I could then retrieve the values using:
apply(M,2,function(x){df[df$k==x,"v"]})
But this seems too overcomplicated for what should be a simple operation...
Additionally, I would prefer not to use any libraries outside of dplyr or tidyr to minimize the dependencies needed in my code.
Updated to an easier code using Onyambu's suggestion:
M <- t(data.frame(A=c("label_1","label_2","label_3"),
B=c("label_4","label_5","label_6"),
C=c("label_7","label_8","label_9")))
df <- data.frame(label_2=5, label_1=0, label_4=7,
label_6=15, label_3=12, label_5=11,
label_9=9, label_8=15, label_7=35)
P<-matrix(df[c(M)],nrow(M))
P
> P
[,1] [,2] [,3]
[1,] 0 5 12
[2,] 7 11 15
[3,] 35 15 9

How does the 'group' argument in rowsum work?

I understand what rowsum() does, but I'm trying to get it to work for myself. I've used the example provided in R which is structured as such:
x <- matrix(runif(100), ncol = 5)
group <- sample(1:8, 20, TRUE)
xsum <- rowsum(x, group)
What is the matrix of values that is produced by xsum and how are the values obtained. What I thought was happening was that the values obtained from group were going to be used to state how many entries from the matrix to use in a rowsum. For example, say that group = (2,4,3,1,5). What I thought this would mean is that the first two entries going by row would be selected as the first entry to xsum. It appears as though this is not what is happening.
rowsum adds all rows that have the same group value. Let us take a simpler example.
m <- cbind(1:4, 5:8)
m
## [,1] [,2]
## [1,] 1 5
## [2,] 2 6
## [3,] 3 7
## [4,] 4 8
group <- c(1, 1, 2, 2)
rowsum(m, group)
## [,1] [,2]
## 1 3 11
## 2 7 15
Since the first two rows correspond to group 1 and the last 2 rows to group 2 it sums the first two rows giving the first row of the output and it sums the last 2 rows giving the second row of the output.
rbind(`1` = m[1, ] + m[2, ], `2` = m[3, ] + m[4, ])
## [,1] [,2]
## 1 3 11
## 2 7 15
That is the 3 is formed by adding the 1 from row 1 of m and the 2 of row 2 of m. The 11 is formed by adding 5 from row 1 of m and 6 from row 2 of m.
7 and 15 are formed similarly.

Nearest Neighbors from KKNN package in R giving garbage indices values when the entire dataset is used

I am using "kknn" package in R to find all of the nearest neighbors for every row in the data set. For some odd reasons, the last row in the test dataset is always ignored. Below, is the R code and the output obtained.
X1 <- c(0.6439659, 0.1923593, 0.3905551, 0.7728847, 0.7602632)
X2 <- c(0.9147394, 0.6181713, 0.8515923, 0.8459367, 0.9296278)
Class <- c(1, 1, 0, 0, 0)
Data <- data.frame(X1,X2,Class)
Data$Class <- as.factor(Data$Class)
library("kknn")
### Here, both training and testing data sets is the object Data
Neighbors.KNN <- kknn(Data$Class~., Data,Data,k = 5, distance =2, kernel = "gaussian")
## Output
## The Column 5 in the below output is filled with garbage values and the value of the first value in the last row is 4, when it has to be 5.
Neighbors.KNN$C
[,1] [,2] [,3] [,4] [,5]
[1,] 1 4 3 2 3245945
[2,] 2 3 4 1 3245945
[3,] 3 1 4 2 3245945
[4,] 4 1 3 2 3245945
[5,] 1 4 3 2 3245945
Could someone let me know if I am doing something wrong or if that is a bug in the package?
the current implementation (silently) assumes that k is smaller than n, the number of rows. In general will be k << n and this case is no problem. The (k+1)th is used to scale distances. I should have mentioned this in the documentation.
Regards,
Klaus

Deleting inverses in a matrix in R

I have initially a matrix, p:
# p is a matrix
p
A B
[1,] 1 1
[2,] 2 3
[3,] 3 2
[4,] 1 1
[5,] 8 2
For a given matrix, I want to iterate through the rows and removing any inversions. So that the new matrix is:
p
A B
[1,] 1 1
[2,] 2 3
[3,] 8 2
This is what I got:
p<-unique(p) # gets rid of duplicates
output<-lapply(p, function(x){
check<-which(p$A[x,] %in% p$B[x,])#is the value in row x of column A found in
#column B if so return the row number it was found in column B
if (length(check)!=0 ){
if(p$A[check,]== p$B[x]){ # now check if at the found row (check)of p$A is equal to p$B[x]
p<-p[-check,] #if so remove that inverse
}
}
}
)
I get this message Error in which(p$A[x] %in% p$B[x]) :
Why am I getting this Error?
Is there a better way to find inversions?
Try
p <- unique(p)
p[!duplicated(apply(p, 1, function(x) paste(sort(x), collapse=''))),]
# A B
#[1,] 1 1
#[2,] 2 3
#[3,] 8 2
data
p <- matrix(c(1,2,3,1,8, 1,3,2,1,2),
dimnames=list(NULL, c("A", "B")), ncol=2)
It's not clear whether the order of values is important in your final output, but perhaps you can make use of pmin and pmax.
Here's an approach using those functions within "data.table":
library(data.table)
unique(as.data.table(p)[, list(A = pmin(A, B), B = pmax(A, B))])
# A B
# 1: 1 1
# 2: 2 3
# 3: 2 8
The question is a bit unclear. I am assuming based on your example that you want to remove the row containing "3 2" because first value occurs in the second column (in a different row). In that case
check <- which(p[,1] %in% p[,2])
should return the rows that you want to delete. Your second round of checking is not needed. You could just delete the rows returned.

Swap (selected/subset) data frame columns in R

What is the simplest way that one can swap the order of a selected subset of columns in a data frame in R. The answers I have seen (Is it possible to swap columns around in a data frame using R?) use all indices / column names for this. If one has, say, 100 columns and need either: 1) to swap column 99 with column 1, or 2) move column 99 before column 1 (but keeping column 1 now as column 2) the suggested approaches appear cumbersome. Funny there is no small package around for this (Wickham's "reshape" ?) - or can one suggest a simple code ?
If you really want a shortcut for this, you could write a couple of simple functions, such as the following.
To swap the position of two columns:
swapcols <- function(x, col1, col2) {
if(is.character(col1)) col1 <- match(col1, colnames(x))
if(is.character(col2)) col2 <- match(col2, colnames(x))
if(any(is.na(c(col1, col2)))) stop("One or both columns don't exist.")
i <- seq_len(ncol(x))
i[col1] <- col2
i[col2] <- col1
x[, i]
}
To move a column from one position to another:
movecol <- function(x, col, to.pos) {
if(is.character(col)) col <- match(col, colnames(x))
if(is.na(col)) stop("Column doesn't exist.")
if(to.pos > ncol(x) | to.pos < 1) stop("Invalid position.")
x[, append(seq_len(ncol(x))[-col], col, to.pos - 1)]
}
And here are examples of each:
(m <- matrix(1:12, ncol=4, dimnames=list(NULL, letters[1:4])))
# a b c d
# [1,] 1 4 7 10
# [2,] 2 5 8 11
# [3,] 3 6 9 12
swapcols(m, col1=1, col2=3) # using column indices
# c b a d
# [1,] 7 4 1 10
# [2,] 8 5 2 11
# [3,] 9 6 3 12
swapcols(m, 'd', 'a') # or using column names
# d b c a
# [1,] 10 4 7 1
# [2,] 11 5 8 2
# [3,] 12 6 9 3
movecol(m, col='a', to.pos=2)
# b a c d
# [1,] 4 1 7 10
# [2,] 5 2 8 11
# [3,] 6 3 9 12

Resources