Count nonzeros element along an axis in pytorch - count

I have a (32x750) tensor
tensor([[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0043],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0043],
[ 0.0000, 0.0044, 0.0000, ..., 0.0044, 0.0000, 0.0000],
...,
[ 0.0059, 0.0000, 0.0059, ..., 0.0059, 0.0000, 0.0000],
[ 0.0059, 0.0000, 0.0059, ..., 0.0059, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0056, 0.0000]], device='cuda:0')
And I want to get the number of nonzero elements along each rows. Something like that [12 47 0 5 .... 8 7 50]
This discussion and this didn't solve my problem and concerned the number of nonzero elements for 1-D tensor.
Thanks

Problem solved using this post
I used: 750 - (tensor == 0).sum(dim=1)

list_of_num_nonzero_in_each_row = []
for row in my tensor:
list_of_num_nonzero_in_each_row.append(sum(row == 0.0).item())

Torch now has a count_nonzero function built in. This vectorized implementation is going to be faster than iterating over the tensor. It also supports counting over a given dimension.
> torch.count_nonzero(x, dim=0)

Related

Implementing a function to calculate variance of period-to-period change of markov chain

I am working on a research project and a while back I asked this question on Mathematics Stack Exchange, where I was looking for a way to calculate the variance of the period-to-period change in income given a transition matrix, where each state corresponds to a log level of income in a vector, which is given. I want to calculate what the variance of an individual's change in income is over some n number of periods given that they began in each state. My state space consists of 11 states, so I hope to end up with a vector consisting of 11 different variances. When I asked the question, I received a satisfactory answer, but I am running into some issues when trying to code it in R I was hoping to receive help with.
I have created this piece of code to calculate the variances:
install.packages("expm")
library(expm)
# creating standard basis vectors
e <- function(i) {
e_i = rep(0, length(alpha))
e_i[i] = 1
return(e_i)
}
# compute variances
p2p_variance = function(n, alpha, P) {
variance = list()
pi_n = list()
for (i in 1:length(alpha)) {
pi_n[[i]] = e(i) %*% (P %^% n)
beta = (t(alpha) - t(alpha)[i])^2
variance[[i]] = (pi_n[[i]] %*% t(beta)) - (((pi_n[[i]] %*% alpha) - alpha[i]) %^% 2)
}
return(t(variance))
}
And for my values of alpha (vector of log levels of income) and P (transition matrix) I use:
alpha = c(3.4965, 3.5835, 3.6636, 3.7377, 3.8067, 3.8712, 3.9318, 3.9890, 4.0431, 4.0943, 4.1431)
P = rbind(c(0.9004, 0.0734, 0.0203, 0.0043, 0.0010, 0.0003, 0.0001, 0.0001, 0.0000, 0.0000, 0.0000),
c(0.3359, 0.3498, 0.2401, 0.0589, 0.0115, 0.0026, 0.0007, 0.0003, 0.0001, 0.0001, 0.0000),
c(0.1583, 0.1538, 0.3931, 0.2346, 0.0481, 0.0090, 0.0021, 0.0007, 0.0003, 0.0001, 0.0001),
c(0.0746, 0.0609, 0.1600, 0.4368, 0.2178, 0.0397, 0.0073, 0.0019, 0.0006, 0.0002, 0.0001),
c(0.0349, 0.0271, 0.0559, 0.1724, 0.4628, 0.2031, 0.0344, 0.0067, 0.0018, 0.0006, 0.0003),
c(0.0155, 0.0122, 0.0230, 0.0537, 0.1817, 0.4870, 0.1860, 0.0316, 0.0066, 0.0018, 0.0009),
c(0.0066, 0.0054, 0.0100, 0.0204, 0.0529, 0.1956, 0.4925, 0.1772, 0.0307, 0.0064, 0.0023),
c(0.0025, 0.0023, 0.0043, 0.0084, 0.0186, 0.0530, 0.2025, 0.4980, 0.1760, 0.0275, 0.0067),
c(0.0009, 0.0009, 0.0017, 0.0035, 0.0072, 0.0168, 0.0490, 0.2025, 0.5194, 0.1721, 0.0260),
c(0.0003, 0.0003, 0.0007, 0.0013, 0.0029, 0.0061, 0.0142, 0.0430, 0.2023, 0.5485, 0.1804),
c(0.0001, 0.0001, 0.0002, 0.0003, 0.0008, 0.0017, 0.0032, 0.0068, 0.0212, 0.1079, 0.8578))
For instance, a call of p2p_variance(100, alpha, P) (calculating the variance over 100 periods) results in the following vector of variances:
0.04393012 0.04091066 0.03856503 0.03636202 0.03472286 0.03331921 0.03213084 0.03068901 0.03143765 0.03255994 0.03522346
Which seem plausible. However, If I run p2p_variance(1000, alpha, P), it results in:
0.06126449 0.03445073 0.009621497 -0.01447615 -0.03652425 -0.05752316 -0.07753646 -0.09726683 -0.1134972 -0.1287498 -0.141676
This is obviously not correct, since we cannot have negative variance. I cannot figure out why simply increasing n to 1000 is resulting in negative variance here. I have most likely coded my p2p_variance function incorrectly, but I cannot for the life of me find the issue. Or perhaps is the process I am using to find these variances flawed somehow? I would really appreciate if anyone could look over this code and help me diagnose the issue
Your variance function is returning the difference, and if you want the absolute value (variance) just wrap it inside abs() like this:
p2p_variance = function(n, alpha, P) {
variance = list()
pi_n = list()
for (i in 1:length(alpha)) {
pi_n[[i]] = e(i) %*% (P %^% n)
beta = (t(alpha) - t(alpha)[i])^2
variance[[i]] = abs((pi_n[[i]] %*% t(beta)) - (((pi_n[[i]] %*% alpha) - alpha[i]) %^% 2))
}
return(t(variance))
}
p2p_variance(1000, alpha, P)
Output:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[1,] 0.06126449 0.03445073 0.009621497 0.01447615 0.03652425 0.05752316 0.07753646 0.09726683 0.1134972 0.1287498 0.141676

R Scale a vector with negative and positive numbers between 1 and -1

I have a vector as follows
vec <- c(0, -0.072, -0.092, -0.092, -0.222, -0.445, -0.345, -0.031,
0.016, 0.158, 0.349, 0.749, 1.182, 1.289, 1.578, 1.767, 1.621,
1.666, 1.892, 1.866, 1.821, 1.702, 1.69, 1.53, 1.38, 1.494, 1.833,
2.392, 2.502, 2.921, 3.363, 3.698, 3.645, 3.89, 3.987, 4.066,
3.963, 3.749, 3.512, 3.259, 3.153, 2.972, 2.918, 2.93, 2.719,
2.458, 2.275, 2.346, 2.588, 2.774, 2.607, 2.336, 1.799, 1.365,
1.025, 0.379, -0.087, -0.765, -1.19, -1.423, -1.751, -1.965,
-1.907, -1.919, -1.848, -1.772, -1.49, -1.19, -1.104, -1.138,
-1.054, -1.139, -1.269, -1.429, -1.56, -1.543, -1.364, -1.318,
-1.094, -1.061, -0.918, -0.861, -0.913, -0.767, -0.615, -0.532,
-0.615, -0.688, -0.75, -0.724, -0.755, -0.685, -0.752, -0.863,
-0.944, -1.004, -1.02, -1.041, -1.073, -1.392)
The following code scales this vector between 1 and -1 perfectly fine.
scale <- function(input)
{
min_val = min(input, na.rm = T)
max_val = max(input, na.rm = T)
average = (min_val + max_val) / 2
range = (max_val - min_val) / 2
normalized_x = (input - average) / range
return(normalized_x)
}
However, I want to scale this vector from -1 to 1 while keeping the midpoint at 0.
Can someone please improve the above function to center this scaling around 0?
Thanks!
Calling this operation "normalization" is rather confusing. The correct term is scaling. Normalization means you have transformed the values to something resembling a Normal distribution. (There is an R function named scale,)
This will scale the values below 0 to the range [-1, 0) and the values above 0 to the range (0,1] which is what I understand to be the desire:
c( -vec[vec<0]/min(vec), vec[vec>=0]/max(vec) )
They are not in the original order, however. If that is desired, you might need to put an ifelse operation in place:
newvec <- ifelse(vec < 0 , -vec[vec<0]/min(vec), vec[vec>=0]/max(vec) )
#-------------
> sum(newvec<0)
[1] 51
> sum(newvec>0)
[1] 47
> sum(newvec==0)
[1] 2

"Streamlining" an R code for population models

I was wondering if it would be possible to get help streamlining some code I have made for a class at uni. I have essentially been thrown into the deep end with R in the past week (so know very little), and wanted to know if there is a really obvious way I could streamline this so it isn't as clunky!
I am calculating the settlement rate of a population of barnacles on the rocky shore (As per Hines 1979). I have my script up and running for my three species at four different settlement rates no problem, I just wanted to know how I could neaten it up a bit. The script is as follows:
# Roughgarden et al 1985
# Six age classes. Data from Roughgardenetal1985_1Species.xls
# Population projection matrix
############################### C.FISSUS #####################################
#1.0
A <- matrix(c(0.8609, 1.4062, 1.9515, 2.4957, 2.6825, 2.8339,
0.1522, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.2378, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.1000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.1000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.1000, 0.0000
), nrow=6, byrow=TRUE)
par(mfrow=c(2,4))
# Starting population vector
N0 <- matrix(c(0, 0, 0, 0, 0, 0), nrow=6, byrow=TRUE)
# Settlement per unit free space (per cm2 / 100 = per mm2), for each species use: 1.0, 0.1, 0.01, and 0.001
s <-1.0
# Area occupied by age classes (mm2)
Ax <- matrix(c(2.33,9.45,15.15,18.78,20.92,22.14), nrow=6, byrow=TRUE)
# Set up matrix to store population stage (rows) structure over time (cols)
Nt<-matrix(data=0, ncol=50, nrow=6) # Create a vector to store results
Nt[,1]<-N0 # Make the first element (col 1) equal to N0
for (step in 1:49) { # Step through time, calculating Nt+1 each time
Nt[,step+1]<- A %*% Nt[,step] # Apply mortality
AreaOfBarnacles <- Ax * Nt[,step+1] # Calculate area occupied by surviving barnacles
Ft <- max(100 - sum(AreaOfBarnacles),0) # Calculate free space
print(sum(AreaOfBarnacles))
Nt[1,step+1] <- s * Ft # Number of new recruits
}
#Nt
# Transpose Nt for plotting
TNt <- t(Nt)
matplot(TNt, xlab = "Time, t", ylab = "Population Size, Nt", type="l", main = "Chthamalus fissus")
title(main="s = 1.0", line = 0.5)
I essentially need to run this part of script a total of 12 times. Four times for each of the three species (with a changing s value each time (1, 0.1, 0.01, and 0.001). I wanted to try and make it so I could add a bit where it would kind of be like "run this script under these four different settlement rates and produce four graphs of it each time" so I would just have this section of script repeated three times (once for each species). However, I can't seem to get it to work and ended up doing it the long way!
Thank you so much for taking the time to read this lengthy question, like I said, I'm VERY new to R (and coding in general) so I do apologise if anything I am asking is stupid!
P.S. (bonus round?)
How would I add a legend to these graphs without it getting in the way? Is there a way I can make a legend that is its own image so it doesn't overlay my graphs?
You can wrap your operations into a function:
## Defining the function
population.projection <- function(settlement, matrix_A, area_occupied) {
# Starting population vector
N0 <- matrix(c(0, 0, 0, 0, 0, 0), nrow=6, byrow=TRUE)
# Set up matrix to store population stage (rows) structure over time (cols)
Nt<-matrix(data=0, ncol=50, nrow=6) # Create a vector to store results
Nt[,1]<-N0 # Make the first element (col 1) equal to N0
for (step in 1:49) { # Step through time, calculating Nt+1 each time
Nt[,step+1]<- matrix_A %*% Nt[,step] # Apply mortality
AreaOfBarnacles <- area_occupied * Nt[,step+1] # Calculate area occupied by surviving barnacles
Ft <- max(100 - sum(AreaOfBarnacles),0) # Calculate free space
# print(sum(AreaOfBarnacles))
Nt[1,step+1] <- settlement * Ft # Number of new recruits
}
# Transpose Nt for plotting
return(t(Nt))
}
This function intakes your s variable and the two matrices A and Ax renamed settlement, matrix_A and area_occupied to be more self-explanatory.
You can then input your data:
## matrix_A input
matrix_A<- matrix(c(0.8609, 1.4062, 1.9515, 2.4957, 2.6825, 2.8339,
0.1522, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.2378, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.1000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.1000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.1000, 0.0000
), nrow=6, byrow=TRUE)
## Area occupied by age classes (mm2)
area_occupied <- matrix(c(2.33,9.45,15.15,18.78,20.92,22.14), nrow=6, byrow=TRUE)
## Setting the s values
my_settlement_values <- c(1, 0.1, 0.01, 0.001)
And loop through your settlement values for plotting the results:
## Setting the graphic parameters
par(mfrow=c(2,2))
## Looping through the s values
for(one_settlement in my_settlement_values) {
## Plotting the results
matplot(population.projection(settlement = one_settlement, matrix_A, area_occupied), xlab = "Time, t", ylab = "Population Size, Nt", type="l", main = "Chthamalus fissus")
## Adding the title
title(main = paste("s =", one_settlement), line = 0.5)
}

Determining/extracting weights of prior observations' on a forecasted value using ets() in R

I am trying to determine the influence that prior observations in a time series have on a forecasted value using exponential smoothing, i.e. the weights of the prior observations. For example, if I have a time series with 30 observations, and I forecast to time 34, I would like to determine the weight of observations 1-30 on the prediction for time 34.
I am using the ets() function in R, and letting the function determine the best model for the most part. I am fitting models to many different time series, most of which have a seasonality component.
Here is a code example:
> obs <- c(0.448, 0.63, 0.761, 0.904, 0.994, 1.122,
1.235, 1.29, 1.336, 1.396, 1.447, 1.518,
1.585, 1.602, 1.617, 1.528, 0.432, 0.571,
0.687, 0.81, 0.932, 1.006, 1.047, 1.106,
1.185, 1.248, 1.283, 1.323, 1.384, 1.404,
1.419, 1.375, 0.357, 0.498, 0.633, 0.739,
0.846, 0.936, 1.02, 1.103, 1.172, 1.238, 1.3, 1.358)
> ts_obj <- ts(obs, start=c(2015 , 1) , frequency = 16)
> fit <- ets(ts_obj, lambda = BoxCox.lambda(ts_obj))
> print(fit)
ETS(A,N,A)
Call:
ets(y = ts_obj, lambda = BoxCox.lambda(ts_obj))
Box-Cox transformation: lambda= -0.0012
Smoothing parameters:
alpha = 0.6842
gamma = 0.1166
Initial states:
l = 0.1642
s=0.3307 0.3667 0.3574 0.3359 0.296 0.25
0.2071 0.1673 0.1156 0.0623 -0.0284 -0.1294 -0.2724 -0.4426 -0.6466 -0.9695
sigma: 0.0205
AIC AICc BIC
-137.7304 -106.0637 -103.8308
> my_forecast <- forecast(fit, h = 4)
> print(my_forecast)
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
2017.750 1.415694 1.379033 1.453332 1.360012 1.473660
2017.812 1.443549 1.398378 1.490180 1.375042 1.515473
2017.875 1.458954 1.406649 1.513206 1.379724 1.542739
2017.938 1.405555 1.349522 1.463919 1.320770 1.495790
Weights for prior observations must be calculated in order to get these forecasts, correct? Is there a way to extract them or compute them myself?
I have tried working out the algebra to do so using the equations given in Dr. Rob J Hyndman's and Dr. George Athanasopoulos's Forecasting: principles and practice free online textbook (which is great by the way). However, I have struggled because of the complexity of the recursion.
Thanks for the help in advance!

Portfolio optimisation in R - group ratio constraints

Let's say we have a simple long-only problem with four assets and several constraints. Below is how I would normally optimise portfolio weights with some basic constraints, such as weights sum to 1, no short selling and no leverage.
# set the covariance matrix:
cov <- cbind(c(0.1486, 0.0778, -0.0240, -0.0154),
c(0.0778, 0.1170, 0.0066, 0.0029),
c(-0.0240, 0.0066, 0.0444, 0.0193),
c(-0.0154, 0.0029, 0.0193, 0.0148)
)
# expected returns:
dvec <- c(0.0308, 0.0269, 0.0145, 0.0130)
# constraints:
# 1) weights to sum to 1
# 2) minimum weight for each asset = 0
# 3) maximum weight for each asset = 1
Amat <- cbind(c(1, 1, 1, 1), diag(1,4,4), diag(-1,4,4))
bvec <- c(1, 0, 0, 0, 0, -1, -1, -1, -1)
meq = 1
# The solution for portfolio weights is as follows:
round(solve.QP(cov, dvec=dvec, Amat=Amat, bvec=bvec, meq=meq)$solution,4)
Now, I would like to add a constraint that the first asset is less or equal than 60% of the first three assets taken together. How could I add this constraint to the above portfolio? It is easy to set the upper bound for an asset as a percentage of the overall portfolio, but I don't know how to set the upper bound for an asset as a percentage of a certain group of assets.
Any thoughts would be much appreciated.

Resources