R: function cut - r

Here is my data:
>my.cut <- cut(my.variable, breaks = c(-Inf, -0.5, -0.25, -0.1, 0, 0.02, 0.05, 0.15, 0.3, 0.5, 1, Inf), right = FALSE)
>levels(my.cut)
"[-Inf,-0.5)" "[-0.5,-0.25)" "[-0.25,-0.1)" "[-0.1,0)" "[0,0.02)" "[0.02,0.05)" "[0.05,0.15)" "[0.15,0.3)" "[0.3,0.5)" "[0.5,1)" "[1, Inf)"
Expected result:
>levels(my.cut)
"[-Inf,-0.5)" "[-0.5,-0.25)" "[-0.25,-0.1)" "[-0.1,0)" "0" "(0,0.02)" "[0.02,0.05)" "[0.05,0.15)" "[0.15,0.3)" "[0.3,0.5)" "[0.5,1)" "[1, Inf)"
In the expected result, there is single figure 0 which I only want to choose the my.variable==0, but with the formula of the my.cut, there is no single 0 cause breaks can only be used for interval. So how could I do?
Hope to get your answer soon! Thanks!

You could explicitly put each value into a group. This is more flexible, but also a lot more verbose.
One way of doing this could be to define a bespoke cut function and then apply it to every element of your vector.
my.variable <- rnorm(100)
bespoke_cut <- function(value){
if (value < 0.1) return('[-Inf, 0.1)')
if (value < 0) return('[0.1, 0)')
if (value == 0) return('0')
return('(0, Inf]')
}
my.cut <- sapply(my.variable, bespoke_cut)
my.cut <- factor(my.cut)
I've only done a few of the groupings you wanted, but I think it should be apparent how to add extra groups.

I think the best you can hope for with 'cut' is to specify a really small range for 0, i.e.,
cps = c(-Inf, -0.1, 0-.Machine$double.eps, 0+.Machine$double.eps, 0.02, Inf)
bgroup = cut(c(-10, 10, 0,0), breaks = cps)
cat(deparse(levels(bgroup)), "\n") ## use this to edit the levels more easily
levels(bgroup) = c("(-Inf, -0.1]", "(-0.1,0)", "0", "(0,0.02]", "(0.02, Inf]")
table(bgroup)
Obviously, the display levels are not identical to those used to cut the data, but if you are okay with that window around 0, then the solution is to form the cuts with that value then change the labels.

You could do this : cut, assign 0 where my.variable == 0, refactor.
my.variable <- rnorm(100)
my.variable[sample(1:100,2)] <- 0
my.cut <- cut(my.variable, breaks = c(-Inf, -0.5, -0.25, -0.1, 0, 0.02, 0.05, 0.15, 0.3, 0.5, 1, Inf), right = FALSE)
lvl <- levels(my.cut)
lvlR <- c(lvl[1:4],"0","(0,0.02)",lvl[6:11])
my.cut <- as.character(my.cut)
my.cut[my.variable == 0] <- 0
my.cut <- factor(my.cut,levels=lvlR)
rm(lvl,lvlR)

Related

combining lists and dataframes in R from raster values

QUESTION EDITED FOR CLARITY AND REPRODUCIBILITY
I am trying to summarize proportions of landcover classes within many buffers contained within a list. Although it appears to be a common problem, I have not found an appropriate solution:
I have a raster stack called hab_stack with discrete values 1-6 for each of 3 layers (each layer == year). I also have locational data with >800,000 locations called dat_sf. I have extracted hab_stack raster values within a 400 m buffer around each location.
I now have a large list with ~800,000 elements (not all hab classes 1-6 are represented in each list). So I tried to create a dummy dataframe with all hab_stack values 1-6 called true_names with assigned frequency/proportion == zero for classes not represented within the buffer because I need to combine all proportions together. I have tried to accomplish this using an lapply looping structure but can't seem to get it quite right. Below is the full function and error:
sum_class <- lapply(values_hab, function(x){
true_names <- data.frame(x = 1:6, Freq = 0)
prop_df <- as.data.frame(prop.table(table(x))) %>%
mutate(x = as.numeric(x))
true_names %>%
anti_join(prop_df, by = "x") %>%
bind_rows(prop_df) %>%
arrange(x)
Error in `mutate()`:
! Problem while computing `x = as.numeric(x)`.
x `x` must be size 0 or 1, not 1659.
Run `rlang::last_error()` to see where the error occurred.
})
When I dissect the function, the error arises from the table(values_hab) argument = Error in table(values_hab) : all arguments must have the same length.
I think a hypothetical list could look something like this, where there's different numbers of NAs and not all classes are represented in each element; also, see a dataframe of my desired output below:
list <- list(c(1,1,1,2,2,2,3,3,4,4,4,NA,NA,NA,5,6),
c(1,2,3,4,NA,NA,NA,NA,4,4,4,4,NA,5,1,1)
c(5,5,5,5,5,1,2,2,2,2,NA,NA,NA,NA,NA,3))
desired_output <- data.frame(`1` = c(0.4, 0.5, 0.6, 0.5, 0.5, 0.3),
`2` = c(0.1, 0.1, 0.1, 0.1, 0.1, 0.2),
`3` = c(0.1, 0.1, 0.0, 0.1, 0.0, 0.3),
`4` = c(0.3, 0.2, 0.0, 0.1, 0.1, 0.1),
`5` = c(0.0, 0.1, 0.2, 0.2, 0.1, 0.0),
`6` = c(0.1, 0.0, 0.1, 0.0, 0.2, 0.1))
Any help is much appreciated. Best,
It looks like my function works and this was a very easy fix. dplyr::mutate was recognizing x as the entire list when in fact I wanted it to apply mutate the vector x within each list. R is still running in the background but this should have taken care of it.
sum_class_function <- function(x){
true_names <- data.frame(x = 1:6, Freq = 0)
prop_df <- as.data.frame(prop.table(table(x)))
prop_df$x <- as.numeric(prop_df$x)
temp<- true_names %>%
anti_join(prop_df, by = "x") %>%
bind_rows(prop_df) %>%
arrange(x)
return(temp)
}
sum_class <- lapply(values_hab, sum_class_function)

How do I need to assign values to each other in triplets using R?

The situation is as follows:
I need to create a dataset of triplets where we have discrete distribution of stock prices S <- c(80,100,120,140,160), with probability P <- c(0.2, 0.3, 0.2, 0.2, 0.1), call option C <- max(S-120,0) = c(0,0,0,20,40) and liability of an option which pays 30 if in a certain region otherwise zero, namely L = I{110 \leq S \leq 150} = c(0,0,30,30,0) <- c(0,0,30,30,0). It is important to mention that if P[1] = 80, then C[1] and L[1]. This holds for i = 1,2,3,4,5. How do you create a dataset for N = 10000 simulations where each value for i corresponds to the other two values for the same i?
This is the code I had for now. Note that X_1 = S, X_2 = C and Y = L.
X_1 <- function(n) {
sample(c(80,100,120,140,160), size = n,
prob = c(0.2, 0.3, 0.2, 0.2, 0.1), replace=T)
}
X_2 <- function(n) {
sample(X_1 - 120, size = n,
prob = c(0.2, 0.3, 0.2, 0.2, 0.1), replace=T)
}
Y <- function(n) {
sample(L, size = n,
prob = c(0.2, 0.3, 0.2, 0.2, 0.1), replace=T)
}
##Creating triplets##
df <- data.frame(S_T = X_1(10000), C_T = X_2(10000), L_T =Y(10000))
df```
I'm not sure if you want C_T to be dependent on the S_T values. If you do, I think you just want to call X_1, assign the results to an object, then use that as the argument to X_2 (or just subtract 120, which is what X_2 does).
X_1 <- function(n) {
sample(c(80,100,120,140,160), size = n,
prob = c(0.2, 0.3, 0.2, 0.2, 0.1), replace=T)
}
# Call that function
S_T <- X_1(10) # for practice
C_T <- S_T - 120 # that's all you're doing in function X_2, if you want to use S_T
If you want to C_T to contain values independent of S_T, you can create function within function
X_1 <- function(n) {
sample(c(80,100,120,140,160), size = n,
prob = c(0.2, 0.3, 0.2, 0.2, 0.1), replace=T)
}
X_2 <- function(n) {
X_1(n) - 120
}
S_T <- X_1(10) # Same as above
C_T <- X_2(10) # Gives values not dependent on S_T
EDIT to address comment below:
It's hard to read the comment, but it looks like you want create a function that takes the results of function X_1 and returns a result based on a condition. Use ifelse to read each element one at at time. You can create another function and then input the results of function X_1
Y <- function(X_1_func){
ifelse( X_1_func == 80,
return(0),
ifelse(X_1_func == 100,
return(0),
ifelse(X_1_func == 120,
return(30),
return(60) # Add a default value here or the last possible value if others are F
)
)
)
}
sapply(X_1(10), Y) # Use an apply to input one element of function X_1 at a time. Assign results to L or whatever you with to call.
If this all works for you, you can accept the answer.

How to for loop using different columns of data frame?

Basically I was working on a portfolio return problem. The stock return is like:
AMZN <- c(0.1, 0.3, 0.4, 0.2)
BBY <- c(0.2, 0.4, 0.5, 0.3)
TGT <- c(-0.1, -0.3, -0.2,-0.5)
df1 <- data.frame(AMZN, BBY, TGT)
date <- c("2000-01-01","2000-02-01", "2000-03-01", "2000-04-01")
date <- as.Date(date, "%Y-%m-%d")
df1 <- cbind(date, df1)
xts <- xts(df1[,-1], order.by=df1[,1])
I want to use Return.portfolio(xts, weight) to calculate portfolio return. So
The weight is like
w1 <- c(0.2, 0.3, 0.1, 0.4)
w2 <- c(0.5, 0.1, 0.1, 0.3)
w3 <- c(0.1, 0.1, 0.4, 0.4)
Weights <- data.frame(w1, w2, w3)
Since there are several groups of weights assigned, I need to get multiple portfolio return.
The code I tried is
for (i in colnames(Weights)){
Return.portfolio(xts, (Weights[[i]]))
}
Although R does not report any error, the only thing I got is a value which i is "w3".
I think you may need to initialize a NULL object first. Maybe something like this
Return<-NULL
for (i in 1:ncol(Weights)){
Return<- cbind(Return, Return.portfolio(xts, (Weights[[i]])))
}

How do I write a loop which sums up my precipitation values

I am looking for a loop which goes over a vector of precipitation values and adds the value to the previous value
for example:
precipitation <- c(0, 2, 0, 0.1, 0.5, 0.6, 0, 1)
and I would love to get a vector which adds up the values like this
precipitationSum <- c(0, 2, 0, 0.1, 0.5, 0.6, 0, 1)
print(precipitationSum)
Hope the description makes sense!
Any help would be awesome!
You can use the cumsum function to calculate the cumulative sum of a vector:
precipitationSum <- cumsum(precipitation)
This gives you the following result:
[1] 0.0 2.0 2.0 2.1 2.6 3.2 3.2 4.2
precipitation <- c(0, 2, 0, 0.1, 0.5, 0.6, 0, 1)
precipitation = unlist(precipitation)
print("This loop calculates the partial sums of precipitation")
myList <- unlist(list(1:length(precipitation)))
print(myList)
for(i in 1:length(precipitation)) {
if(i == 1) {
myList[i] <- precipitation[i]
}
else {
myList[i] <- myList[i-1] + precipitation[i]
}
}
print(myList)

Non-linear Optimization solnl function error in R: 'Argument of length zero'

I am trying to implement CVaR portfolio optimisation in R. Basically trying to replicate the Matlab approach used in this paper:
https://ethz.ch/content/dam/ethz/special-interest/mtec/chair-of-entrepreneurial-risks-dam/documents/dissertation/master%20thesis/Thesis_Matthias_Kull_2014.pdf
To do this I need to perform nonlinear optimisation with nonlinear constraints.
I have tried to use the nloptr package, but found the derivative calculation for the gradient of matrices beyond me.
Instead I have opted for the NlcOptim package which formulates the constraints in the same way as the Matlab function used in the paper.
library(NlcOptim)
# ====================================================================
# Just generate arbitrary returns data and bootstrap -----------------
asset_returns <- rbind(c(0.1, 0.05, 0.05, 0.01, 0.06),
c(0.05, 0.05, 0.06, -0.01, 0.09),
c(0.025, 0.05, 0.07, 0.02, -0.1),
c(0.01, 0.05, 0.08, -0.02, -0.01),
c(0.01, 0.05, 0.08, 0.00, 0.2),
c(0.005, 0.05, 0.09, 0.005, -0.15),
c(0.01, 0.05, 0.08, 0.01, -0.01),
c(0.012, 0.05, 0.00, -0.01, -0.01),
c(0.015, 0.05, 0.00, 0.03, 0.05),
c(0.02, 0.05, -0.01, 0.04, 0.03))
# Returns for 5 assets over 10 trading periods
nAssets <- ncol(asset_returns)
nReturns <- nrow(asset_returns)
nPeriods <- 4
nSims <- 10
# BOOTSTRAP ---------------------------------------------------------
sim_period_returns <- matrix(nrow = nSims, ncol = nAssets)
for (k in 1:nSims) {# run nSims simulations
sim_returns <- matrix(nrow = nPeriods, ncol = nAssets)
sample_order <- sample(nReturns, nPeriods)
for (i in 1:nPeriods) {
sim_returns[i,] <- asset_returns[sample_order[i],]
}
sim_prices <- rbind(rep(1, nAssets), 1 + sim_returns)
for (j in 1:nAssets) {
sim_period_returns[k, j] <- prod(sim_prices[, j]) - 1
}
}
# ------------------------------------------------------------------------
# ========================================================================
# The important stuff ====================================================
returns <- sim_period_returns
alpha <- 0.95
CVaR_limit <- 0.025
UB <- 0.75
LB <- 0.05
# Inequality constraints
A <- rbind(c(rep(0, nAssets), 1, 1/((1-alpha)*nSims) * rep(1, nSims)),
cbind(- returns, -1, diag(nSims)))
b <- as.matrix(c(-CVaR_limit, rep(0, nSims)), nrow = nSims, ncol = 1)
# Equality constraints
Aeq <- c(rep(1, nAssets), 0, rep(0, nSims))
beq <- 1
# Upper and lower bounds
UB <- c(rep(UB, nAssets), Inf, rep(Inf, nSims))
LB <- c(rep(LB, nAssets), 0, rep(0, nSims))
# Initial portfolio weights
w0 <- rep(1/nAssets, nAssets)
VaR0 <- quantile(returns %*% w0, alpha, names = F)
w0 <- c(w0, VaR0, rep(0, nSims))
objective_function <- function(x) {
# objective function to minimise
return (-colMeans(returns) %*% x[1:nAssets])
}
# **********************************************
# The solnl function giving the error based on the above inputs
solnl(X = w0,
objfun = objective_function,
A = A,
B = b,
Aeq = Aeq,
Beq = beq,
lb = LB,
ub = UB)
# **********************************************
# ===================================================================
I am receiving the following error:
Error in if (eq > 0 & ineq > 0) { : argument is of length zero
I have read the package source code and tried to figure out what is causing this error, but am still at a loss.
Checking the source code and input data, I think that the error starts at line 319 on NlcOptim when the following code is called nLineareq = nrow(Aeq);By calling nrow(Aeq) in the way that you have defined Aeq it will result in NULL a few lines later the expression if (eq > 0 & ineq > 0) is evaluated resulting in the error. Regarding the error you can find an explanation in here Argument is of length zero in if statement
A quick fix could be to change the shape on Aeq by using
Aeq <- t(array(c(rep(1, nAssets), 0, rep(0, nSims))))
However by changing that I get a different error when i try to run the code
Error: object 'lambda' not found
I'm not sure if the R implementation needs a different initial conditions or the method is not converging, since in the paper, the method used for the optimization was interior-point rather than SQP as implemented in NlcOptim.

Resources