PortfolioAnalytics Error row names not Dates

PortfolioAnalytics Error row names not Dates - r

I am getting the following error from the portfolio analytics package.
Error in checkData(R) :
The data cannot be converted into a time series. If you are trying to pass in names from a data object with one column, you should use the form 'data[rows, columns, drop = FALSE]'. Rownames should have standard date formats, such as '1985-03-15'.
The data set I am using is simulated data
> df
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
[1,] 0 1 0 1 0 0 0 1 1 0
[2,] 0 1 0 0 1 1 1 1 1 1
[3,] 1 0 0 0 0 0 0 1 1 0
[4,] 1 0 1 1 1 0 0 1 0 1
[5,] 0 0 1 0 1 0 1 1 1 0
[6,] 0 1 0 1 0 1 1 0 1 1
[7,] 1 0 0 0 0 1 1 1 1 0
[8,] 0 0 1 1 0 0 1 1 0 1
[9,] 1 0 0 0 0 1 1 1 1 0
[10,] 0 1 1 0 0 1 0 1 0 0
I set the portfolio constraints to be
returns = as.matrix(df)
> funds = colnames(df)
> init.portfolio <- portfolio.spec(assets = funds)
> init.portfolio <- add.constraint(portfolio = init.portfolio, type = "full_investment")
> init.portfolio <- add.constraint(portfolio = init.portfolio, type = "long_only")
> minSD.portfolio <- add.objective(portfolio=init.portfolio,
+ type="risk",
+ name="StdDev")
> minSD.opt <- optimize.portfolio(R = df, portfolio = minSD.portfolio,
+ optimize_method = "ROI", trace = TRUE)
Error in checkData(R) :
The data cannot be converted into a time series. If you are trying to pass in names from a data object with one column, you should use the form 'data[rows, columns, drop = FALSE]'. Rownames should have standard date formats, such as '1985-03-15'.
How can I fix this error. DF is a simulation of single period returns. So they are all eithe 100% or 0%, and for the same period. I can add a date variable if I need to as the row names, but I do not know how. I tried
> rownames(df) = as.Date(c("Jan", rep(nrow(df))))
Error in charToDate(x) :
character string is not in a standard unambiguous format
Can someone help me with this error? Thanks

You'll need to add date data to df. Assuming that the data is monthly returns beginning at the start of this year, you can either add rownames using
rownames(df) <- as.character(seq(as.Date("2015-01-01"), length.out=nrow(df), by = "month"))
or convert df to an xts time series by
library(xts)
df <- xts(df, order.by = seq(as.Date("2015-01-01"),
length.out=nrow(df), by = "month"), df)
xts is a commonly used format for financial time series and works well with PortfolioAnalytics so you might consider that.
Once you've done that and run optimize.portfolio, you'll won't get an solution. It appears that df is not positive definite so you'll have to adjust the values in df.
Also I don't quite understand your comment that single period returns require that returns be 0 or 1. That's not true in general.

Related

How to count how many conditions an observation meets using R?

If I have a date set with lots of binary variables, all with values o/1. I want to create a new column, and add by one if the observation is 1 of one binary variable, add by two if it has 1 of two binary variables...
Such as:
x1 x2 x3 x4 x5
1 1 1 0 1
0 0 1 0 0
0 0 0 0 0
I want to have
x1 x2 x3 x4 x5 count
1 1 1 0 1 4
0 0 1 0 0 1
0 0 0 0 0 0

If your dataset contains only the binary variables you are interested in, you can use
df$count <- rowSums(df)
Otherwise, please provide a more detailed description of your data.

Another option is Reduce with +
df$count <- Reduce(`+`, df)

how to subset a data frame based on list of multiple match case in columns

So I have a list that contains certain characters as shown below
list <- c("MY","GM+" ,"TY","RS","LG")
And I have a variable named "CODE" in the data frame as follows
code <- c("MY GM+","","LGTY", "RS","TY")
df <- data.frame(1:5,code)
df
code
1 MY GM+
2
3 LGTY
4 RS
5 TY
Now I want to create 5 new variables named "MY","GM+","TY","RS","LG"
Which takes binary value, 1 if there's a match case in the CODE variable
df
code MY GM+ TY RS LG
1 MY GM+ 1 1 0 0 0
2 0 0 0 0 0
3 LGTY 0 0 1 0 1
4 RS 0 0 0 1 0
5 TY 0 0 1 0 0
Really appreciate your help. Thank you.

Since you know how many values will be returned (5), and what you want their types to be (integer), you could use vapply() with grepl(). We can turn the resulting logical matrix into integer values by using integer() in vapply()'s FUN.VALUE argument.
cbind(df, vapply(List, grepl, integer(nrow(df)), df$code, fixed = TRUE))
# code MY GM+ TY RS LG
# 1 MY GM+ 1 1 0 0 0
# 2 0 0 0 0 0
# 3 LGTY 0 0 1 0 1
# 4 RS 0 0 0 1 0
# 5 TY 0 0 1 0 0
I think your original data has a couple of typos, so here's what I used:
List <- c("MY", "GM+" , "TY", "RS", "LG")
df <- data.frame(code = c("MY GM+", "", "LGTY", "RS", "TY"))

extract rows for which first non-zero element is one

I would like to extract every row from the data frame my.data for which the first non-zero element is a 1.
my.data <- read.table(text = '
x1 x2 x3 x4
0 0 1 1
0 0 0 1
0 2 1 1
2 1 2 1
1 1 1 2
0 0 0 0
0 1 0 0
', header = TRUE)
my.data
desired.result <- read.table(text = '
x1 x2 x3 x4
0 0 1 1
0 0 0 1
1 1 1 2
0 1 0 0
', header = TRUE)
desired.result
I am not even sure where to begin. Sorry if this is a duplicate. Thank you for any suggestions or advice.

Here's one approach:
# index of rows
idx <- apply(my.data, 1, function(x) any(x) && x[as.logical(x)][1] == 1)
# extract rows
desired.result <- my.data[idx, ]
The result:
x1 x2 x3 x4
1 0 0 1 1
2 0 0 0 1
5 1 1 1 2
7 0 1 0 0

Probably not the best answer, but:
rows.to.extract <- apply(my.data, 1, function(x) {
no.zeroes <- x[x!=0] # removing 0
to.return <- no.zeroes[1] == 1 # finding if first number is 0
# if a row is all 0, then to.return will be NA
# this fixes that problem
to.return[is.na(to.return)] <- FALSE # if row is all 0
to.return
})
my.data[rows.to.extract, ]
x1 x2 x3 x4
1 0 0 1 1
2 0 0 0 1
5 1 1 1 2
7 0 1 0 0

Use apply to iterate over all rows:
first.element.is.one <- apply(my.data, 1, function(x) x[x != 0][1] == 1)
The function passed to apply compares the first [1] non-zero [x != 0] element of x to == 1. It will be called once for each row, x will be a vector of four in your example.
Use which to extract the indices of the candidate rows (and remove NA values, too):
desired.rows <- which(first.element.is.one)
Select the rows of the matrix -- you probably know how to do this.
Bonus question: Where do the NA values mentioned in step 2 come from?

creating a matrix of indicator variables

I would like to create a matrix of indicator variables. My initial thought was to use model.matrix, which was also suggested here: Automatically expanding an R factor into a collection of 1/0 indicator variables for every factor level
However, model.matrix does not seem to work if a factor has only one level.
Here is an example data set with three levels to the factor 'region':
dat = read.table(text = "
reg1 reg2 reg3
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
0 1 0
0 1 0
0 1 0
0 0 1
0 0 1
0 0 1
0 0 1
", sep = "", header = TRUE)
# model.matrix works if there are multiple regions:
region <- c(1,1,1,1,1,1,2,2,2,3,3,3,3)
df.region <- as.data.frame(region)
df.region$region <- as.factor(df.region$region)
my.matrix <- as.data.frame(model.matrix(~ -1 + df.region$region, df.region))
my.matrix
# The following for-loop works even if there is only one level to the factor
# (one region):
# region <- c(1,1,1,1,1,1,1,1,1,1,1,1,1)
my.matrix <- matrix(0, nrow=length(region), ncol=length(unique(region)))
for(i in 1:length(region)) {my.matrix[i,region[i]]=1}
my.matrix
The for-loop is effective and seems simple enough. However, I have been struggling to come up with a solution that does not involve loops. I can use the loop above, but have been trying hard to wean myself off of them. Is there a better way?

I would use matrix indexing. From ?"[":
A third form of indexing is via a numeric matrix with the one column for each dimension: each row of the index matrix then selects a single element of the array, and the result is a vector.
Making use of that nice feature:
my.matrix <- matrix(0, nrow=length(region), ncol=length(unique(region)))
my.matrix[cbind(seq_along(region), region)] <- 1
# [,1] [,2] [,3]
# [1,] 1 0 0
# [2,] 1 0 0
# [3,] 1 0 0
# [4,] 1 0 0
# [5,] 1 0 0
# [6,] 1 0 0
# [7,] 0 1 0
# [8,] 0 1 0
# [9,] 0 1 0
# [10,] 0 0 1
# [11,] 0 0 1
# [12,] 0 0 1
# [13,] 0 0 1

I came up with this solution by modifying an answer to a similar question here:
Reshaping a column from a data frame into several columns using R
region <- c(1,1,1,1,1,1,2,2,2,3,3,3,3)
site <- seq(1:length(region))
df <- cbind(site, region)
ind <- xtabs( ~ site + region, df)
ind
region <- c(1,1,1,1,1,1,1,1,1,1,1,1,1)
site <- seq(1:length(region))
df <- cbind(site, region)
ind <- xtabs( ~ site + region, df)
ind
EDIT:
The line below will extract the data frame of indicator variables from ind:
ind.matrix <- as.data.frame.matrix(ind)

All Levels of a Factor in a Model Matrix in R

I have a data.frame consisting of numeric and factor variables as seen below.
testFrame <- data.frame(First=sample(1:10, 20, replace=T),
Second=sample(1:20, 20, replace=T), Third=sample(1:10, 20, replace=T),
Fourth=rep(c("Alice","Bob","Charlie","David"), 5),
Fifth=rep(c("Edward","Frank","Georgia","Hank","Isaac"),4))
I want to build out a matrix that assigns dummy variables to the factor and leaves the numeric variables alone.
model.matrix(~ First + Second + Third + Fourth + Fifth, data=testFrame)
As expected when running lm this leaves out one level of each factor as the reference level. However, I want to build out a matrix with a dummy/indicator variable for every level of all the factors. I am building this matrix for glmnet so I am not worried about multicollinearity.
Is there a way to have model.matrix create the dummy for every level of the factor?

(Trying to redeem myself...) In response to Jared's comment on #Fabians answer about automating it, note that all you need to supply is a named list of contrast matrices. contrasts() takes a vector/factor and produces the contrasts matrix from it. For this then we can use lapply() to run contrasts() on each factor in our data set, e.g. for the testFrame example provided:
> lapply(testFrame[,4:5], contrasts, contrasts = FALSE)
$Fourth
Alice Bob Charlie David
Alice 1 0 0 0
Bob 0 1 0 0
Charlie 0 0 1 0
David 0 0 0 1
$Fifth
Edward Frank Georgia Hank Isaac
Edward 1 0 0 0 0
Frank 0 1 0 0 0
Georgia 0 0 1 0 0
Hank 0 0 0 1 0
Isaac 0 0 0 0 1
Which slots nicely into #fabians answer:
model.matrix(~ ., data=testFrame,
contrasts.arg = lapply(testFrame[,4:5], contrasts, contrasts=FALSE))

You need to reset the contrasts for the factor variables:
model.matrix(~ Fourth + Fifth, data=testFrame,
contrasts.arg=list(Fourth=contrasts(testFrame$Fourth, contrasts=F),
Fifth=contrasts(testFrame$Fifth, contrasts=F)))
or, with a little less typing and without the proper names:
model.matrix(~ Fourth + Fifth, data=testFrame,
contrasts.arg=list(Fourth=diag(nlevels(testFrame$Fourth)),
Fifth=diag(nlevels(testFrame$Fifth))))

caret implemented a nice function dummyVars to achieve this with 2 lines:
library(caret)
dmy <- dummyVars(" ~ .", data = testFrame)
testFrame2 <- data.frame(predict(dmy, newdata = testFrame))
Checking the final columns:
colnames(testFrame2)
"First" "Second" "Third" "Fourth.Alice" "Fourth.Bob" "Fourth.Charlie" "Fourth.David" "Fifth.Edward" "Fifth.Frank" "Fifth.Georgia" "Fifth.Hank" "Fifth.Isaac"
The nicest point here is you get the original data frame, plus the dummy variables having excluded the original ones used for the transformation.
More info: http://amunategui.github.io/dummyVar-Walkthrough/

dummyVars from caret could also be used. http://caret.r-forge.r-project.org/preprocess.html

Ok. Just reading the above and putting it all together. Suppose you wanted the matrix e.g. 'X.factors' that multiplies by your coefficient vector to get your linear predictor. There are still a couple extra steps:
X.factors =
model.matrix( ~ ., data=X, contrasts.arg =
lapply(data.frame(X[,sapply(data.frame(X), is.factor)]),
contrasts, contrasts = FALSE))
(Note that you need to turn X[*] back into a data frame in case you have only one factor column.)
Then say you get something like this:
attr(X.factors,"assign")
[1] 0 1 **2** 2 **3** 3 3 **4** 4 4 5 6 7 8 9 10 #emphasis added
We want to get rid of the **'d reference levels of each factor
att = attr(X.factors,"assign")
factor.columns = unique(att[duplicated(att)])
unwanted.columns = match(factor.columns,att)
X.factors = X.factors[,-unwanted.columns]
X.factors = (data.matrix(X.factors))

A tidyverse answer:
library(dplyr)
library(tidyr)
result <- testFrame %>%
mutate(one = 1) %>% spread(Fourth, one, fill = 0, sep = "") %>%
mutate(one = 1) %>% spread(Fifth, one, fill = 0, sep = "")
yields the desired result (same as #Gavin Simpson's answer):
> head(result, 6)
First Second Third FourthAlice FourthBob FourthCharlie FourthDavid FifthEdward FifthFrank FifthGeorgia FifthHank FifthIsaac
1 1 5 4 0 0 1 0 0 1 0 0 0
2 1 14 10 0 0 0 1 0 0 1 0 0
3 2 2 9 0 1 0 0 1 0 0 0 0
4 2 5 4 0 0 0 1 0 1 0 0 0
5 2 13 5 0 0 1 0 1 0 0 0 0
6 2 15 7 1 0 0 0 1 0 0 0 0

Using the R package 'CatEncoders'
library(CatEncoders)
testFrame <- data.frame(First=sample(1:10, 20, replace=T),
Second=sample(1:20, 20, replace=T), Third=sample(1:10, 20, replace=T),
Fourth=rep(c("Alice","Bob","Charlie","David"), 5),
Fifth=rep(c("Edward","Frank","Georgia","Hank","Isaac"),4))
fit <- OneHotEncoder.fit(testFrame)
z <- transform(fit,testFrame,sparse=TRUE) # give the sparse output
z <- transform(fit,testFrame,sparse=FALSE) # give the dense output

I am currently learning Lasso model and glmnet::cv.glmnet(), model.matrix() and Matrix::sparse.model.matrix()(for high dimensions matrix, using model.matrix will killing our time as suggested by the author of glmnet.).
Just sharing there has a tidy coding to get the same answer as #fabians and #Gavin's answer. Meanwhile, #asdf123 introduced another package library('CatEncoders') as well.
> require('useful')
> # always use all levels
> build.x(First ~ Second + Fourth + Fifth, data = testFrame, contrasts = FALSE)
>
> # just use all levels for Fourth
> build.x(First ~ Second + Fourth + Fifth, data = testFrame, contrasts = c(Fourth = FALSE, Fifth = TRUE))
Source : R for Everyone: Advanced Analytics and Graphics (page273)

I write a package called ModelMatrixModel to improve the functionality of model.matrix(). The ModelMatrixModel() function in the package in default return a class containing a sparse matrix with all levels of dummy variables which is suitable for input in cv.glmnet() in glmnet package. Importantly, returned
class also stores transforming parameters such as the factor level information, which can then be applied to new data. The function can hand most items in r formula like poly() and interaction. It also gives several other options like handle invalid factor levels , and scale output.
#devtools::install_github("xinyongtian/R_ModelMatrixModel")
library(ModelMatrixModel)
testFrame <- data.frame(First=sample(1:10, 20, replace=T),
Second=sample(1:20, 20, replace=T), Third=sample(1:10, 20, replace=T),
Fourth=rep(c("Alice","Bob","Charlie","David"), 5))
newdata=data.frame(First=sample(1:10, 2, replace=T),
Second=sample(1:20, 2, replace=T), Third=sample(1:10, 2, replace=T),
Fourth=c("Bob","Charlie"))
mm=ModelMatrixModel(~First+Second+Fourth, data = testFrame)
class(mm)
## [1] "ModelMatrixModel"
class(mm$x) #default output is sparse matrix
## [1] "dgCMatrix"
## attr(,"package")
## [1] "Matrix"
data.frame(as.matrix(head(mm$x,2)))
## First Second FourthAlice FourthBob FourthCharlie FourthDavid
## 1 7 17 1 0 0 0
## 2 9 7 0 1 0 0
#apply the same transformation to new data, note the dummy variables for 'Fourth' includes the levels not appearing in new data
mm_new=predict(mm,newdata)
data.frame(as.matrix(head(mm_new$x,2)))
## First Second FourthAlice FourthBob FourthCharlie FourthDavid
## 1 6 3 0 1 0 0
## 2 2 12 0 0 1 0

You can use tidyverse to achieve this without specifying each column manually.
The trick is to make a "long" dataframe.
Then, munge a few things, and spread it back to wide to create the indicators/dummy variables.
Code:
library(tidyverse)
## add index variable for pivoting
testFrame$id <- 1:nrow(testFrame)
testFrame %>%
## pivot to "long" format
gather(feature, value, -id) %>%
## add indicator value
mutate(indicator=1) %>%
## create feature name that unites a feature and its value
unite(feature, value, col="feature_value", sep="_") %>%
## convert to wide format, filling missing values with zero
spread(feature_value, indicator, fill=0)
The output:
id Fifth_Edward Fifth_Frank Fifth_Georgia Fifth_Hank Fifth_Isaac First_2 First_3 First_4 ...
1 1 1 0 0 0 0 0 0 0
2 2 0 1 0 0 0 0 0 0
3 3 0 0 1 0 0 0 0 0
4 4 0 0 0 1 0 0 0 0
5 5 0 0 0 0 1 0 0 0
6 6 1 0 0 0 0 0 0 0
7 7 0 1 0 0 0 0 1 0
8 8 0 0 1 0 0 1 0 0
9 9 0 0 0 1 0 0 0 0
10 10 0 0 0 0 1 0 0 0
11 11 1 0 0 0 0 0 0 0
12 12 0 1 0 0 0 0 0 0
...

model.matrix(~ First + Second + Third + Fourth + Fifth - 1, data=testFrame)
or
model.matrix(~ First + Second + Third + Fourth + Fifth + 0, data=testFrame)
should be the most straightforward

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

PortfolioAnalytics Error row names not Dates - r

Related

How to count how many conditions an observation meets using R?

how to subset a data frame based on list of multiple match case in columns

extract rows for which first non-zero element is one

creating a matrix of indicator variables

All Levels of a Factor in a Model Matrix in R

Categories

Resources