Monte Carlo Sim in R plots STRAIGHTS - r

So I am getting started with Monte Carlo Sims, and went with this basic code to simulate Returns for a given portfolio. Well somehow a portion of the simulated returns always results in straight linear lines which are easy to see on the plotted graph. First I decreased the number of sims so you can see it clearer and I also played around with some other factors but they keep showing up. The rest of the output looks promising and "random".
Added the link to the image as my account is new and also the code, appreciate any help!:
library(quantmod)
library(ggplot2)
maxDate<- "2000-01-01"
tickers<-c("MSFT", "AAPL", "BRK-B")
getSymbols(tickers, from=maxDate)
Port.p<-na.omit(merge(Cl(AAPL),Cl(MSFT),Cl(`BRK-B`)))
Port.r<-ROC(Port.p, type = "discrete")[-1,]
stock_Price<- as.matrix(Port.p[,1:3])
stock_Returns <- as.matrix(Port.r[,1:3])
mc_rep = 50 # Number of Sims
training_days = 200
portfolio_Weights = c(0.5,0.3,0.2)
coVarMat = cov(stock_Returns)
miu = colMeans(stock_Returns)
Miu = matrix(rep(miu, training_days), nrow = 3)
portfolio_Returns_m = matrix(0, training_days, mc_rep)
set.seed(2000)
for (i in 1:mc_rep) {
Z = matrix ( rnorm( dim(stock_Returns)[2] * training_days ), ncol = training_days )
L = t( chol(coVarMat) )
daily_Returns = Miu + L %*% Z
portfolio_Returns_200 = cumprod( portfolio_Weights %*% daily_Returns + 1 )
portfolio_Returns_m[,i] = portfolio_Returns_200;
}
x_axis = rep(1:training_days, mc_rep)
y_axis = as.vector(portfolio_Returns_m-1)
plot_data = data.frame(x_axis, y_axis)
ggplot(data = plot_data, aes(x = x_axis, y = y_axis)) + geom_path(col = 'red', size = 0.1) +
xlab('Days') + ylab('Portfolio Returns') +
ggtitle('Simulated Portfolio Returns in 200 days')+
theme_bw() +
theme(plot.title = element_text(hjust = 0.5))

The lines are the 'return' from the end of each series to the beginning of the next. You can keep the lines separate by adding a grouping variable to your plotting data and using the group aesthetic to tell ggplot about it:
g <- rep(1:training_days, each = mc_rep)
plot_data = data.frame(x_axis, y_axis, g)
ggplot(data = plot_data, aes(x = x_axis, y = y_axis, group = g)) + ...

Related

Why is geom_density plotting my data differently from the expected image?

I have the following task statement:
In this task we want to simulate random variables with density
To do this, write a function r_density(n) that simulates n of such random variables.
Then use this function to simulate N = 1000 of such random variables. Using geom_density() you can now estimate the density from the simulated random variables. We can compare this estimate with the real density. To do this, create a graph that looks like this:
Problem is, however, that I don't understand why my output looks like this:
Why is the raked density plotted in such a weird way? Can someone explain to me why it looks like that and how to get the estimated density from the expected image?
This is the corresponding code I wrote for the above plot:
library(tidyverse)
N <- 1000
r_density <- function(n){
exp(-abs(n))/2
}
x <- runif(N)
tb <- tibble(
x = x,
density_fkt = r_density(x)
)
ggplot() +
geom_density(
data = tb,
mapping = aes(
x = density_fkt,
y = ..scaled..
)
) +
geom_function(
fun = r_density,
xlim = c(-6,6),
color = "red",
size = 1
) +
theme_minimal() +
labs(
x = "x",
y = "Dichtefunktion f(x)",
title = "Geschätzte (schwarz) vs echte (rot) Dichte"
)
You may use inverse transform sampling or rejection sampling. I choose rejection sampling.
library(tidyverse)
N <- 1000
r_density <- function(n){
exp(-abs(n))/2
}
x = c()
while (length(x) < N) {
y = rnorm(1)
while (y > 6 | y < -6) {
y = rnorm(1)
}
u = runif(1)
if (u < r_density(y)/(dnorm(y) * 3)) {
x=append(x, y)
}
}
tb <- tibble(
x = x,
density_fkt = r_density(x)
)
ggplot() +
geom_density(
data = tb,
mapping = aes(
x = x,
y = ..density..
)
) +
geom_function(
fun = r_density,
xlim = c(-6,6),
color = "red",
size = 1
) +
theme_minimal() +
labs(
x = "x",
y = "Dichtefunktion f(x)",
title = "Geschätzte (schwarz) vs echte (rot) Dichte"
)
Here's the inverse transform sampling method (this involves some difficult integration, so perhaps not what your teacher intended)
r_density <- function(n) {
cdf <- function(x) {
1/4 * exp(-x) * (-1 + 2 * exp(x) + exp(2*x) - (-1 + exp(x))^2 * sign(x))
}
sapply(runif(n), function(i) {
uniroot(function(x) cdf(x) - i, c(-30, 20))$root
})
}
Plotting gives:
ggplot() +
geom_density(aes(r_density(1000))) +
geom_function(
fun = function(x) exp(-abs(x))/2,
xlim = c(-6,6),
color = "red",
size = 1
) +
theme_minimal() +
labs(
x = "x",
y = "Dichtefunktion f(x)",
title = "Geschätzte (schwarz) vs echte (rot) Dichte"
)

gam plots with ggplot

I need to create some gam plots in ggplot. I can do them with the general plot function, but am unsure how to do with ggplot. Here is my code and plots with the regular plot function. I'm using the College data set from the ISLR package.
train.2 <- sample(dim(College)[1],2*dim(College)[1]/3)
train.college <- College[train.2,]
test.college <- College[-train.2,]
gam.college <- gam(Outstate~Private+s(Room.Board)+s(Personal)+s(PhD)+s(perc.alumni)+s(Expend)+s(Grad.Rate), data=train.college)
par(mfrow=c(2,2))
plot(gam.college, se=TRUE,col="blue")
See update below old answer.
Old answer:
There is an implementation of GAM plotting using ggplot2 in voxel library. Here is how you would go about it:
library(ISLR)
library(mgcv)
library(voxel)
library(tidyverse)
library(gridExtra)
data(College)
set.seed(1)
train.2 <- sample(dim(College)[1],2*dim(College)[1]/3)
train.college <- College[train.2,]
test.college <- College[-train.2,]
gam.college <- gam(Outstate~Private+s(Room.Board)+s(Personal)+s(PhD)+s(perc.alumni)+s(Expend)+s(Grad.Rate), data=train.college)
vars <- c("Room.Board", "Personal", "PhD", "perc.alumni","Expend", "Grad.Rate")
map(vars, function(x){
p <- plotGAM(gam.college, smooth.cov = x) #plot customization goes here
g <- ggplotGrob(p)
}) %>%
{grid.arrange(grobs = (.), ncol = 2, nrow = 3)}
after a bunch of errors: In plotGAM(gam.college, smooth.cov = x) :
There are one or more factors in the model fit, please consider plotting by group since plot might be unprecise
To compare to the plot.gam:
par(mfrow=c(2,3))
plot(gam.college, se=TRUE,col="blue")
You might also want to plot the observed values:
map(vars, function(x){
p <- plotGAM(gam.college, smooth.cov = x) +
geom_point(data = train.college, aes_string(y = "Outstate", x = x ), alpha = 0.2) +
geom_rug(data = train.college, aes_string(y = "Outstate", x = x ), alpha = 0.2)
g <- ggplotGrob(p)
}) %>%
{grid.arrange(grobs = (.), ncol = 3, nrow = 2)}
or per group (especially important if you used the by argument (interaction in gam).
map(vars, function(x){
p <- plotGAM(gam.college, smooth.cov = x, groupCovs = "Private") +
geom_point(data = train.college, aes_string(y = "Outstate", x = x, color= "Private"), alpha = 0.2) +
geom_rug(data = train.college, aes_string(y = "Outstate", x = x, color= "Private" ), alpha = 0.2) +
scale_color_manual("Private", values = c("#868686FF", "#0073C2FF")) +
theme(legend.position="none")
g <- ggplotGrob(p)
}) %>%
{grid.arrange(grobs = (.), ncol = 3, nrow = 2)}
Update, 08. Jan. 2020.
I currently think the package mgcViz offers superior functionality compared to the voxel::plotGAMfunction. An example using the above data set and models:
library(mgcViz)
viz <- getViz(gam.college)
print(plot(viz, allTerms = T), pages = 1)
plot customization is similar go ggplot2 syntax:
trt <- plot(viz, allTerms = T) +
l_points() +
l_fitLine(linetype = 1) +
l_ciLine(linetype = 3) +
l_ciBar() +
l_rug() +
theme_grey()
print(trt, pages = 1)
This vignette shows many more examples.

saving ggplot in a list gives me the same graph

I am trying to plot 12 different plots on a 3 by 4 grid. But,it only plots the last one 12 times. Can any one help me? I am so fed up with it. Thanks
library(ggplot2)
library(gridExtra)
pmax=0.85
K_min = 0.0017
T = seq(100,1200,by=100) ## ISIs
lambda =1/T
p=list()
for(i in (1:length(lambda))){
p[[i]]<-ggplot(data.frame(x = c(0, 1)), aes(x = x)) +
stat_function(fun = function (x) (lambda[i]*(1-(1-pmax))/K_min)*(1-x)^((lambda[i]/K_min)-1)*
(1-(1-pmax)*x)^-((lambda[i]/K_min)+1),colour = "dodgerblue3")+
scale_x_continuous(name = "Probability") +
scale_y_continuous(name = "Frequency") + theme_bw()
main <- grid.arrange(grobs=p,ncol=4)
}
This code produces the correct picture but I need to use ggplot since my other figures are in ggplot.
par( mfrow = c( 3, 4 ) )
for (i in (1:length(lambda))){
f <- function (x) ((lambda[i]*(1-(1-pmax))/K_min)*(1-x)^((lambda[i]/K_min)-1)*
(1-(1-pmax)*x)^-((lambda[i]/K_min)+1) )
curve(f,from=0, to=1, col = "violet",lwd=2,sub = paste0("ISI = ",round(1/lambda[i],3), ""),ylab="PDF",xlab="R")
}
Correct plot using curve:
ggplot objects created in a loop are evaluated at the end of the loop. Since all the ggplot objects in this case use data calculated with lambda[i], they get the same result based on the last i value (12). Here are two possible workarounds:
Workaround 1. Convert each ggplot object into a grob within the loop, & save that to the list:
for(i in (1:length(lambda))){
# code for generating each plot is unchanged
g <- ggplot(data.frame(x = c(0, 1)), aes(x = x)) +
stat_function(fun = function (x) (lambda[i]*(1-(1-pmax))/K_min)*(1-x)^((lambda[i]/K_min)-1)*
(1-(1-pmax)*x)^-((lambda[i]/K_min)+1),colour = "dodgerblue3")+
scale_x_continuous(name = "Probability") +
scale_y_continuous(name = "Frequency") + theme_bw()
p[[i]] <- ggplotGrob(g)
}
main <- grid.arrange(grobs=p, ncol=4)
Workaround 2. Put all the data in a data frame, & create a single ggplot with a facet for each ISI:
library(dplyr)
pmax = 0.85
K_min = 0.0017
ISI = seq(100, 1200, by = 100) # I changed this; using `T` as a name clashes with T from TRUE/FALSE
lambda = 1/ISI
df <- data.frame(
x = rep(seq(0, 1, length.out = 101), length(ISI)),
ISI = rep(ISI, each = 101),
l = rep(lambda, each = 101)
) %>%
mutate(y = (l * pmax / K_min) * (1-x) ^ ((l / K_min) - 1) *
(1 - (1 - pmax) * x)^-((l / K_min) + 1))
ggplot(data,
aes(x = x, y = y, group = 1)) +
geom_line(colour = "dodgerblue3") +
facet_wrap(~ISI, nrow = 3, scales = "free_y") +
labs(x = "Probability", y = "Frequency") +
theme_bw()

How to plot the intersection of a hyperplane and a plane in R

I have a set of (2-dimensional) data points that I run through a classifier that uses higher order polynomial transformations. I want to visualize the results as a 2 dimensional scatterplot of the points with the classifier superimbosed on top, preferably using ggplot2 as all other visualizations are made by this. Pretty much like this one that was used in the ClatechX online course on machine learning (the background color is optional).
I can display the points with colors and symbols and all, that's easy but I can't figure out how to draw anything like the classifiers (the intersection of the classifiing hyperplane with the plane representing my threshold). The only thing I found was stat_function and that only takes a function with a single argument.
Edit:
The example that was asked for in the comments:
sample data:
"","x","y","x","x","y","value"
"1",4.17338115745224,0.303530843229964,1.26674990184152,17.4171102853774,0.0921309727918932,-1
"2",4.85514814266935,3.452660451876,16.7631779801937,23.5724634872656,11.9208641959486,1
"3",3.51938610081561,3.41200957307592,12.0081790673332,12.3860785266141,11.6418093267617,1
"4",3.18545089452527,0.933340128976852,2.97310914874565,10.1470974014319,0.87112379635852,-16
"5",2.77556006214581,2.49701633118093,6.93061880335166,7.70373365857888,6.23509055818427,-1
"6",2.45974169578403,4.56341833807528,11.2248303614692,6.05032920997851,20.8247869282818,1
"7",2.73947941488586,3.35344674880616,9.18669833727041,7.50474746458339,11.2456050970786,-1
"8",2.01721803518012,3.55453519499861,7.17027250203368,4.06916860145595,12.6347204524838,-1
"9",3.52376445778646,1.47073399974033,5.1825201951431,12.4169159539591,2.1630584979922,-1
"10",3.77387718763202,0.509284208528697,1.92197605658768,14.2421490273294,0.259370405056702,-1
"11",4.15821685106494,1.03675272315741,4.31104264382058,17.2907673804804,1.0748562089743,-1
"12",2.57985028671101,3.88512040604837,10.0230289934507,6.65562750184287,15.0941605694935,1
"13",3.99800728890114,2.39457673509605,9.5735352407471,15.9840622821066,5.73399774026327,1
"14",2.10979392635636,4.58358959294856,9.67042948411309,4.45123041169019,21.0092935565863,1
"15",2.26988795562647,2.96687697409652,6.73447830932721,5.15239133109813,8.80235897942413,-1
"16",1.11802248633467,0.114183261757717,0.127659454208164,1.24997427994995,0.0130378172656312,-1
"17",0.310411276295781,2.09426849964075,0.650084557879535,0.0963551604515758,4.38596054858751,-1
"18",1.93197490065359,1.72926536411978,3.340897280049,3.73252701675543,2.99035869954433,-1
"19",3.45879891654477,1.13636834081262,3.93046958599847,11.9632899450912,1.29133300600123,-1
"20",0.310697768582031,0.730971727753058,0.227111284709427,0.0965331034018534,0.534319666774291,-1
"21",3.88408110360615,0.915658151498064,3.55649052359657,15.0860860193904,0.838429850404852,-1
"22",0.287852146429941,2.16121324687265,0.622109872005114,0.0828588582043242,4.67084269845782,-1
"23",2.80277011333965,1.22467750683427,3.4324895146344,7.85552030822994,1.4998349957458,-1
"24",0.579150241101161,0.57801398797892,0.334756940497835,0.335415001767533,0.334100170299295-,1
"25",2.37193428212777,1.58276639413089,3.7542178708388,5.62607223873297,2.50514945839009,-1
"26",0.372461311053485,2.51207412336953,0.935650421453748,0.138727428231681,6.31051640130279,-1
"27",3.56567220995203,1.03982002707198,3.70765737388213,12.7140183088242,1.08122568869998,-1
"28",0.634770628530532,2.26303249713965,1.43650656059435,0.402933750845047,5.12131608311011,-1
"29",2.43812176748179,1.91849716124125,4.67752968967431,5.94443775306852,3.68063135769073,-1
"30",1.08741064323112,3.01656032912433,3.28023980783858,1.18246190701233,9.0996362192467,-1
"31",0.98,2.74,2.6852,0.9604,7.5076,1
"32",3.16,1.78,5.6248,9.9856,3.1684,1
"33",4.26,4.28,18.2328,18.1476,18.3184,-1
The code to generate a classifier:
perceptron_train <- function(data, maxIter=10000) {
set.seed(839)
X <- as.matrix(data[1:5])
Y <- data["value"]
d <- dim(X)
X <- cbind(rep(1, d[1]), X)
W <- rep(0, d[2] + 1)
count <- 0
while (count < maxIter){
H <- sign(X %*% W)
indexs <- which(H != Y)
if (length(indexs) == 0){
break
} else {
i <- sample(indexs, 1)
W <- W + 0.1 * (X[i,] * Y[i,])
}
count <- count + 1
point <- as.data.frame(data[i,])
plot_it(data, point, W, paste("plot", sprintf("%05d", count), ".png", sep=""))
}
W
}
The code to generate the plot:
plot_it <- function(data, point, weights, name = "plot.png") {
line <- weights_to_line(weights)
point <- point
png(name)
p = ggplot() + geom_point(data = data, aes(x, y, color = value, size = 2)) + theme(legend.position = "none")
p = p + geom_abline(intercept = line[2], slope = line[1])
print(p)
dev.off()
}
This was solved using material from the question and answers from Issues plotting a fitted SVM model's decision boundary using ggplot2's stat_contour(). I skipped the call to geom_point for the grid-entires and some of the aesthetical definitions like scale_fill_manual and scale_colour_manual. Removing the dots for the grid entries solved the problem with the vanishing contour-line in my case.
train_and_plot_svm <- function(train, kernel = "sigmoid", type ="C", cost, gamma) {
fit <- svm(as.factor(value) ~ x + y, data = train, kernel = kernel, type = type, cost = cost)
grid <- expand.grid (x = seq(from = -0.1, to = 15, length = 100), y = seq(from = -0.1, to = 15, length = 100))
decisionValues <- as.vector(attributes(predict(fit, grid, decision.values = TRUE))$decision)
p <- predict(fit, grid)
grid$value <- p
grid$z <- decisionValues
p <- ggplot() + stat_contour(data = grid, aes(x = x, y = y, z = z), breaks = c(0))
p <- p + geom_point(data = train, aes(x, y, colour = as.factor(value)), alpha = 0.7)
p <- p + xlim(0,15) + ylim(0,15) + theme(legend.position="none")
}
Note that this function doesn't return the result of the svm training but the ggplot2 object.
This is, what I got:

Add datapoints to existing scatterplot

I have an existing ggplot2 scatterplot which shows the results of a parameter against from normal database. I then want to add two additional points to this graph which I would pass as command line arguments to my script script age value1 value2. I would like to show these points as red with an r and l geom_text above each point. I have the following code so far but do not know how to add the finishing touches
pkgLoad <- function(x)
{
if (!require(x,character.only = TRUE))
{
install.packages(x,dep=TRUE, repos='http://star-www.st-andrews.ac.uk/cran/')
if(!require(x,character.only = TRUE)) stop("Package not found")
}
}
pkgLoad("ggplot2")
#load current normals database
df<-data.frame(read.csv("dat_normals.txt", sep='\t', header=T))
args<-commandArgs(TRUE)
#specify what each argument is
age <- args[1]
rSBR <- args[2]
lSBR <- args[3]
# RUN REGRESSION AND APPEND PREDICTION INTERVALS
lm_fit = lm(SBR ~ Age, data = df)
sbr_with_pred = data.frame(df, predict(lm_fit, interval='prediction'))
p <- ggplot(sbr_with_pred, aes(x=Age, y=SBR)) +
geom_point(shape=19, alpha=1/4) +
geom_smooth(method = 'lm', aes(fill = 'confidence'), alpha = 0.5) +
geom_ribbon(aes(y = fit, ymin = lwr, ymax = upr,
fill = 'prediction'), alpha = 0.2) +
scale_fill_manual('Interval', values = c('green', 'blue')) +
theme_bw() +
theme(legend.position = "none")
ggsave(filename=paste("/home/data/wolf/FV_DAT/dat_results.png",sep=""))
browseURL(paste("/home/data/wolf/FV_DAT/dat_results.png",sep""))
Essentially, I want to see if the 2 new points fall within the 95% confidence intervals from the normal database (blue ribbon)
Your example is not reproducible. It is really constructive to create data and reproducible example. It is not a waste of time. For the solution, I write what it is said in the comment. You add a new layer with new data.
newdata <- data.frame(Age = args[1],
SBR = c(args[2],args[3]))
p + geom_point(data=newdata,colour="red",size=10)
For example:
sbr_with_pred <- data.frame(Age = sample(15:36,50,rep=T),
SBR = rnorm(50))
p <- ggplot(sbr_with_pred, aes(x=Age, y=SBR)) +
geom_point(shape=19, alpha=1/4) +
geom_smooth(method = 'lm', aes(fill = 'confidence'), alpha = 0.5)
args <- c(20,rnorm(1),rnorm(2))
newdata <- data.frame(Age = args[1],
SBR = c(args[2],args[3]))
p + geom_point(data=newdata,colour="red",size=10)

Resources