Fit a normal fit in histogram with a x-log scale - r

I hope some of you help me. I am dealing with a plottiing a normal fit in my histogram with x-log scale. I use log scale because a normal histogram with my data has a long tail. My codes are like this..
breaks<- c(0,0.01, 0.05, 0.1,0.2,0.5,1,2,5,10,20,50,100,200,300) #bins
major <- c(0.1,1,10,100)
H <- hist(log10(B),plot=B) #using data "B"
plot(H$mids,H$counts,type="n",
xaxt="n",
xlab="B",ylab="Counts",
main="Histogram of B",
bg="lightgrey"
)
abline(v=log10(breaks),col="lightgrey",lty=2)
abline(v=log10(major),col="lightgrey")
abline(h=pretty(H$counts),col="lightgrey")
plot(H,add=T,freq=T,col="blue")
#Position of ticks
at <- log10(breaks)
#Creation X axis
axis(1,at=at,labels=10^at)
curve(dnorm(x, mean=mean(B), sd=sd(B)), add=TRUE) # 1st try
lines(density(B)) # 2nd try
xfit<-seq(min(B),max(B),length=40) # 3rd try
yfit<-dnorm(xfit,mean=mean(B),sd=sd(B))
yfit<-yfit*diff(H$mids[1:2])*length(B)
lines(xfit, yfit, col="red", lwd=2)
But 1st, 2nd, 3rd tries did not work..Please let me know how to add a normal fit into my histogram. Thank you very much for your help.
Summy

I just changed your first try so that you get what you want:
B <- rlnorm(10000)
H <- hist(log10(B), freq = FALSE, col="blue", xaxt="n", xlab="B")
at <- H$mids
axis(1,at=at,labels=round(10^at,2))
curve(dnorm(x, mean=mean(log10(B)), sd=sd(log10(B))), add=TRUE)
Hope it helps,
alex

Related

Plotting several variables on the same scale in R

I've tried over and over to solve this issue but I can't get it down. I have estimated a Beta-t-EGARCH model and a GARCH-t model in R and now I need to plot the results over the same plot. The final result is horrible, since the variables don't share the same scale on the y axis. I'm new to R, so please don't blame me :).
Here's the code:
library(quantmod)
library(betategarch)
library(fGarch)
library(ggplot2)
getSymbols("GOOG",src="yahoo")
google_ret <- abs(periodReturn(GOOG, period="daily", subset=NULL, type="log"))-mean(abs(periodReturn(GOOG, period="daily", subset=NULL, type="log")))
googcomp <- tegarch(google_ret, asym=FALSE, skew=FALSE)
goog1stdev <- fitted(googcomp)
#now we try to fit a standard GARCH-t model
googgarch <- garchFit(data=google_ret, cond.dist="sstd")
googgarch2 <- garchFit(data=google_ret, cond.dist="sstd", include.mean = FALSE, include.delta = FALSE, include.skew = FALSE, include.shape = FALSE, leverage = FALSE, trace = TRUE)
volatility <- volatility(googgarch2, type = "sigma")
plot(google_ret)
par(new=TRUE)
plot(googgarch2, which=2)
par(new=TRUE)
plot(goog1stdev, col="red")
The final result is a plot completely out of scale on the y axis, with variables that have lower values plotted above higher ones. Thanks a lot to anybody that wants to help me!
The recommended approach is to plot them as different plots stacked on top of each other:
layout(matrix(1:3,3))
plot(google_ret)
plot(googgarch2, which=2)
plot(goog1stdev, col="red")
You can get rid of the whitespace with calls to par("mar") to adjust margin sizes:
opar=par(mar=par("mar") -c(1,0,3,0)) # opar will then let your restore previous values
..... plotting efforts
par(opar)
I don't know your domain very much but if you cna use shifted y-ordinates then this produces a somewhat cleaned up version with overlayed plots:
png()
plot(google_ret, ylim=c(0,1), ylab="ylab="Google Returns(black); GGarch x10 +0.5 (blue); STD + 0.3(red)" )
par(new=TRUE)
plot(googgarch2#data +.5, type="l", col="blue",axes=FALSE, ylab="", main="",ylim=c(0, 1)) ;abline(h=.5, col="blue")
par(new=TRUE);
plot( 10*coredata(goog1stdev) + .3, col="red", type="l", axes=FALSE, main="",ylim=c(0,1), ylab=""); abline(h=.3, col="red")
dev.off()

How to get rid of multiple outliers in a timeseries in R?

I'm using "outliers" package in order to remove some undesirable values. But it seems that rm.outliers() funcion does not replace all outliers at the same time. Probably, rm.outliers() could not perform despikes recursively. Then, basically I have to call this function a lot of times in order to replace all outliers.
Here is a reproducible example of the issue I'm experiencing:
require(outliers)
# creating a timeseries:
set.seed(12345)
y = rnorm(10000)
# inserting some outliers:
y[4000:4500] = -11
y[4501:5000] = -10
y[5001:5100] = -9
y[5101:5200] = -8
y[5201:5300] = -7
y[5301:5400] = -6
y[5401:5500] = -5
# plotting the timeseries + outliers:
plot(y, type="l", col="black", lwd=6, xlab="Time", ylab="w'")
# trying to get rid of some outliers by replacing them by the series mean value:
new.y = outliers::rm.outlier(y, fill=TRUE, median=FALSE)
new.y = outliers::rm.outlier(new.y, fill=TRUE, median=FALSE)
# plotting the new timeseries "after removing the outliers":
lines(new.y, col="red")
# inserting a legend:
legend("bottomleft", c("raw", "new series"), col=c("black","red"), lty=c(1,1), horiz=FALSE, bty="n")
Does anyone know how to improve the code above, so that all outliers could be replaced by a mean value?
Best thought I could come up with is just to use a for loop, keeping track of the outliers as you find them.
plot(y, type="l", col="black", lwd=6, xlab="Time", ylab="w'")
maxIter <- 100
outlierQ <- rep(F, length(y))
for (i in 1:maxIter) {
bad <- outlier(y, logical = T)
if (!any(bad)) break
outlierQ[bad] <- T
y[bad] <- mean(y[!bad])
}
y[outlierQ] <- mean(y[!outlierQ])
lines(y, col="blue")

Why isn't this plotting multiple functions in one graph?

I'm having some problems trying to plot multiple reliability functions in one single graph from a inverse gaussian distirbution. I need the functions to be lines, and all I got is points, when trying to set type="l", it happens to be a mess drawing mulitle lines everywhere.
Here is the code
library("statmod")
x<-rinvgauss(90,0.000471176,0.0000191925)
y<-rinvgauss(90,0.000732085,0.000002982015)
z<-rinvgauss(180,0.000286672,0.00000116771)
den<-pinvgauss(x,0.000471176,0.0000191925)
dens<-pinvgauss(y,0.000732085,0.000002982015)
densi<-pinvgauss(z,0.000286672,0.00000116771)
rel<-1-den
reli<-1-dens
relia<-1-densi
plot(x,rel, xlim=c(0,0.002), col="red")
points(y,reli, col="blue")
points(z,relia, col="black")
I would really appreciate any help on this!
The problem is your x, y, z values aren't sorted...
library("statmod")
x <- sort(rinvgauss(90,0.000471176,0.0000191925))
y <- sort(rinvgauss(90,0.000732085,0.000002982015))
z <- sort(rinvgauss(180,0.000286672,0.00000116771))
den <- pinvgauss(x,0.000471176,0.0000191925)
dens <- pinvgauss(y,0.000732085,0.000002982015)
densi <- pinvgauss(z,0.000286672,0.00000116771)
rel <- 1-den
reli <- 1-dens
relia <- 1-densi
plot(x,rel, xlim=c(0,0.002), col="red", type="l")
lines(y,reli, col="blue")
lines(z,relia, col="black")
Your values weren't sorted. This should work:
x<-sort(rinvgauss(90,0.000471176,0.0000191925))
y<-sort(rinvgauss(90,0.000732085,0.000002982015))
z<-sort(rinvgauss(180,0.000286672,0.00000116771))
den<-sort(pinvgauss(x,0.000471176,0.0000191925))
dens<-sort(pinvgauss(y,0.000732085,0.000002982015))
densi<-sort(pinvgauss(z,0.000286672,0.00000116771))
rel<-1-den
reli<-1-dens
relia<-1-densi
plot(x,rel, xlim=c(0,0.002), col="red",type="l")
lines(y,reli, col="blue")
lines(z,relia, col="black")

Colorfill boxplot in R-cran with lines, dots, or similar

I need to use black and white color for my boxplots in R. I would like to colorfill the boxplot with lines and dots. For an example:
I imagine ggplot2 could do that but I can't find any way to do it.
Thank you in advance for your help!
I thought this was a great question and pondered if it was possible to do this in base R and to obtain the checkered look. So I put together some code that relies on boxplot.stats and polygon (which can draw angled lines). Here's the solution, which is really not ready for primetime, but is a solution that could be tinkered with to make more general.
boxpattern <-
function(y, xcenter, boxwidth, angle=NULL, angle.density=10, ...) {
# draw an individual box
bstats <- boxplot.stats(y)
bxmin <- bstats$stats[1]
bxq2 <- bstats$stats[2]
bxmedian <- bstats$stats[3]
bxq4 <- bstats$stats[4]
bxmax <- bstats$stats[5]
bleft <- xcenter-(boxwidth/2)
bright <- xcenter+(boxwidth/2)
# boxplot
polygon(c(bleft,bright,bright,bleft,bleft),
c(bxq2,bxq2,bxq4,bxq4,bxq2), angle=angle[1], density=angle.density)
polygon(c(bleft,bright,bright,bleft,bleft),
c(bxq2,bxq2,bxq4,bxq4,bxq2), angle=angle[2], density=angle.density)
# lines
segments(bleft,bxmedian,bright,bxmedian,lwd=3) # median
segments(bleft,bxmin,bright,bxmin,lwd=1) # min
segments(xcenter,bxmin,xcenter,bxq2,lwd=1)
segments(bleft,bxmax,bright,bxmax,lwd=1) # max
segments(xcenter,bxq4,xcenter,bxmax,lwd=1)
# outliers
if(length(bstats$out)>0){
for(i in 1:length(bstats$out))
points(xcenter,bstats$out[i])
}
}
drawboxplots <- function(y, x, boxwidth=1, angle=NULL, ...){
# figure out all the boxes and start the plot
groups <- split(y,as.factor(x))
len <- length(groups)
bxylim <- c((min(y)-0.04*abs(min(y))),(max(y)+0.04*max(y)))
xcenters <- seq(1,max(2,(len*(1.4))),length.out=len)
if(is.null(angle)){
angle <- seq(-90,75,length.out=len)
angle <- lapply(angle,function(x) c(x,x))
}
else if(!length(angle)==len)
stop("angle must be a vector or list of two-element vectors")
else if(!is.list(angle))
angle <- lapply(angle,function(x) c(x,x))
# draw plot area
plot(0, xlim=c(.97*(min(xcenters)-1), 1.04*(max(xcenters)+1)),
ylim=bxylim,
xlab="", xaxt="n",
ylab=names(y),
col="white", las=1)
axis(1, at=xcenters, labels=names(groups))
# draw boxplots
plots <- mapply(boxpattern, y=groups, xcenter=xcenters,
boxwidth=boxwidth, angle=angle, ...)
}
Some examples in action:
mydat <- data.frame(y=c(rnorm(200,1,4),rnorm(200,2,2)),
x=sort(rep(1:2,200)))
drawboxplots(mydat$y, mydat$x)
mydat <- data.frame(y=c(rnorm(200,1,4),rnorm(200,2,2),
rnorm(200,3,3),rnorm(400,-2,8)),
x=sort(rep(1:5,200)))
drawboxplots(mydat$y, mydat$x)
drawboxplots(mydat$y, mydat$x, boxwidth=.5, angle.density=30)
drawboxplots(mydat$y, mydat$x, # specify list of two-element angle parameters
angle=list(c(0,0),c(90,90),c(45,45),c(45,-45),c(0,90)))
EDIT: I wanted to add that one could also obtain dots as a fill by basically drawing a pattern of dots, then covering them a "donut"-shaped polygon, like so:
x <- rep(1:10,10)
y <- sort(x)
plot(y~x, xlim=c(0,11), ylim=c(0,11), pch=20)
outerbox.x <- c(2.5,0.5,10.5,10.5,0.5,0.5,2.5,7.5,7.5,2.5)
outerbox.y <- c(2.5,0.5,0.5,10.5,10.5,0.5,2.5,2.5,7.5,7.5)
polygon(outerbox.x,outerbox.y, col="white", border="white") # donut
polygon(c(2.5,2.5,7.5,7.5,2.5),c(2.5,2.5,2.5,7.5,7.5)) # inner box
But mixing that with angled lines in a single plotting function would be a bit difficult, and is generally a bit more challenging, but it starts to get you there.
I think it is hard to do this with ggplot2 since it dont use shading polygon(gris limitatipn). But you can use shading line feature in base plot, paramtered by density and angle arguments in some plot functions ( ploygon, barplot,..).
The problem that boxplot don't use this feature. So I hack it , or rather I hack bxp internally used by boxplot. The hack consist in adding 2 arguments (angle and density) to bxp function and add them internally in the call of xypolygon function ( This occurs in 2 lines).
my.bxp <- function (all.bxp.argument,angle,density, ...) {
.....#### bxp code
xypolygon(xx, yy, lty = boxlty[i], lwd = boxlwd[i],
border = boxcol[i],angle[i],density[i])
.......## bxp code after
xypolygon(xx, yy, lty = "blank", col = boxfill[i],angle[i],density[i])
......
}
Here an example. It should be noted that it is entirely the responsibility of the user to ensure
that the legend corresponds to the plot. So I add some code to rearrange the legend an the boxplot code.
require(stats)
set.seed(753)
(bx.p <- boxplot(split(rt(100, 4), gl(5, 20))))
layout(matrix(c(1,2),nrow=1),
width=c(4,1))
angles=c(60,30,40,50,60)
densities=c(50,30,40,50,30)
par(mar=c(5,4,4,0)) #Get rid of the margin on the right side
my.bxp(bx.p,angle=angles,density=densities)
par(mar=c(5,0,4,2)) #No margin on the left side
plot(c(0,1),type="n", axes=F, xlab="", ylab="")
legend("top", paste("region", 1:5),
angle=angles,density=densities)

R arrowed labelling of data points on a plot

I am looking to label data points with indices -- to identify the index number easily by visual examination.
So for instance,
x<-ts.plot(rnorm(10,0,1)) # would like to visually identify the data point indices easily through arrow labelling
Of course, if there's a better way of achieving this, please suggest
You can use arrows function:
set.seed(1); ts.plot(x <-rnorm(10,0,1), ylim=c(-1.6,1.6)) # some random data
arrows(x0=1:length(x), y0=0, y1=x, code=2, col=2, length=.1) # adding arrows
text(x=1:10, y=x+.1, 0, labels=round(x,2), cex=0.65) # adding text
abline(h=0) # adding a horizontal line at y=0
Use my.symbols from package TeachingDemos to get arrows pointing to the locations you want:
require(TeachingDemos)
d <- rnorm(10,0,1)
plot(d, type="l", ylim=c(min(d)-1, max(d)+1))
my.symbols(x=1:10, y=d, ms.arrows, angle=pi/2, add=T, symb.plots=TRUE, adj=1.5)
You can use text() for this
n <- 10
d <- rnorm(n)
plot(d, type="l", ylim=c(min(d)-1, max(d)+1))
text(1:n, d+par("cxy")[2]/2,col=2) # Upside
text(1:n, d-par("cxy")[2]/2,col=3) # Downside
Here a lattice version, to see the analogous of some base function.
set.seed(1234)
dat = data.frame(x=1:10, y = rnorm(10,0,1))
xyplot(y~x,data=dat, type =c('l','p'),
panel = function(x,y,...){
panel.fill(col=rgb(1,1,0,0.5))
panel.xyplot(x,y,...)
panel.arrows(x, y0=0,x1=x, y1=y, code=2, col=2, length=.1)
panel.text(x,y,label=round(y,2),adj=1.2,cex=1.5)
panel.abline(a=0)
})

Resources