how can i show two histogram in same window but different plots in R? - r

I want to show the effect of removing outliers on my histogram, so I have to plot both hists together.
boxplot(Costs, Costs1,
xlab=" Costs and Costs after removig outliers",
col=topo.colors(2))
so i tried this:
hist(Costs,Costs1,main="Histogram of Maintenance_cost ",col="blue",
border="darkblue",xlab="Total_cost",ylab=" ",yaxt = 'n',
#ylim=c(0,3000),
#xlim=c(0,max(My_Costs)),
breaks=60)
the first code give me to box plot, but I tried it for hist it doesn't work
can anyone tell me how to do it in R?

For a base R solution, use par with mfrow.
set.seed(1234)
Costs = rnorm(5000, 100, 20)
OUT = which(Costs %in% boxplot(Costs, plot=FALSE)$out)
Costs1 = Costs[-OUT]
par(mfrow=c(1,2), mar=c(5,1,2,1))
hist(Costs,main="Histogram of Maintenance_cost ",col="blue",
border="darkblue",xlab="Total_cost",ylab=" ",yaxt = 'n',
breaks=60, xlim=c(30,170))
hist(Costs1,main="Maintenance_cost without outliers",col="blue",
border="darkblue",xlab="Total_cost",ylab=" ",yaxt = 'n',
breaks=60, xlim=c(30,170))

For multiple plots, you should use ggplot2 with facet_wrap. Here is an example:
Plot several histograms with ggplot in one window with several variables

Related

Multiple histograms with title and mean as a line?

I'm struggeling with the histogram function in my exploratory analysis. I would like to run a couple of variables in my dataset through a histogram function and for each add the title and a line at the arithmetic mean. This is how far I've got (but the main title is still missing):
histo.abline <-function(x){
hist(x)
abline(v = mean(x, na.rm = TRUE), col = "blue", lwd = 4)}
sapply(dataset[c(7:10)], histo.abline)
I tried to add a main argument in the histogram function but it just doesn't pick the right variable name of my dataset vector. When I put main=x there, it says returns NULL for each variable. Colnames, names and other functions didn't work either. Could you help me?
you can try to do it with ggplot:
library(ggplot)
histo.abline <-function(dataset,colnum){
p<-ggplot(dataset,aes(dataset[,colnum]))+geom_histogram(bins=5,fill=I("blue"),col=I("red"), alpha=I(.2))+
geom_vline(xintercept = mean(dataset[,colnum], na.rm = TRUE))+xlab(as.character(names(dataset)[colnum]))
return(p)
}
since you have not provided data lets work with mtcars and create a list of histograms
dataset=mtcars
listOfHistograms<-lapply(3:7,function(x) histo.abline(dataset,x))
your list has 5 histograms that you can plot for instance the first by:
print(listOfHistograms[[1]])
More histogram options for ggplot here: https://www.r-bloggers.com/how-to-make-a-histogram-with-ggplot2/
hope this helps
EDIT: Multiple Plot in one graph
One way to do it is through cowplot library:
library(cowplot)
plot_grid(plotlist=listOfHistograms[1:4])

how to make a biplot without label in R

I used
biplot(prcomp(data, scale.=T), xlabs=rep("·", nrow(data)))
but it did not work to omit the labels.
Even if I remove the labels my plot is so messy and ugly which can be seen below!
I also need to show the percentage of PCs on axes
I used the following command to plot the image
biplot(prcomp(data, scale.=T), xlabs=rep("·", nrow(data)), ylabs = rep("·", ncol(data)))
Try this one
\devtools::install_github("sinhrks/ggfortify")
library(ggfortify)
ggplot2::autoplot(stats::prcomp(USArrests, scale=TRUE), label = FALSE, loadings.label = TRUE)

Combine two plots created with effects package in R

I have the following Problem. After running an ordered logit model, I want to R's effects package to visualize the results. This works fine and I did so for two independent variables, then I tried to combine the two plots. However, this does not seem to work. I provide a little replicable example here so you can see my problem for yourself:
library(car)
data(Chile)
mod <- polr(vote ~ age + log(income), data=Chile)
eff <- effect("log(income)", mod)
plot1 <- plot(eff, style="stacked",rug=F, key.args=list(space="right"))
eff2 <- effect("age", mod)
plot2 <- plot(eff2, style="stacked",rug=F, key.args=list(space="right"))
I can print these two plots now independently, but when I try to plot them together, the first plot is overwritten. I tried setting par(mfrow=c(2,1)), which didn't work. Next I tried the following:
print(plot1, position=c(0, .5, 1, 1), more=T)
print(plot2, position=c(0,0, 1, .5))
In this latter case, the positions of the two plots are just fine, but still the first plot vanishes once I add the second (or better, it is overwritten). Any suggestions how to prevent this behavior would be appreciated.
Reading down the long list of arguments to ?print.eff we see that there are some arguments for doing just this:
plot(eff, style="stacked",rug=F, key.args=list(space="right"),
row = 1,col = 1,nrow = 1,ncol = 2,more = TRUE)
plot(eff2, style="stacked",rug=F, key.args=list(space="right"),
row = 1,col = 2,nrow = 1,ncol = 2)
The reason par() didn't work is because this package is using lattice graphics, which are based on the grid system, which is incompatible with base graphics. Neither par() nor layout will have any effect on grid graphics.
This seems to work:
plot(eff,col=1,row=2,ncol=1,nrow=2,style="stacked",rug=F,
key.args=list(space="right"),more=T)
plot(eff2,col=1,row=1,ncol=1,nrow=2,style="stacked",rug=F,
key.args=list(space="right"))
edit: Too late...

superpose a histogram and an xyplot

I'd like to superpose a histogram and an xyplot representing the cumulative distribution function using r's lattice package.
I've tried to accomplish this with custom panel functions, but can't seem to get it right--I'm getting hung up on one plot being univariate and one being bivariate I think.
Here's an example with the two plots I want stacked vertically:
set.seed(1)
x <- rnorm(100, 0, 1)
discrete.cdf <- function(x, decreasing=FALSE){
x <- x[order(x,decreasing=FALSE)]
result <- data.frame(rank=1:length(x),x=x)
result$cdf <- result$rank/nrow(result)
return(result)
}
my.df <- discrete.cdf(x)
chart.hist <- histogram(~x, data=my.df, xlab="")
chart.cdf <- xyplot(100*cdf~x, data=my.df, type="s",
ylab="Cumulative Percent of Total")
graphics.off()
trellis.device(width = 6, height = 8)
print(chart.hist, split = c(1,1,1,2), more = TRUE)
print(chart.cdf, split = c(1,2,1,2))
I'd like these superposed in the same frame, rather than stacked.
The following code doesn't work, nor do any of the simple variations of it that I have tried:
xyplot(cdf~x,data=cdf,
panel=function(...){
panel.xyplot(...)
panel.histogram(~x)
})
You were on the right track with your custom panel function. The trick is passing the correct arguments to the panel.- functions. For panel.histogram, this means not passing a formula and supplying an appropriate value to the breaks argument:
EDIT Proper percent values on y-axis and type of plots
xyplot(100*cdf~x,data=my.df,
panel=function(...){
panel.histogram(..., breaks = do.breaks(range(x), nint = 8),
type = "percent")
panel.xyplot(..., type = "s")
})
This answer is just a placeholder until a better answer comes.
The hist() function from the graphics package has an option called add. The following does what you want in the "classical" way:
plot( my.df$x, my.df$cdf * 100, type= "l" )
hist( my.df$x, add= T )

lattice or latticeExtra combine multiple plots different yscaling (log10 and non-transformed)

I have a multiple variable time series were some of the variables have rather large ranges. I wish to make a single-page plot with multiple stacked plots of each variable were some of the variables have a log10 y-axis scaling. I am relatively new to lattice and have not been able to figure out how to effectively mix the log10 scaling with non-transformed axes and get a publication quality plot. If print.trellis is used the plots are not aligned and the padding needs some work, if c.trellis is used the layout is good, but only the y-scaling from only one plot is used. Any suggestions for an efficient solution, where I can replicate the output of c.trellis using the different y-scaling for each (original) object?
Example below:
require(lattice)
require(latticeExtra)
# make data.frame
d.date <- as.POSIXct(c("2009-12-15", "2010-01-15", "2010-02-15", "2010-03-15", "2010-04-15"))
CO2dat <- c(100,200,1000,9000,2000)
pHdat <- c(10,9,7,6,7)
tmp <- data.frame(date=d.date ,CO2dat=CO2dat ,pHdat=pHdat)
# make plots
plot1 <- xyplot(pHdat ~ date, data=tmp
, ylim=c(5,11)
, ylab="pHdat"
, xlab="Date"
, origin = 0, border = 0
, scales=list(y=list(alternating=1))
, panel = function(...){
panel.xyarea(...)
panel.xyplot(...)
}
)
# make plot with log y scale
plot2 <- xyplot(CO2dat ~ date, data=tmp
, ylim=c(10,10^4)
, ylab="CO2dat"
, xlab="Date"
, origin = 0, border = 0
, scales=list(y=list(alternating=1,log=10))
, yscale.components = yscale.components.log10ticks
, panel = function(...){
panel.xyarea(...)
panel.xyplot(...)
# plot CO2air uatm
panel.abline(h=log10(390),col="blue",type="l",...)
}
)
# plot individual figures using split
print(plot2, split=c(1,1,1,2), more=TRUE)
print(plot1, split=c(1,2,1,2), more=F)
# combine plots (more convenient)
comb <- c(plot1, plot2, x.same=F, y.same=F, layout = c(1, 2))
# plot combined figure
update(comb, ylab = c("pHdat","log10 CO2dat"))
Using #joran's idea, I can get the axes to be closer but not exact; also, reducing padding gets them closer together but changes the aspect ratio. In the picture below I've reduced the padding perhaps by too much to show the not exactness; if this close were desired, you'd clearly want to remove the x-axis labels on the top as well.
I looked into the code that sets up the layout and the margin on the left side is calculated from the width of the labels, so #joran's idea is probably the only thing that will work based on the printing using split, unless one were to rewrite the plot.trellis command. Perhaps the c method could work but I haven't found a way yet to set the scale components separately depending on the panel. That does seem more promising though.
mtheme <- standard.theme("pdf")
mtheme$layout.heights$bottom.padding <- -10
plot1b <- update(plot1, scales=list(y=list(alternating=1, at=5:10, labels=paste(" ",c(5:10)))))
plot2b <- update(plot2, par.settings=mtheme)
pdf(file="temp.pdf")
print(plot2b, split=c(1,1,1,2), more=TRUE)
print(plot1b, split=c(1,2,1,2), more=F)

Resources