I try to make a good Boxplot. As you can see in the picture, to get a clear visualization, it is necessary to "zoom in" into the biggest part of the data.
I did this with the ylim option.
As you can see in the picture below I created an main title, the outliers are going through the title and that is the problem.
I think I could solve the problem by deleting the outliers in the original data, but I was wondering if it is possible to cut the "boxplotline" by 0.10, so the boxplot stays in the figure.
My code so far:
boxplot (genergy$Measurevalue, ylim= c(0,0.1), ylab = "Measured Value",
main="Boxplot Measured Value", col = "red")
UPDATE:
#Twitch_City: I don't think that use another ylim is the solution. For example:
boxplot (genergy$Measurevalue, ylim= c(0,0.50), ylab = "Measured Value",
main="Boxplot Measured Value", col = "red")
#akash87, sure. The data is:
You could use outline=FALSE to avoid plotting the outliers completely. You could then provide data about the outliers separately (for example, using fivenum or other summary).
Here's an example using random data generated from a chi-squared distribution with df=3; the data are quite positively skewed, as your data seem to be. Save the boxplot stats to obtain info on the outliers.
N=500000
dat <- rchisq(N, 3)
dat.box <- boxplot(dat, cex=.5, outline=F, las=1)
cat(fivenum(dat.box$out))
Another alternative is to plot a kernel density curve and add lines corresponding to the desired quantiles. As below:
plot(density(dat), las=1)
abline(v=median(dat), col='black')
abline(v=quantile(dat, .25), lty=3, col='red')
abline(v=quantile(dat, .75), lty=3, col='red')
I would like to add a curved line to fit the dark bars of this supply cost curve (like the red line that appears in image). The height of the dark bars represent the range in uncertainty in their costs (costrange). I am using fully transparent values (costtrans) to stack the bars above a certain level
This is my code:
costtrans<-c(10,10,20,28,30,37,50,50,55,66,67,70)
costrange<-c(15,30,50,21,50,20,30,40,45,29,30,20)
cost3<-table(costtrans,costrange)
cost3<-c(10,15,10,30,20,50,28,21,30,50,37,20,50,30,50,40,55,45,66,29,67,30,70,20)
costmat <- matrix(data=cost3,ncol=12,byrow=FALSE)
Dark <- rgb(99/255,99/255,99/250,1)
Transparent<-rgb(99/255,99/255,99/250,0)
production<-c(31.6,40.9,3.7,3.7,1,0.3,1.105,0.5,2.3,0.7,0.926,0.9)
par(xaxs='i',yaxs='i')
par(mar=c(4, 6, 4, 4))
barplot(costmat,production, space=0, main="Supply Curve", col=c(Transparent, Dark), border=NA, xlab="Quantity", xlim=c(0,100),ylim=c(0, 110), ylab="Supply Cost", las=1, bty="l", cex.lab=1.25,axes=FALSE)
axis(1, at=seq(0,100, by=5), las=1, cex.axis=1.25)
axis(2, at=seq(0,110, by=10), las=1, cex.axis=1.25)
Image to describe what I am looking for:
I guess it really depends how you want to calculate the line...
One first option would be:
# Save the barplot coordinates into a variable
bp <- barplot(costmat,production, space=0, main="Supply Curve",
col=c(Transparent, Dark), border=NA, xlab="Quantity",
xlim=c(0,100), ylim=c(0, 110), ylab="Supply Cost", las=1,
bty="l", cex.lab=1.25,axes=FALSE)
axis(1, at=seq(0,100, by=5), las=1, cex.axis=1.25)
axis(2, at=seq(0,110, by=10), las=1, cex.axis=1.25)
# Find the mean y value for each box
mean.cost <- (costmat[1,]+colSums(costmat))/2
# Add a line through the points
lines(bp, mean.cost, col="red", lwd=2)
Which gives
Now, you could do some smoother line, using some sort of regression
For instance, using a LOESS regression.
# Perform a LOESS regression
# To allow for extrapolation, you may want to add
# control = loess.control(surface = "direct")
model <- loess(mean.cost~bp, span=1)
# Predict values in the 0:100 range.
# Note that, unless you allow extrapolation (see above)
# by default only values in the range of the original data
# will be predicted.
pr <- predict(model, newdata=data.frame(bp=0:100))
lines(0:100, pr, col="red", lwd=2)
I would like to see the number of instances for each bin show up on the graph as well
set.seed(1)
x<-rnorm(1:100)
hist(x)
Try this
set.seed(1)
x<-rnorm(1:100)
y <- hist(x, plot=FALSE)
plot(y, ylim=c(0, max(y$counts)+5))
text(y$mids, y$counts+3, y$counts, cex=0.75)
which gives:
Another much simpler solution is just to use labels=TRUE in the hist(...) method itself. It will include number of occurrences/counts on the top of each bins in the histogram plot.
However, I would recommend to always include xlim and ylim for the histogram plots.
Code:
set.seed(1)
x <- rnorm(1:100)
hist(x, xlim = c(-3,3), ylim = c(1,30), labels = TRUE)
that automatically happen. it's called "frequency" on the left
I want to make a histogram for multiple variables.
I used the following code :
set.seed(2)
dataOne <- runif(10)
dataTwo <- runif(10)
dataThree <- runif(10)
one <- hist(dataOne, plot=FALSE)
two <- hist(dataTwo, plot=FALSE)
three <- hist(dataThree, plot=FALSE)
plot(one, xlab="Beta Values", ylab="Frequency",
labels=TRUE, col="blue", xlim=c(0,1))
plot(two, col='green', add=TRUE)
plot(three, col='red', add=TRUE)
But the problem is that they cover each other, as shown below.
I just want them to be added to each other (showing the bars over each other) i.e. not overlapping/ not covering each other.
How can I do this ?
Try replacing your last three lines by:
plot(One, xlab = "Beta Values", ylab = "Frequency", col = "blue")
points(Two, col = 'green')
points(Three, col = 'red')
The first time you need to call plot. But the next time you call plot it will start a new plot which means you lose the first data. Instead you want to add more data to it either with scatter chart using points, or with a line chart using lines.
It's not quite clear what you are looking for here.
One approach is to place the plots in separate plotting spaces:
par("mfcol"=c(3, 1))
hist(dataOne, col="blue")
hist(dataTwo, col="green")
hist(dataThree, col="red")
par("mfcol"=c(1, 1))
Is this what you're after?
I'm plotting the vertex degree (# of incident edges) of the graph g.
deg <- degree(g, v=V(g), mode = c("in"), loops = TRUE)
histdata <- hist( deg, breaks=1000, plot=FALSE )
plot(histdata$count, log="xy", type="p", col="blue", bg = "blue", pch=20,
xlim=c(1,max(deg)),
ylim=c(1,max(histdata$count)),
ylab="Frequency", xlab="Degree")
This code plots this scatterplot,
which is very close to what I need but has a few issues:
1) the x labels are wrong, as they don't represent the degrees but the histogram breaks.
2) the axis bars are messy. How can I remove the empty ones?
3) how can I plot a regression line? I tried with abline and lm(histdata$mids~histdata$count) but nothing gets plotted.
Thanks for any hint!
UPDATE: this plot is probably plain wrong. See http://www.hpl.hp.com/research/idl/papers/ranking/ranking.html