I have made box-plots for the onset values of three different groups using the box-plot function in R like so:
boxplot(onset ~ group, data = pulse.dat, range = 0, col = "lightblue")
However, I want to see how the data looks without the range, so I want to create a box-plot without the whiskers. I also wouldn't mind any kind of graph as long as it displays the median, 25th and 75th quartile for each of the 3 groups.
Does anyone know how I can do this in R?
Under boxplot pars...
d <- rnorm(1:100, 100, 10)
boxplot(d, whisklty = 0, staplelty = 0)
whisklty gets rid of the lines or whiskers
staplelty gets rid of the ends or staples
Related
I have some monthly time-series data and I need to create a boxplot on this data using R. Until here, no problem of course. Then, additionally to the median, 1st quartile and 3rd quartile, I also need to visualize in the graph specific data points coming from the time-series, namely the observations at 3 years, 1 year and 3 months in the past. I looked online but I can't seem to find any command for this. Is there a way I can add these observation to the boxplot?
You can use, for example, the text function:
set.seed(123)
x <- rnorm(100, 5, 10)
If you want to display the mean value:
boxplot(x)
text(1, mean(x), "x", col = "red")
Additionally or alternatively, you can use the function points:
points(IQR(x), col = "blue", pch = 8)
I have searched very hard for a solution to this question, but I haven't been successful. I have a zoo plot with your standard time series X axis and simply want to highlight regions of the chart when the value of the series is less than a certain threshold. Specifically, I want to highlight when the p-value of the intercept is significant (and the intercept is plotted). This will occur at various intervals across the time series, rather than simply some range x <= y.
I have tried help (xblocks) example (xblocks) and I haven't been able to get the highlighted region to show for the dates to which I know they should apply.
Does this solves your problem?
rgb <- hcl(c(0, 0, 260), c = c(100, 0, 100), l = c(50, 90, 50), alpha = 0.3)
set.seed(1234)
x.Date <- as.Date("2015-02-01") + c(1,3,6,7,9,10,12,14,18,20) - 1
y <- zoo(rnorm(length(x.Date)), x.Date)
pval<-zoo(runif(length(x.Date),0,.2), x.Date)
plot(y,col=4)
xblocks(pval<=0.05,col = rgb[1])
I have three variables, support, party, and gender. Support has three levels, 1 for yes, 2 for no, and 3 for neural. Party and gender are dummy variables. Now let's fake them:
support = sample(1:3, size=100, replace=T)
party = as.numeric(rbinom(100, 100, 0.4) > 42)
gender = as.numeric(rbinom(100, 100, 0.4) > 39)
I want to see the percentage of support conditioned on party and gender. So far, I can do percentage conditioned on only one variable, say party.
counts = table(support,party)
percent = t(t(counts)/colSums(counts))
barplot(percent)
How can I split party by gender and place gender side-by-side while maintaining party support stacked in the current way? (If you don't understand what I am saying, read on ...)
To be clear, I want the first column bar 0 (party = 0) be split into two adjacent column bars, one for each gender. Then I want the second column bar 1 (party=1) be split into two adjacent column bars, one for each gender. For each specific column bar, I will want it to be stacked up like the way there are now.
I am not even sure this can be accomplished.
BY THE WAY, is there a way to control the width of the bars? They are way too wide for my taste.
How about something like this, we can call barplot twice to place two sets on the same surface. First, I named some of the data so I could keep track of it
#sample data
set.seed(15)
support = sample(1:3, size=100, replace=T)
party = factor(as.numeric(rbinom(100, 100, 0.4) > 42), levels=0:1, labels=c("D","R"))
gender = factor(as.numeric(rbinom(100, 100, 0.4) > 39), levels=0:1, labels=c("M","F"))
Now we summarize the data separately for each party
tt<-table(support, gender, party)
p1<-tt[,,1]
p1<-p1/sum(p1)
p2<-tt[,,2]
p2<-p2/sum(p2)
And now we combine the barplots
xx<-barplot(p1, width=.3, space=c(.25,.6), xaxt="n",
xlim=c(0,2.4), ylim=c(0, max(colSums(p1), colSums(p2))))
axis(1,xx, levels(gender), line=0, tick=F)
yy<-barplot(p2, width=.3, space=c(5, .6), xaxt="n", add=T)
axis(1,yy, levels(gender), line=0, tick=F)
axis(1, c(mean(xx), mean(yy)), levels(party), line=1, tick=F)
And this will produce
I'd do this using ggplot2, but not put the bars side-by-side, but use sub-plots (or facets in ggplot2 jargon):
df = data.frame(support, party, gender)
library(ggplot2)
ggplot(df, aes(x = factor(party), fill = factor(support))) +
geom_bar() + facet_wrap(~ gender)
I need to plot the outliers from a boxplot on to a map. My lecturer gave me the function to extract all outliers from this boxplot:
outliers = match(names(boxplot(pc3, plot = FALSE)$out), names(pc3))
(pc3 being the data)
I am then plotting them using:
points(Data.1$X[outliers], Data.1$Y[outliers], col = "red", cex = 3, lwd = 2)
However I want to extract the positive outliers into one variable and the negative outliers into a different variable in order to plot them in different colours. How do I do this?
Thank you.
Outliers are defined by boxplot as points farther than 1.5 times the inter-quartile range from the sides of the box (75th and 25th percentile). You can apply that definition directly:
iq.range <- quantile(pc3, probs=c(0.25, 0.75))
lower.bound <- iq.range[1] - 1.5*diff(iq.range)
upper.bound <- iq.range[2] + 1.5*diff(iq.range)
low.out <- pc3[pc3 < lower.bound]
high.out <- pc3[pc3 > upper.bound]
That's computing it from scratch. You can also split the vector that you get from boxplot using the median. Anything above is the higher part.
I have a couple of box and whisker plots in R. In both, the x-axis corresponds to one categorical variable whilst the grouping colours correspond to the other.
If I draw both plots with an untransformed y-axis, they are both fine. However, if I try to square-root transform the y-axis (using: coord_trans(y = "sqrt")), one of those graph is still fine whilst the other drops the lines corresponding to the median in most boxes (except those for which there are only two groups and where the boxes are therefore slightly wider, see "Numbers" 1 and 2 on the first plot). Further, for the graph that does not draw properly, if I reduce the number of categories on my x-axis (hence getting the boxes wider again), the median lines appear again.
Is this a bug with coord_trans (if so, how can I get around it) or a problem with my code?
Thank you very much for any suggestion.
library(car)
library(gplots)
library(plyr)
library(ggplot2)
library(gridExtra)
library(gdata)
Category=factor(c(rep(1, times =3240), rep(2, times =2160)),
labels=c("A","B"), levels=c(1,2))
ID=factor(rep(seq(from = 1, to = 45),each = 120))
Months=factor(rep(seq(from = 1, to = 3), each = 40, times = 45),
labels=c("Jan","Feb","Mar"),levels=c(1:3))
Obs=rnorm(5400, mean=25, sd=15)
Data=data.frame(Category,ID,Months,Obs)
Data=subset(Data, (Data$Category=="B") | !(Data$ID%in%c(1,2)) |
(Data$Months%in%c("Jan","Feb")))
for (j in 1:2)
{
sel=which(Data$Category==unique(levels(Data$Category))[j])
Observ=Data$Obs[sel]
Month=Data$Months[sel]
Number=droplevels(Data$ID[sel])
Number=droplevels(Number)
Data_used=data.frame(Number,Month,Observ)
plot1 = ggplot(Data_used, aes(Number, Observ)) +
geom_boxplot(aes(fill=Month, drop=FALSE), na.rm=TRUE) +
scale_y_continuous(breaks = c(0,20,40,60,80,100), limits=c(0,115)) +
coord_trans(y = "sqrt")
plot(plot1)
}
#Dennis is correct in his comment that scale_y_sqrt() will correct this. Because median and quartiles are order statistics it doesn't matter whether the data are transformed before or after calculating them.