How to put labels between columns in a bar plot in R? - r

I'm a beginner with R and looking for help with plotting.
I would like to make a distribution plot in R that looks like a histogram of continuous data bucketed into columns with x-axis labels between each column to denote the range captured in each column.
Instead of continuous data though, I only have the bucketed counts. I can create a plot with barplot, however I can't find a way to label BETWEEN the columns to denote the range captured in each bar.
I've tried barplot but cannot get the labels to fall between columns instead of being treated as column labels and falling directly beneath each column.
dat$freq = c(5,15,20,10)
dat$mid = c(-1.5,-.5,.5,1.5) #midpoint in each bucketed range
dat$perc = dat$freq/sum(dat$freq)
barplot(dat$perc, names.arg = dat$mid)
Each column is labeled with the midpoint. I would instead like the labels to be -2,-1,0,1,2 BETWEEN the columns.
Thank you!
edit: dput(dat) outputs:
list(freq = c(5, 15, 20, 10), mid = c(-1.5, -0.5, 0.5, 1.5), perc =
c(0.1, 0.3, 0.4, 0.2))

Is this what you're after?
df <- data.frame(freq = c(5, 15, 20, 10), mid = c(-1.5, -0.5, 0.5, 1.5), perc = c(0.1, 0.3, 0.4, 0.2))
I'm using the awesome and highly customisable library ggplot2 to plot this, which renders the plot as I think you want it. You can install this with install.packages('ggplot2'):
# install.packages('ggplot2')
library(ggplot2)
p <- ggplot(df)
p <- p + geom_bar(aes(mid, perc), stat='identity')
p

Related

Adding observations as proportions on a horizontal barplot in R using text() function

I cannot figure out how to get the percentage of responses at the end of the bars. I know I'm missing something within the text() function, just not sure what exactly I'm missing. Thank you!
#Training/Specialty Barplot
trainbarplot <- barplot(table(PSR$training), horiz = TRUE,
main="Respondent Distribution of Training", cex.main = 1.1, font.main = 2,
cex.lab = 0.8, cex.names = 0.4, font.axis = 4, las = 2,
xlab="Response Frequency", xlim=c(0, 40), cex.axis = 0.8,
border="black",
col=rgb (0.1, 0.1, 0.4, 0.5, 0.6),
density=c(50,40,30) , angle=c(9,11,36)
)
text(trainbarplot, table(PSR$training) - 3,
labels=paste(round(proportions(table(PSR$training))*100, 0), "%"))
Generate data
I generated some sample data to replicate your problem. Please note that you should always try to provide an example dataset :)
set.seed(123)
df1 <- data.frame(x = rnorm(10, mean=10, sd=2), y = LETTERS[1:20])
Plot the data
Here's a plot that follows the same structure as your code:
bp <- barplot(df1$x, names.arg = df1$y, col = df1$colour, horiz = T)
text(x= df1$x+0.5, y= bp, labels=paste0(round(df1$x),"%"), xpd=TRUE)
Using ggplot2
You can also plot your data using ggplot2. For instance, you could first create a new column in your dataset with information on the labels...
df1$perc <- paste0(round(df1$x),"%")
Next, you can plot your data using ggplot and adding different relevant layers.
library(ggplot2)
ggplot(df1, aes(x = x, y = y)) +
geom_col() +
geom_text(aes(label = perc)) +
theme_minimal()
Good luck!

Revise the number of ticks in the x-axis?

I only have a series of number, and I want to count the number of each element. Here is something I have done. X-axis is my element and Y-axis is the number of each element.
My question is, how could I revise the way of presentation in the x-axis? I only want to see 0.4, 0.5, 0.6, 0.7, 0.8 and 0.9 in the axis, but still to keep the same number of bars in the figure (nothing changed). Any suggestion please?
d1 <- ggplot(TestData, aes(factor(TestData$Col1)))
d2 <- d1 + geom_bar() + xlab("") + ylab("")
Create data with mean of 0.5, std of 0.2:
data<- rnorm(1000,0.5,0.2)
dataf <- data.frame(data)
Make histogram for all data range:
ggplot(aes(x = data),data = dataf) +
geom_histogram()
Xlim to 0.4 to 0.9:
ggplot(aes(x = data),data = dataf) +
geom_histogram() +
scale_x_continuous(limits = c(0.4,0.9),
breaks= scales::pretty_breaks(n=5))
In base graphics, you can just omit the axes when generating the plot, then add them manually using the axis function:
set.seed(1234)
dat <- rnorm(1000, 0.5, 0.1)
hist(dat, axes = FALSE, xlim = c(0, 1))
axis(side = 2)
axis(side = 1, at = seq(0.4, 0.9, 0.1))

Draw vertical quantile lines over histogram

I currently generate the following plot using ggplot in R:
The data is stored in a single dataframe with three columns: PDF (y-axis in the plot above), mids(x) and dataset name. This is created from histograms.
What I want to do is to plot a color-coded vertical line for each dataset representing the 95th quantile, like I manually painted below as an example:
I tried to use + geom_line(stat="vline", xintercept="mean") but of course I'm looking for the quantiles, not for the mean, and AFAIK ggplot does not allow that. Colors are fine.
I also tried + stat_quantile(quantiles = 0.95) but I'm not sure what it does exactly. Documentation is very scarce. Colors, again, are fine.
Please note that density values are very low, down to 1e-8. I don't know if the quantile() function likes that.
I understand that calculating the quantile of an histogram is not quite the same as calculating that of a list of numbers. I don't know how it would help, but the HistogramToolspackage contains an ApproxQuantile() function for histogram quantiles.
Minimum working example is included below. As you can see I obtain a data frame from each histogram, then bind the dataframes together and plot that.
library(ggplot2)
v <- c(1:30, 2:50, 1:20, 1:5, 1:100, 1, 2, 1, 1:5, 0, 0, 0, 5, 1, 3, 7, 24, 77)
h <- hist(v, breaks=c(0:100))
df1 <- data.frame(h$mids,h$density,rep("dataset1", 100))
colnames(df1) <- c('Bin','Pdf','Dataset')
df2 <- data.frame(h$mids*2,h$density*2,rep("dataset2", 100))
colnames(df2) <- c('Bin','Pdf','Dataset')
df_tot <- rbind(df1, df2)
ggplot(data=df_tot[which(df_tot$Pdf>0),], aes(x=Bin, y=Pdf, group=Dataset, colour=Dataset)) +
geom_point(aes(color=Dataset), alpha = 0.7, size=1.5)
Precomputing these values and plotting them separately seems like the simplest option. Doing so with dplyr requires minimal effort:
library(dplyr)
q.95 <- df_tot %>%
group_by(Dataset) %>%
summarise(Bin_q.95 = quantile(Bin, 0.95))
ggplot(data=df_tot[which(df_tot$Pdf>0),],
aes(x=Bin, y=Pdf, group=Dataset, colour=Dataset)) +
geom_point(aes(color=Dataset), alpha = 0.7, size=1.5) +
geom_vline(data = q.95, aes(xintercept = Bin_q.95, colour = Dataset))

R placing and scaling subplots / images

I've prepared a plot and two zoom areas, but am having problems inserting the zooms in the space underneath.
This is the main plot with some white space before the legend to insert the zoom plots:
I first thought of using subplot from the Hmisc package, but couldn't work out how to scale the inserts down to 30%.
Another option might be to just import the png images of all plots and then use the grid package to scale and place them, but I haven't tried this yet.
Any ideas?
Since you already have three plots - I've prepared a plot and two zoom areas
I was trying to quickly wrap up three pictures contains zoom relation.. but the whole idea is to show you how to use viewport to arrange several plots.
library(ggplot2)
library(grid)
data_x <- 5:10
data_y <- 6:11
a <- qplot(data_x, data_y, xlim=c(0, 15), ylim=c(0, 15), size=data_x)
b <- qplot(data_x, data_y, xlim=c(5, 10), ylim=c(5, 10), size=data_x) + theme(legend.position="none")
c <- qplot(data_y, data_y, xlim=c(7.5, 9.5), ylim=c(7.5, 10.5), size=data_x) + theme(legend.position="none")
vpb <- viewport(width = 0.3,
height = 0.3,
x = 0.3,
y = 0.8)
vpc <- viewport(width = 0.3,
height = 0.3,
x = 0.6,
y = 0.3)
# print and overlap
print(a)
print(b, vp = vpb)
print(c, vp = vpc)

R: Bar plot on a continuous x-axis (time-scaled)

I'm fairly new to R so please comment on anything you see.
I have data taken at different timepoints, under two conditions (for one timpoint) and I want to plot this as a bar plot with errorbars and with the bars at the appropriate timepoint.
I currently have this (stolen from another question on this site):
library(ggplot2)
example <- data.frame(tp = factor(c(0, "14a", "14b", 24, 48, 72)), means = c(1, 2.1, 1.9, 1.8, 1.7, 1.2), std = c(0.3, 0.4, 0.2, 0.6, 0.2, 0.3))
ggplot(example, aes(x = tp, y = means)) +
geom_bar(position = position_dodge()) +
geom_errorbar(aes(ymin=means-std, ymax=means+std))
Now my timepoints are a factor, but the fact that there is an unequal distribution of measurements across time makes the plot less nice.!
This is how I imagine the graph :
I find the ggplot2 package can give you very nice graphs, but I have a lot more difficulty understanding it than I have with other R stuff.
Before we get into R, you have to realize that even in a bar plot the x axis needs a numeric value. If you treat them as factors then the software assumes equal spacing between the bars by default. What would be the x-values for each of the bars in this case? It can be (0, 14, 14, 24, 48, 72) but then it will plot two bars at point 14 which you don't seem to want. So you have to come up with the x-values.
Joran provides an elegant solution by modifying the width of the bars at position 14. Modifying the code given by joran to make the bars fall at the right position in the x-axis, the final solution is:
library(ggplot2)
example <- data.frame(tp = factor(c(0, "14a", "14b", 24, 48, 72)), means = c(1, 2.1, 1.9, 1.8, 1.7, 1.2), std = c(0.3, 0.4, 0.2, 0.6, 0.2, 0.3))
example$tp1 <- gsub("a|b","",example$tp)
example$grp <- c('a','a','b','a','a','a')
example$tp2 <- as.numeric(example$tp1)
ggplot(example, aes(x = tp2, y = means,fill = grp)) +
geom_bar(position = "dodge",stat = "identity") +
geom_errorbar(aes(ymin=means-std, ymax=means+std),position = "dodge")

Resources