add extra labels to x axis barplot [duplicate] - r

This question already has an answer here:
R - how to make barplot plot zeros for missing values over the data range?
(1 answer)
Closed 2 years ago.
Imagine that I have the following bar plot
counts <- table(mtcars$gear)
barplot(counts, main="Car Distribution",
xlab="Number of Gears")
What I would like to do is to add extra categories, for example 2 and 6 gears. This would be, of course, reflected as 0 in the plot.
Any idea?

You need to make it a factor and declare the levels:
counts <- table(factor(mtcars$gear,levels=2:6))
barplot(counts, main="Car Distribution",
xlab="Number of Gears")
To add an explanation, factors are something meant for categorical variables. There's two aspect achieved by setting the levels as above. One you can detail what levels to expect, including missing. This is useful when say you subset and table etc. Second, you order the categories or factors. You can see it is plotted from 2 to 6. You can try doing this:
counts <- table(factor(mtcars$gear,levels=6:2))
barplot(counts, main="Car Distribution",
xlab="Number of Gears")
The plot will reverse now. You can also see this R chapter on factors

Related

Revisiting R+ggplot+geom_bar+scale_x_continuous+limits: leftmost and rightmost bars not showing on plot

Please don't tag this as a duplicate of R+ggplot+geom_bar+scale_x_continuous+limits: leftmost and rightmost bars not showing on plot : some people commented that the example in there was too long/convoluted/weird, so here is a simpler example that reproduces the problem. If a moderator think it is a good idea I will delete the original (longer) question.
I am trying to create a function that does a stacked bar plot of some yearly measures. The function takes as parameters the data and the min and max year I want to plot. The problem is that for some combination of the years the bars get weird.
Here is the code, it defines the function, creates a simple simulated dataset and creates four plots with different parameters. The resulting images are below.
library(ggplot2)
library(plyr)
# Plot either all data or select by name.
doPlot <- function(data,minYear,maxYear) {
title = paste("Bob's Performance ",minYear,"-",maxYear)
# Aggregate quantity by year and category
byYear <- aggregate(Quantity ~ Year+Category, data, sum)
# Get coordinates for numbers in stacked bars
byYear = ddply(byYear, "Year", mutate, label_y = cumsum(Quantity))
g <- ggplot(byYear, aes(x=Year,y=Quantity))
g <- g + geom_bar(stat="identity",aes(fill=Category), colour="black") +
ggtitle(title) +
scale_fill_discrete("Category",labels=c("Sheep","Cactus","Chicken"),drop=FALSE,c=45, l=80)+
scale_x_continuous(name="Year", limits=c(minYear,maxYear), breaks=seq(minYear,maxYear,1)) +
geom_text(aes(label=Quantity,y=label_y), vjust=1.3,size=6)
print(g)
}
consts = paste('"Category","Year","Name","Quantity"\n',
'CACTUS,1997,Bob,45\n',
'CHICKEN,1997,Bob,6\n',
'SHEEP,1998,Bob,2\n',
'SHEEP,1999,Bob,4\n',
'SHEEP,2005,Bob,5\n',sep = "")
data <- read.csv(text=consts,header = TRUE)
data$Category <- factor(data$Category, levels = c("SHEEP", "CACTUS", "CHICKEN"))
# This works OK
doPlot(data,1996,2006)
# This don't: bars on left and rightside disappears
doPlot(data,1997,2005)
# This don't: left bar disappears but it seems it was not plotted.
doPlot(data,1998,2000)
# This is weird: why does the bar width uses over 5 years?
doPlot(data,1999,2011)
The first plot is OK since the data is all inside the years range:
In the second plot the years range is exactly the same as the range of years in the data. The leftmost and rightmost bars are not plotted, but the numbers are.
In the third plot the year range is very narrow -- again leftmost and/or rightmost bars are not plotted. There's a hint here that the bar width could not be fitted in the plot -- see the width for 1999!
The fourth plot the year range is wider, but again leftmost and/or rightmost bars are not plotted, and the one bar that is plotted covers several years.
I can make the plot sort of work by using always an extended range for years, but this is bugging me. I guess I didn't specify something that controls the bar widths, but what?
I noticed that there are similar problems with the leftmost and rightmost bars, e.g. In ggplot2 - how to ensure geom_errorbar displays bar limits for all points when controlling x-axis with xlim() , and the solutions are similar, but I believe there ought to be a better way.
I must point out that using
scale_x_continuous(name="Year", breaks=seq(minYear,maxYear,1)) +
coord_cartesian(xlim=c(minYear,maxYear)) +
instead of
scale_x_continuous(name="Year", limits=c(minYear,maxYear),breaks=seq(minYear,maxYear,1)) +
solves the "bar over several years" issue of the fourth plot, but causes parts of the leftmost/rightmost bars to be plotted:
thanks
Rafael

Scatterplot in ggplot stacked like barplot

I want to create a scatterplot in ggplot where there are multiple y values for each x value. I want to add these y values and plot the sum against the x value.
>df
a b
1 2
1 2
2 1
2 4
3 1
3 5
I want a plot that plots the sums of the b values for each a
a b
1 4
2 5
3 6
I can do this for a barplot by making a stacked barplot:
ggplot(data=df, aes(x=df$a, y=df$b)) + geom_bar(stat="identity")
but if I do this with geom_point ggplot just plots each value of y without stacking.
I could use ddply for this, but that would require a number of more steps. If there is a more expedient way I'd appreciate it.
I searched the site for other answers. While there were plenty about "stacked scatterplots" they were all about overlaid plots.
I don't see anything stacked about your bar chart example. If you just want to summarize the values to a single pont, you can use stat_summary
ggplot(data=df, aes(x=a, y=b)) + stat_summary(fun.y=sum, geom="point")
There are many ways to achieve this effect - of a 'histogram' but without bars, whose height is the sum of all values at the same X.
This type of graph is called a Cleveland Dot Plot, and is used because the conspicuous bars of a histogram can a distraction or at worse be misleading. (see works by Cleveland, Tufte etc).
One way to achieve this is to pre-process the data to do the sum, using functions such as table or hist or tapply or xtabs...
Note that base R has the function dotchart for the production of this type of graph.
dotchart(xtabs(rev(df)))
... but since we are discussing ggplot, which has powerful ways to summarise the data while plotting it, let's stick to MrFlick's theme of how to do it directly ggplot operators (i.e. not preprocessed).
Using a weighted bin summary statistic:
ggplot(data=df, aes(x=factor(a),weight=b)) + geom_point(stat="bin")
you may want to adjust the lower y limit to 0 here.
By stacking the height of the points:
ggplot(data=df, aes(x=factor(a),y=b)) + geom_point(position="stack")
the additional dots visible on this plot are probably superfluous and definitely ambiguous, but highlight the fact of multiplicity in the source data.
Building a dotplot
This one is popular in newspapers, but usually has dollar bills instead of giant black holes:
ggplot(data=df, aes(x=factor(a),weight=b)) + geom_dotplot(method="histodot")
It's probably not what you are looking for, but it's worth being aware of.
You should also be aware that scales are difficult to get correct in this mode, so it's best used in a hand-tuned mode, with the y scale numbering turned off.

Grouped Bar Plot Species and plots

Id like to make a grouped barplot that has two groups. One named Exotic Species and the second Native Species. then compare them to the Plot that they are found.Therefore 3 columns are involved with the graph. Y would be "Species Richness" and it would be the number of species either of native or exotic. X will be the "Plot name". How do i write out the coding for the bar graph i described above? If you google search European Parliament Elections R grouped barplot (orange and purply plot. thats what i want
If I understand your question correctly you want something like this:
Here is two solutions:
1.using barplot: let's say mtcars is the dataframe
# Grouped Bar Plot
counts <- table(mtcars$vs, mtcars$gear)
barplot(counts, main="Car Distribution by Gears and VS",
xlab="Number of Gears", col=c("darkblue","red"),
legend = rownames(counts), beside=TRUE)
Link
using lattice
library(lattice)
barchart(Species~Reason,data=Reasonstats,groups=Catergory,
scales=list(x=list(rot=90,cex=0.8)))
Link

R - how to make barplot plot zeros for missing values over the data range?

Lets say I have 10 observations of 200 points of integers between one and ten:
mysample = sample(rep(seq(1,10),20),10);
and I want to barplot it
barplot(table(mysample));
barplot
In this example, there are no observations of 7. Is there a quick way of telling barplot to set the x-axis range to all integers between 1 and 10, or do I have to manually edit the table?
Try
barplot(table(factor(mysample, levels=1:10)));
By using a factor, R will know which levels are "missing"

histogram: printing group labels [duplicate]

This question already has an answer here:
How to change panel labels and x-axis sublabels in a lattice bwplot
(1 answer)
Closed 8 years ago.
I am using the following histogram command to visualize the features of a labeled dataset that has binary labels (0 or 1).
require(lattice)
data <- data.frame(num_child=1:10,label=rep(0:1,each=5))
histogram( ~ data$num_child | data$label ,xlab="Number of children")
I get a pair of histogram plots, as expected, with x-axis labeled as "Number of children" and y-axis labeled as "Percent of Total". However, the labels on top of both the plots are "data$label" rather than the value of the group label. The histogram command takes a xlab, and ylab as parameter, but does not seem to have a parameter for the group label. How can I get the group label (i.e. "0" and "1") to be printed?
Looks like the easiest solution is to change your grouping to a factor:
histogram( ~ data$num_child | as.factor(data$label),xlab="Number of children")

Resources