I have a simple dataset, containing values from 0 to 1. When I plot it, naturally, the horizontal axis is zero. I would like this reference to be 0.5 and the bars falling below 0.5 to be reversed and colored differently than those falling above this threshold.
my.df <- data.frame(group=state.name[1:20],col1 = runif(20))
p <- ggplot(my.df, aes(x=group,y=col1)) +
geom_bar(stat="identity")+ylim(0,0.5)
I am thinking of dissecting the data into two, one subset being greater than 0.5 and the other being larger than 0.5, then somewhat combining these two subsets in the same ggplot. Is there any other clearer way to do that? Thanks!
To build on #jas_hughes's answer, you can subtract 0.5 from your col1 variable, then rename the labels on the y-axis.
df <- data.frame(group=state.name[1:20],value=runif(20))
df %>% ggplot(aes(reorder(group,value),value-0.5)) + geom_bar(stat='identity') +
scale_y_discrete(name='Value',
labels=c('0','0.5','1'),
limits=c(-0.5,0,0.5),
expand = c(-0.55, 0.55)) +
xlab('State') +
theme(axis.text.x = element_text(angle=45,hjust=1))
The y-variable you are trying to communicate is distance from 0.5, so you need to change the values in col1 to reflect this.
library(dplyr)
library(ggplot)
my.df %>%
mutate(col2 = col1-0.5) %>%
ggplot() +
aes(x = group, y = col2, fill = col2 >=0) +
geom_bar(stat = 'identity') +
theme(legend.position = 'none',
axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +
ylab('Col1 above 0.5 (AU)')
Note, you can also use the aes(fill = col1 >= 0.5) option to color code the bars without shifting the axis (which is what I would recommend if col1 contains percentages).
Related
I am having an issue when making bar charts with mixed sign values in ggplot2. Take the following example:
df <- data.frame(year = letters[1:2],
value = c(1, -1))
ggplot(df, aes(year, value)) +
geom_col() +
geom_text(aes(label = value), vjust = 0.0, size = 5)
Which yields:
I would like to be consistent with the placement of the text - either below or on top of the bars. This is tricky because in both cases in the graph above, the text is directly above the value. However, because the first value is positive and the second value is negative the text appears in a different location relative to the bar. What I would like to see is (adjustments in red):
My question is: Is it possible to conditionally format label placement based on the sign of the value?
df <- data.frame(year = letters[1:3],
value = c(1, -1,-5)) %>%
mutate(text_location = ifelse(value < 0,0,value))
ggplot(df, aes(year, value)) +
geom_col() +
geom_text(aes(y = text_location,label = value), vjust = 0.0, size = 5)
I have created a stacked barplot
ggplot(data %>% count(x, y),
aes(x, n, fill = factor(y))) +
geom_bar(stat="identity")+
theme_light()+
theme(plot.title = element_text(hjust=0.5))
there are (possible) outliers at 50,54 and 60. How can I add their ID into the graph?
If you post your data, I'll amend this answer using it. But basically you want
df %>%
count(x, y) %>%
ggplot(aes(x = x, y = n, fill = y)) +
geom_col() +
geom_text(aes(label = x), data = . %>% filter(x >= thresh), vjust = 0, nudge_y = 0.1)
where thresh is some threshold you've set--maybe an arbitrary cutoff point that makes sense, or maybe 3 standard deviations from the mean of x, or whatever. You can store it in an outside variable, you can make a boolean column in your dataframe, or you can calculate it inline inside your geom_text--really up to you. vjust = 0, nudge_y = 0.1 puts the labels just above the bars corresponding to your outliers.
Maybe geom_text(data=mydata%>%filter(just.the.outliers) ?
See also this: RE: Alignment of numbers on the individual bars with ggplot2
I am trying to do a histogram zoomed on part of the data. My problem is that I would like to grup everything that is outside the range into last category "10+". Is it possible to do it using ggplot2?
Sample code:
x <- data.frame(runif(10000, 0, 15))
ggplot(x, aes(runif.10000..0..15.)) +
geom_histogram(aes(y = (..count..)/sum(..count..)), colour = "grey50", binwidth = 1) +
scale_y_continuous(labels = percent) +
coord_cartesian(xlim=c(0, 10)) +
scale_x_continuous(breaks = 0:10)
Here is how the histogram looks now:
How the histogram looks now
And here is how I would like it to look:
How the histogram should look
Probably it is possibile to do it by nesting ifelses, but as I have in my problem more cases is there a way for ggplot to do it?
You could use forcats and dplyr to efficiently categorize the values, aggregate the last "levels" and then compute the percentages before the plot. Something like this should work:
library(forcats)
library(dplyr)
library(ggplot2)
x <- data.frame(x = runif(10000, 0, 15))
x2 <- x %>%
mutate(x_grp = cut(x, breaks = c(seq(0,15,1)))) %>%
mutate(x_grp = fct_collapse(x_grp, other = levels(x_grp)[10:15])) %>%
group_by(x_grp) %>%
dplyr::summarize(count = n())
ggplot(x2, aes(x = x_grp, y = count/10000)) +
geom_bar(stat = "identity", colour = "grey50") +
scale_y_continuous(labels = percent)
However, the resulting graph is very different from your example, but I think it's correct, since we are building a uniform distribution:
I am trying to make some changes to my plot, but am having difficulty doing so.
(1) I would like warm, avg, and cold to be filled in as the colors red, yellow, and blue, respectively.
(2) I am trying to make the y-axis read "Count" and have it be horizontally written.
(3) In the legend, I would like the title to be Temperatures, rather than variable
Any help making these changes would be much appreciated along with other suggestions to make the plot look nicer.
df <- read.table(textConnection(
'Statistic Warm Avg Cold
Homers(Away) 1.151 1.028 .841
Homers(Home) 1.202 1.058 .949'), header = TRUE)
library(ggplot2)
library(reshape2)
df <- melt(df, id = 'Statistic')
ggplot(
data = df,
aes(
y = value,
x = Statistic,
group = variable,
shape = variable,
fill = variable
)
) +
geom_bar(stat = "identity")
You are on the right lines by trying to reshape the data into long format. My preference is to use gather from the tidyr package for that. You can also create the variable names Temperatures and Count in the gather step.
The next step is to turn the 3 classes of temperature into a factor, ordered from cold, through average, to warm.
Now you can plot. You want position = "dodge" to get the bars side by side, since it makes no sense to stack the values in a single bar. Fill colours you specify using scale_fill_manual.
You rotate the y-axis title by manipulating axis.title.y.
So putting all of that together (plus a black/white theme):
library(dplyr)
library(tidyr)
library(ggplot2)
df %>%
gather(Temperatures, Count, -Statistic) %>%
mutate(Temperatures = factor(Temperatures, c("Cold", "Avg", "Warm"))) %>%
ggplot(aes(Statistic, Count)) +
geom_col(aes(fill = Temperatures), position = "dodge") +
scale_fill_manual(values = c("blue", "yellow", "red")) +
theme_bw() +
theme(axis.title.y = element_text(angle = 0, vjust = 0.5))
Result:
I'd question whether Count is a sensible variable name in this case.
You are almost there. To map specific colors to specific factor levels you can use scale_fill_manual and create your own scale:
scale_fill_manual(values=c("Warm"="red", "Avg"="yellow", "Cold"="blue")) +
Changing the y axis legend is also easy in ggplot:
ylab("Count") +
And to change the legend title you can use:
labs(fill='TEMPERATURE') +
Giving us:
ggplot(df, aes(y = value, x = Statistic, group= variable, fill = variable)) +
geom_bar(stat = "identity") +
scale_fill_manual(values=c("Warm"="red", "Avg"="yellow", "Cold"="blue")) +
labs(fill='TEMPERATURE') +
ylab("Count") +
xlab("") +
theme_bw() +
theme(axis.title.y = element_text(angle = 0, vjust = 0.5))
I am trying to plot nice stacked percent barchart using ggplot2. I've read some material and almost manage to plot, what I want. Also, I enclose the material, it might be useful in one place:
How do I label a stacked bar chart in ggplot2 without creating a summary data frame?
Create stacked barplot where each stack is scaled to sum to 100%
R stacked percentage bar plot with percentage of binary factor and labels (with ggplot)
My problem is that I can't place labels where I want - in the middle of the bars.
You can see the problem in the picture above - labels looks awfull and also overlap each other.
What I am looking for right now is:
How to place labels in the midde of the bars (areas)
How to plot not all the labels, but for example which are greather than 10%?
How to solve overlaping problem?
For the Q 1. #MikeWise suggested possible solution. However, I still can't deal with this problem.
Also, I enclose reproducible example, how I've plotted this grahp.
library('plyr')
library('ggplot2')
library('scales')
set.seed(1992)
n=68
Category <- sample(c("Black", "Red", "Blue", "Cyna", "Purple"), n, replace = TRUE, prob = NULL)
Brand <- sample("Brand", n, replace = TRUE, prob = NULL)
Brand <- paste0(Brand, sample(1:5, n, replace = TRUE, prob = NULL))
USD <- abs(rnorm(n))*100
df <- data.frame(Category, Brand, USD)
# Calculate the percentages
df = ddply(df, .(Brand), transform, percent = USD/sum(USD) * 100)
# Format the labels and calculate their positions
df = ddply(df, .(Brand), transform, pos = (cumsum(USD) - 0.5 * USD))
#create nice labes
df$label = paste0(sprintf("%.0f", df$percent), "%")
ggplot(df, aes(x=reorder(Brand,USD,
function(x)+sum(x)), y=percent, fill=Category))+
geom_bar(position = "fill", stat='identity', width = .7)+
geom_text(aes(label=label, ymax=100, ymin=0), vjust=0, hjust=0,color = "white", position=position_fill())+
coord_flip()+
scale_y_continuous(labels = percent_format())+
ylab("")+
xlab("")
Here's how to center the labels and avoid plotting labels for small percentages. An additional issue in your data is that you have multiple bar sections for each colour. Instead, it seems to me all the bar sections of a given colour should be combined. The code below uses dplyr instead of plyr to set up the data for plotting:
library(dplyr)
# Initial data frame
df <- data.frame(Category, Brand, USD)
# Calculate percentages
df.summary = df %>% group_by(Brand, Category) %>%
summarise(USD = sum(USD)) %>% # Within each Brand, sum all values in each Category
mutate(percent = USD/sum(USD))
With ggplot2 version 2, it is no longer necessary to calculate the coordinates of the text labels to get them centered. Instead, you can use position=position_stack(vjust=0.5). For example:
ggplot(df.summary, aes(x=reorder(Brand, USD, sum), y=percent, fill=Category)) +
geom_bar(stat="identity", width = .7, colour="black", lwd=0.1) +
geom_text(aes(label=ifelse(percent >= 0.07, paste0(sprintf("%.0f", percent*100),"%"),"")),
position=position_stack(vjust=0.5), colour="white") +
coord_flip() +
scale_y_continuous(labels = percent_format()) +
labs(y="", x="")
With older versions, we need to calculate the position. (Same as above, but with an extra line defining pos):
# Calculate percentages and label positions
df.summary = df %>% group_by(Brand, Category) %>%
summarise(USD = sum(USD)) %>% # Within each Brand, sum all values in each Category
mutate(percent = USD/sum(USD),
pos = cumsum(percent) - 0.5*percent)
Then plot the data using an ifelse statement to determine whether a label is plotted or not. In this case, I've avoided plotting a label for percentages less than 7%.
ggplot(df.summary, aes(x=reorder(Brand,USD,function(x)+sum(x)), y=percent, fill=Category)) +
geom_bar(stat='identity', width = .7, colour="black", lwd=0.1) +
geom_text(aes(label=ifelse(percent >= 0.07, paste0(sprintf("%.0f", percent*100),"%"),""),
y=pos), colour="white") +
coord_flip() +
scale_y_continuous(labels = percent_format()) +
labs(y="", x="")
I followed the example and found the way how to put nice labels for simple stacked barchart. I think it might be usefull too.
df <- data.frame(Category, Brand, USD)
# Calculate percentages and label positions
df.summary = df %>% group_by(Brand, Category) %>%
summarise(USD = sum(USD)) %>% # Within each Brand, sum all values in each Category
mutate( pos = cumsum(USD)-0.5*USD)
ggplot(df.summary, aes(x=reorder(Brand,USD,function(x)+sum(x)), y=USD, fill=Category)) +
geom_bar(stat='identity', width = .7, colour="black", lwd=0.1) +
geom_text(aes(label=ifelse(USD>100,round(USD,0),""),
y=pos), colour="white") +
coord_flip()+
labs(y="", x="")