ggplot shows all Y-axis values instead of calibrating - r

I plotted a ggplot for a couple of variables using side-by side bars, but it seems the y-axis is shown without any calibration-displays every number(in %)
The topic of the plot is to show the share of white,hispanic and black people (%) out of the whole population in each state in the US. as you can see, the Y axis that is supposed to represent the percent, looks like all the values had been pushed inside it instead of a calibration from 0 to 100
The dataset I am using is presented at github_fivethirtyeight_police-killings (I am sorry but I couldn't find a way to organize the five columns I am taking from the dataframe: state, ethnicity, and the three shares you are seeing on the right(in %)
the R code is presented:
x<-read.csv("C:/Users/USER/data/police-killings/police_killings.csv",header=TRUE, sep = "," ,stringsAsFactors = FALSE)
state<-x[,10]
ethnicity<-x[,4]
state_and_shares<-x[,c(10,23:25)]
df2<-melt(state_and_shares, id.vars = 'state')
head(df2)
ggplot(df2,aes(x=state,y=value,fill=variable))+geom_bar(stat = 'identity',position = 'dodge')+theme(axis.text.x = element_text(angle = 90, hjust = 1))
can someone please tell me how can I factor the Y values to look more normal and to not display all the values?

You need ggplot to see the y values as numbers and not strings, eg. with as.numeric():
ggplot(df2,aes(x=state,y=as.numeric(value),fill=variable))+geom_bar(stat = 'identity',position = 'dodge')+theme(axis.text.x = element_text(angle = 90, hjust = 1))

Related

X axis labels cut off in ggplot when rotating

When x axis labels are rotated in ggplot sometimes it happens that labels are cut off
I looked at those posts How can I manipulate a ggplot in R to allow extra room on lhs for angle=45 long x-axis labels? and ggplot2 plot area margins?. The suggestion in both cases is to use plot.margin parameter. But I'm wondering if there's more elegant and dynamic solution to the problem. In my application users will be allowed to change font size for axes labels, so setting a hardcoded value for plot margin seems not to be a good approach. Are there any other ways to avoid such effect? Is it possible to manipulate the layout somehow?
Code to reproduce:
categories <- c(
"Entertainment",
"Research",
"Development",
"Support",
"Classic",
"Old Time"
)
years <- 2020:2021
types <- c(
"Easy",
"Pro",
"Free",
"Trial",
"Subscription"
)
d <- expand.grid(category = categories,
type = types,
year = years)
counts <- sample(0:100, size = nrow(d))
d$n <- counts
ggplot(
data = d,
aes(x = category, y = n, fill = category)
) + geom_bar(stat = "identity") +
facet_grid(rows = vars(year), cols = vars(type)) +
theme(
axis.text.x = element_text(
angle = 22.5,
hjust = 1,
size = 12
)
)
I don't see any way to do this automatically natively using ggplot2 tools, so why not write a small function that sets the size of the margin based on the number of characters in the leftmost x category value?
margin_spacer <- function(x) {
# where x is the column in your dataset
left_length <- nchar(levels(factor(x)))[1]
if (left_length > 8) {
return((left_length - 8) * 4)
}
else
return(0)
}
The function can deal with a character column (or factor), and checks the number of characters in the first level (which would appear on the left in the plot). Fiddling around, it seemed anything longer than 8 posed an issue for the plot code, so this adds 4 points to the margin for every character past 8 characters.
Note also, that I changed the angle of the x axis text on your plot - I think 22.5 is a bit too shallow and you get a lot of overlapping with the size of your text on my graphics device. This means that 8 and 4 value may not work quite as well for you, but here's how it works for a few different data frames.
Here's the new plot code:
ggplot(data = d, aes(x = category, y = n, fill = category)) +
geom_bar(stat = "identity") +
facet_grid(rows = vars(year), cols = vars(type)) +
theme(
axis.text.x = element_text(angle = 40, hjust = 1, size = 12),
plot.margin = margin(l = 0 + margin_spacer(d$category))
)
I created the following plots by changing the code where d$categories is defined. I am showing you the output using the code above where the first entry in categories <- c(...) is changed accordingly for each one. You'll note it works pretty well unless it's crazy long. As the text gets really long, the text size may have to be adjusted as well. If you think your users are going to get crazy with labels, you can use a similar strategy to adjust text size... but that's probably overkill.
"Enter" (5 characters)
"Entertain" (9 characters)
"Entertainer" (11 characters)
"Entertainment" (13 characters)
"Quality Entertainment" (21 characters)

geom_step starting and ending with a horizontal segment

Sometimes I'd like to present data that refer to periods (not to points in time) as a step function. When e.g. data are per-period averages, this seems more appropriate than using a line connecting points (with geom_line). Consider, as a MWE, the following:
df = data.frame(x=1:8,y=rnorm(8,5,2))
ggplot(df,aes(x=x,y=y))+geom_step(size=1)+scale_x_continuous(breaks=seq(0,8,2))
This gives
However, the result is not fully satisfactory, as (1) I'd like the final observation to be represented by an horizontal segment and (2) I'd like to have labels on the x-axis aligned at the center of the horizontal line. What I want can be obtained with some hacking:
df %>% rbind(tail(df,1) %>% mutate(x=x+1)) %>%
ggplot(aes(x,y))+geom_step(size=1)+
scale_x_continuous(breaks=seq(0,12,2))+
theme(axis.ticks.x=element_blank(),axis.text.x=element_text(hjust=-2))
which produces:
This corresponds to what I am looking for (except that the horizontal alignment of labels requires some fine tuning and is not perfect). However, I am not sure this is the best way to proceed and I wonder if there is a better way.
Does this work for you? It comes down to altering the data as it is passed rather than changing the plotting code per se (as is often the case in ggplot)
Essentially what we do is add an extra copy of the final y value on to the end of the data frame at an incremented x value.
To make the horizontal segments line up to the major axis breaks, we simply subtract 0.5 from the x value.
ggplot(rbind(df, data.frame(x = 9, y = tail(df$y, 1))),
aes(x = x - 0.5, y = y)) +
geom_step(size = 1)+
scale_x_continuous(breaks = seq(0, 8, 2), name = "x",
minor_breaks = seq(0, 8, 1) + 0.5) +
theme_bw() +
theme(panel.grid.major.x = element_blank(),
panel.grid.minor = element_line())

Ticks unevenly spread-out

I am plotting an adjacency matrix representing neurons connections but the spacing between ticks on both axes are not consistent. Some ticks are so close to each other's that their label overlap and other ticks are very spread apart creating large gaps.
About the data:
The data is in a dataframe where the first column contains Source and the second column contains Target. Every entry in this dataframe is a character.
Here is how the data look like:
Source Target
RID ALA
ADLL ADLR
AFDL AFDR
AFDL AIBL
AVAL AS01
AVAL AS06
targetOrder is a factor made of the column Source
targetOrder
RID
ADLL
AFDL
AFDL
AVAL
AVAL
More details on what is going on:
The x-axis and the y-axis are discrete, similar (containing the same values) and in the same order. Both axes contain 80 neurons. The neuron on the y-axis shoots at a neuron on the x-axis creating a dot where they intersect.
Here is the code I am using:
ggplot(data = dataFrame)+
geom_point(mapping = aes (x = Target, y = Source), color = "#0000FF")+
labs(title="Adjacency Matrix")+
scale_x_discrete(limits = targetOrder)+ #Target oder used on both axis
scale_y_discrete(limits = targetOrder)+
theme(axis.text.x = element_text(angle = 90, hjust = 0, vjust = 0))
theme(axis.ticks.length = unit(.5, "cm"))
I have spent the last few days looking for answers to this question but could not find any. In the process, I learned everything else about ticks, like how to customize their sizes...
If you need any other informations just ask and I will do my best.

Secondary / Dual axis - ggplot

I am opening this question for three reasons : First, to re-open the dual-axis discussion with ggplot. Second, to ask if there is a non-torturing generic approach to do that. And finally to ask for your help with respect to a work-around.
I realize that there are multiple discussions and questions on how to add a secondary axis to a ggplot. Those usually end up in one of two conclusions:
It's bad, don't do it: Hadley Wickham answered the same question here, concluding that it is not possible. He had a very good argument that "using separate y scales (not y-scales that are transformations of each other) are fundamentally flawed".
If you insist, over-complicate your life and use grids : for example here and here
However, here are some situations that I often face, in which the visualization would greatly benefit from dual-axis. I abstracted the concepts below.
The plot is wide, hence duplicating the y-axis on the right side would help (or x-axis on the top) would ease interpretation. (We've all stumbled across one of those plots where we need to use a ruler on the screen, because the axis is too far)
I need to add a new axis that is a transformation to the original axes (eg: percentages, quantiles, .. ). (I am currently facing a problem with that. Reproducible example below)
And finally, adding Grouping/Meta information: I stumble across that when using categorical data with multiple-level, (e.g.: Categories = {1,2,x,y,z}, which are "meta-divided" into letters and numerics.) Even though color-coding the meta-levels and adding a legend or even facetting solve the issue, things get a little bit simpler with a secondary axis, where the user won't need to match the color of the bars to that of the legend.
General question: Given the new extensibility features ggplot 2.0.0, is there a more-robust no-torture way to have dual-axis without using grids?
And one final comment: I absolutely agree that the wrong use of dual-axis can be dangerously misleading... But, isn't that the case for information visualization and data science in general?
Work-around question:
Currently, I need to have a percentage-axis (2nd case). I used annotate and geom_hline as a workaround. However, I can't move the text outside the main plot. hjust also didn't seem to work with me.
Reproducible example:
library(ggplot2)
# Random values generation - with some manipulation :
maxVal = 500
value = sample(1:maxVal, size = 100, replace = T)
value[value < 400] = value[value < 400] * 0.2
value[value > 400] = value[value > 400] * 0.9
# Data Frame prepartion :
labels = paste0(sample(letters[1:3], replace = T, size = length(value)), as.character(1:length(value)))
df = data.frame(sample = factor(labels, levels = labels), value = sort(value, decreasing = T))
# Plotting : Adding Percentages/Quantiles as lines
ggplot(data = df, aes(x = sample, y = value)) +
geom_bar(stat = "identity", fill = "grey90", aes(y = maxVal )) +
geom_bar(stat = "identity", fill = "#00bbd4") +
geom_hline(yintercept = c(0, maxVal)) + # Min and max values
geom_hline(yintercept = c(maxVal*0.25, maxVal*0.5, maxVal*0.75), alpha = 0.2) + # Marking the 25%, 50% and 75% values
annotate(geom = "text", x = rep(100,3), y = c(maxVal*0.25, maxVal*0.5, maxVal*0.75),
label = c("25%", "50%", "75%"), vjust = 0, hjust = 0.2) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
theme(panel.background = element_blank()) +
theme(plot.background = element_blank()) +
theme(plot.margin = unit(rep(2,4), units = "lines"))
In response to #1
We've all stumbled across one of those plots where we need to use a ruler on the screen, because the axis is too far
cowplot.
# Assign your original plot to some variable, `gpv` <- ggplot( ... )
ggdraw(switch_axis_position(gpv, axis="y", keep="y"))

Remove left side of graph (from 0 to 13) with scale_x_discrete()

The data can be download here: https://docs.google.com/spreadsheets/d/1McbcquHdsdlEM_yPfBQHeX_CpUcARAm1I3VtASNsY3k/edit?usp=sharing
Here is my code
# load data
raw_data <- read.csv("Sleep vs reaction time (Responses) - Form Responses 1.csv")
library(ggplot2)
#histogram
qplot(x = Age, data = raw_data, xlim = c(13,43), geom = "histogram") + scale_x_continuous()
qplot(x = Age, data = raw_data, xlim = c(13,43), geom = "histogram") + scale_x_discrete()
I would like to draw a histogram by Age.
It is discrete value (age is whole number) so I use scale_x_discrete to separate between bar. However, it look like that
which have the space on left side.
If I use scale_x_continuous(), the left space will gone, but the separate between bar also gone too.
I would like to get rid of the space on left side, from 0 to 13, but keep the separate between bar. Please show me how.
Thank you.
My solution:
Thanked to #Gregor, this is my solution:
raw_data$Age = factor(raw_data$Age) #convert Age column to factor
qplot(x = Age, data = raw_data, geom = "histogram") + scale_x_discrete()
Result:
You should let the class of your data determine whether the scale is discrete or continuous. ggplot doesn't have built-in support for an integer scale as something different from a numeric scale, so if you want a discrete scale you should convert your age data to factor (if it's not already):
raw_data$Age_factor = factor(raw_data$Age)
Then the defaults will give you what you want if you don't specify xlim.
qplot(x = Age_factor, data = raw_data, geom = "histogram")
This is a bit confusing, but it was actually your xlim = c(13, 43) that was shifting your graph to the right. On a discrete scale, 13 and 43 refer to the 13th and 43rd discrete levels, so by setting those xlim you were forcing your data to the right.

Resources