Ticks unevenly spread-out - r

I am plotting an adjacency matrix representing neurons connections but the spacing between ticks on both axes are not consistent. Some ticks are so close to each other's that their label overlap and other ticks are very spread apart creating large gaps.
About the data:
The data is in a dataframe where the first column contains Source and the second column contains Target. Every entry in this dataframe is a character.
Here is how the data look like:
Source Target
RID ALA
ADLL ADLR
AFDL AFDR
AFDL AIBL
AVAL AS01
AVAL AS06
targetOrder is a factor made of the column Source
targetOrder
RID
ADLL
AFDL
AFDL
AVAL
AVAL
More details on what is going on:
The x-axis and the y-axis are discrete, similar (containing the same values) and in the same order. Both axes contain 80 neurons. The neuron on the y-axis shoots at a neuron on the x-axis creating a dot where they intersect.
Here is the code I am using:
ggplot(data = dataFrame)+
geom_point(mapping = aes (x = Target, y = Source), color = "#0000FF")+
labs(title="Adjacency Matrix")+
scale_x_discrete(limits = targetOrder)+ #Target oder used on both axis
scale_y_discrete(limits = targetOrder)+
theme(axis.text.x = element_text(angle = 90, hjust = 0, vjust = 0))
theme(axis.ticks.length = unit(.5, "cm"))
I have spent the last few days looking for answers to this question but could not find any. In the process, I learned everything else about ticks, like how to customize their sizes...
If you need any other informations just ask and I will do my best.

Related

How to add labels and points to each geom_line in ggplot2?

I have a dataframe called (casos_obitos) that looks something like this:
EPI_WEEK CASES DEATHS
SE 51 1053 19
SE 52 1384 21
SE 53 1892 25
SE 01/21 1806 43
I'm making a plot with ggplot that places both cases and deaths in two different geom_lines. This is my code:
scl = 10
ggplot(data = casos_obitos, aes(x = EPI_WEEK, y = CASES, fill = CASES, group =1))+
scale_y_continuous(limits = c(0, max(casos_obitos$CASES)+10), expand = expansion(mult = c(0, .1)),
sec.axis = sec_axis(~./scl, name = "Nº de Óbitos"))+
geom_line(aes(x = SEM_EPI, y = CASES, color = "CASES"), size = 1)+
geom_line(aes(x = SEM_EPI, y = DEATHS*scl, color = "DEATHS"), size = 1) +
geom_text(aes(label= CASES), hjust= 0.5, vjust = -2, size= 2.0, color= "black") +
labs(x = "Semana Epidemiológica", y = "Nº de Casos") +
scale_colour_manual(" ", values=c("CASES" = "blue", "DEATHS" = "red"))+
theme_minimal(base_size = 10) +
theme(legend.position = "bottom", axis.line = element_line(colour = "black"),
axis.text.x=element_text(angle = 90, vjust = 0.5, hjust=1, color="black"),
axis.text.y=element_text(color="black"))
For now, my plot looks like this:
Where the blue line is the cases column and the red one is the deaths column. I need to put labels on the red line but I can't seem to find answers for that. I also wany to put labels in a "nice looking" way so I can understand the numbers and they don't look messy like they're right now.
Thanks!
You should be able to add the following to get labels on the bottom line:
geom_text(aes(y = DEATHS*scl, label= DEATHS), hjust= 0.5, vjust = -2, size= 2.0, color= "black") +
You might also consider reshaping your data into a long format so that the CASES and DEATHS (after scaling) values are combined into the same column, with another column distinguishing which series is related to each value. ggplot2 generally works more smoothly with data in that form -- you would map the color aesthetic to the column specifying which series, and then you'd only need one geom_line and one geom_text to get both series. In this case, with only two series, and one of them scaled, it might not be worth the trouble to switch.
"Nice looking labels" is subjective and a harder problem than it might sound. There are a few options, including:
use a function like ggrepel::geom_text_repel to automatically shift labels from overlapping each other. It works by starting from an initial point and iteratively nudging until the labels have as much separation as you've specified. Many options for adjusting the initial starting position and how the nudging should work.
manually nudge the labels you need to using code, e.g. by adjusting vjust for certain points. You might, for instance, use vjust to make the labels under the line for the points that are lower than neighboring points, by pre-calculating a moving average and comparing values to that.
manually nudge the points afterward, e.g. by using officer/svg to output to a vector file you can edit in powerpoint, for instance.
avoid persistent labels altogether by shifting to an interactive option like ggplotly and see the labels upon hover instead of all the time.
You might also take a look at functions like scales::comma to control how the labels themselves appear. I'm anticipating that your Deaths labels will have many digits of decimals but you probably just will want the integer part of that...

geom_step starting and ending with a horizontal segment

Sometimes I'd like to present data that refer to periods (not to points in time) as a step function. When e.g. data are per-period averages, this seems more appropriate than using a line connecting points (with geom_line). Consider, as a MWE, the following:
df = data.frame(x=1:8,y=rnorm(8,5,2))
ggplot(df,aes(x=x,y=y))+geom_step(size=1)+scale_x_continuous(breaks=seq(0,8,2))
This gives
However, the result is not fully satisfactory, as (1) I'd like the final observation to be represented by an horizontal segment and (2) I'd like to have labels on the x-axis aligned at the center of the horizontal line. What I want can be obtained with some hacking:
df %>% rbind(tail(df,1) %>% mutate(x=x+1)) %>%
ggplot(aes(x,y))+geom_step(size=1)+
scale_x_continuous(breaks=seq(0,12,2))+
theme(axis.ticks.x=element_blank(),axis.text.x=element_text(hjust=-2))
which produces:
This corresponds to what I am looking for (except that the horizontal alignment of labels requires some fine tuning and is not perfect). However, I am not sure this is the best way to proceed and I wonder if there is a better way.
Does this work for you? It comes down to altering the data as it is passed rather than changing the plotting code per se (as is often the case in ggplot)
Essentially what we do is add an extra copy of the final y value on to the end of the data frame at an incremented x value.
To make the horizontal segments line up to the major axis breaks, we simply subtract 0.5 from the x value.
ggplot(rbind(df, data.frame(x = 9, y = tail(df$y, 1))),
aes(x = x - 0.5, y = y)) +
geom_step(size = 1)+
scale_x_continuous(breaks = seq(0, 8, 2), name = "x",
minor_breaks = seq(0, 8, 1) + 0.5) +
theme_bw() +
theme(panel.grid.major.x = element_blank(),
panel.grid.minor = element_line())

ggplot shows all Y-axis values instead of calibrating

I plotted a ggplot for a couple of variables using side-by side bars, but it seems the y-axis is shown without any calibration-displays every number(in %)
The topic of the plot is to show the share of white,hispanic and black people (%) out of the whole population in each state in the US. as you can see, the Y axis that is supposed to represent the percent, looks like all the values had been pushed inside it instead of a calibration from 0 to 100
The dataset I am using is presented at github_fivethirtyeight_police-killings (I am sorry but I couldn't find a way to organize the five columns I am taking from the dataframe: state, ethnicity, and the three shares you are seeing on the right(in %)
the R code is presented:
x<-read.csv("C:/Users/USER/data/police-killings/police_killings.csv",header=TRUE, sep = "," ,stringsAsFactors = FALSE)
state<-x[,10]
ethnicity<-x[,4]
state_and_shares<-x[,c(10,23:25)]
df2<-melt(state_and_shares, id.vars = 'state')
head(df2)
ggplot(df2,aes(x=state,y=value,fill=variable))+geom_bar(stat = 'identity',position = 'dodge')+theme(axis.text.x = element_text(angle = 90, hjust = 1))
can someone please tell me how can I factor the Y values to look more normal and to not display all the values?
You need ggplot to see the y values as numbers and not strings, eg. with as.numeric():
ggplot(df2,aes(x=state,y=as.numeric(value),fill=variable))+geom_bar(stat = 'identity',position = 'dodge')+theme(axis.text.x = element_text(angle = 90, hjust = 1))

ggrepel: Repelling text in only one direction, and returning values of repelled text

I have a dataset, where each data point has an x-value that is constrained (represents an actual instance of a quantitative variable), y-value that is arbitrary (exists simply to provide a dimension to spread out text), and a label. My datasets can be very large, and there is often text overlap, even when I try to spread the data across the y-axis as much as possible.
Hence, I am trying to use the new ggrepel. However, I am trying to keep the text labels constrained at their x-value position, while only allowing them to repel from each other in the y-direction.
As an example, the below code produces an plot for 32 data points, where the x-values show the number of cylinders in a car, and the y-values are determined randomly (have no meaning but to provide a second dimension for text plotting purposes). Without using ggrepel, there is significant overlap in the text:
library(ggrepel)
library(ggplot2)
set.seed(1)
data = data.frame(x=runif(100, 1, 10),y=runif(100, 1, 10),label=paste0("label",seq(1:100)))
origPlot <- ggplot(data) +
geom_point(aes(x, y), color = 'red') +
geom_text(aes(x, y, label = label)) +
theme_classic(base_size = 16)
I can remedy the text overlap using ggrepel, as shown below. However, this changes not only the y-values, but also the x-values. I am trying to avoid changing the x-values, as they represent an actual physical meaning (the number of cylinders):
repelPlot <- ggplot(data) +
geom_point(aes(x, y), color = 'red') +
geom_text_repel(aes(x, y, label = label)) +
theme_classic(base_size = 16)
As a note, the reason I cannot allow the x-value of the text to change is because I am only plotting the text (not the points). Whereas, it seems that most examples in ggrepel keep the position of the points (so that their values remain true), and only repel the x and y values of the labels. Then, the points and connected to the labels with segments (you can see that in my second plot example).
I kept the points in the two examples above for demonstration purposes. However, I am only retaining the text (and hence will be removing the points and the segments), leaving me with something like this:
repelPlot2 <- ggplot(data) + geom_text_repel(aes(x, y, label = label), segment.size = 0) + theme_classic(base_size = 16)
My question is two fold:
1) Is it possible for me to repel the text labels only in the y-direction?
2) Is it possible for me to obtain a structure containing the new (repelled) y-values of the text?
Thank you for any advice!
ggrepel version 0.6.8 (Install from GitHub using devtools::github_install) now supports a "direction" argument, which enables repelling of labels only in "x" or "y" direction.
repelPlot2 <- ggplot(data) + geom_text_repel(aes(x, y, label = label), segment.size = 0, direction = "y") + theme_classic(base_size = 16)
Getting the y values is harder -- one approach can be to use the "repel_boxes" function from ggrepel first to get repelled values and then input those into ggplot with geom_text. For discussion and sample code of that approach, see https://github.com/slowkow/ggrepel/issues/24. Note that if using the latest version, the repel_boxes function now also has a "direction" argument, which takes in "both","x", or "y".
I don't think it is possible to repel text labels only in one direction with ggrepel.
I would approach this problem differently, by instead generating the arbitrary y-axis positions manually. For example, for the data set in your example, you could do this using the code below.
I have used the dplyr package to group the data set by the values of x, and then created a new column of data y containing the row numbers within each group. The row numbers are then used as the values for the y-axis.
library(ggplot2)
library(dplyr)
data <- data.frame(x = mtcars$cyl, label = paste0("label", seq(1:32)))
data <- data %>%
group_by(x) %>%
mutate(y = row_number())
ggplot(data, aes(x = x, y = y, label = label)) +
geom_text(size = 2) +
xlim(3.5, 8.5) +
theme_classic(base_size = 8)
ggsave("filename.png", width = 4, height = 2)

Secondary / Dual axis - ggplot

I am opening this question for three reasons : First, to re-open the dual-axis discussion with ggplot. Second, to ask if there is a non-torturing generic approach to do that. And finally to ask for your help with respect to a work-around.
I realize that there are multiple discussions and questions on how to add a secondary axis to a ggplot. Those usually end up in one of two conclusions:
It's bad, don't do it: Hadley Wickham answered the same question here, concluding that it is not possible. He had a very good argument that "using separate y scales (not y-scales that are transformations of each other) are fundamentally flawed".
If you insist, over-complicate your life and use grids : for example here and here
However, here are some situations that I often face, in which the visualization would greatly benefit from dual-axis. I abstracted the concepts below.
The plot is wide, hence duplicating the y-axis on the right side would help (or x-axis on the top) would ease interpretation. (We've all stumbled across one of those plots where we need to use a ruler on the screen, because the axis is too far)
I need to add a new axis that is a transformation to the original axes (eg: percentages, quantiles, .. ). (I am currently facing a problem with that. Reproducible example below)
And finally, adding Grouping/Meta information: I stumble across that when using categorical data with multiple-level, (e.g.: Categories = {1,2,x,y,z}, which are "meta-divided" into letters and numerics.) Even though color-coding the meta-levels and adding a legend or even facetting solve the issue, things get a little bit simpler with a secondary axis, where the user won't need to match the color of the bars to that of the legend.
General question: Given the new extensibility features ggplot 2.0.0, is there a more-robust no-torture way to have dual-axis without using grids?
And one final comment: I absolutely agree that the wrong use of dual-axis can be dangerously misleading... But, isn't that the case for information visualization and data science in general?
Work-around question:
Currently, I need to have a percentage-axis (2nd case). I used annotate and geom_hline as a workaround. However, I can't move the text outside the main plot. hjust also didn't seem to work with me.
Reproducible example:
library(ggplot2)
# Random values generation - with some manipulation :
maxVal = 500
value = sample(1:maxVal, size = 100, replace = T)
value[value < 400] = value[value < 400] * 0.2
value[value > 400] = value[value > 400] * 0.9
# Data Frame prepartion :
labels = paste0(sample(letters[1:3], replace = T, size = length(value)), as.character(1:length(value)))
df = data.frame(sample = factor(labels, levels = labels), value = sort(value, decreasing = T))
# Plotting : Adding Percentages/Quantiles as lines
ggplot(data = df, aes(x = sample, y = value)) +
geom_bar(stat = "identity", fill = "grey90", aes(y = maxVal )) +
geom_bar(stat = "identity", fill = "#00bbd4") +
geom_hline(yintercept = c(0, maxVal)) + # Min and max values
geom_hline(yintercept = c(maxVal*0.25, maxVal*0.5, maxVal*0.75), alpha = 0.2) + # Marking the 25%, 50% and 75% values
annotate(geom = "text", x = rep(100,3), y = c(maxVal*0.25, maxVal*0.5, maxVal*0.75),
label = c("25%", "50%", "75%"), vjust = 0, hjust = 0.2) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
theme(panel.background = element_blank()) +
theme(plot.background = element_blank()) +
theme(plot.margin = unit(rep(2,4), units = "lines"))
In response to #1
We've all stumbled across one of those plots where we need to use a ruler on the screen, because the axis is too far
cowplot.
# Assign your original plot to some variable, `gpv` <- ggplot( ... )
ggdraw(switch_axis_position(gpv, axis="y", keep="y"))

Resources