Sometimes I'd like to present data that refer to periods (not to points in time) as a step function. When e.g. data are per-period averages, this seems more appropriate than using a line connecting points (with geom_line). Consider, as a MWE, the following:
df = data.frame(x=1:8,y=rnorm(8,5,2))
ggplot(df,aes(x=x,y=y))+geom_step(size=1)+scale_x_continuous(breaks=seq(0,8,2))
This gives
However, the result is not fully satisfactory, as (1) I'd like the final observation to be represented by an horizontal segment and (2) I'd like to have labels on the x-axis aligned at the center of the horizontal line. What I want can be obtained with some hacking:
df %>% rbind(tail(df,1) %>% mutate(x=x+1)) %>%
ggplot(aes(x,y))+geom_step(size=1)+
scale_x_continuous(breaks=seq(0,12,2))+
theme(axis.ticks.x=element_blank(),axis.text.x=element_text(hjust=-2))
which produces:
This corresponds to what I am looking for (except that the horizontal alignment of labels requires some fine tuning and is not perfect). However, I am not sure this is the best way to proceed and I wonder if there is a better way.
Does this work for you? It comes down to altering the data as it is passed rather than changing the plotting code per se (as is often the case in ggplot)
Essentially what we do is add an extra copy of the final y value on to the end of the data frame at an incremented x value.
To make the horizontal segments line up to the major axis breaks, we simply subtract 0.5 from the x value.
ggplot(rbind(df, data.frame(x = 9, y = tail(df$y, 1))),
aes(x = x - 0.5, y = y)) +
geom_step(size = 1)+
scale_x_continuous(breaks = seq(0, 8, 2), name = "x",
minor_breaks = seq(0, 8, 1) + 0.5) +
theme_bw() +
theme(panel.grid.major.x = element_blank(),
panel.grid.minor = element_line())
Related
Novice R user here wrestling with some arcane details of ggplot
I am trying to produce a plot that charts two data ranges: One plotted as a line, and another plotted on the same plot, but as points. The code is something roughly like this:
ggplot(data1, aes(x = Year, y = Capacity, col = Process)) +
geom_line() +
facet_grid(Country ~ ., scales = "free_y") +
scale_y_continuous(trans = "log10") +
geom_point(data = data2, aes(x = Year, y = Capacity, col = Process))
I've left out some additional cosmetic arguments for the sake of simplicity.
The problem is that the points from the geom_point keep getting cut off by the x axis:
I know the standard fix here would be to adjust the y limits to make room for the points:
scale_y_continuous(limits = c(-100, Y_MAX))
But here there is a separate problem due to the facet grid with free scales, since there is no single value for Y_MAX
I've also tried it using expansions:
scale_y_continuous(expand = c(0.5, 0))
But here, it runs into problems with the log scale, since it multiplies by different values for each facet, producing very wonky results.
I just want to produce enough blank space on the bottom of each facet to make room for the point. Or, alternatively, move each point up a little bit to make room. Is there any easy way to do this in my case?
This might be a good place for scales::pseudo_log_trans, which combines a log transformation with a linear transformation (and a flipped sign log transformation) to retain most of the benefits of a log transformation while also allowing zero and negative values. Adjust the sigma parameter of the function to adjust where the transition from linear to log should happen.
library(ggplot2)
ggplot(data = data.frame(country = rep(c("France","USA"), each = 5),
x = rep(1:5, times = 2),
y = c(10^(2:6), 0, 10^(1:4))),
aes(x,y)) +
geom_point() +
# scale_y_continuous(trans = "log10") +
scale_y_continuous(trans = scales::pseudo_log_trans(),
breaks = c(0, 10^(0:6)),
labels = scales::label_number_si()) +
facet_wrap(~country, ncol = 1, scales = "free_y")
vs. with (trans = "log10"):
I'm trying to visualize paired sets of data points on the following graphs:
ggplot(grid.mag.ROIcontrols.allRuns, aes(Model,Grid_Magnitude)) +
geom_boxplot(aes(fill=Model),outlier.shape = NA,alpha=0.6) +
geom_point(aes(fill=Model),size=2,shape=21,position=position_jitterdodge(0.2)) +
geom_line(aes(group=Side)) +
facet_grid(~Side,scales = "free") +
scale_fill_brewer(palette="GnBu") +
labs(title = "Average Grid Magnitude, pm vs al EC")
Lines are joining the points between alLeft6/pmLeft6 and between alRight6/pmRight6.
However geom_line with the group variable that I need doesn't work - it adds vertical lines and one horizontal line between data points, when I need one horizontal line for each of the 10 pairs.
Without geom_line:
With geom_line:
PS: Sorry, I don't know how to share the raw data...
Without the actual data it is hard give you in depth help, please refer to this site for a guide for a great reproducible example, as mentioned in the comments.
I am assuming you want to compare one datapoint from alLeft6 to one from pmLeft6 (otherwise the horizontal line would make little sense). This indicates you have some column in your data linking these points together (Pairs in the example data).
With made up data this would be as easy as setting the geom_line() grouping variable to this column (Pairs). To align the geom_point() with the geom_line() with jitter an easy solution is to define the offset before the ggplot call (here called pd).
library(tidyverse)
grid.mag.ROIcontrols.allRuns = tibble(Model = c(rep("alLeft6", 10),rep("pmLeft6", 10),rep("alRight6", 10),rep("pmRight6", 10)),
Grid_Magnitude = c(runif(10, -1, 1),runif(10, -0.5, 1.5), runif(10, -1, 1),runif(10, -1.5, 0.5)),
Side = c(rep("Left", 20), rep("Right", 20)),
Pair = c(rep(1:10, 2), rep(11:20, 2))
) %>%
mutate(Pair = as.factor(Pair))
pd <- position_dodge(0.2)
ggplot(grid.mag.ROIcontrols.allRuns, aes(Model,Grid_Magnitude)) +
geom_boxplot(aes(fill=Model),outlier.shape = NA,alpha=0.6) +
geom_line(aes(group=Pair), position = pd) +
geom_point(aes(fill=Model,group=Pair),size=2,shape=21, position = pd) +
facet_grid(~Side,scales = "free") +
scale_fill_brewer(palette="GnBu") +
labs(title = "Average Grid Magnitude, pm vs al EC")
I plotted a ggplot for a couple of variables using side-by side bars, but it seems the y-axis is shown without any calibration-displays every number(in %)
The topic of the plot is to show the share of white,hispanic and black people (%) out of the whole population in each state in the US. as you can see, the Y axis that is supposed to represent the percent, looks like all the values had been pushed inside it instead of a calibration from 0 to 100
The dataset I am using is presented at github_fivethirtyeight_police-killings (I am sorry but I couldn't find a way to organize the five columns I am taking from the dataframe: state, ethnicity, and the three shares you are seeing on the right(in %)
the R code is presented:
x<-read.csv("C:/Users/USER/data/police-killings/police_killings.csv",header=TRUE, sep = "," ,stringsAsFactors = FALSE)
state<-x[,10]
ethnicity<-x[,4]
state_and_shares<-x[,c(10,23:25)]
df2<-melt(state_and_shares, id.vars = 'state')
head(df2)
ggplot(df2,aes(x=state,y=value,fill=variable))+geom_bar(stat = 'identity',position = 'dodge')+theme(axis.text.x = element_text(angle = 90, hjust = 1))
can someone please tell me how can I factor the Y values to look more normal and to not display all the values?
You need ggplot to see the y values as numbers and not strings, eg. with as.numeric():
ggplot(df2,aes(x=state,y=as.numeric(value),fill=variable))+geom_bar(stat = 'identity',position = 'dodge')+theme(axis.text.x = element_text(angle = 90, hjust = 1))
I am trying to make a histodot using geom_dotplot. For some reason, ggplot is giving me what appear to me are arbitrary breaks in my data along the x axis. My data has been binned seq(0,22000000,500000), so I would like to see gaps in my data where they actually exist. However, I'm struggling to successfully list those breaks(). I wouldn't expect to see my first break until after $6,000,000 with a break until $10,000,000. Bonus points for teaching me why I can't use labels=scales::dollar on my scale_x_discrete.
Here is my data.
library(ggplot2)
data <- read.csv("data2.csv")
ggplot(data,aes(x=factor(constructioncost),fill=designstage)) +
geom_dotplot(binwidth=.8,method="histodot",dotsize=0.75,stackgroups=T) +
scale_x_discrete(breaks=seq(0,22000000,500000)) +
ylim(0,30)
Thanks for any and all help and please, let me know if you have any questions!
Treating the x axis as continuous instead of a factor will give you what you need. However, you experienced the enormous range of your cost variable (0 to 21 million) was making ggplot2 choke when you try to treat is as continuous.
Because your values (other than 0) are all at least 500000, dividing the cost by 100000 will put things on a scale that ggplot2 can handle but also give you the variable spacing you want.
Note I had to play around with binwidth when I changed the scale of the data.
ggplot(data, aes(x = constructioncost/100000, fill = signsta)) +
geom_dotplot(binwidth = 3, method="histodot", stackgroups=T) +
scale_x_continuous(breaks=seq(0, 220, 5)) +
ylim(0,30)
Then you can change the labels to reflect the whole dollar amounts if you'd like. The number are so big you'll likely need to either add fewer or change the orientation (or both).
ggplot(data, aes(x = constructioncost/100000, fill = signsta)) +
geom_dotplot(binwidth = 3, method="histodot", stackgroups=T) +
scale_x_continuous(breaks = seq(0, 220, 10),
labels = scales::dollar(seq(0, 22000000, 1000000))) +
ylim(0,30) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
I have a dataset, where each data point has an x-value that is constrained (represents an actual instance of a quantitative variable), y-value that is arbitrary (exists simply to provide a dimension to spread out text), and a label. My datasets can be very large, and there is often text overlap, even when I try to spread the data across the y-axis as much as possible.
Hence, I am trying to use the new ggrepel. However, I am trying to keep the text labels constrained at their x-value position, while only allowing them to repel from each other in the y-direction.
As an example, the below code produces an plot for 32 data points, where the x-values show the number of cylinders in a car, and the y-values are determined randomly (have no meaning but to provide a second dimension for text plotting purposes). Without using ggrepel, there is significant overlap in the text:
library(ggrepel)
library(ggplot2)
set.seed(1)
data = data.frame(x=runif(100, 1, 10),y=runif(100, 1, 10),label=paste0("label",seq(1:100)))
origPlot <- ggplot(data) +
geom_point(aes(x, y), color = 'red') +
geom_text(aes(x, y, label = label)) +
theme_classic(base_size = 16)
I can remedy the text overlap using ggrepel, as shown below. However, this changes not only the y-values, but also the x-values. I am trying to avoid changing the x-values, as they represent an actual physical meaning (the number of cylinders):
repelPlot <- ggplot(data) +
geom_point(aes(x, y), color = 'red') +
geom_text_repel(aes(x, y, label = label)) +
theme_classic(base_size = 16)
As a note, the reason I cannot allow the x-value of the text to change is because I am only plotting the text (not the points). Whereas, it seems that most examples in ggrepel keep the position of the points (so that their values remain true), and only repel the x and y values of the labels. Then, the points and connected to the labels with segments (you can see that in my second plot example).
I kept the points in the two examples above for demonstration purposes. However, I am only retaining the text (and hence will be removing the points and the segments), leaving me with something like this:
repelPlot2 <- ggplot(data) + geom_text_repel(aes(x, y, label = label), segment.size = 0) + theme_classic(base_size = 16)
My question is two fold:
1) Is it possible for me to repel the text labels only in the y-direction?
2) Is it possible for me to obtain a structure containing the new (repelled) y-values of the text?
Thank you for any advice!
ggrepel version 0.6.8 (Install from GitHub using devtools::github_install) now supports a "direction" argument, which enables repelling of labels only in "x" or "y" direction.
repelPlot2 <- ggplot(data) + geom_text_repel(aes(x, y, label = label), segment.size = 0, direction = "y") + theme_classic(base_size = 16)
Getting the y values is harder -- one approach can be to use the "repel_boxes" function from ggrepel first to get repelled values and then input those into ggplot with geom_text. For discussion and sample code of that approach, see https://github.com/slowkow/ggrepel/issues/24. Note that if using the latest version, the repel_boxes function now also has a "direction" argument, which takes in "both","x", or "y".
I don't think it is possible to repel text labels only in one direction with ggrepel.
I would approach this problem differently, by instead generating the arbitrary y-axis positions manually. For example, for the data set in your example, you could do this using the code below.
I have used the dplyr package to group the data set by the values of x, and then created a new column of data y containing the row numbers within each group. The row numbers are then used as the values for the y-axis.
library(ggplot2)
library(dplyr)
data <- data.frame(x = mtcars$cyl, label = paste0("label", seq(1:32)))
data <- data %>%
group_by(x) %>%
mutate(y = row_number())
ggplot(data, aes(x = x, y = y, label = label)) +
geom_text(size = 2) +
xlim(3.5, 8.5) +
theme_classic(base_size = 8)
ggsave("filename.png", width = 4, height = 2)