I'm trying to map different ranges (lines) into different regions in the plot (see below) using geom_segment but some of the ranges overlap and can't be shown at all.
This is a minimal example for a dataframes:
start = c(1, 5,8, 14)
end =c(3, 6,12, 16)
regions = c(1,2,3, 4)
regions = data_frame(regions, start, end)
from = c(1,2, 5.5, 13.5)
to = c(3,2.5,6, 15)
lines = data_frame(from, to)
I plotted the regions with geom_rect and then plot the ranges (lines) with geom_segment.
This is the plot:
plot_splice <- ggplot() +
scale_x_continuous(breaks = seq(1,16)) +
scale_y_continuous() +
geom_hline(yintercept = 1.6,
size = 20,
alpha = 0.1) +
geom_rect(
data = regions,
mapping = aes(
xmin = start,
xmax = end,
ymin = 1.5,
ymax = 1.8,
)) +
geom_segment(
data = lines,
x = (lines$from),
xend = (lines$to),
y = 1.48,
yend = 1.48,
colour = "red",
size = 3
) +
ylim(1.0, 2.2) +
xlab("") +
theme_minimal()
The first plot is the one generated with the code whereas the second one is the desired plot.
As you can see, the second line overlaps with the first one, so you can't see the second line at all.
How can I change the code to produce the second plot?
I'm trying to use ifelse statement but not sure what is test argument should be:
I want it to check for each range (line) if it is overlapped with any previous range (line) to change the y position by around .05, so it doesn't overlap.
lines <- lines %>%
dplyr::arrange(desc(from))
new_y$lines = ifelse(from[1] < to[0], 1.48, 1.3)
geom_segment(
data = lines,
x = (lines$from),
xend = (lines$to),
y = new_y,
yend = new_y,
colour = "red",
size = 3
)
Your geom_segment call isn't using any aesthetic mapping, which is how you normally get ggplot elements to change position based on a particular variable (or set of variables).
The stacking of the geom_segment based on the number of overlapping regions is best calculated ahead of the call to ggplot. This allows you to pass the x and y values into an aesthetic mapping:
# First ensure that the data feame is ordered by the start time
lines <- lines[order(lines$from),]
# Now iterate through each row, calculating how many previous rows have
# earlier starts but haven't yet finished when the current row starts.
# Multiply this number by a small negative offset and add the 1.48 baseline value
lines$offset <- 1.48 - 0.03 * sapply(seq(nrow(lines)), function(i) {
with(lines[seq(i),], length(which(from < from[i] & to > from[i])))
})
Now do the same plot but using aesthetic mapping inside geom_segment:
ggplot() +
scale_x_continuous(breaks = seq(1,16), name = "") +
scale_y_continuous(limits = c(1, 2.2), name = "") +
geom_hline(yintercept = 1.6,
size = 20,
alpha = 0.1) +
geom_rect(
data = regions,
mapping = aes(
xmin = start,
xmax = end,
ymin = 1.5,
ymax = 1.8,
)) +
geom_segment(
data = lines,
mapping = aes(
x = from,
xend = to,
y = offset,
yend = offset),
colour = "red",
size = 3
) +
theme_minimal()
Related
I would like to plot rectangles between specific values listed in a data frame, such as:
Region <- c("A","B","A","B","A","C","B","C","A"),
Lon <- c(31.03547, 37.25443, 65.97450, 69.90290, 101.77630,
105.32550, 148.86270, 147.72010, 146.10420)
var1 <- rnorm(n = 9, mean = 15, sd = 100)
regions <- data.frame(Region, Lon, var1)
This is an example where I show the region limits using geom_vline:
ggplot(NULL)+
geom_vline(data = regions, aes(xintercept=Lon,
linetype=region,
color = region),
size=0.6)+
geom_point(data = regions, aes(x=Lon, y=var1, color=Region))+
theme_bw()
I want to plot background rectangles that would be limited by those verticle lines.
I tried to look at this previous question:
How to find the start and the end of sequences automatically in R for rectangles in ggplot
However, it does not satisfy completely my needs, because I would like to plot rectangles for every region.
# Convert to runlength encoding
rle <- rle(regions$Region == "B")
# Determine starts and ends
starts <- {ends <- cumsum(rle$lengths)} - rle$lengths + 1
# Build a data.frame from the rle
dfrect <- data.frame(
xmin = regions$Lon[starts],
# We have to +1 the ends, because the linepieces end at the next datapoint
# Though we should not index out-of-bounds, so we need to cap at the last end
xmax = regions$Lon[pmin(ends + 1, max(ends))],
fill = rle$values
)
ggplot(NULL)+
geom_vline(data = regions, aes(xintercept=Lon,
linetype=region,
color = region),
size=0.6)+
geom_rect(data = dfrect,
aes(xmin = xmin, xmax = xmax, ymin = -Inf, ymax = Inf,
fill = fill),
alpha = 0.4) +
geom_point(data = regions, aes(x=Lon, y=var1, color=Region))+
theme_bw()
How can I define the rectangles for A and C too? consider that I have multiple regions, not only 3.
sample data
Region <- c("A","B","A","B","A","C","B","C","A")
Lon <- c(31.03547, 37.25443, 65.97450, 69.90290, 101.77630,
105.32550, 148.86270, 147.72010, 146.10420)
var1 <- rnorm(n = 9, mean = 15, sd = 100)
regions <- data.frame(Region, Lon, var1)
code
library(data.table)
# Make regions a data.table
setDT(regions)
# first sort by lon, to avoind overlap in rectangles
setkey(regions, Lon)
# create boundaries of rectangles
regions[, Lon_end := data.table::shift(Lon, type = "lead", fill = Inf)][]
# plot
ggplot(data = regions) +
geom_vline(aes(xintercept = Lon, linetype = Region, color = Region), size = 0.6) +
geom_rect(mapping = aes(xmin = Lon, xmax = Lon_end, ymin = 0, ymax = 1, fill = Region), alpha = 0.1)
output
Objective:
Create the XY scatterplot of variables (xx,yy). Color the corresponding Cartesian quadrants according to a third variable's (return) median.
I've created the color vector using colorRampPalette. The issue is that it is being read as continuous (though the vector is discrete).
Have the scatter points be blue (not labeled "blue")
Include a label on each quadrant according to dt.data[, quadrants] so that it is easy to identify what the area corresponds to. So the mark A or the top right, B on bottom right, etc.
This is the code I've written.
library(data.table)
set.seed(42)
dt <- data.table(
xx = rnorm(40, 0, 2),
yy = rnorm(40, 0, 2),
return = rnorm(40, 1, 3))
## compute the range we're going to want to plot over
## in this case 50% more than the max value
RANGE <- 1.5 * dt[, max(abs(c(xx, yy)))]
## compute the medians per quadrant
dtMedians <- dt[,
.(med = median(return)),
.(sign_x = sign(xx), sign_y = sign(yy))]
## set up some fake labels
dtMedians[, quadrant := letters[1:4]]
## compute a color scale for the medians and assign it
fcol <- colorRampPalette(c("#FC4445", "#3FEEE6", "#5CDB95"))
dtMedians[, col := fcol(4)[rank(med)]]
Mycol <- dt.Medians[, .(col)]
dt.rects2<- data.table(
quadrant = letters[1:4],
xmin= c(0,0,-RANGE, -RANGE),
xmax= c(RANGE,RANGE,0,0),
ymin= c(0,-RANGE,-RANGE,0),
ymax= c(RANGE,0,0,RANGE))
dt.data <- merge(dtMedians, dt.rects2, by ="quadrant")
gg<- ggplot() +
geom_rect(data = dt.data,
aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax, fill = med ))
gg+
scale_fill_manual(values = Mycol ) +
labs(x="xx", y="yy", title='US. Growth Quadrant') +
geom_point(data = dt,
aes(x = xx,
y = yy,
color = 'blue'))
While I think the code could be much cleaner, I left it unchanged to the extent possible - there were a few mistakes (e.g., with the variables x and y) that I had to correct to be able to run the code. Now as to your questions:
You can tell R to treat a variable as a factor with fill = as.factor(med). In addition, I had to adjust scale_fill_manual(values = Mycol$col) to select the colors defined in variable col of df Mycol.
To make the scatters blue, I took the color = 'blue' outside of the aes() in the geom_point().
I used annotate() to label the corners of the plot, which relies on manually defining the x and y coordinates. I am sure there are other, potentially better (and automated) solutions out there.
Full code for the plot (taking your data):
ggplot() +
geom_rect(data = dt.data,
aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax, fill = as.factor(med))) +
scale_fill_manual(values = Mycol$col) +
labs(x="xx", y="yy", title='US. Growth Quadrant') +
geom_point(data = dt,
aes(x = x,
y = y),
color = 'blue') +
annotate(geom = 'text', label = 'A', x = 5, y = 5, size = 8) +
annotate(geom = 'text', label = 'B', x = 5, y = -5, size = 8) +
annotate(geom = 'text', label = 'C', x = -5, y = -5, size = 8) +
annotate(geom = 'text', label = 'D', x = -5, y = 5, size = 8)
Output:
I am trying to make a wind vector plot, and the closest I have come is using ggplot2 and the tutorials here: https://theoceancode.netlify.app/post/wind_vectors/
and here: http://jason-doug-climate.blogspot.com/2014/08/weather-station-at-worldfish-hq-goes.html
First I'm going to specify some example data that has the same structure as I'm working with...some code is redundant for the example here but I'm leaving it in for continuity with what I'm working with.
library(tidyverse)
dat <- tibble(Date = seq(as.POSIXct('2018-08-01 00:00:00'),
as.POSIXct('2018-08-12 00:00:00'), "hour"),
WSMPS = rnorm(265,3,1),
WDir = rnorm(265,180,75),
month = 8,
year = rep(2018))
vec_dat <- dat %>%
rename(ws=WSMPS, wd= WDir) %>%
filter(year==2018, month==8) %>% # redundant for example data
mutate(hour = as.numeric(substr(Date,12,13)),
bin = cut.POSIXt(Date,
breaks = NROW(unique(Date))/4),
u = (1 * ws) * sin((wd * pi / 180.0)), # convert to cartesian coordinate vectors
v = (1 * ws) * cos((wd * pi / 180.0))) %>%
group_by(bin) %>% # bin the data into 4hr increments
summarise(u=mean(u),
v=mean(v)) %>%
mutate(bin = as.POSIXct(bin),
date = as.Date(substr(bin, 1,10)),
time = chron::as.times(substr(bin, 12,19)))
The closest I have come is using the code below
wind_scale <- 1 # this is a scaling factor not used at the moment so set to 1
y_axis <- seq(-5, 5, 5)
ggplot(data = vec_dat, aes(x = bin, y = y_axis)) +
# Here we create the wind vectors as a series of segments with arrow tips
geom_segment(aes(x = date, xend = date + u*wind_scale, y = 0, yend = v*wind_scale),
arrow = arrow(length = unit(0.15, 'cm')), size = 0.5, alpha = 0.7)
This creates a plot that looks good except that I would like to split the vectors into their respective bins (4 hour increments denoted by vec_dat$bin) instead of having all the vectors for a given day originate from the same point on the x axis. I've tried switching vec_dat$date for vec$dat$bin but then the math within geom_segment() no longer works and the plot originates from the bins but the vectors are all perfectly vertical as below:
ggplot(data = vec_dat, aes(x = bin, y = y_axis)) +
# Here we create the wind vectors as a series of segments with arrow tips
geom_segment(aes(x = bin, xend = bin + u*wind_scale, y = 0, yend = v*wind_scale),
arrow = arrow(length = unit(0.15, 'cm')), size = 0.5, alpha = 0.7)
UPDATE
This appears to be a math problem. When I calculate the xend argument using bin instead of date the result is that the xend value is not scaled correctly as below:
test <- vec_dat[1:12,]
test$bin+test$u
test$date+test$u
So what is required is to use data as class Date within the xend formula...however this throws an error:
ggplot(data = vec_dat, aes(x = bin, y = y_axis)) +
# Here we create the wind vectors as a series of segments with arrow tips
geom_segment(aes(x = bin, xend = date + u*wind_scale, y = 0, yend = v*wind_scale),
arrow = arrow(length = unit(0.15, 'cm')), size = 0.5, alpha = 0.7)
Error: Invalid input: time_trans works with objects of class POSIXct only
So if anyone can help with this error or with a workaround I'd appreciate it.
I think you are looking for something like this:
wind_scale <- 86400 # (seconds in a day)
y_axis <- seq(-5, 5, 5)
ggplot(data = vec_dat, aes(x = bin, y = y_axis)) +
geom_segment(aes(xend = bin + u * wind_scale, y = 0, yend = v),
arrow = arrow(length = unit(0.15, 'cm')),
size = 0.5, alpha = 0.7) +
coord_fixed(ratio = wind_scale) # Preserves correct angle for wind vector
For what it's worth, I don't think having this many arrows on a single plot makes for a great visualization because there is a lot of clashing and overlap of arrows that makes it hard to read. Vertical faceting by day might make this easier to interpret.
I want to divide the y axis for the attached figure to take part with a score <25 occupies the majority of the figure while the remaining represent a minor upper part.
I browsed that and I am aware that I should use scale_y_discrete(limits .I used this p<- p+scale_y_continuous(breaks = 1:20, labels = c(1:20,"//",40:100)) but it doesn't work yet.
I used the attached data and this is my code
Code
p<-ggscatter(data, x = "Year" , y = "Score" ,
color = "grey", shape = 21, size = 3, # Points color, shape and size
add.params = list(color = "blue", fill = "lightgray"), # Customize reg. line
add = "loess", #reg.line
conf.int = T,
cor.coef = F, cor.method = "pearson",
xlab = "Year" , ylab= "Score")
p<-p+ coord_cartesian(xlim = c(1980, 2020));p
Here is as close as I could get getting a fake axis break and resizing the upper area of the plot. I still think it's a bad idea and if this were my plot I'd much prefer a more straightforward axis transform.
First, we'd need a function that generates a transform that squeezes all values above some threshold:
library(ggplot2)
library(scales)
# Define new transform
my_transform <- function(threshold = 25, squeeze_factor = 10) {
force(threshold)
force(squeeze_factor)
my_transform <- trans_new(
name = "trans_squeeze",
transform = function(x) {
ifelse(x > threshold,
((x - threshold) * (1 / squeeze_factor)) + threshold,
x)
},
inverse = function(x) {
ifelse(x > threshold,
((x - threshold) * squeeze_factor) + threshold,
x)
}
)
return(my_transform)
}
Next we apply that transformation to the y-axis and add a fake axis break. I've used vanilla ggplot2 code as I find the ggscatter() approach confusing.
ggplot(data, aes(Year, Score)) +
geom_point(color = "grey", shape = 21, size = 3) +
geom_smooth(method = "loess", fill = "lightgray") +
# Add fake axis lines
annotate("segment", x = -Inf, xend = -Inf,
y = c(-Inf, Inf), yend = c(24.5, 25.5)) +
# Apply transform to y-axis
scale_y_continuous(trans = my_transform(25, 10),
breaks = seq(0, 80, by = 10)) +
scale_x_continuous(limits = c(1980, 2020), oob = oob_keep) +
theme_classic() +
# Turn real y-axis line off
theme(axis.line.y = element_blank())
You might find it informative to read Hadley Wickham's view on discontinuous axes. People sometimes mock weird y-axes.
I am trying to create a chart like this one produced in the NYTimes using ggplot:
I think I'm getting close, but I'm not quite sure how to separate out some of my data so I get the right view. My data is political office holders that appear something like this:
name,year_elected,year_left,years_in_office,type,party
Person 1,1969,1969,1,Candidate,Unknown
Person 2,1969,1971,2,Candidate,Unknown
Person 3,1969,1973,4,Candidate,Unknown
Person 4,1969,1973,4,Candidate,Unknown
Person 5,1971,1974,3,Candidate,Unknown
Person 1,1971,1976,5,Candidate,Unknown
Person 2,1971,1980,9,Candidate,Unknown
Person 6,1973,1978,5,Candidate,Unknown
Person 7,1973,1980,7,Candidate,Unknown
Person 8,1975,1980,5,Candidate,Unknown
Person 9,1977,1978,1,Candidate,Unknown
And I've used the below code to get very close to this view, but I think an issue I'm running into is either drawing segments incorrectly (e.g., I don't seem to have a single segment for each candidate), or segments are overlapping/stacking. The key issue I'm running into is my list of office holders is around 60, but my chart is only drawing around 28 lines.
library(googlesheets)
library(tidyverse)
# I'm reading from a Google Spreadsheet
data <- gs_title("Council Members")
data_sj <- gs_read(ss = data, ws = "Sheet1")
ggplot(data, aes(year_elected, years_in_office)) +
geom_segment(aes(x = year_elected, y = 0,
xend = year_left, yend = years_in_office)) +
theme_minimal()
The above code gives me:
Thanks ahead of time for any pointers!
If your data frame is called d, then:
Transform it to data.table
Add jitter to year_electer
Add equivalent jitter to year_left
Add group (as an example) to color your samples
Use ggrepel to add text if there are many points.
Code:
library(data.table)
library(ggplot2)
library(ggrepel)
d[, year_elected2 := jitter(year_elected)]
d[, year_left2 := year_left + year_elected2 - year_elected + 0.01]
d[, group := TRUE]
d[factor(years_in_office %/% 9) == 1, group := FALSE]
ggplot(d, aes(year_elected2, years_in_office)) +
geom_segment(aes(x = year_elected2, xend = year_left2,
y = 0, yend = years_in_office, linetype = group),
alpha = 0.8, size = 1, color = "grey") +
geom_point(aes(year_left2), color = "black", size = 3.3) +
geom_point(aes(year_left2, color = group), size = 2.3) +
geom_text_repel(aes(year_left2, label = name), ) +
scale_colour_brewer(guide = FALSE, palette = "Dark2") +
scale_linetype_manual(guide = FALSE, values = c(2, 1)) +
labs(x = "Year elected",
y = "Years on office") +
theme_minimal(base_size = 10)
Result:
For the record and to address my comment on #PoGibas answer above, here's my tidyverse version:
data_transform <- data_sj %>%
mutate(year_elected_jitter = jitter(year_elected)) %>%
mutate(year_left_jitter = year_left + year_elected_jitter - year_elected + 0.01)
ggplot(data_transform, aes(year_elected, years_in_office, label = name)) +
geom_segment(aes(x = year_elected_jitter, y = 0, xend = year_left_jitter, yend = years_in_office, color = gender), size = 0.3) +
geom_text_repel(aes(year_left_jitter, label = name)) +
theme_minimal()