I've got a data like below:
structure(list(bucket = structure(1:23, .Label = c("(1.23,6.1]",
"(6.1,10.9]", "(10.9,15.6]", "(15.6,20.4]", "(20.4,25.1]", "(25.1,29.9]",
"(29.9,34.6]", "(34.6,39.4]", "(39.4,44.2]", "(44.2,48.9]", "(48.9,53.7]",
"(53.7,58.4]", "(58.4,63.2]", "(63.2,68]", "(68,72.7]", "(72.7,77.5]",
"(77.5,82.2]", "(82.2,87]", "(87,91.7]", "(91.7,96.5]", "(96.5,101]",
"(101,106]", "(106,111]"), class = "factor"), value = c(0.996156321090158, 0.968144290236367, 0.882793110384066, 0.719390676388129, 0.497759597498133,
0.311721580067415, 0.181244079443301, 0.0988516758834657, 0.0527504526341006,
0.0278716018561911, 0.0145107725175315, 0.00785033086321829,
0.00405759957072942, 0.00213190168252939, 0.00109610249274952,
0.000578154695264754, 0.000301095727545301, 0.000155696457494707,
8.2897211122996e-05, 4.09225082176349e-05, 2.33782236798641e-05,
1.21665352966827e-05, 6.87373003802479e-06), bucket_id = 1:23), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -23L))
Which I want to visualise as a circular stacked bar plot:
cutoff_values <- seq(0, 115, by = 5)
library(tidyverse)
ex %>%
mutate(r0 = cutoff_values[-length(cutoff_values)],
r = cutoff_values[-1]) %>%
mutate(x0 = 100,
y0 = 50) %>%
ggplot(aes(x0 = x0, y0 = y0, r0 = r0, r = r)) +
ggforce::geom_arc_bar(aes(start = 0, end = 2 * pi, fill = value),
colour = NA) +
theme_void() +
labs(fill = 'colour')
But I also need to be able to mark out some particular bucket with different filling at best. So I need to be able to preserve filling using value with continuous scale, but also fill one particular stratum (let's say bucket == 15) with another colour, leaving the other strata (buckets) as they are. Is it possible? What are the alternatives to mark out bucket 15th?
I believe that this can be done with the relayer package, which is still highly experimental. You can copy a subset of your data in a seperate geom and give it another fill aesthetic. This seperate geom can then be piped into rename_geom_aes() and you would have to set the scale_fill_*() for your renamed aesthetic. You'd probably get a warning about that the geom is ignoring unknown aesthetics, but I don't know if that can be helped.
Below is an example for making bucket 15 red.
library(tidyverse)
library(relayer) # https://github.com/clauswilke/relayer
ex <- df %>%
mutate(r0 = cutoff_values[-length(cutoff_values)],
r = cutoff_values[-1]) %>%
mutate(x0 = 100,
y0 = 50)
ggplot(ex, aes(x0 = x0, y0 = y0, r0 = r0, r = r)) +
ggforce::geom_arc_bar(aes(start = 0, end = 2 * pi, fill = value),
colour = NA) +
ggforce::geom_arc_bar(data = ex[ex$bucket_id == 15,], # Whatever bucket you want
aes(start = 0, end = 2 * pi, fill2 = as.factor(bucket_id))) %>%
rename_geom_aes(new_aes = c("fill" = "fill2")) +
scale_fill_manual(aesthetics = "fill2", values = "red", guide = "legend") +
theme_void() +
labs(fill = 'colour', fill2 = "highlight")
Related
This is the dataset:
heartData <- structure(list(id = 1:6, biking = c(30.80124571, 65.12921517,
1.959664531, 44.80019562, 69.42845368, 54.40362555), smoking = c(10.89660802,
2.219563176, 17.58833051, 2.802558875, 15.9745046, 29.33317552
), heart.disease = c(11.76942278, 2.854081478, 17.17780348, 6.816646909,
4.062223522, 9.550045997)), row.names = c(NA, 6L), class = "data.frame")
Here I have used multiple linear regression as model.
model.1 <- lm( heart.disease ~ biking + smoking, data = heartData)
plotting.data is a synthesized data I am interested in to check the confidence interval around as well as prediction interval.
plotting.data <- expand.grid(
biking = seq(min(heartData$biking), max(heartData$biking), length.out = 5),
smoking = c(mean(heartData$smoking)))
plotting.data$predicted.y <- predict(model.1, newdata = plotting.data, interval = 'confidence')
plotting.data$smoking <- round(plotting.data$smoking, digits = 2)
plotting.data$smoking <- as.factor(plotting.data$smoking)
After running the block of code above, I can see I have created plotting.data with 5 columns however, when I'm running
colnames(plotting.data)
I get 3 column names. plotting.data$predicted.y is only one column and I can't have access or rename plotting.data$predicted.y[,"fit"], plotting.data$predicted.y[,"upr"] or plotting.data$predicted.y[,"lwr"]
To plot results
heart.plot <- ggplot(data = heartData, aes(x = biking, y = heart.disease)) + geom_point()
+ geom_line(data = plotting.data, aes(x = biking, y = predicted.y[,"fit"], color = "red"), size = 1.25)
+ geom_ribbon(data = plotting.data, aes(ymin = predicted.y[,"lwr"], ymax = predicted.y[,"upr"]), alpha = 0.1)
heart.plot
I get the error:
Error in FUN(X[[i]], ...) : object 'heart.disease' not found
I don't know why I'm getting this error. From my own trial and errors, I know that the following part of the code is giving the error. however, I don't know how I can write it in a better way.
geom_ribbon(data = plotting.data, aes(ymin = predicted.y[,"lwr"], ymax = predicted.y[,"upr"]), alpha = 0.1)
It's because when you name variables in the aes() wrapper in ggplot(), it is expected that those variables are available to any data set that you happen to call in the additional geoms. If you want to use multiple data sets and they don't necessarily have the same variables, you need to have a separate aes() wrapper in each of the geoms to better control this issue.
ggplot() +
geom_point(data = heartData, aes(x = biking, y = heart.disease)) +
geom_line(data = plotting.data, aes(x = biking, y = predicted.y[,"fit"]), color = "red", size = 1.25) +
geom_ribbon(data = plotting.data, aes(x = biking, ymin = predicted.y[,"lwr"], ymax = predicted.y[,"upr"]), alpha = 0.1)
I'm studying the returns to college admission for marginal student and i'm trying to make a ggplot2 of the following data which is, average salaries of students who finished or didn't finish their masters in medicin and the average 'GPA' (foreign equivalent) distance to the 'acceptance score':
SalaryAfter <- c(287.780,305.181,323.468,339.082,344.738,370.475,373.257,
372.682,388.939,386.994)
DistanceGrades <- c("<=-1.0","[-0.9,-0.5]","[-0.4,-0.3]","-0,2","-0.1",
"0.0","0.1","[0.2,0.3]","[0.4,0.5]",">=0.5")
I have to do a Regression Discontinuity Design (RDD), so to do the regression - as far as i understand it - i have to rewrite the DistanceGrades to numeric so i just created a variable z
z <- -5:4
where 0 is the cutoff (ie. 0 is equal to "0.0" in DistanceGrades).
I then make a dataframe
df <- data.frame(z,SalaryAfter)
Now my attempt to create the plot gets a bit messy (i use the package 'fpp3', but i suppose that it is just the ggplot2 and maybe dyplr packages)
df %>%
select(z, SalaryAfter) %>%
mutate(D = as.factor(ifelse(z >= -0.1, 1, 0))) %>%
ggplot(aes(x = z, y = SalaryAfter, color = D)) +
geom_point(stat = "identity") +
geom_smooth(method = "lm") +
geom_vline(xintercept = 0) +
theme(panel.grid = element_line(color = "white",
size = 0.75,
linetype = 1)) +
xlim(-6,5) +
xlab("Distance to acceptance score") +
labs(title = "Figur 1.1", subtitle = "Salary for every distance to the acceptance score")
Which plots:
What i'm trying to do is firstly, split the data with a dummy variable D=1 if z>0 and D=0 if z<0. Then i plot it with a linear regression and a vertical line at z=0. Lastly i write the title and subtilte. Now i have two problems:
The x axis is displaying -5, -2.5, ... but i would like for it to show all the integers, the rational numbers have no relation to the z variable which is discrete. I have tried to fix this with several different methods, but none of them have worked, i can't remember all the ways i have tried (theme(panel.grid...),scale_x_discrete and many more), but the outcome has all been pretty similar. They all cause the x-axis to be completely removed such that there is no numbers and sometimes it even removes the axis title.
i would like for the regression channel for the first part of the data to extend to z=0
When i try to solve both of these problems i again get similar results, most of the things i try is not producing an error message when i run the code, but they either do nothing to my plot or they remove some of the existing elements which leaves me made of questions. I suppose that the error is caused by some of the elements not working together but i have no idea.
Try this:
library(tidyverse)
SalaryAfter <- c(287.780,305.181,323.468,339.082,344.738,370.475,373.257,
372.682,388.939,386.994)
DistanceGrades <- c("<=-1.0","[-0.9,-0.5]","[-0.4,-0.3]","-0,2","-0.1",
"0.0","0.1","[0.2,0.3]","[0.4,0.5]",">=0.5")
z <- -5:4
df <- data.frame(z,SalaryAfter) %>%
select(z, SalaryAfter) %>%
mutate(D = as.factor(ifelse(z >= -0.1, 1, 0)))
# Fit a lm model for the left part of the panel
fit_data <- lm(SalaryAfter~z, data = filter(df, z <= -0.1)) %>%
predict(., newdata = data.frame(z = seq(-5, 0, 0.1)), interval = "confidence") %>%
as.data.frame() %>%
mutate(z = seq(-5, 0, 0.1), D = factor(0, levels = c(0, 1)))
# Plot
ggplot(mapping = aes(color = D)) +
geom_ribbon(data = filter(fit_data, z <= 0 & -1 <= z),
aes(x = z, ymin = lwr, ymax = upr),
fill = "grey70", color = "transparent", alpha = 0.5) +
geom_line(data = fit_data, aes(x = z, y = fit), size = 1) +
geom_point(data = df, aes(x = z, y = SalaryAfter), stat = "identity") +
geom_smooth(data = df, aes(x = z, y = SalaryAfter), method = "lm") +
geom_vline(xintercept = 0) +
theme(panel.grid = element_line(color = "white",
size = 0.75,
linetype = 1)) +
scale_x_continuous(limits = c(-6, 5), breaks = -6:5) +
xlab("Distance to acceptance score") +
labs(title = "Figure 1.1", subtitle = "Salary for every distance to the acceptance score")
I'm working with spatial data. I'd like to create an animation of a movement path, with older data fading out. I figured out how to create the fade-out effect with points, but it looks like there's no built-in support for transitions for geom_path (Error: path layers not currently supported by transition_components). But are there any clever workarounds that could be used? My full dataset is large (200K points) and the overlapping paths get out of hand...
toy data:
library(ggplot2)
library(gganimate)
df <- structure(list(Lon = c(-66.6319163369509, -66.5400363369509,
-65.3972830036509, -65.2810430036509, -65.1169763369509, -64.7409730036509,
-64.3898230036509, -64.3458230036509, -64.1435830036509, -64.1902230036509,
-64.5269330036509, -64.5508330036509, -64.9324130036509, -66.4002496703509,
-66.4605896703509, -66.6230763369509, -66.6636963369509, -66.6425830036509,
-66.5310230036509, -66.4582830036509, -66.2992030036509, -65.8810363369509,
-65.3338363369509, -65.2480363369509, -65.3705963369509, -65.8357874342282,
-66.7324643369709, -66.8768896703509, -66.8215363369509, -66.8320584884004
), Lat = c(63.9018749538395, 64.1357216205395, 64.4444682872395,
64.4580016205395, 64.4744549538395, 64.4951416205395, 64.5202416205395,
64.5237216205395, 64.5388016205395, 64.5400516205395, 64.5090116205395,
64.5069516205395, 64.4609016205395, 64.2904882872395, 64.1898016205395,
63.9022816205395, 63.9948082872395, 64.0236682872395, 64.1115882872395,
64.2171216205395, 64.3599949538395, 64.3979682872395, 64.4634216205395,
64.4719816205395, 64.4459016205395, 64.4008282316608, 63.8029216205395,
63.7730882872395, 63.8046816205395, 63.8239941445658), DateTime = structure(c(1451784300,
1451790981, 1451806092, 1451807038, 1451808331, 1451811238, 1451813999,
1451814338, 1451815898, 1451820189, 1451822838, 1451823018, 1451826048,
1451838029, 1451840610, 1451848380, 1451864271, 1451865064, 1451867591,
1451870472, 1451874641, 1451878100, 1451882678, 1451883331, 1451886921,
1451890867, 1451910187, 1451925099, 1451929401, 1451934427), class = c("POSIXct",
"POSIXt"), tzone = "GMT")), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -30L))
df$TimeNum <- as.numeric(df$DateTime)
Plots:
##### successful fade - with points
p <- ggplot(df) +
geom_point(aes(x = Lon, y = Lat), size = 2) +
transition_components(TimeNum, exit_length = 20) +
ease_aes(x = 'sine-out', y = 'sine-out') +
shadow_wake(0.05, size = 2, alpha = TRUE, wrap = FALSE, #exclude_layer = c(2, 3),
falloff = 'sine-in', exclude_phase = 'enter')
animate(p, renderer = gifski_renderer(loop = F), duration = 10)
anim_save("try.gif")
##### successful plot with geom_path, but no fading - it gets REALLY busy with the
##### full dataset!
p1 <- ggplot(df) +
geom_path(aes(x = Lon, y = Lat), size = 2) +
transition_reveal(DateTime, keep_last = FALSE) +
labs(title = 'A: {frame_along}') +
exit_fade()
animate(p1, renderer = gifski_renderer(loop = F), duration = 10)
anim_save("try1.gif", width = 1000, height = 1000)
I'm not sure this is exactly what you're looking for, but you could use geom_segment instead of geom_path by adding the adjacent coordinate.
library(dplyr)
df1 <- df %>%
mutate(next_Lon = lead(Lon),
next_Lat = lead(Lat))
ggplot(df1) +
geom_segment(aes(x = Lon, y = Lat,
xend = next_Lon,
yend = next_Lat), size = 2) +
geom_point(aes(x = Lon, y = Lat), size = 2) +
transition_components(TimeNum, exit_length = 20) +
ease_aes(x = 'sine-out', y = 'sine-out') +
shadow_wake(0.05, size = 2, alpha = TRUE, wrap = FALSE, #exclude_layer = c(2, 3),
falloff = 'sine-in', exclude_phase = 'enter')
I am trying to create a chart like this one produced in the NYTimes using ggplot:
I think I'm getting close, but I'm not quite sure how to separate out some of my data so I get the right view. My data is political office holders that appear something like this:
name,year_elected,year_left,years_in_office,type,party
Person 1,1969,1969,1,Candidate,Unknown
Person 2,1969,1971,2,Candidate,Unknown
Person 3,1969,1973,4,Candidate,Unknown
Person 4,1969,1973,4,Candidate,Unknown
Person 5,1971,1974,3,Candidate,Unknown
Person 1,1971,1976,5,Candidate,Unknown
Person 2,1971,1980,9,Candidate,Unknown
Person 6,1973,1978,5,Candidate,Unknown
Person 7,1973,1980,7,Candidate,Unknown
Person 8,1975,1980,5,Candidate,Unknown
Person 9,1977,1978,1,Candidate,Unknown
And I've used the below code to get very close to this view, but I think an issue I'm running into is either drawing segments incorrectly (e.g., I don't seem to have a single segment for each candidate), or segments are overlapping/stacking. The key issue I'm running into is my list of office holders is around 60, but my chart is only drawing around 28 lines.
library(googlesheets)
library(tidyverse)
# I'm reading from a Google Spreadsheet
data <- gs_title("Council Members")
data_sj <- gs_read(ss = data, ws = "Sheet1")
ggplot(data, aes(year_elected, years_in_office)) +
geom_segment(aes(x = year_elected, y = 0,
xend = year_left, yend = years_in_office)) +
theme_minimal()
The above code gives me:
Thanks ahead of time for any pointers!
If your data frame is called d, then:
Transform it to data.table
Add jitter to year_electer
Add equivalent jitter to year_left
Add group (as an example) to color your samples
Use ggrepel to add text if there are many points.
Code:
library(data.table)
library(ggplot2)
library(ggrepel)
d[, year_elected2 := jitter(year_elected)]
d[, year_left2 := year_left + year_elected2 - year_elected + 0.01]
d[, group := TRUE]
d[factor(years_in_office %/% 9) == 1, group := FALSE]
ggplot(d, aes(year_elected2, years_in_office)) +
geom_segment(aes(x = year_elected2, xend = year_left2,
y = 0, yend = years_in_office, linetype = group),
alpha = 0.8, size = 1, color = "grey") +
geom_point(aes(year_left2), color = "black", size = 3.3) +
geom_point(aes(year_left2, color = group), size = 2.3) +
geom_text_repel(aes(year_left2, label = name), ) +
scale_colour_brewer(guide = FALSE, palette = "Dark2") +
scale_linetype_manual(guide = FALSE, values = c(2, 1)) +
labs(x = "Year elected",
y = "Years on office") +
theme_minimal(base_size = 10)
Result:
For the record and to address my comment on #PoGibas answer above, here's my tidyverse version:
data_transform <- data_sj %>%
mutate(year_elected_jitter = jitter(year_elected)) %>%
mutate(year_left_jitter = year_left + year_elected_jitter - year_elected + 0.01)
ggplot(data_transform, aes(year_elected, years_in_office, label = name)) +
geom_segment(aes(x = year_elected_jitter, y = 0, xend = year_left_jitter, yend = years_in_office, color = gender), size = 0.3) +
geom_text_repel(aes(year_left_jitter, label = name)) +
theme_minimal()
I'm trying to plot 2 sets of data points and a single line in R using ggplot.
The issue I'm having is with the legend.
As can be seen in the attached image, the legend applies the lines to all 3 data sets even though only one of them is plotted with a line.
I have melted the data into one long frame, but this still requires me to filter the data sets for each individual call to geom_line() and geom_path().
I want to graph the melted data, plotting a line based on one data set, and points on the remaining two, with a complete legend.
Here is the sample script I wrote to produce the plot:
xseq <- 1:100
x <- rnorm(n = 100, mean = 0.5, sd = 2)
x2 <- rnorm(n = 100, mean = 1, sd = 0.5)
x.lm <- lm(formula = x ~ xseq)
x.fit <- predict(x.lm, newdata = data.frame(xseq = 1:100), type = "response", se.fit = TRUE)
my_data <- data.frame(x = xseq, ypoints = x, ylines = x.fit$fit, ypoints2 = x2)
## Now try and plot it
melted_data <- melt(data = my_data, id.vars = "x")
p <- ggplot(data = melted_data, aes(x = x, y = value, color = variable, shape = variable, linetype = variable)) +
geom_point(data = filter(melted_data, variable == "ypoints")) +
geom_point(data = filter(melted_data, variable == "ypoints2")) +
geom_path(data = filter(melted_data, variable == "ylines"))
pushViewport(viewport(layout = grid.layout(1, 1))) # One on top of the other
print(p, vp = viewport(layout.pos.row = 1, layout.pos.col = 1))
You can set them manually like this:
We set linetype = "solid" for the first item and "blank" for others (no line).
Similarly for first item we set no shape (NA) and for others we will set whatever shape we need (I just put 7 and 8 there for an example). See e.g. http://www.r-bloggers.com/how-to-remember-point-shape-codes-in-r/ to help you to choose correct shapes for your needs.
If you are happy with dots then you can use my_shapes = c(NA,16,16) and scale_shape_manual(...) is not needed.
my_shapes = c(NA,7,8)
ggplot(data = melted_data, aes(x = x, y = value, color=variable, shape=variable )) +
geom_path(data = filter(melted_data, variable == "ylines") ) +
geom_point(data = filter(melted_data, variable %in% c("ypoints", "ypoints2"))) +
scale_colour_manual(values = c("red", "green", "blue"),
guide = guide_legend(override.aes = list(
linetype = c("solid", "blank","blank"),
shape = my_shapes))) +
scale_shape_manual(values = my_shapes)
But I am very curious if there is some more automated way. Hopefully someone can post better answer.
This post relied quite heavily on this answer: ggplot2: Different legend symbols for points and lines