Zig Zag when using geom_line with ggplot in R - r

I would really appreciate some insight on the zagging when using the following code in R:
tbi_military %>%
ggplot(aes(x = year, y = diagnosed, color = service)) +
geom_line() +
facet_wrap(vars(severity))
The dataset is comprised of 5 variables (3 character, 2 numerical). Any insight would be so appreciated.
enter image description here

This is just an illustration with a standard dataset. Let's say we're interested in plotting the weight of chicks over time depending on a diet. We would attempt to plot this like so:
library(ggplot2)
ggplot(ChickWeight, aes(Time, weight, colour = factor(Diet))) +
geom_line()
You can see the zigzag pattern appear, because per diet/time point, there are multiple observations. Because geom_line sorts the data depending on the x-axis, this shows up as a vertical line spanning the range of datapoints at that time per diet.
The data has an additional variable called 'Chick' that separates out individual chicks. Including that in the grouping resolves the zigzag pattern and every line is the weight over time per individual chick.
ggplot(ChickWeight, aes(Time, weight, colour = factor(Diet))) +
geom_line(aes(group = interaction(Chick, Diet)))
If you don't have an extra variable that separates out individual trends, you could instead choose to summarise the data per timepoint by, for example, taking the mean at every timepoint.
ggplot(ChickWeight, aes(Time, weight, colour = factor(Diet))) +
geom_line(stat = "summary", fun = mean)
Created on 2021-08-30 by the reprex package (v1.0.0)

Related

How do you color the 3 bars with the highest value differently (gold, silver, bronze) from the rest in ggplot?

I am currently working on an R Markdown document for our school, which should make documenting student performance easier for the teachers.
I would like to include a bar chart using ggplot2, which
orders students from best to worst based on their GPA, and
colors the 3 highest bars gold, silver and bronze respectively, and all the other bars blue.
Note that the code needs to work with an arbitrary number of students. What I tried is:
subjects_long %>%
group_by(Name) %>%
summarize(gpa = mean(grade)) %>%
ggplot(aes(x = reorder(Name, GPA), y = GPA, fill = Name)) +
geom_col() +
coord_flip() +
scale_y_continuous(breaks = seq(0, 5, by = 0.1)) +
scale_fill_manual(values = c("#d6af36","#d7d7d7","#a77044",
rep("blue", length(subjects$Name)-3)))
This ensures that the code runs, and there is an appropriate number of columns every time, regardless of which dataset (class data) I run it on, but the bars getting colored gold/silver/bronze are not the ones with the highest value, but the ones with the highest (= alphabetically) names, regardless of how high their GPA is. Apparently, this is because scale_fill_manual orders by levels of the factor, not by Y-axis values.
Any help would be greatly appreciated!

R, ggplot, How do I keep related points together when using jitter?

One of the variables in my data frame is a factor denoting whether an amount was gained or spent. Every event has a "gain" value; there may or may not be a corresponding "spend" amount. Here is an image with the observations overplotted:
Adding some random jitter helps visually, however, the "spend" amounts are divorced from their corresponding gain events:
I'd like to see the blue circles "bullseyed" in their gain circles (where the "id" are equal), and jittered as a pair. Here are some sample data (three days) and code:
library(ggplot2)
ccode<-c(Gain="darkseagreen",Spend="darkblue")
ef<-data.frame(
date=as.Date(c("2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-02","2021-03-02","2021-03-02","2021-03-02","2021-03-02","2021-03-02","2021-03-02","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-02","2021-03-02","2021-03-02","2021-03-02","2021-03-02","2021-03-02","2021-03-02","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-03")),
site=c("Castle","Temple","Temple","Temple","Temple","Temple","Palace","Palace","Castle","Castle","Castle","Temple","Temple","Palace","Palace","Castle","Castle","Castle","Castle","Castle","Temple","Temple","Palace","Castle","Temple","Temple","Temple","Temple","Temple","Palace","Palace","Castle","Castle","Castle","Temple","Temple","Palace","Palace","Castle","Castle","Castle","Castle","Castle","Temple","Temple","Palace"),
id=c("C123","T101","T93","T94","T95","T96","P102","P96","C126","C127","C128","T100","T98","P100","P98","C129","C130","C131","C132","C133","T104","T99","P99","C123","T101","T93","T94","T95","T96","P102","P96","C126","C127","C128","T100","T98","P100","P98","C129","C130","C131","C132","C133","T104","T99","P99"),
gainspend=c("Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend"),
amount=c(6,14,34,31,3,10,6,14,2,16,16,14,1,1,15,11,8,7,2,10,15,4,3,NA,NA,4,5,NA,NA,NA,NA,NA,NA,2,NA,1,NA,3,NA,NA,2,NA,NA,2,NA,3))
#▼ 3 day, points centered
ggplot(ef,aes(date,site)) +
geom_point(aes(size=amount,color=gainspend),alpha=0.5) +
scale_color_manual(values=ccode) +
scale_size_continuous(range=c(1,15),breaks=c(5,10,20))
#▼ 3 day, jitted
ggplot(ef,aes(date,site)) +
geom_point(aes(size=amount,color=gainspend),alpha=0.5,position=position_jitter(w=0,h=0.2)) +
scale_color_manual(values=ccode) +
scale_size_continuous(range=c(1,15),breaks=c(5,10,20))
My main idea is the old "add jitter manually" approach. I'm wondering if a nicer approach could be something like plotting little pie charts as points a la package scatterpie.
In this case you could add a random number for the amount of jitter to each ID so points within groups will be moved the same amount. This takes doing work outside of ggplot2.
First, draw the "jitter" to add for each ID. Since a categorical axis is 1 unit wide, I choose numbers between -.3 and .3. I use dplyr for this work and set the seed so you will get the same results.
library(dplyr)
set.seed(16)
ef2 = ef %>%
group_by(id) %>%
mutate(jitter = runif(1, min = -.3, max = .3)) %>%
ungroup()
Then the plot. I use a geom_blank() layer so that the categorical site axis is drawn before I add the jitter. I convert site to be numeric from a factor and add the jitter on; this only works for factors so luckily categorical axes in ggplot2 are based on factors.
Now paired ID's move together.
ggplot(ef2, aes(x = date, y = site)) +
geom_blank() +
geom_point(aes(size = amount, color = gainspend,
y = as.numeric(factor(site)) + jitter),
alpha=0.5) +
scale_color_manual(values = ccode) +
scale_size_continuous(range = c(1, 15), breaks = c(5, 10, 20))
#> Warning: Removed 15 rows containing missing values (geom_point).
Created on 2021-09-23 by the reprex package (v2.0.0)
You can add some jitter by id outside the ggplot() call.
jj <- data.frame(id = unique(ef$id), jtr = runif(nrow(ef), -0.3, 0.3))
ef <- merge(ef, jj, by = 'id')
ef$sitej <- as.numeric(factor(ef$site)) + ef$jtr
But you need to make site integer/numeric to do this. So when it comes to making the plot, you need to manually add axis labels with scale_y_continuous(). (Update: the geom_blank() trick from aosmith above is a better solution!)
ggplot(ef,aes(date,sitej)) +
geom_point(aes(size=amount,color=gainspend),alpha=0.5) +
scale_color_manual(values=ccode) +
scale_size_continuous(range=c(1,15),breaks=c(5,10,20)) +
scale_y_continuous(breaks = 1:3, labels= sort(unique(ef$site)))
This seems to work, but there are still a few gain/spend circles without a partner--perhaps there is a problem with the id variable.
Perhaps someone else has a better approach!

Subsetting data for ggplot2

I have data saved in multiple datasets, each consisting of four variables. Imagine something like a data.table dt consisting of the variables Country, Male/Female, Birthyear, Weighted Average Income. I would like to create a graph where you see only one country's weighted average income by birthyear and split by male/female. I've used the facet_grid() function to get a grid of graphs for all countries as below.
ggplot() +
geom_line(data = dt,
aes(x = Birthyear,
y = Weighted Average Income,
colour = 'Weighted Average Income'))+
facet_grid(Country ~ Male/Female)
However, I've tried isolating the graphs for just one country, but the below code doesn't seem to work. How can I subset the data correctly?
ggplot() +
geom_line(data = dt[Country == 'Germany'],
aes(x = Birthyear,
y = Weighted Average Income,
colour = 'Weighted Average Income'))+
facet_grid(Country ~ Male/Female)
For your specific case the problem is that you are not quoting Male/Female and Weighted Average Income. Also your data and basic aesthetics should likely be part of ggplot and not geom_line. Doing so isolates these to the single layer, and you would have to add the code to every layer of your plot if you were to add for example geom_smooth.
So to fix your problem you could do
library(tidyverse)
plot <- ggplot(data = dt[Country == 'Germany'],
aes(x = Birthyear,
y = sym("Weighted Average Income"),
col = sym("Weighted Average Income")
) + #Could use "`x`" instead of sym(x)
geom_line() +
facet_grid(Country ~ sym("Male/Female")) ##Could use "`x`" instead of sym(x)
plot
Now ggplot2 actually has a (lesser known) builtin functionality for changing your data, so if you wanted to compare this to the plot with all of your countries included you could do:
plot %+% dt # `%+%` is used to change the data used by one or more layers. See help("+.gg")

Make multiple geoms animated in ggplot

I am trying to develop an animated plot showing how the rates of three point attempts and assists have changed for NBA teams over time. While the points in my plot are transitioning correctly, I tried to add a vertical and horizontal mean line, however this is staying constant for the overall averages rather than shifting year by year.
p<-ggplot(dataBREFPerPossTeams, aes(astPerPossTeam,fg3aPerPossTeam,col=ptsPerPossTeam))+
geom_point()+
scale_color_gradient(low='yellow',high='red')+
theme_classic()+
xlab("Assists Per 100 Possessions")+
ylab("Threes Attempted Per 100 Possessions")+labs(color="Points Per 100 Possessions")+
geom_hline(aes(yintercept = mean(fg3aPerPossTeam)), color='blue',linetype='dashed')+
geom_vline(aes(xintercept = mean(astPerPossTeam)), color='blue',linetype='dashed')
anim<-p+transition_time(as.integer(yearSeason))+labs(title='Year: {frame_time}')
animate(anim, nframes=300)
Ideally, the two dashed lines would shift as the years progress, however, right now they are staying constant. Any ideas on how to fix this?
I am using datasets::airquality since you have not shared your data. The idea here is that you need to have the values for your other geom (here it is mean) as a variable in your dataset, so gganimate can draw the connection between the values and frame (i.e. transition_time).
So What I did was grouping by frame (here it is month and it will be yearSeason for you) and then mutating a column with the average of my desired variables. Then in geoms I used that appended variable instead of getting the mean inside of the geom. Look below;
library(datasets) #datasets::airquality
library(ggplot2)
library(gganimate)
library(dplyr)
g <- airquality %>%
group_by(Month) %>%
mutate(mean_wind=mean(Wind),
mean_temp=mean(Temp)) %>%
ggplot()+
geom_point(aes(Wind,Temp, col= Solar.R))+
geom_hline(aes(yintercept = mean_temp), color='blue',linetype='dashed')+
geom_vline(aes(xintercept = mean_wind), color='green',linetype='dashed')+
scale_color_gradient(low='yellow',high='red')+
theme_classic()+
xlab("Wind")+
ylab("Temp")+labs(color="Solar.R")
animated_g <- g + transition_time(as.integer(Month))+labs(title='Month: {frame_time}')
animate(animated_g, nframes=18)
Created on 2019-06-09 by the reprex package (v0.3.0)

How to shade under part of a line from a dataset

I have a simple plot of same data from an experiment.
plot(x=sample95$PositionA, y=sample95$AbsA, xlab=expression(position (mm)), ylab=expression(A[260]), type='l')
I would like to shade a particular area under the line, let's say from 35-45mm. From what I've searched so far, I think I need to use the polygon function, but I'm unsure how to assign vertices from a big dataset like this. Every example I've seen so far uses a normal curve.
Any help is appreciated, I am very new to R/RStudio!
Here is a solution using tidyverse tools including ggplot2. I use the built in airquality dataset as an example.
This first part is just to put the data in a format that we can plot by combining the month and the day into a single date. You can just substitute date for PositionA in your data.
library(tidyverse)
df <- airquality %>%
as_tibble() %>%
magrittr::set_colnames(str_to_lower(colnames(.))) %>%
mutate(date = as.Date(str_c("1973-", month, "-", day)))
This is the plot code. In ggplot2, we start with the function ggplot() and add geom functions to it with + to create the plot in layers.
The first function, geom_line, joins up all observations in the order that they appear based on the x variable, so it makes the line that we see. Each geom needs a particular mapping to an aesthetic, so here we want date on the x axis and temp on the y axis, so we write aes(x = date, y = temp).
The second function, geom_ribbon, is designed to plot bands at particular x values between a ymax and a ymin. This lets us shade the area underneath the line by choosing a constant ymin = 55 (a value lower than the minimum temperature) and setting ymax = temp.
We shade a specific part of the chart by specifying the data argument. Normally geom functions act on the dataset inherited from ggplot(), but you can override them by specifying individually. Here we use filter to only plot the points where the date is in June in geom_ribbon.
ggplot(df) +
geom_line(aes(x = date, y = temp)) +
geom_ribbon(
data = filter(df, date < as.Date("1973-07-01") & date > as.Date("1973-06-01")),
mapping = aes(x = date, ymax = temp, ymin = 55)
)
This gives the chart below:
Created on 2018-02-20 by the reprex package (v0.2.0).

Resources