Changing boxplot width (measuring multiple categorical variables) for categorical conditions with missing data - r

As a preliminary disclaimer, I am still very new to R (this is the first analysis I've performed independently), and am hoping this is a reproducible example.
I have a dataset measuring the d.13.C and d.18.O values of various enamel samples through time and space. I want to represent trends within Families across space and time. I have a boxplot I generated in ggplot2 that does this, but I'm running into a few problems:
d %>%
mutate(across(Member, factor, levels = c("UpperBurgi", "KBS", "Okote"))) %>%
mutate(across(Dep_context, factor, levels = c("Lacustrine", "Deltaic", "Fluvial "))) %>%
ggplot(aes(x = Member, y = d.13.C)) +
geom_boxplot(aes(x = Member, y = d.13.C, col = Dep_context, fill = Dep_context), alpha = 0.5, lwd = 1) +
facet_wrap(~Family) +
scale_fill_brewer(palette = "Dark2") +
scale_color_brewer(palette = "Dark2") +
theme_bw()
It produces something like this:
Since my data is not evenly distributed (not every depositional context is represented in each geologic member in each family), the boxplots for each depositional environment are different. I would like them to all be the same width, regardless of if the data is present or not (e.g., equivalent to the size of the ones in Bovidae in the KBS Member).
I've tried messing around with width = in the geom_boxplot call, I've tried using theme() to change aspects of the grid, and I've tried the drop = FALSE call, but that didn't change anything. I've also tried faceting my member and depositional environment, but that did not look as appealing and seemed clunkier. Is there a way to accomplish this, or is faceting the way to go?
I provided my dataframe below. *note: it's a subset since otherwise, the output was too long.
dput(head(d))
structure(list(CA = c("6", "1", "104", "105", "6A", "6A"), Member = c("KBS",
"Okote", "KBS", "KBS", "KBS", "KBS"), Dep_context = c("Deltaic",
"Fluvial ", "Fluvial ", "Fluvial ", "Deltaic", "Deltaic"), Family = c("Equidae",
"Equidae", "Equidae", "Equidae", "Equidae", "Equidae"), Tribe = c("",
"", "", "", "", ""), Genus = c("Equus", "Equus", "Equus", "Equus",
"Equus", "Equus"), d.13.C = c(-0.3, -0.7, 0.7, -0.9, -0.1, -0.8
), d.18.O = c(0, 1.6, 4, 2.6, 1.8, 0.2), Age.range = c("1.87-1.56",
"1.56-1.38", "1.87-1.56", "1.87-1.56", "1.87-1.56", "1.87-1.56"
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))

You could use position_dodge2 with preserve = "single" to keep the boxplot width the same across different groups like this:
library(ggplot2)
library(dplyr)
d %>%
mutate(across(Member, factor, levels = c("UpperBurgi", "KBS", "Okote"))) %>%
mutate(across(Dep_context, factor, levels = c("Lacustrine", "Deltaic", "Fluvial "))) %>%
ggplot(aes(x = Member, y = d.13.C)) +
geom_boxplot(aes(x = Member, y = d.13.C, col = Dep_context, fill = Dep_context), alpha = 0.5, lwd = 1,
position = position_dodge2(preserve = "single")) +
facet_wrap(~Family) +
scale_fill_brewer(palette = "Dark2") +
scale_color_brewer(palette = "Dark2") +
theme_bw()
Created on 2023-02-08 with reprex v2.0.2

Related

Specification curve "choices" plot using ggplot2

I have a small dataset of estimates from many regressions of an outcome variable on a main treatment variable and then various sets of control variables (in fact, all possible combinations of those controls variables). The table of estimates is as follows:
df <-
structure(list(control_set = c("cen21_hindu_pct", "cen83_urban_pct",
"cen21_hindu_pct + cen83_urban_pct", "NONE"), xest = c(0.0124513609978549,
0.00427174623249021, 0.006447506098051, 0.0137107176362076),
xest_conf_low = c(0.00750677700140716, -0.00436301983024899,
-0.0013089334064237, 0.00925185534519074), xest_conf_high = c(0.0173959449943027,
0.0129065122952294, 0.0142039456025257, 0.0181695799272245
)), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"
))
I want to make the two plots for the classic "specification curve analysis." The top plot is simply the set of estimates ordered by the magnitude of the estimate on the main treatment variable (no issue here):
df %>%
arrange(xest) %>%
mutate(specifications = 1:nrow(.)) %>%
ggplot(aes(x = specifications, y = xest, ymin = xest_conf_low, ymax = xest_conf_high)) +
geom_pointrange(alpha = 0.1, size = 0.6, fatten = 1) +
labs(x = "", y = "Estimate\n") +
theme_bw()
My problem is with the aligned plot underneath that describes the control-set choices. Directly underneath each coefficient dot and whisker from the plot just made I want a plot that indicates the set of corresponding control variables that were included in that model (i.e. the list of controls in the control_set column in the df data frame row). So the plot I need in this example would look just like this:
This is a (failed) sketch of what I tried to get there, by modifying the earlier estimation dataset in long form, but I couldn't get multiple ticks to show vertically: (Note, this bit of code won't run)
# forplot %>%
# arrange(xest) %>%
# mutate(specifications = 1:nrow(.)) %>%
# mutate(value = "|") %>%
# ggplot(aes(specifications, term)) +
# geom_text(aes(label = value)) +
# scale_color_manual(values = c("lightblue")) +
# labs(x = "\nSpecification number", y = "") +
# theme_bw()
How can I use ggplot2 to make the plot-figure shown above from the information in the data frame, df?
If we define your plot as -> a...
library(patchwork)
b <- tibble(specifications = c(1,2,2,3),
control_set = rep(c("cen83_urban_pct", "cen21_hindu_pct"), each = 2)) %>%
ggplot(aes(specifications, control_set)) +
geom_text(aes(label = "|"), size = 5) +
coord_cartesian(xlim = c(1,4)) +
labs(x = NULL, y = NULL) +
theme_bw()+
theme(axis.ticks = element_blank(),
axis.text.x = element_blank())
a/b + plot_layout(heights = c(3,1))
If you want to generate the key automatically, you might use something like this:
library(dplyr)
df %>%
select(control_set) %>%
mutate(specifications = 1:4) %>%
separate_rows(control_set, sep = "\\+") %>%
mutate(control_set = trimws(control_set)) %>% # b/c my regex not good enough to trim spaces in line above
...
If you want to relabel the numbers in the y-axis with the control_set labels you can add
+ scale_y_continuous(breaks = df$xest, labels = df$control_set)

Consistent mapping from value to color in ggplot

I think I'm missing something very easy here, but I just can't figure it out at the moment:
I would like to consistently assign colors to certain values from a column across multiple plots.
So I have this tibble (sl):
# A tibble: 15 x 3
class hex x
<chr> <chr> <int>
1 translational slide #c23b22 1
2 rotational slide #AFC6CE 2
3 fast flow-type #b7bf5e 3
4 complex #A6CEE3 4
5 area subject to rockfall/topple #1F78B4 5
6 fall-type #B2DF8A 6
7 n.d. #33A02C 7
8 NA #FB9A99 8
9 area subject to shallow-slides #E31A1C 9
10 slow flow-type #FDBF6F 10
11 topple #FF7F00 11
12 deep-seated movement #CAB2D6 12
13 subsidence #6A3D9A 13
14 areas subject to subsidence #FFFF99 14
15 area of expansion #B15928 15
This should recreate it:
structure(list(class = c("translational slide", "rotational slide",
"fast flow-type", "complex", "area subject to rockfall/topple",
"fall-type", "n.d.", NA, "area subject to shallow-slides", "slow flow-type",
"topple", "deep-seated movement", "subsidence", "areas subject to subsidence",
"area of expansion"), hex = c("#c23b22", "#AFC6CE", "#b7bf5e",
"#A6CEE3", "#1F78B4", "#B2DF8A", "#33A02C", "#FB9A99", "#E31A1C",
"#FDBF6F", "#FF7F00", "#CAB2D6", "#6A3D9A", "#FFFF99", "#B15928"
), x = 1:15), row.names = c(NA, -15L), class = c("tbl_df", "tbl",
"data.frame"))
Now I would like to plot each class with a bar in the color if its hex-code (for now just for visualization purposes). So I did:
ggplot(sl) +
geom_col(aes(x = x,
y = 1,
fill = class)) +
scale_fill_manual(values = sl$hex) +
geom_text(aes(x = x,
y = 0.5,
label = class),
angle = 90)
But these are not the colors as they are in the tibble.
So I tried to follow this guide: How to assign colors to categorical variables in ggplot2 that have stable mapping? and created this:
# create the color palette
mycols = sl$hex
names(mycols) = sl$class
and then plotted it with
ggplot(sl) +
geom_col(aes(x = x,
y = 1,
fill = class)) +
scale_fill_manual(values = mycols) +
geom_text(aes(x = x,
y = 0.5,
label = class),
angle = 90)
But the results is the same. It's this:
For example the translational slide has the hex code: "#c23b22" and should be a pastell darkish red.
Anyone might have an idea what I'm missing here?
Consider this:
sl <- structure(list(class = c("translational slide", "rotational slide",
"fast flow-type", "complex", "area subject to rockfall/topple",
"fall-type", "n.d.", NA, "area subject to shallow-slides", "slow flow-type",
"topple", "deep-seated movement", "subsidence", "areas subject to subsidence",
"area of expansion"), hex = c("#c23b22", "#AFC6CE", "#b7bf5e",
"#A6CEE3", "#1F78B4", "#B2DF8A", "#33A02C", "#FB9A99", "#E31A1C",
"#FDBF6F", "#FF7F00", "#CAB2D6", "#6A3D9A", "#FFFF99", "#B15928"
), x = 1:15), row.names = c(NA, -15L), class = c("tbl_df", "tbl",
"data.frame"))
sl$class <- factor( sl$class, levels=unique(sl$class) )
cl <- sl$hex
names(cl) <- paste( sl$class )
ggplot(sl) +
geom_col(aes(x = x,
y = 1,
fill = class)) +
scale_fill_manual( values = cl, na.value = cl["NA"] ) +
geom_text(aes(x = x,
y = 0.5,
label = class),
angle = 90)
By changing class to a factor and setting levels to it, and using a named vector for your values in scale_fill_manual, and using na.value in there properly, yo might get something that looks more as expected.
You need to provide correct order to colors as per your column, since there is already one called 'x' I have used it as well. Also I replaced NA with character 'NA'. I have checked few of them, Please let me know if this is not the desired output. Thanks
#Assuming df is your dataframe:
df[is.na(df$class), 'class'] <- 'NA'
ggplot(df) +
geom_col(aes(x = x,
y = 1,
fill = factor(x))) +
scale_fill_manual(values = df$hex, labels=df$class) +
geom_text(aes(x = x,
y = 0.5,
label = class),
angle = 90)
Output:
I think the problem is that scale_fill_manual expect the order of its values and labels arguments to match. This isn't the case with your dataset.
Does
sl %>% ggplot() +
geom_col(aes(x = x,
y = 1,
fill = hex)) +
geom_text(aes(x = x,
y = 0.5,
label = class),
angle = 90) +
scale_fill_manual(values=sl$hex, labels=sl$class)
Give you what you want?
next time, please dput() your test data: it took me as long to create the test dataset as to answer your question. Also, using hex codes for colours make it difficult to check the colours are as expected. For a MWE, blue/green/black etx would have been more helpful.

ggalluvial: Order flow of lines based on a variable within stratum

I am using generic Diabetes data,
Processing data(continuous to discrete)
library("ggalluvial")
dat$Glucose_cat<- cut(dat$Glucose,breaks=c(1,100,125,max(dat$Glucose)), labels = c("Low","Normal","High"))
dat$BMI_cat <- cut(dat$BMI, breaks= c(17,25,30,35,40,max(dat$Age)), labels = c("18-25", "25-30", "30-35", "35-40", "40+"))
dat$Outcome_cat<-cut(dat$Outcome, breaks = c(-Inf,0,Inf), labels = c("Negative", "Positive"))
dat$freq <- 1`
dat3d <- dat[, .(freq3d = .N, freq = sum(freq)), by=list(Glucose_cat,
BMI_cat, Outcome_cat)]
dat3d<- dat3d[!(is.na(dat3d$BMI_cat))]
dat3d<- dat3d[!(is.na(dat3d$Glucose_cat))]
setnames(dat3d, old = c('Glucose_cat', 'BMI_cat','Outcome_cat'), new = c('Glucose', 'BMI','Diabetes'))
ggplot(dat3d,aes(axis1= Diabetes, axis2=Glucose, axis3 = BMI, y = freq))+
geom_alluvium(aes(fill=Diabetes), reverse = FALSE)+
scale_fill_manual(labels = c("Negative", "Positive"), values = c("blue", "red"))+
scale_x_discrete(limits = c("Glucose", "BMI"), expand = c(.001, .001))+
geom_stratum(alpha=0.6, reverse = FALSE)+
geom_text(stat="stratum", label.strata= TRUE, reverse = FALSE)+
ylab("Frequency")+xlab("Features")+
theme(legend.title = element_text(size=12))+
theme_minimal()
following plot is displayed with the above code
I want to plot such that when Glucose is "Positive" and BMI is "High", it should one single red line and Not 5 lines as in my case.
I am pretty new to R programming and i am exploring different libraries to create this flow diagram. I tried something with "alluvial" library which has this function "layer", then everything is sorted on some value in my case i did sort it for Daibetes=="Negative" and plot looked like thisplot using alluvial library, sorted like all red lines are above blue line in each case
I want to do something similar using ggalluvial. Look forward to leads. Thanks in advance.
You need to set aes.bind = TRUE in the geom_alluvium() which gets passed to stat_flow() which prioritizes the aesthetics over the axis lodes when plotting.
ggplot(dat3d,aes(axis1= Diabetes, axis2=Glucose, axis3 = BMI, y = freq3d)) +
geom_alluvium(aes(fill=Diabetes),aes.bind=TRUE, reverse = FALSE) +
scale_fill_manual(labels = c("Negative", "Positive"), values = c("blue", "red")) +
scale_x_discrete(limits = c("Diabetes", "Glucose", "BMI"), expand = c(.001, .001)) +
geom_stratum(alpha=0.6, reverse = FALSE) +
geom_text(stat="stratum", label.strata= TRUE, reverse = FALSE) +
ylab("Frequency")+xlab("Features") +
theme(legend.title = element_text(size=12)) +
theme_minimal()

Ordering of items within a stacked geom_bar

I want, for reasons which seems good to me, to plot a stacked bar chart, with the bars in a specific, data dependent order. For reasons which are obscure to me, it does not seem to work. Specifically, while I can readily arrange the rows of my dataframe in the right order, and make the column of names identifying the bars an ordered factor, so getting the bars in the order I desire, the graph does not list the columns of the dataframe in the order I desire.
An example
tab <- structure(list(Item = c("Personal", "Peripheral", "Communication", "Multimedia", "Office", "Social Media"), `Not at all` = c(3.205128, 18.709677, 5.844156, 31.578947, 20.666667, 25.827815), Somewhat = c(30.76923, 23.87097, 24.67532, 18.42105, 30, 16.55629), `Don't know` = c(0.6410256, 2.5806452, 1.9480519, 11.1842105, 2.6666667, 5.9602649), Confident = c(32.69231, 29.67742, 33.11688, 17.10526, 23.33333, 27.15232), `Very confident` = c(32.69231, 25.16129, 34.41558, 21.71053, 23.33333, 24.50331)), .Names = c("Item", "Not at all", "Somewhat", "Don't know", "Confident", "Very confident"), row.names = c(NA, -6L), class = "data.frame")
Title <- 'Plot title'
ResponseLevels <- c("Not at all", "Somewhat", "Don't know", "Confident", "Very confident") # Labels for bars
pal.1 <- brewer.pal(category, 'BrBG') # Colours
tab <- tab %>% arrange(.[,2]) # Sort by first columns of responses
tab$Item <- factor(tab$Item, levels = tab$Item[order(tab[,2])], ordered = TRUE) # Reorder factor levels
tab.m <- melt(tab, id = 'Item')
tab.m$col <- rep(pal.1, each = items) # Set colours
g <- ggplot(data = tab.m, aes(x = Item, y = value, fill = col)) +
geom_bar(position = "stack", stat = "identity", aes(group = variable)) +
coord_flip() +
scale_fill_identity("Percent", labels = ResponseLevels,
breaks = pal.1, guide = "legend") +
labs(title = Title, y = "", x = "") +
theme(plot.title = element_text(size = 14, hjust = 0.5)) +
theme(axis.text.y = element_text(size = 16,hjust = 0)) +
theme(legend.position = "bottom")
g
The stacked pieces of the bars run from right to left, from 'Not at all' to 'Very confident'. The items are in the correct order, from 'Multimedia' to 'Personal', ordered by the proportion of those who said 'Not at all' to each item.
What I want to get is this graph with the responses ordered the other way, the same way as the legend, that is from 'Not at all' on the left, to 'Very confident' on the right. I cannot figure out how this ordering is set, nor how to change it.
I've read through the 'similar questions', but can see no answer to this specific query. Suggestions, using ggplot, not base R graphics, welcome.
Ok, building on the useful, and much appreciated answer from allstaire, I try the following
library(tidyverse)
tab <- structure(list(Item = c("Personal", "Peripheral", "Communication", "Multimedia", "Office", "Social Media"), `Not at all` = c(3.205128, 18.709677, 5.844156, 31.578947, 20.666667, 25.827815), Somewhat = c(30.76923, 23.87097, 24.67532, 18.42105, 30, 16.55629), `Don't know` = c(0.6410256, 2.5806452, 1.9480519, 11.1842105, 2.6666667, 5.9602649), Confident = c(32.69231, 29.67742, 33.11688, 17.10526, 23.33333, 27.15232), `Very confident` = c(32.69231, 25.16129, 34.41558, 21.71053, 23.33333, 24.50331)), .Names = c("Item", "Not at all", "Somewhat", "Don't know", "Confident", "Very confident"), row.names = c(NA, -6L), class = "data.frame")
tab <- tab %>% select(1,6,5,4,3,2,1) ## Re-order the columns of tab
tab.m <- tab %>% arrange(`Not at all`) %>%
mutate(Item = factor(Item, levels = Item[order(`Not at all`)])) %>%
gather(variable, value, -Item, factor_key = TRUE)
ggplot(data = tab.m, aes(x = Item, y = value, fill = variable)) +
geom_col() +
coord_flip() +
scale_fill_brewer("Percent", type = 'cat', palette = 'BrBG',
guide = guide_legend(reverse = TRUE)) +
labs(title = 'Plot title', y = NULL, x = NULL) +
theme(legend.position = "bottom")
And this is exactly the graph I want, so my pressing problem is solved.
However, if I say instead
ggplot(data = tab.m, aes(x = Item, y = value, fill = variable)) +
geom_col() +
coord_flip() +
scale_fill_brewer("Percent", type = 'cat', palette = 'BrBG',
guide = guide_legend(reverse = FALSE)) +
labs(title = 'Plot title', y = NULL, x = NULL) +
theme(legend.position = "bottom")
The picture I get is this
Here the body of the chart is correct, but the legend is going in the wrong direction.
This solves my problem, but does not quite answer my question. I start with a dataframe, and to get what I want I have to reverse the order of the data columns, and reverse the guide legend. This evidently works, but it's perverse.
So, how does a stacked bar chart decide in what order to present the stacked items? It's clearly related to their order in the melted dataset, but simply changing the order leaves the legend going in the wrong direction. Looking at the melted dataset, tab.m, from top to bottom, the responses are in the order 'Very confident' to 'Not at all', but the default legend is the reverse order 'Not at all' to 'Very confident'.
If you pass guide_legend instead of just a string, you can set its reverse parameter to TRUE. Simplifying a bit,
library(tidyverse)
tab <- structure(list(Item = c("Personal", "Peripheral", "Communication", "Multimedia", "Office", "Social Media"), `Not at all` = c(3.205128, 18.709677, 5.844156, 31.578947, 20.666667, 25.827815), Somewhat = c(30.76923, 23.87097, 24.67532, 18.42105, 30, 16.55629), `Don't know` = c(0.6410256, 2.5806452, 1.9480519, 11.1842105, 2.6666667, 5.9602649), Confident = c(32.69231, 29.67742, 33.11688, 17.10526, 23.33333, 27.15232), `Very confident` = c(32.69231, 25.16129, 34.41558, 21.71053, 23.33333, 24.50331)), .Names = c("Item", "Not at all", "Somewhat", "Don't know", "Confident", "Very confident"), row.names = c(NA, -6L), class = "data.frame")
tab.m <- tab %>% arrange(`Not at all`) %>%
mutate(Item = factor(Item, levels = Item[order(`Not at all`)])) %>%
gather(variable, value, -Item, factor_key = TRUE)
ggplot(data = tab.m, aes(x = Item, y = value, fill = variable)) +
geom_col() +
coord_flip() +
scale_fill_brewer("Percent", palette = 'BrBG',
guide = guide_legend(reverse = TRUE)) +
labs(title = 'Plot title', y = NULL, x = NULL) +
theme(legend.position = "bottom")
For the edit:
Bar order is determined by factor level order, which in the above is determined by column order due to the use of gather to create the factor, thoughcoord_flip is making it less obvious. It's easy to reverse level order with levels<- or by reassembling the factor, though. To keep the colors with the same levels, pass direction = -1 to scale_fill_brewer to reverse their order, as well.
tab.m <- tab %>% arrange(`Not at all`) %>%
mutate(Item = factor(Item, levels = Item[order(`Not at all`)])) %>%
gather(variable, value, -Item, factor_key = TRUE) %>%
mutate(variable = factor(variable, levels = rev(levels(variable)), ordered = TRUE))
ggplot(data = tab.m, aes(x = Item, y = value, fill = variable)) +
geom_col() +
coord_flip() +
scale_fill_brewer("Percent", palette = 'BrBG', direction = -1,
guide = guide_legend(reverse = TRUE)) +
labs(title = 'Plot title', y = NULL, x = NULL) +
theme(legend.position = "bottom")

Defining and showing legend in ggplot

Edit: Added data.
As a previous Stata-user and coding-newbie I'm having fun & trouble with the transition to R. I'm trying to make a line plot based on some of the variables in the dataframe seen in the screenshot. I've got the lines going the way I want them to, but adding a simple legend where I get to label each linetype is not at all going well. I can't even get a legend to show up! I'm fairly certain I'm not getting the level of abstraction used by ggplot here, but I'm also completely certain I want to specify this stuff manually to get the format right across multiple graphs for the publication.
By all means tell me if my whole approach is wrong, but if there is a line or two I can add that clears this up instead of adding more levels of abstraction I'd be a very happy camper.
(I have tried many solutions from similar questions on Stack Overflow and elsewhere, but I'm just not getting it.)
codingsuccess <- ggplot(data = ystats, aes(x=iyear)) +
geom_line(aes(y = pnattacks), linetype = "dotted", size = 1) +
geom_line(aes(y = ptra), linetype = "longdash", size = 1) +
geom_line(aes(y = pdom), linetype = "F1", size = 1) +
geom_line(aes(y = punc), linetype = "solid", size = 1) +
labs(title = "Coding Success", x = "Year", y = "Percentage") +
theme_bw()
codingsuccess # View plot
Data
structure(list(pnattacks = c(96.6954022988506, 94.229722373435,
95.4063604240283, 93.9429464634623, 94.5975744211687, 96.4044943820225,
96.3838166845686, 93.6634494334872, 92.4137931034483, 95.6087824351297,
89.628349178911, 93.6086529006883, 93.4337997847148, 95.7178841309824,
93.8461538461539, 96.2779156327543, 95.0248756218905, 96.039603960396,
96.7592592592593, 96.1538461538462, 96.219035202086, 95.4599761051374,
86.3636363636364, 94.058229352347, 95.1696377228292, 94.8897256589564,
93.6298076923077, 91.9762258543834, 87.6906318082789, 89.1412056151941
), ptra = c(91.6216216216216, 94.2408376963351, 94.6564885496183,
90.9090909090909, 87.7952755905512, 94.5887445887446, 97.5450081833061,
96.7051070840198, NaN, 97.6311336717428, 93.2668329177057, 95.9090909090909,
93.3920704845815, 99.4413407821229, 97.0588235294118, 98.7421383647799,
95.4022988505747, 98.4455958549223, 95.852534562212, 95.4022988505747,
94.9074074074074, 94.6341463414634, 93.1578947368421, 94.5205479452055,
94.0639269406393, 88.5826771653543, 91.554054054054, 89.3041237113402,
85.7374392220421, 91.2866449511401), pdom = c(98.116539140671,
97.3818181818182, 98.1110475100172, 97.2609561752988, 97.7892756349953,
98.0813953488372, 97.5047080979284, 95.0148367952522, NaN, 96.2833914053426,
94.4610778443114, 94.6575342465753, 94.4532488114105, 99.5024875621891,
92.375366568915, 96.4285714285714, 98.0952380952381, 96.9072164948454,
97.2027972027972, 98.1060606060606, 97.9779411764706, 98.3660130718954,
96.9072164948454, 95.5916473317865, 98.2203969883641, 97.3514211886305,
95.5904334828102, 94.8237394020527, 88.2456915598763, 88.370142577579
), punc = c(15.8536585365854, 23.1884057971014, 41.5730337078652,
0.641025641025641, 25, 20.7920792079208, 19.8412698412698, 22.5641025641026,
0, 31.9587628865979, 21.0526315789474, 43.4782608695652, 49.5867768595041,
10.5263157894737, 7.69230769230769, 9.09090909090909, 0, 40,
8.69565217391304, 0, 0, 7.31707317073171, 0.75187969924812, 1.96078431372549,
14.2857142857143, 25.1968503937008, 7.01754385964912, 50.6398537477148,
60.5584642233857, 73.558981233244), iyear = 1985:2014), .Names = c("pnattacks",
"ptra", "pdom", "punc", "iyear"), row.names = c(NA, 30L), class = "data.frame")
As suggested in the comments, tidyr::gather will get your data from wide to long format, which makes ggplot much easier.
I'd suggest color rather than linetype to distinguish the groups, but here it is with linetype:
library(tidyr)
library(ggplot2)
ystats %>%
gather(coding_success, percentage, -iyear) %>%
ggplot(aes(iyear, percentage)) +
geom_line(aes(linetype = coding_success), size = 1) +
scale_linetype_manual(values = c("F1", "dotted", "longdash", "solid")) +
labs(x = "Year", y = "percentage", title = "Coding Success")
Result:

Resources