Grouped Frequency Bars in R using ggplot - r

I'm trying to produce a bar graph with frequencies of multiple groups. I tried using geom_bar() but I keep running into "Error: stat_count() must not be used with a y aesthetic." I have one line for each participant, with age (2 categories), condition (2 categories), and their performance (0 or 1). From what I read on the manual and in pretty much everywhere online, if I use
bar<-ggplot(data, aes(age, performance, fill = condition)) + geom_bar(position = "dodge")
I should get what I want (which is this), but instead I get the error and I can't figure out what I'm missing. Isn't the geom_bar() supposed to give count by default? When I use stat="identity" I get full bars like so: how it actually looks.
Please help! Any advice will be greatly appreciated.
EDITED:
Here's my actual data:
structure(list(ageyears = c(4L, 4L, 5L, 5L, 5L, 4L, 5L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 4L, 5L, 4L, 5L, 4L, 5L,
4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 4L, 4L, 4L, 4L, 5L, 4L,
5L, 5L, 4L, 4L, 4L, 5L, 4L, 4L, 5L, 4L, 5L, 4L, 4L, 5L, 5L, 4L,
4L, 5L, 4L, 5L, 4L, 5L, 4L, 4L, 5L, 4L, 5L, 4L, 5L, 4L, 5L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 5L, 4L, 4L, 5L, 5L, 4L, 5L, 5L, 4L, 4L,
5L, 5L, 5L, 4L, 5L, 5L, 4L, 5L, 5L, 4L, 4L, 5L, 4L, 5L, 5L, 4L,
5L, 4L, 4L, 5L, 5L, 4L, 5L, 5L, 5L, 4L, 5L, 4L, 5L, 4L, 5L, 4L,
5L, 5L, 5L, 4L, 5L, 5L, 4L, 5L, 5L, 5L, 4L, 5L, 4L, 5L, 4L, 5L,
4L, 5L, 4L, 5L, 4L, 5L, 4L, 5L, 4L, 5L, 4L, 5L, 4L, 5L, 5L, 5L,
5L, 5L, 4L, 4L, 4L, 5L, 4L), MatrixLabels = structure(c(2L, 2L,
1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L,
1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L,
1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L,
2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L,
1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L,
2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L,
2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L,
2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L,
2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L,
2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("0",
"1"), class = "factor"), Mat_sort_pass_fail = c(0L, 0L, 1L, 1L,
0L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L,
1L, 1L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 1L,
0L, 0L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 0L,
1L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L,
1L, 1L, 0L, 0L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 1L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 1L,
0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L,
1L, 0L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 0L)), .Names = c("ageyears",
"MatrixLabels", "Mat_sort_pass_fail"), row.names = c(1L, 2L,
3L, 4L, 5L, 6L, 7L, 8L, 11L, 12L, 13L, 15L, 16L, 17L, 18L, 19L,
20L, 21L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L,
34L, 35L, 36L, 37L, 38L, 40L, 41L, 42L, 43L, 44L, 45L, 46L, 48L,
49L, 50L, 51L, 52L, 53L, 54L, 55L, 56L, 57L, 58L, 60L, 61L, 62L,
63L, 64L, 65L, 66L, 67L, 68L, 69L, 70L, 71L, 72L, 74L, 75L, 76L,
77L, 78L, 79L, 80L, 82L, 83L, 85L, 86L, 87L, 88L, 89L, 90L, 91L,
92L, 93L, 94L, 95L, 96L, 97L, 98L, 99L, 100L, 101L, 102L, 103L,
104L, 105L, 106L, 107L, 108L, 109L, 110L, 111L, 112L, 113L, 114L,
115L, 116L, 117L, 118L, 119L, 120L, 121L, 122L, 123L, 124L, 125L,
126L, 127L, 128L, 129L, 130L, 131L, 132L, 133L, 134L, 135L, 136L,
137L, 138L, 139L, 140L, 141L, 142L, 143L, 144L, 145L, 146L, 147L,
148L, 149L, 150L, 151L, 152L, 153L, 154L, 155L, 156L, 157L, 158L,
159L, 160L, 197L, 198L, 200L, 201L, 202L, 203L, 204L, 205L, 206L,
207L), class = "data.frame")

From the documentation of geom_bar :
By default, geom_bar uses stat="count" which makes the height of the
bar proportion to the number of cases in each group (or if the weight
aethetic is supplied, the sum of the weights). If you want the heights
of the bars to represent values in the data, use stat="identity" and
map a variable to the y aesthetic.
In your case you should use the height as your sum of your performance, since you have a summarized data , so the ggplot should use stat = identity
EDIT After OP pasted the dput:
You need to first summarize your data, I am assuming the df is your dataframe, you can use anything to do the summarization, I am using data.table and baseR aggregate, you can pick either of them to do it as below:
###1. base R aggregate
df <- aggregate(Mat_sort_pass_fail ~ ageyears + MatrixLabels, data=df1 ,sum)
df$perc <- df$Mat_sort_pass_fail/sum(df$Mat_sort_pass_fail)
names(df) <- c("age","condition","performance","percentage")
###2. sumarization using data.table
library(data.table)
dt <- setDT(df)
dt1 <- dt[,list(Performance = sum(Mat_sort_pass_fail)),by=c("ageyears","MatrixLabels")]
dt1[,perc:=Performance/sum(Performance)] ##percentage within column
df <- data.frame(dt1)
names(df) <- c("age","condition","performance","percentage")
library(ggplot2)
library(RColorBrewer)
ggplot(df, aes(x = condition ,y=performance)) +
geom_bar(aes(fill = factor(age)),stat="identity",position = "dodge") +
ggtitle("Matrix Sort Performance") +
scale_fill_brewer(palette = "Dark2")
###In case you need the percentage run the below code:
ggplot(df, aes(x = condition ,y=percentage)) +
geom_bar(aes(fill = factor(age)),stat="identity",position = "dodge") +
ggtitle("Matrix Sort Performance") +
scale_fill_brewer(palette = "Dark2")

Normally it calculates frquencies from your data. If your data is already grouped try below:
+ geom_bar(stat="identity",position = "dodge")

You can use geom_col() as an alias for geom_bar(stat = "identity").
You also had what I think is wrong aes mapping.
I mimicked some data based on the graph you posted:
df <- data.frame(age = factor(rep(4:5, each = 2), labels = c('4-Years-Olds', '5-Years-Olds')),
performance = c(48,37,65,65),
condition = factor(c(1,2,1,2), labels = c('No Label', 'Label')))
library(ggplot2)
ggplot(df) +
geom_col(aes(condition, performance, fill = age), position = 'dodge') +
scale_fill_manual(values = c('skyblue', 'darkolivegreen1'))

Related

reshape data frame and concatenating columns in R

I tried to reshape my data frame from wide to long format. At the moment the data frame looks like this:
structure(list(study_site = structure(c(5L, 5L, 5L, 5L, 5L, 5L,
5L, 1L, 1L, 3L, 3L, 1L, 1L, 1L, 4L, 4L, 1L, 2L, 3L, 1L, 4L, 1L,
4L, 3L, 3L, 3L, 1L, 3L, 5L, 4L, 4L, 4L, 3L, 3L, 5L, 5L, 4L, 4L,
4L, 1L, 4L, 3L, 5L, 5L, 5L, 1L, 3L, 5L, 3L, 3L, 3L, 5L, 5L, 3L,
4L, 2L), .Label = c("N", "no_nest", "O", "S", "W"), class = "factor"),
coords.N = structure(c(54L, 54L, 40L, 40L, 40L, 40L, 39L,
67L, 67L, 55L, 55L, 64L, 64L, 64L, 78L, 81L, 47L, 80L, 83L,
60L, 46L, 46L, 76L, 88L, 88L, 88L, 84L, 84L, 30L, 58L, 58L,
58L, 25L, 25L, 19L, 19L, 42L, 42L, 42L, 29L, 45L, 90L, 91L,
91L, 91L, 91L, 89L, 89L, 87L, 87L, 87L, 56L, 56L, 61L, 35L,
36L), .Label = c("40.40463", "48.40168", "48.40178", "48.40215",
"48.40235", "48.40309", "48.40390", "48.40393", "48.40396",
"48.40405", "48.40410", "48.40411", "48.40415", "48.40416",
"48.40424", "48.40425", "48.40430", "48.40435", "48.40436 ",
"48.40438", "48.40443", "48.40450", "48.40451", "48.40454",
"48.40455", "48.40459", "48.40460", "48.40461", "48.40466",
"48.40466 ", "48.40467", "48.40469", "48.40471", "48.40477",
"48.40479 ", "48.40481", "48.40482", "48.40483", "48.40488 ",
"48.40491", "48.40493", "48.40504 ", "48.40508", "48.40513",
"48.40515", "48.40519 ", "48.40522 ", "48.40523", "48.40525",
"48.40526", "48.40529", "48.40532", "48.40537", "48.40537 ",
"48.40538 ", "48.40543 ", "48.40549", "48.40549 ", "48.40557",
"48.40557 ", "48.40558", "48.40565", "48.40571", "48.40575",
"48.40580", "48.40584", "48.40586 ", "48.40591", "48.40596",
"48.40598", "48.40599", "48.40611", "48.40612", "48.40617",
"48.40626", "48.40632 ", "48.40633", "48.40635 ", "48.40636",
"48.40637", "48.40638 ", "48.40639", "48.40639 ", "48.40641 ",
"48.40652", "48.40655", "48.40656 ", "48.40657 ", "48.40687 ",
"48.40690 ", "48.40703", "48.40718", "48.40719", "48.40726",
"48.40742", "48.40748", "NO_DATA"), class = "factor"), coords.E = structure(c(67L,
67L, 49L, 49L, 49L, 49L, 27L, 67L, 67L, 70L, 70L, 68L, 68L,
68L, 87L, 94L, 68L, 83L, 90L, 73L, 52L, 52L, 2L, 95L, 95L,
95L, 93L, 93L, 32L, 69L, 69L, 69L, 55L, 55L, 24L, 24L, 29L,
29L, 29L, 30L, 48L, 85L, 1L, 1L, 1L, 1L, 78L, 78L, 79L, 79L,
79L, 64L, 64L, 63L, 66L, 45L), .Label = c(" 015.82024", " 015.82164",
"015.80237", "015.80263", "015.80309", "015.80341", "015.80369",
"015.80388", "015.80394", "015.80399", "015.80406", "015.80435",
"015.80436", "015.80466", "015.80512", "015.80517", "015.80548",
"015.80551", "015.80572", "015.80583", "015.80609", "015.80636",
"015.80659", "015.80703", "015.80723", "015.80779", "015.80795",
"015.80803", "015.80821", "015.80843", "015.80871", "015.80875",
"015.80888", "015.80897", "015.80901", "015.80903", "015.80905",
"015.80906", "015.80908", "015.80909", "015.80921", "015.80923",
"015.80929", "015.80939", "015.80993", "015.81007", "015.81018",
"015.81087", "015.81113", "015.81132", "015.81151", "015.81180",
"015.81241", "015.81273", "015.81305", "015.81406", "015.81422",
"015.81522", "015.81526", "015.81543", "015.81546", "015.81564",
"015.81628", "015.81632", "015.81678", "015.81682", "015.81700",
"015.81703", "015.81735", "015.81739", "015.81770", "015.81783",
"015.81784", "015.81800", "015.81849", "015.81992", "015.82012",
"015.82029", "015.82039", "015.82083", "015.82099", "015.82126",
"015.82180", "015.82230", "015.82232", "015.82255", "015.82265",
"015.82290", "015.82303", "015.82304", "015.82346", "015.82362",
"015.82376", "015.82398", "015.82451", "015.82500", "015.82519",
"015.82555", "015.82579", "015.82634", "NO_DATA"), class = "factor"),
study_ID = c(120L, 120L, 1L, 1L, 1L, 1L, 9L, 39L, 39L, 109L,
109L, 110L, 110L, 110L, 45L, 58L, 121L, 96L, 97L, 40L, 43L,
43L, 47L, 57L, 57L, 57L, 114L, 114L, 67L, 71L, 71L, 71L,
83L, 83L, 4L, 4L, 10L, 10L, 10L, 106L, 108L, 46L, 115L, 115L,
115L, 115L, 116L, 116L, 117L, 117L, 117L, 70L, 70L, 119L,
95L, 3L), species = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L), .Label = c("barn swallow", "no_nest"), class = "factor"),
first_visit = c(1L, 2L, 1L, 1L, 0L, 0L, 1L, 1L, 1L, 5L, 0L,
1L, 0L, 2L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L,
1L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 2L, 0L, 3L, 0L, 0L, 0L, 4L,
1L, 8L, 0L, 0L, 1L, 2L, 1L, 5L, 0L, 0L, 1L, 0L, 1L, 1L, 0L
), second_visit = c(1L, 2L, 1L, 1L, 0L, 0L, 1L, 1L, 1L, 5L,
0L, 1L, 0L, 2L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L,
0L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 2L, 0L, 4L, 0L, 0L,
4L, 1L, 0L, 8L, 0L, 1L, 2L, 1L, 0L, 5L, 0L, 1L, 0L, 0L, 1L,
0L), third_visit = c(0L, 0L, 1L, 1L, 0L, 0L, 1L, 1L, 1L,
0L, 5L, 0L, 1L, 2L, 0L, 1L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L,
0L, 0L, 0L, 1L, 1L, 0L, 0L, 2L, 0L, 1L, 0L, 2L, 0L, 0L, 6L,
1L, 4L, 1L, 0L, 0L, 8L, 1L, 2L, 1L, 0L, 0L, 5L, 1L, 1L, 0L,
0L, 0L), used_1st_visit = c(0L, 2L, 1L, 0L, 0L, 0L, 1L, 0L,
1L, 2L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 1L, 1L, 1L, 1L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 3L, 0L,
0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L,
1L, 0L, 0L), used_2nd_visit = c(0L, 2L, 1L, 0L, 0L, 0L, 1L,
0L, 1L, 2L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 1L, 1L, 1L,
1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 2L, 0L,
4L, 0L, 0L, 0L, 1L, 0L, 5L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L,
0L, 0L, 0L, 0L), used_3rd_visit = c(0L, 0L, 1L, 0L, 0L, 0L,
1L, 0L, 1L, 0L, 4L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 1L, 1L,
1L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 2L, 0L, 1L, 0L, 2L,
0L, 0L, 6L, 1L, 0L, 1L, 0L, 0L, 2L, 0L, 0L, 1L, 0L, 0L, 2L,
0L, 1L, 0L, 0L, 0L), nest_condition = structure(c(3L, 5L,
5L, 2L, 5L, 5L, 5L, 5L, 3L, 5L, 5L, 5L, 5L, 2L, 5L, 3L, 2L,
4L, 5L, 5L, 5L, 5L, 3L, 2L, 5L, 5L, 2L, 2L, 5L, 1L, 5L, 5L,
5L, 5L, 5L, 5L, 3L, 5L, 5L, 5L, 2L, 3L, 5L, 5L, 5L, 2L, 5L,
3L, 5L, 5L, 5L, 2L, 5L, 3L, 5L, 4L), .Label = c(" ready ",
"damaged", "in_progress", "no_nest", "ready"), class = "factor"),
nesting_site = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 3L, 1L,
1L, 1L, 1L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L), .Label = c("inside", "no_nest", "outside"), class = "factor"),
distance = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 4L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 3L, 4L, 3L, 6L, 4L, 4L, 2L, 2L, 4L, 2L,
2L, 2L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
3L, 2L, 4L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
4L, 6L), .Label = c("1", "2", "3", "4", "no_data", "no_nest"
), class = "factor"), material = structure(c(5L, 5L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 5L, 5L, 1L, 1L, 2L), .Label = c("fine", "fine plaster",
"medium fine plaster", "no_data", "rough", "rough plaster",
"smooth plaster", "under construction", "wood"), class = "factor"),
housetype = structure(c(4L, 4L, 4L, 4L, 4L, 4L, 5L, 4L, 4L,
3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 3L,
3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 5L, 5L, 5L,
4L, 4L, 5L, 3L, 3L, 3L, 3L, 5L, 5L, 3L, 3L, 3L, 4L, 4L, 5L,
5L, 4L), .Label = c("auto repair shop", "barn ", "hall",
"residence", "stable"), class = "factor"), usage_house = structure(c(5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 3L), .Label = c("auto_repair",
"barn", "inhabited", "under construction", "used"), class = "factor"),
age = c(1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L)), row.names = c(1L,
2L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L,
19L, 21L, 22L, 23L, 24L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L,
35L, 36L, 37L, 38L, 39L, 40L, 41L, 89L, 90L, 91L, 92L, 93L, 94L,
95L, 96L, 97L, 98L, 99L, 100L, 101L, 102L, 103L, 104L, 105L,
106L, 107L, 108L, 109L, 111L), class = "data.frame")
used_1st..2nd... means that the birds have used this certain number of nests at the first, second,... control.
I would like to have that each row in my data frame to always represent a used/ unused nest as well as no_nest:
ID species `1st_visit` `2nd_visit` `3rd_visit` used_1st_visit used_2nd_visit used_3rd_visit
<dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 120 barn s~ 1 1 0 0 0 0
2 120 barn s~ 1 1 0 1 1 0
3 120 barn s~ 1 1 0 1 1 0
4 39 barn s~ 1 1 1 1 1 1
5 8 barn s~ 1 1 1 1 0 0
6 8 barn s~ 1 1 1 0 0 0
Unfortunately I have no idea how to concatenate the columns to get the final data frame.
Does anybody has an idea?
I'm not completely sure what you are asking for, but this is what I understood: In the long data frame...
if all visits (coulmns used_first_visit, used_sec_visit etc.) are 0, combine them to one row, marking it 0
if any visits are not 0, keep as many rows as there are non-zero visits and mark them with 1
This is my dplyr-solution (it's not very pretty, but it works):
# create data
dat <- data.frame("visits" = c("first", "first", "second", "second", "third", "third"), "study_id" = rep(120, 6), "used_first_visit" = c(0, 2, 0, 2, 0, 2), "used_sec_visit" = c(0, 2, 0, 2, 0, 2), "used_thrd_visit" = rep(0, 6), "nest_cond" = c("damaged", "ready", "damaged", "ready", "damaged", "ready"))
# make long data frame and filter values
dat_long <- dat %>%
pivot_longer(c(3:5),names_to = "whatever", values_to = "used") %>% # make long data frame
select(-c(whatever)) %>% # get rid of name column
group_by(visits, nest_cond) %>% # group data
mutate(used = ifelse(all(used == 0) & row_number() == 1, 10, used)) %>% # if the whole group is 0, mark one row for later filtering
filter(used > 0 ) %>% # filter
mutate(used = ifelse(used == 10, 0, 1)) # change to correct numbers
Let me know if this is not what you are looking for!

ggplot2 select categories for bar chart and create labels

I am trying make bar chart with ggplot2 with the dataset below. When I use the code
ggplot(p.data, aes(x = `Period Number`, y = `Total Jumps`)) +
stat_summary(data = subset(p.data, Status = "Starter"), fun ="mean", geom = "bar")
I get this graph:
The most concerning aspect is the for period 2, 3, 4, and 5 the bars should be taller (period 2 should be around 9.9). Additionally, I would like to remove period 0 and period 1 and add bar labels with the raw data and without creating an additional data frame.
p.data <- structure(list(`Period Number` = c(0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L),
`Total Jumps` = c(112L, 97L, 28L, 132L, 162L, 19L, 92L, 112L,
97L, 141L, 68L, 86L, 76L, 26L, 105L, 125L, 19L, 92L, 112L,
64L, 101L, 68L, 4L, 8L, 0L, 8L, 12L, 0L, 0L, 0L, 13L, 8L,
0L, 8L, 2L, 2L, 5L, 12L, 0L, 0L, 0L, 5L, 11L, 0L, 0L, 6L,
0L, 9L, 8L, 0L, 0L, 0L, 7L, 10L, 0L, 14L, 5L, 0L, 5L, 5L,
0L, 0L, 0L, 8L, 11L, 0L, 108L, 131L, 47L, 136L, 159L, 35L,
114L, 116L, 111L, 190L, 64L, 75L, 95L, 47L, 116L, 123L, 27L,
103L, 108L, 70L, 152L, 64L, 4L, 7L, 0L, 14L, 10L, 0L, 0L,
0L, 15L, 10L, 0L, 4L, 0L, 0L, 3L, 7L, 7L, 8L, 8L, 5L, 10L,
0L, 7L, 14L, 0L, 3L, 10L, 1L, 0L, 0L, 11L, 7L, 0L, 18L, 15L,
0L, 0L, 9L, 0L, 3L, 0L, 10L, 11L, 0L, 118L, 96L, 48L, 143L,
170L, 37L, 118L, 117L, 116L, 165L, 56L, 80L, 68L, 48L, 114L,
130L, 36L, 114L, 107L, 80L, 123L, 56L, 2L, 10L, 0L, 8L, 11L,
0L, 0L, 0L, 5L, 9L, 0L, 4L, 12L, 0L, 6L, 5L, 0L, 4L, 8L,
12L, 8L, 0L, 7L, 4L, 0L, 10L, 10L, 0L, 0L, 0L, 12L, 13L,
0L, 25L, 2L, 0L, 5L, 14L, 1L, 0L, 2L, 7L, 12L, 0L), Status = structure(c(1L,
2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L,
1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L,
2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L,
1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L,
1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L,
1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L,
2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L,
1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L,
1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L,
2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L,
2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L,
2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L,
1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L,
2L, 1L), .Label = c("Bench", "Starter"), class = "factor")), row.names = c(NA,
198L), class = "data.frame")
Thank you for your help!
It's best to pass that data you actually want to plot to the plotting function, rather than trying to coerce it within the plotting function. In this case you were trying to subset a different data frame from the one you passed to ggplot inside stat_summary. The call to ggplot had already set up the aesthetics you wanted mapped, then in your only geom layer, you were telling ggplot you wanted a completely different set of aesthetics.
You don't need to create another data frame to reshape your data. Here's how you could do it using dplyr:
library(dplyr)
library(ggplot2)
p.data %>%
filter(Status == "Starter") %>%
group_by(`Period Number`) %>%
summarise(`Total Jumps` = mean(`Total Jumps`)) %>%
filter(`Period Number` > 1) %>%
ggplot(aes(x = `Period Number`, y = `Total Jumps`)) +
geom_col(fill = "dodgerblue", colour = "black") +
geom_text(aes(y = `Total Jumps` + 1, label = signif(`Total Jumps`, 2)))

How can I remove the legend from this boxplot in ggplot? [duplicate]

This question already has answers here:
Remove legend ggplot 2.2
(4 answers)
Closed 3 years ago.
Please find My Data below.
How can I remove the red, encircled legend from my boxplot?
I wish to keep the same colors and design. I have tried numerous different solutions, but this has unfortunately not solved the problem.
This might be kinda basic, but simply can't figure out how to solve this. I hope you can help - thanks in advance!
My script is:
df <- data.frame(x = as.factor(c(p$WHO.Grade)),
y = c(p$ki67pro),
f = rep(c("Ki67pro"), c(nrow(p))))
ggplot(df) +
geom_boxplot(aes(x, y, fill = f, colour = f), outlier.alpha = 0, position = position_dodge(width = 0.78)) +
scale_x_discrete(name = "", label=c("WHO-I\nn=108","WHO-II\nn=34","WHO-III\nn=1")) +
scale_y_continuous(name="Ki-67 proliferative index", breaks=seq(0,30,5), limits=c(0,30)) +
stat_boxplot(aes(x, y, colour = f), geom = "errorbar", width = 0.3,position = position_dodge(0.7753)) +
geom_point(aes(x, y, fill = f, colour = f), size = 3, shape = 21, position = position_jitterdodge()) +
scale_fill_manual(values = c("#52C1C76D"), name = "",
labels = c("\nTotal cohort\nn=159\n ")) +
scale_colour_manual(values = c("#51BFC4"), name = "",
labels = c("\nTotal cohort\nn=159\n "))
And My Data
p <- structure(list(WHO.Grade = c(1L, 2L, 1L, 1L, 1L, 1L, 3L, 2L,
1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), ki67pro = c(1L, 12L, 3L, 3L, 5L,
3L, 20L, 25L, 7L, 4L, 5L, 12L, 3L, 15L, 4L, 5L, 7L, 8L, 3L, 12L,
10L, 4L, 10L, 7L, 3L, 2L, 3L, 7L, 4L, 7L, 10L, 4L, 5L, 5L, 3L,
5L, 2L, 5L, 3L, 3L, 3L, 4L, 4L, 3L, 2L, 5L, 1L, 5L, 2L, 3L, 1L,
2L, 3L, 3L, 5L, 4L, 20L, 5L, 0L, 4L, 3L, 0L, 3L, 4L, 1L, 2L,
20L, 2L, 3L, 5L, 4L, 8L, 1L, 4L, 5L, 4L, 3L, 6L, 12L, 3L, 4L,
4L, 2L, 5L, 3L, 3L, 3L, 2L, 5L, 4L, 2L, 3L, 4L, 3L, 3L, 2L, 2L,
4L, 7L, 4L, 3L, 4L, 2L, 3L, 6L, 2L, 3L, 10L, 5L, 10L, 3L, 10L,
3L, 4L, 5L, 2L, 4L, 3L, 4L, 4L, 4L, 5L, 3L, 12L, 5L, 4L, 3L,
2L, 4L, 3L, 4L, 2L, 1L, 6L, 1L, 4L, 12L, 3L, 4L, 3L, 2L, 6L,
5L, 4L, 3L, 4L, 4L, 4L, 3L, 5L, 4L, 5L, 4L, 1L, 3L, 3L, 4L, 0L,
3L)), class = "data.frame", row.names = c(1L, 2L, 3L, 9L, 10L,
11L, 12L, 13L, 14L, 15L, 16L, 18L, 19L, 20L, 21L, 22L, 23L, 24L,
25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L,
38L, 39L, 40L, 41L, 44L, 45L, 46L, 47L, 48L, 49L, 50L, 51L, 52L,
53L, 54L, 55L, 57L, 59L, 60L, 61L, 62L, 63L, 64L, 65L, 66L, 67L,
68L, 69L, 70L, 71L, 72L, 73L, 74L, 75L, 76L, 77L, 78L, 79L, 80L,
81L, 82L, 83L, 84L, 85L, 87L, 89L, 90L, 91L, 92L, 93L, 94L, 96L,
97L, 98L, 99L, 100L, 101L, 102L, 103L, 104L, 105L, 106L, 107L,
109L, 110L, 111L, 112L, 113L, 114L, 115L, 116L, 117L, 118L, 119L,
120L, 121L, 123L, 124L, 125L, 126L, 127L, 128L, 130L, 131L, 132L,
133L, 134L, 135L, 136L, 137L, 138L, 139L, 140L, 141L, 142L, 143L,
144L, 145L, 146L, 147L, 148L, 149L, 150L, 151L, 152L, 153L, 154L,
155L, 156L, 157L, 158L, 159L, 160L, 161L, 162L, 163L, 164L, 165L,
166L, 167L, 168L, 169L, 170L, 171L, 172L, 173L, 174L, 175L))
You can use theme() as follows:
... + theme(legend.position = "none")
This should eliminate the legend
reference: https://www.datanovia.com/en/blog/ggplot-legend-title-position-and-labels/

Linear function for condition1 and cubic function for condition2 in one plot

I have data of participants that had numerous trials, where certain trials had one condition, and other trials were another.
My analyses show that for condition 1, there is a linear null effect (flat line), while for condition 2 there is a cubic effect. I want to plot them together.
The code below creates a plot that gives the cubic function for both groups:
ggplot(dat, aes(x=trial, y=y, group=condition, colour=condition)) +
geom_point() + geom_jitter(height=0.2) +
geom_smooth(alpha=0.1, method="lm", formula = y ~ poly(x,3, raw=TRUE)) +
labs(x="Trial", y="y") +
scale_x_discrete(breaks=c(1,9,18,27,36,45,54,63))
What I want is to not have the cubic function for condition 2, but have a linear function. I tried to force this through aes() calls within geom_smooth(), but this seems to give me a much flatter cubic function for condition 1:
ggplot(dat, aes(x=trial, y=y)) +
geom_point(aes(group=condition, colour=condition)) + geom_jitter(height=0.2, aes(group=condition, colour=condition)) +
geom_smooth(alpha=0.1, method="lm", formula = y ~ poly(x,3, raw=TRUE), aes(group=(condition="1"), colour=(condition="1"))) +
geom_smooth(alpha=0.1, method="lm", aes(group=(condition="2"), colour=(condition="2"))) +
labs(x="Trial", y="y") +
scale_x_discrete(breaks=c(1,9,18,27,36,45,54,63))
Obviously this is not the way to go. How would I accomplish this? Script for reproducible example (first 250 lines of the total dataset, so your figures will be different) below:
structure(list(id = c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L
), trial = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L,
13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L,
26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L,
39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L, 47L, 48L, 49L, 50L, 51L,
52L, 53L, 54L, 55L, 56L, 57L, 58L, 59L, 60L, 61L, 62L, 63L, 1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L,
16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L,
29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L,
42L, 43L, 44L, 45L, 46L, 47L, 48L, 49L, 50L, 51L, 52L, 53L, 54L,
55L, 56L, 57L, 58L, 59L, 60L, 61L, 62L, 63L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L,
19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L,
32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L,
45L, 46L, 47L, 48L, 49L, 50L, 51L, 52L, 53L, 54L, 55L, 56L, 57L,
58L, 59L, 60L, 61L, 62L, 63L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L,
9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L,
22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L,
35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L, 47L,
48L, 49L, 50L, 51L, 52L, 53L, 54L, 55L, 56L, 57L, 58L, 59L, 60L,
61L), condition = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
y = c(NA, NA, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 0L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 0L,
1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 1L, 0L,
0L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 1L,
1L, 0L, 1L, 1L, 1L, NA, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L,
1L, 1L, 0L, 1L, 1L, NA, NA, NA, 0L, NA, 0L, NA, 1L, 1L, 0L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 0L, 0L, 1L,
0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, NA, 0L, 0L, 1L, 0L, 0L, 1L,
1L, 1L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, NA, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, NA,
0L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, NA, 1L, NA, NA, 1L, 1L,
1L, 1L, NA, 1L, 1L, 1L, 1L, NA, 1L, 0L, 1L, 1L, 1L, 0L, 1L,
0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 0L,
1L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 1L)), .Names = c("id",
"trial", "condition", "y"), row.names = c(NA, 250L), class = "data.frame")
Edit: The reason I'm not using geom_smooth() using gam or loess, is because there are multiple polynomials in condition 1, so it will show more than just the cubic function if I use that solution. I wish to show the cubic function, not the composite of multiple polynomials.
You could filter your data inside geom_smooth.
library(tidyverse)
ggplot(dat, aes(x=trial, y=y, colour=as.factor(condition))) +
geom_point() + geom_jitter(height=0.2) +
geom_smooth(data = filter(dat, condition == 2), alpha=0.1, method="lm", formula = y ~ poly(x,3, raw=TRUE)) +
geom_smooth(data = filter(dat, condition == 1), alpha=0.1, method="lm", formula = y ~ 1) +
labs(x="Trial", y="y") +
scale_x_continuous(breaks=c(1,9,18,27,36,45,54,63))
Which gives you this plot

Calculating Variance Inflation Factors (VIFs) based on object type in R

There seem to be two popular ways of calculating VIFs (Variance Inflation Factors, to detect collinearity among variables in regression) in R:
The vif() function in the car package, where the input is the model. This requires you to first fit a model before you can check for VIFs among variables in the model.
The corvif() function, where the input are the actual candidate explanatory variables (i.e. a list of variables, before the model is even fitted). This function is part of the AED package (Zuur et al. 2009), which has been discontinued. This one seems to work only on a list of variables, not on a fitted regression model.
Here is a data example:
MyData<-structure(list(site = structure(c(3L, 1L, 5L, 1L, 2L, 3L, 2L,
4L, 1L, 2L, 2L, 3L, 4L, 3L, 2L, 2L, 4L, 1L, 1L, 3L, 3L, 1L, 4L,
3L, 1L, 3L, 4L, 5L, 1L, 3L, 1L, 2L, 4L, 2L, 1L, 1L, 5L, 3L, 1L,
3L, 4L, 3L, 1L, 4L, 4L, 2L, 5L, 2L, 1L, 4L, 1L, 1L, 1L, 4L, 4L,
3L, 5L, 3L, 1L, 3L, 1L, 1L, 3L, 1L, 4L, 5L, 1L, 5L, 1L, 4L, 1L,
4L, 1L, 2L, 5L, 2L, 3L, 1L, 5L, 4L, 1L, 1L, 3L, 2L, 1L, 3L, 5L,
3L, 3L, 5L, 2L, 1L, 3L, 5L, 4L, 5L, 5L, 1L, 3L, 2L, 5L, 4L, 3L,
3L, 2L, 5L, 2L, 1L, 1L, 3L, 3L, 5L, 5L, 5L, 3L, 1L, 1L, 5L, 5L,
5L, 2L, 3L, 5L, 1L, 3L, 3L, 4L, 4L, 4L, 5L, 2L, 3L, 1L, 4L, 2L,
4L, 3L, 4L, 3L, 3L, 4L, 1L, 3L, 4L, 1L, 4L, 4L, 5L, 4L, 4L, 1L,
4L, 1L, 2L, 1L, 2L, 4L, 2L, 4L, 3L, 5L, 1L, 2L, 3L, 1L, 1L, 4L,
3L, 1L, 1L, 1L, 4L, 3L, 5L, 4L, 2L, 1L, 4L, 1L, 2L, 1L, 1L, 5L,
1L, 5L, 3L, 1L, 5L, 3L, 5L, 3L, 5L, 3L, 1L, 5L, 1L, 1L, 1L, 3L,
1L, 4L, 4L, 2L, 5L, 4L, 1L, 3L, 2L, 4L, 5L, 4L, 5L, 5L, 3L, 2L,
2L, 4L, 2L, 5L, 4L, 1L, 5L, 5L, 4L, 4L, 3L, 1L, 3L, 4L, 4L, 1L,
1L, 1L, 3L, 3L, 1L, 1L, 3L, 4L, 4L, 1L, 5L, 3L, 5L, 5L, 3L, 5L,
5L, 1L, 4L, 3L, 1L, 1L, 1L, 1L, 2L, 3L, 1L, 4L, 3L, 3L, 4L, 3L,
4L, 3L, 3L, 4L, 1L, 5L, 4L, 3L, 1L, 2L, 2L, 5L, 1L, 3L, 3L, 4L,
1L, 4L, 3L, 1L, 2L, 5L, 5L, 4L, 1L, 3L, 4L, 4L, 3L, 5L, 4L, 5L,
2L, 5L, 4L, 2L, 5L, 1L, 2L, 4L, 1L, 5L, 3L, 5L, 4L, 1L, 4L, 4L,
2L, 3L, 5L, 4L, 3L, 4L, 2L, 1L, 1L, 5L, 3L, 3L, 1L, 3L, 1L, 3L,
3L, 5L, 2L, 4L, 3L, 1L, 1L, 4L, 4L, 3L, 3L, 3L, 4L, 5L, 1L, 5L,
3L, 3L, 1L, 1L, 3L, 2L, 5L, 1L, 3L, 1L, 5L, 3L, 4L, 4L, 2L, 1L,
2L, 4L, 1L, 4L, 4L, 3L, 3L, 5L, 3L, 2L, 2L, 4L, 2L, 1L, 1L, 3L,
3L, 4L, 3L, 1L, 4L, 2L, 1L, 2L, 4L, 3L, 4L, 1L, 1L, 4L, 4L, 3L,
5L, 1L), .Label = c("R1a", "R1b", "R2", "Za", "Zb"), class = "factor"),
species = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
3L, 4L, 3L, 1L, 4L, 3L, 4L, 1L, 4L, 3L, 3L, 4L, 1L, 1L, 1L,
2L, 4L, 1L, 2L, 1L, 3L, 1L, 4L, 3L, 3L, 2L, 2L, 4L, 1L, 1L,
3L, 2L, 4L, 3L, 3L, 1L, 3L, 1L, 3L, 1L, 1L, 2L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 3L, 4L, 3L, 3L, 3L, 1L, 1L, 1L, 3L, 1L, 1L,
1L, 3L, 1L, 1L, 3L, 2L, 3L, 3L, 2L, 1L, 1L, 1L, 3L, 3L, 3L,
1L, 3L, 2L, 1L, 3L, 1L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 3L, 3L,
3L, 1L, 1L, 3L, 3L, 3L, 1L, 1L, 1L, 3L, 1L, 1L, 4L, 1L, 1L,
1L, 4L, 1L, 1L, 4L, 1L, 1L, 4L, 1L, 1L, 1L, 3L, 3L, 1L, 1L,
1L, 4L, 1L, 1L, 1L, 1L, 4L, 3L, 2L, 1L, 3L, 1L, 4L, 4L, 1L,
1L, 1L, 1L, 1L, 3L, 1L, 3L, 1L, 1L, 3L, 1L, 3L, 1L, 1L, 3L,
3L, 1L, 4L, 1L, 3L, 3L, 1L, 1L, 1L, 3L, 3L, 3L, 1L, 3L, 1L,
1L, 3L, 1L, 1L, 3L, 1L, 1L, 1L, 1L, 3L, 4L, 3L, 3L, 1L, 1L,
1L, 4L, 1L, 3L, 4L, 1L, 3L, 4L, 3L, 3L, 3L, 3L, 1L, 3L, 2L,
3L, 3L, 4L, 3L, 1L, 2L, 1L, 1L, 2L, 3L, 4L, 3L, 1L, 1L, 4L,
1L, 1L, 1L, 4L, 1L, 2L, 1L, 1L, 3L, 4L, 4L, 1L, 3L, 1L, 3L,
3L, 1L, 3L, 3L, 3L, 1L, 3L, 1L, 3L, 1L, 2L, 3L, 2L, 2L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 3L, 4L, 1L, 1L, 3L, 1L, 1L, 4L, 1L,
3L, 3L, 1L, 1L, 1L, 1L, 3L, 1L, 3L, 3L, 2L, 3L, 1L, 3L, 1L,
1L, 1L, 3L, 1L, 3L, 1L, 1L, 1L, 1L, 4L, 1L, 1L, 1L, 1L, 1L,
1L, 4L, 1L, 3L, 3L, 1L, 1L, 3L, 1L, 3L, 2L, 4L, 1L, 1L, 4L,
1L, 1L, 3L, 4L, 1L, 1L, 4L, 2L, 3L, 3L, 1L, 1L, 1L, 3L, 1L,
3L, 1L, 3L, 4L, 4L, 1L, 3L, 1L, 3L, 1L, 4L, 1L, 1L, 1L, 4L,
1L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 4L, 2L, 3L, 3L, 3L, 1L,
3L, 1L, 1L, 4L, 2L, 3L, 1L, 4L, 1L, 1L, 3L, 1L, 4L, 1L, 1L,
3L, 1L, 3L, 1L, 1L, 3L, 3L, 1L, 3L, 3L, 1L, 3L, 1L, 1L, 1L,
4L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Monogyna",
"Other", "Prunus", "Rosa"), class = "factor"), aspect = structure(c(4L,
4L, 4L, 4L, 4L, 4L, 4L, 3L, 4L, 4L, 4L, 3L, 3L, 3L, 4L, 4L,
3L, 4L, 4L, 4L, 4L, 4L, 3L, 4L, 4L, 2L, 3L, 4L, 4L, 4L, 4L,
4L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 2L, 4L, 3L, 3L, 4L,
4L, 4L, 4L, 3L, 4L, 4L, 4L, 3L, 3L, 3L, 4L, 4L, 4L, 2L, 4L,
4L, 2L, 4L, 1L, 1L, 4L, 4L, 4L, 3L, 4L, 3L, 4L, 4L, 4L, 4L,
2L, 4L, 1L, 3L, 4L, 4L, 2L, 4L, 4L, 4L, 4L, 1L, 4L, 1L, 4L,
4L, 4L, 1L, 3L, 3L, 1L, 4L, 3L, 4L, 4L, 3L, 4L, 5L, 4L, 4L,
4L, 4L, 4L, 3L, 2L, 4L, 2L, 1L, 2L, 4L, 4L, 4L, 4L, 1L, 4L,
4L, 1L, 4L, 4L, 4L, 3L, 3L, 3L, 4L, 4L, 2L, 4L, 3L, 4L, 3L,
5L, 3L, 2L, 4L, 3L, 4L, 4L, 3L, 4L, 3L, 3L, 4L, 3L, 3L, 4L,
3L, 4L, 4L, 4L, 4L, 3L, 4L, 3L, 4L, 1L, 4L, 4L, 4L, 4L, 4L,
3L, 3L, 4L, 4L, 4L, 3L, 5L, 4L, 3L, 4L, 4L, 3L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 3L, 4L, 3L, 4L, 1L, 4L, 4L, 3L, 4L, 4L, 4L,
4L, 4L, 3L, 4L, 3L, 3L, 4L, 4L, 3L, 4L, 3L, 4L, 3L, 4L, 3L,
4L, 4L, 2L, 4L, 4L, 3L, 4L, 1L, 3L, 4L, 4L, 4L, 3L, 3L, 3L,
4L, 3L, 3L, 3L, 4L, 4L, 4L, 2L, 5L, 4L, 4L, 3L, 3L, 3L, 4L,
4L, 4L, 1L, 4L, 4L, 1L, 4L, 4L, 3L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 3L, 4L, 4L, 3L, 2L,
4L, 4L, 4L, 1L, 4L, 3L, 3L, 3L, 4L, 3L, 2L, 4L, 4L, 4L, 4L,
3L, 4L, 4L, 3L, 3L, 1L, 4L, 3L, 1L, 4L, 4L, 3L, 4L, 4L, 4L,
4L, 3L, 4L, 1L, 4L, 1L, 3L, 4L, 3L, 3L, 4L, 2L, 4L, 3L, 4L,
3L, 4L, 4L, 4L, 4L, 4L, 3L, 4L, 4L, 4L, 2L, 3L, 4L, 4L, 3L,
2L, 4L, 4L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 4L, 1L, 4L, 2L, 4L,
4L, 4L, 4L, 1L, 4L, 5L, 4L, 4L, 3L, 3L, 3L, 4L, 4L, 4L, 3L,
4L, 3L, 3L, 3L, 4L, 3L, 2L, 4L, 4L, 3L, 4L, 4L, 4L, 5L, 1L,
3L, 2L, 4L, 3L, 4L, 4L, 4L, 3L, 4L, 3L, 4L, 4L, 3L, 3L, 4L,
4L, 4L), .Label = c("East", "Flat", "North", "South", "West"
), class = "factor"), height = c(515L, 60L, 60L, 30L, 70L,
70L, 40L, 70L, 50L, 75L, 160L, 85L, 40L, 90L, 70L, 210L,
30L, 60L, 45L, 60L, 410L, 50L, 40L, 210L, 140L, 120L, 70L,
35L, 30L, 90L, 40L, 240L, 40L, 55L, 120L, 200L, 65L, 40L,
95L, 140L, 220L, 70L, 40L, 30L, 50L, 95L, 50L, 50L, 50L,
70L, 160L, 45L, 35L, 50L, 70L, 230L, 110L, 300L, 50L, 105L,
60L, 50L, 60L, 70L, 30L, 60L, 30L, 110L, 80L, 80L, 30L, 60L,
70L, 80L, 60L, 40L, 220L, 140L, 110L, 40L, 40L, 40L, 90L,
125L, 90L, 100L, 270L, 420L, 60L, 70L, 53L, 40L, 80L, 90L,
30L, 40L, 65L, 40L, 110L, 90L, 40L, 190L, 110L, 70L, 52L,
120L, 95L, 50L, 50L, 140L, 75L, 30L, 50L, 60L, 125L, 60L,
80L, 35L, 55L, 140L, 140L, 240L, 65L, 40L, 200L, 80L, 60L,
65L, 120L, 80L, 230L, 150L, 40L, 50L, 60L, 210L, 50L, 130L,
140L, 210L, 60L, 50L, 90L, 120L, 55L, 50L, 20L, 50L, 40L,
70L, 40L, 100L, 80L, 85L, 60L, 50L, 20L, 200L, 40L, 70L,
50L, 200L, 60L, 43L, 30L, 60L, 40L, 70L, 40L, 40L, 40L, 50L,
110L, 70L, 30L, 50L, 85L, 70L, 40L, 100L, 40L, 50L, 100L,
40L, 70L, 40L, 40L, 50L, 210L, 50L, 140L, 80L, 75L, 90L,
40L, 50L, 60L, 50L, 80L, 50L, 60L, 40L, 60L, 170L, 60L, 80L,
80L, 15L, 40L, 70L, 45L, 45L, 45L, 110L, 200L, 30L, 60L,
40L, 60L, 160L, 40L, 90L, 80L, 30L, 40L, 270L, 50L, 50L,
60L, 60L, 50L, 30L, 70L, 170L, 50L, 30L, 50L, 60L, 40L, 60L,
60L, 140L, 80L, 80L, 220L, 45L, 80L, 130L, 50L, 40L, 220L,
40L, 70L, 60L, 80L, 50L, 200L, 115L, 50L, 90L, 400L, 50L,
360L, 40L, 60L, 60L, 65L, 100L, 50L, 55L, 60L, 50L, 130L,
40L, 130L, 40L, 40L, 120L, 66L, 55L, 100L, 75L, 60L, 80L,
60L, 90L, 160L, 50L, 210L, 35L, 60L, 40L, 55L, 50L, 90L,
220L, 60L, 120L, 62L, 60L, 40L, 60L, 70L, 60L, 90L, 50L,
50L, 30L, 110L, 70L, 80L, 90L, 210L, 70L, 65L, 160L, 100L,
25L, 55L, 40L, 60L, 110L, 70L, 50L, 60L, 70L, 60L, 60L, 170L,
45L, 60L, 120L, 40L, 60L, 130L, 40L, 170L, 50L, 80L, 60L,
150L, 90L, 60L, 120L, 120L, 80L, 30L, 110L, 230L, 190L, 70L,
110L, 50L, 60L, 82L, 60L, 30L, 60L, 200L, 90L, 30L, 140L,
60L, 70L, 70L, 100L, 60L, 415L, 115L, 90L, 60L, 60L, 80L,
60L, 55L, 90L, 65L, 60L, 40L, 40L, 90L, 50L, 70L, 70L, 120L,
40L, 50L, 110L, 45L, 30L, 95L, 30L, 70L), width = c(310L,
50L, 40L, 30L, 60L, 70L, 20L, 80L, 70L, 20L, 220L, 40L, 60L,
30L, 230L, 110L, 20L, 40L, 25L, 60L, 240L, 90L, 30L, 130L,
120L, 110L, 60L, 70L, 30L, 110L, 30L, 180L, 20L, 80L, 110L,
310L, 40L, 10L, 80L, 160L, 134L, 30L, 20L, 40L, 20L, 230L,
100L, 180L, 40L, 120L, 130L, 30L, 40L, 100L, 30L, 180L, 70L,
110L, 170L, 40L, 30L, 50L, 30L, 40L, 30L, 50L, 80L, 50L,
80L, 90L, 70L, 70L, 190L, 60L, 50L, 30L, 150L, 150L, 50L,
80L, 30L, 40L, 130L, 390L, 60L, 130L, 400L, 200L, 110L, 30L,
15L, 300L, 70L, 140L, 30L, 50L, 30L, 40L, 110L, 240L, 50L,
90L, 70L, 20L, 40L, 100L, 50L, 30L, 30L, 130L, 40L, 70L,
70L, 60L, 10L, 30L, 60L, 50L, 40L, 120L, 90L, 210L, 50L,
20L, 100L, 100L, 110L, 100L, 100L, 80L, 120L, 80L, 5L, 40L,
50L, 60L, 15L, 100L, 120L, 200L, 30L, 80L, 60L, 70L, 30L,
30L, 20L, 50L, 50L, 60L, 15L, 80L, 60L, 130L, 40L, 60L, 30L,
100L, 20L, 130L, 60L, 120L, 70L, 20L, 60L, 20L, 40L, 50L,
15L, 120L, 60L, 50L, 300L, 40L, 30L, 25L, 70L, 130L, 30L,
50L, 60L, 50L, 50L, 50L, 20L, 30L, 70L, 35L, 180L, 40L, 50L,
70L, 40L, 70L, 50L, 20L, 40L, 40L, 40L, 40L, 50L, 20L, 30L,
180L, 30L, 130L, 30L, 15L, 25L, 50L, 40L, 40L, 40L, 50L,
170L, 20L, 50L, 20L, 50L, 110L, 30L, 90L, 15L, 50L, 40L,
150L, 30L, 30L, 30L, 20L, 40L, 20L, 100L, 60L, 40L, 30L,
30L, 140L, 40L, 50L, 120L, 150L, 100L, 70L, 300L, 30L, 60L,
120L, 30L, 50L, 100L, 60L, 90L, 50L, 40L, 140L, 130L, 60L,
60L, 70L, 200L, 30L, 40L, 50L, 20L, 20L, 20L, 80L, 35L, 70L,
15L, 40L, 360L, 70L, 50L, 50L, 30L, 110L, 30L, 30L, 90L,
50L, 30L, 70L, 40L, 110L, 70L, 40L, 150L, 100L, 40L, 40L,
40L, 20L, 250L, 180L, 40L, 60L, 20L, 120L, 40L, 50L, 60L,
260L, 110L, 30L, 30L, 40L, 100L, 50L, 50L, 100L, 150L, 190L,
70L, 110L, 50L, 10L, 40L, 50L, 60L, 80L, 30L, 20L, 150L,
70L, 25L, 30L, 40L, 50L, 30L, 50L, 210L, 40L, 100L, 30L,
80L, 20L, 30L, 70L, 130L, 60L, 50L, 50L, 70L, 50L, 30L, 150L,
130L, 110L, 50L, 40L, 80L, 90L, 40L, 40L, 40L, 40L, 200L,
140L, 40L, 25L, 50L, 50L, 40L, 20L, 40L, 340L, 70L, 60L,
50L, 20L, 80L, 60L, 25L, 260L, 20L, 15L, 40L, 30L, 300L,
120L, 60L, 100L, 50L, 40L, 20L, 90L, 50L, 40L, 80L, 30L,
40L), length = c(450L, 80L, 55L, 50L, 90L, 90L, 30L, 90L,
90L, 30L, 240L, 50L, 70L, 40L, 380L, 200L, 40L, 40L, 35L,
110L, 250L, 120L, 70L, 150L, 130L, 140L, 90L, 90L, 40L, 390L,
40L, 190L, 40L, 110L, 140L, 360L, 50L, 30L, 130L, 500L, 200L,
30L, 25L, 60L, 30L, 350L, 110L, 180L, 70L, 180L, 200L, 40L,
70L, 110L, 70L, 180L, 90L, 150L, 400L, 100L, 60L, 70L, 70L,
60L, 30L, 50L, 80L, 180L, 110L, 100L, 110L, 110L, 210L, 80L,
70L, 40L, 500L, 210L, 50L, 80L, 40L, 50L, 350L, 400L, 150L,
200L, 400L, 280L, 240L, 40L, 50L, 360L, 140L, 140L, 50L,
50L, 40L, 50L, 210L, 370L, 70L, 110L, 80L, 50L, 50L, 100L,
80L, 50L, 35L, 140L, 60L, 90L, 110L, 60L, 130L, 180L, 70L,
70L, 40L, 230L, 130L, 290L, 90L, 40L, 100L, 100L, 120L, 150L,
110L, 80L, 220L, 90L, 5L, 50L, 50L, 60L, 30L, 150L, 120L,
200L, 60L, 170L, 80L, 90L, 40L, 50L, 70L, 50L, 60L, 100L,
15L, 90L, 70L, 150L, 60L, 90L, 50L, 120L, 20L, 220L, 80L,
140L, 120L, 30L, 60L, 40L, 40L, 70L, 30L, 180L, 60L, 110L,
300L, 50L, 60L, 50L, 110L, 160L, 40L, 70L, 70L, 60L, 70L,
50L, 25L, 30L, 215L, 70L, 220L, 70L, 80L, 90L, 60L, 130L,
60L, 20L, 60L, 50L, 40L, 60L, 100L, 40L, 70L, 210L, 40L,
500L, 40L, 30L, 50L, 80L, 40L, 60L, 80L, 50L, 220L, 20L,
70L, 50L, 50L, 180L, 50L, 90L, 15L, 120L, 80L, 170L, 30L,
30L, 60L, 20L, 60L, 30L, 140L, 80L, 40L, 50L, 40L, 200L,
80L, 80L, 120L, 160L, 210L, 120L, 400L, 60L, 60L, 180L, 70L,
70L, 150L, 70L, 110L, 70L, 80L, 250L, 140L, 90L, 60L, 180L,
400L, 60L, 50L, 60L, 40L, 30L, 50L, 100L, 40L, 110L, 30L,
80L, 400L, 70L, 50L, 80L, 30L, 180L, 70L, 60L, 100L, 70L,
50L, 100L, 60L, 220L, 70L, 70L, 200L, 110L, 50L, 110L, 50L,
60L, 250L, 220L, 60L, 80L, 35L, 210L, 70L, 70L, 110L, 320L,
280L, 60L, 50L, 60L, 100L, 70L, 70L, 170L, 170L, 230L, 80L,
130L, 90L, 10L, 60L, 70L, 60L, 120L, 40L, 50L, 160L, 100L,
30L, 40L, 40L, 90L, 30L, 80L, 240L, 100L, 170L, 60L, 120L,
20L, 40L, 70L, 150L, 80L, 50L, 90L, 130L, 70L, 60L, 480L,
150L, 130L, 90L, 70L, 150L, 100L, 70L, 50L, 40L, 60L, 400L,
200L, 80L, 30L, 120L, 70L, 50L, 40L, 40L, 360L, 90L, 70L,
60L, 40L, 110L, 80L, 25L, 270L, 40L, 25L, 50L, 30L, 320L,
150L, 100L, 100L, 60L, 40L, 50L, 100L, 50L, 50L, 200L, 30L,
80L), ground = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 2L,
1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 3L, 1L,
2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 2L,
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 3L,
1L, 2L, 1L, 2L, 1L, 1L, 3L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L,
1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L,
1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 3L, 2L,
2L, 1L, 1L, 1L, 2L, 2L, 1L, 3L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 3L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 3L, 1L, 1L, 2L, 1L, 1L,
2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L,
2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 2L,
1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L,
2L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L,
1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 3L, 1L, 3L, 2L,
1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L,
1L, 1L, 1L, 2L, 1L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 1L, 1L, 1L, 2L, 3L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 1L, 3L, 1L, 1L, 2L, 1L, 2L, 1L, 3L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 3L, 1L, 1L, 1L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L,
1L, 2L, 3L, 1L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L,
2L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L,
1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 2L,
1L, 1L, 2L, 1L, 1L, 3L, 2L, 1L, 3L, 1L), .Label = c("Grass",
"GrassRock", "Rock"), class = "factor"), sun = structure(c(3L,
1L, 3L, 3L, 3L, 1L, 3L, 3L, 3L, 1L, 1L, 3L, 3L, 1L, 1L, 1L,
3L, 3L, 3L, 1L, 3L, 3L, 3L, 1L, 3L, 3L, 1L, 2L, 3L, 1L, 1L,
1L, 3L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L,
3L, 1L, 3L, 3L, 1L, 3L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L,
3L, 3L, 1L, 3L, 3L, 3L, 2L, 3L, 3L, 3L, 1L, 1L, 3L, 1L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 3L, 1L, 1L, 3L, 3L,
3L, 1L, 1L, 3L, 2L, 1L, 3L, 1L, 3L, 2L, 1L, 1L, 3L, 3L, 1L,
3L, 3L, 3L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 3L, 3L,
3L, 1L, 3L, 1L, 1L, 3L, 3L, 3L, 2L, 1L, 3L, 3L, 1L, 3L, 3L,
1L, 3L, 2L, 1L, 3L, 3L, 1L, 3L, 3L, 1L, 3L, 1L, 3L, 3L, 3L,
3L, 1L, 1L, 3L, 3L, 3L, 1L, 3L, 3L, 1L, 3L, 1L, 3L, 3L, 1L,
1L, 3L, 3L, 3L, 3L, 3L, 1L, 3L, 3L, 1L, 1L, 3L, 3L, 1L, 1L,
1L, 3L, 1L, 1L, 1L, 3L, 1L, 1L, 3L, 1L, 2L, 1L, 3L, 1L, 3L,
3L, 3L, 1L, 1L, 3L, 3L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L,
3L, 3L, 3L, 3L, 3L, 3L, 1L, 3L, 3L, 3L, 3L, 1L, 3L, 3L, 3L,
3L, 1L, 3L, 3L, 1L, 3L, 1L, 3L, 3L, 1L, 3L, 3L, 1L, 2L, 1L,
1L, 1L, 3L, 3L, 1L, 3L, 1L, 3L, 3L, 3L, 2L, 3L, 3L, 3L, 3L,
3L, 1L, 3L, 3L, 3L, 1L, 1L, 1L, 3L, 1L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 1L, 3L, 3L, 1L, 1L, 3L, 1L, 1L, 2L, 3L, 3L, 1L, 3L,
1L, 1L, 1L, 3L, 1L, 1L, 3L, 3L, 1L, 1L, 1L, 3L, 1L, 1L, 2L,
3L, 1L, 3L, 3L, 2L, 1L, 1L, 3L, 2L, 1L, 3L, 3L, 3L, 1L, 3L,
3L, 3L, 1L, 3L, 1L, 3L, 3L, 1L, 3L, 3L, 3L, 3L, 3L, 1L, 3L,
1L, 3L, 1L, 3L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 3L, 3L,
1L, 1L, 3L, 3L, 3L, 3L, 3L, 2L, 1L, 3L, 1L, 1L, 3L, 3L, 1L,
3L, 3L, 3L, 3L, 1L, 3L, 3L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 1L,
3L, 1L), .Label = c("Half", "Shade", "Sun"), class = "factor"),
leaf = structure(c(2L, 2L, 4L, 2L, 2L, 4L, 2L, 2L, 4L, 2L,
2L, 2L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 4L, 2L, 1L, 2L, 1L, 3L, 1L, 1L, 1L, 4L, 4L, 4L, 1L,
1L, 4L, 4L, 2L, 2L, 2L, 2L, 2L, 4L, 1L, 1L, 2L, 4L, 2L, 2L,
2L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 4L, 2L, 4L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 4L, 4L, 2L, 2L, 1L, 2L, 2L,
1L, 1L, 2L, 2L, 4L, 2L, 2L, 1L, 2L, 4L, 4L, 4L, 2L, 1L, 2L,
2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 2L, 2L, 4L, 1L, 2L, 2L,
2L, 2L, 4L, 2L, 1L, 4L, 2L, 4L, 2L, 2L, 2L, 2L, 2L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 4L, 1L, 2L, 2L, 1L, 2L, 2L,
2L, 2L, 4L, 4L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 4L, 2L, 2L, 1L,
2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 2L, 2L, 1L, 2L, 4L, 2L,
2L, 2L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 2L, 2L, 2L, 2L, 4L,
2L, 2L, 2L, 2L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 1L, 1L, 2L,
2L, 4L, 2L, 4L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 4L, 2L, 4L, 1L,
2L, 4L, 4L, 4L, 2L, 2L, 2L, 4L, 1L, 2L, 4L, 4L, 2L, 1L, 2L,
4L, 4L, 1L, 4L, 2L, 2L, 2L, 2L, 4L, 1L, 2L, 1L, 1L, 2L, 2L,
2L, 4L, 2L, 2L, 4L, 2L, 1L, 2L, 2L, 2L, 2L, 4L, 2L, 4L, 2L,
2L, 2L, 1L, 4L, 4L, 4L, 2L, 2L, 2L, 1L, 4L, 4L, 2L, 2L, 2L,
4L, 1L, 2L, 4L, 2L, 1L, 2L, 2L, 2L, 4L, 2L, 2L, 2L, 2L, 2L,
2L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 4L, 2L, 2L, 2L,
2L, 1L, 2L, 1L, 4L, 2L, 1L, 2L, 4L, 4L, 4L, 4L, 2L, 2L, 2L,
2L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 4L, 2L, 4L,
1L, 2L, 4L, 2L, 2L, 2L, 4L, 1L, 2L, 1L, 2L, 2L, 2L, 4L, 1L,
2L, 2L, 2L, 1L, 2L, 4L, 2L, 2L, 2L, 1L, 4L, 4L, 2L, 2L, 2L,
4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 2L, 2L, 2L,
2L, 2L, 4L, 4L, 4L, 2L, 4L, 2L), .Label = c("Large", "Medium",
"Scarce", "Small"), class = "factor"), Presence = c(0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L,
0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L,
1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L,
0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L,
0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L,
1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 0L,
0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L,
1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 1L,
0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L,
1L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L,
0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 0L,
0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
1L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L,
0L, 0L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L,
0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 1L, 0L, 0L,
1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L,
0L)), .Names = c("site", "species", "aspect", "height", "width",
"length", "ground", "sun", "leaf", "Presence"), row.names = c(NA,
393L), class = "data.frame")
After the model selection, this is the optimal model:
model <- glm(Presence ~ site + species + aspect + length + sun
+ leaf, data=MyData, family=binomial)
With respect to the 1st way referred to above, one can do the following:
library(car)
vif(model)
to obtain VIFs based on the model as an input.
But with respect to 2nd way, one could look at VIFs of variables, before fitting the model:
library(AED) # note that his package has been discontinued
vars <- cbind(MyData$site, MyData$species,
MyData$aspect , MyData$length ,
MyData$width, MyData$height,
MyData$ground, MyData$sun, MyData$leaf)
corvif(vars)
(the corvif() function code can be found here: http://www.highstat.com/Book2/HighstatLibV6.R)
The underlying mathematics of the two functions appear to be the same, but the way the functions are written, they accept different types of objects as input.
My questions are:
Do you prefer to calculate VIFs based
on a list of variables prior to model fitting,
on a fitted model, or
both?
Are there any functions (in additions to the two referred to already) that people recommend and/or use to calculate VIFs?
Is anyone aware of a single R function that works on both the list of variables and the fitted model as in input?
My (opinionated) answer to the question: whether it's more appropriate to use vif on a model object or on the data itself, would be that it would be best practice to do it before the model is constructed as part of the process of understanding the relationships within the data before modeling. But truth be told, I think most of the time it's done as an afterthought because of unexpected results (standard errors that blow up, usually).
If you want a function that can take either a fit object or a dimensioned data-object (matrix or dataframe), then I think you may need to "roll your own". I have used the rms/Hmisc pair of packages extensively and there is also a vif in the 'rms'-package as well as a which.influence function that lets you know the combinations that are responsible for the multicollinearity. It only accepts a fit-object. Because the versions that handle fit-objects can look at both the result of vcov and the terms in the RHS of the formula, you would only need to have single argument. However, if you want to specify which columns to examine in a dimensioned object, then you would need to provide function code to handle a second parameter.
I did a search with:
sos::findFn("vif")
... and the fourth page examined (function vif in package "HH") appears to offer a choice of which strategy to use: http://finzi.psych.upenn.edu/R/library/HH/html/vif.html
If you wanted to write your own, then you already have the code in the form of the corvif and myvif functions on the page you linked to. The corvif function uses the myvif function, which is model-based. So you could insert code to check for the presence of the first argument's class in the vector of methods returned by methods(vcov).

Resources