I would like to plot two data frames on a single plot - r

I am fairly new to R and am attempting to plot data frames simultaneously using ggplot2.
I have two data frames.
One is called WorkSchedMonday and consist of 96 rows and 4 columns.
structure(c(9, 9, 9, 9, 18, 18, 36, 36, 36, 36, 64, 80, 96, 96,
112, 128, 168, 168, 296, 312, 14, 14, 14, 21, 21, 21, 21, 35,
49, 49, 12, 12, 6, 6, 0, 0, 0, 0, 6, 6), .Dim = c(10L, 4L), .Dimnames = list(
c("04:00", "04:15", "04:30", "04:45", "05:00", "05:15", "05:30",
"05:45", "06:00", "06:15"), c("WorkSchedAndIndivMondayAtHome",
"WorkSchedAndIndivMondayAtSingleWorkPlace", "WorkSchedAndIndivMondayAtVarietyOfPlaces",
"WorkSchedAndIndivMondayWorkingOnTheMove")))
The other is called WorkSchedTuesday and consist of 96 rows and 4 columns.
structure(c(0, 0, 0, 0, 9, 9, 27, 27, 36, 36, 64, 80, 96, 96,
112, 128, 168, 168, 296, 312, 14, 14, 14, 21, 21, 21, 21, 35,
49, 49, 12, 12, 6, 6, 0, 0, 0, 0, 6, 6), .Dim = c(10L, 4L), .Dimnames = list(
c("04:00", "04:15", "04:30", "04:45", "05:00", "05:15", "05:30",
"05:45", "06:00", "06:15"), c("WorkSchedAndIndivTuesdayAtHome",
"WorkSchedAndIndivTuesdayAtSingleWorkPlace", "WorkSchedAndIndivTuesdayAtVarietyOfPlaces",
"WorkSchedAndIndivTuesdayWorkingOnTheMove")))
Using the following code a plotted the 2 data frames.
WorkSchedWeek<-as.matrix(cbind(WorkSchedAndIndivMondayAtHome,WorkSchedAndIndivMondayAtSingleWorkPlace,WorkSchedAndIndivMondayAtVarietyOfPlaces, WorkSchedAndIndivMondayWorkingOnTheMove, WorkSchedAndIndivTuesdayAtHome,WorkSchedAndIndivTuesdayAtSingleWorkPlace,WorkSchedAndIndivTuesdayAtVarietyOfPlaces, WorkSchedAndIndivTuesdayWorkingOnTheMove))
####
melted_WorkSchedWeek<- melt(WorkSchedWeek)
plot<-ggplot(melted_WorkSchedWeek) + geom_col(aes(x = Var1,y = value,fill = Var2),position = "fill") + theme(legend.position="right", axis.text.x = element_text(angle = 90, hjust = 1))
plot + labs(x="Time", y="Probabilities", colour="Work schedules", fill="Work schedules")
However I would like to create the above plot using ggplot (or lattice) . On x axis is time (0400 till 0345 _ 24hours) per days (Monday and Tuesday), y axis probability distributions. The plot is filled with work schedules values. Can somebody help me? Thanks

You can use facet_grid to make two graphs side by side but sharing an axis. But this requires you to first merge your two dataframes.
To do this we standardize your variables, add a day column, a time column and then use rbind:
WorkSchedMonday = data.frame(structure(c(9, 9, 9, 9, 18, 18, 36, 36, 36, 36, 64, 80, 96, 96,
112, 128, 168, 168, 296, 312, 14, 14, 14, 21, 21, 21, 21, 35,
49, 49, 12, 12, 6, 6, 0, 0, 0, 0, 6, 6), .Dim = c(10L, 4L), .Dimnames = list(
c("04:00", "04:15", "04:30", "04:45", "05:00", "05:15", "05:30",
"05:45", "06:00", "06:15"), c("WorkSchedAndIndivMondayAtHome",
"WorkSchedAndIndivMondayAtSingleWorkPlace", "WorkSchedAndIndivMondayAtVarietyOfPlaces",
"WorkSchedAndIndivMondayWorkingOnTheMove"))))
names(WorkSchedMonday) = c("AtHome", "SingleWork", "Variety", "OnTheMove")
WorkSchedMonday$time = rownames(WorkSchedMonday)
WorkSchedTuesday = data.frame(structure(c(0, 0, 0, 0, 9, 9, 27, 27, 36, 36, 64, 80, 96, 96,
112, 128, 168, 168, 296, 312, 14, 14, 14, 21, 21, 21, 21, 35,
49, 49, 12, 12, 6, 6, 0, 0, 0, 0, 6, 6), .Dim = c(10L, 4L), .Dimnames = list(
c("04:00", "04:15", "04:30", "04:45", "05:00", "05:15", "05:30",
"05:45", "06:00", "06:15"), c("WorkSchedAndIndivMondayAtHome",
"WorkSchedAndIndivMondayAtSingleWorkPlace", "WorkSchedAndIndivMondayAtVarietyOfPlaces",
"WorkSchedAndIndivMondayWorkingOnTheMove"))))
names(WorkSchedTuesday) = c("AtHome", "SingleWork", "Variety", "OnTheMove")
WorkSchedTuesday$time = rownames(WorkSchedTuesday)
WorkSchedMonday$day = "Monday"
WorkSchedTuesday$day = "Tuesday"
WorkSched = rbind(WorkSchedMonday, WorkSchedTuesday)
With that done, you can melt your dataframe like you did before and run the same ggplot, but with facet_grid along the variable that you want your graph to be separated by (day).
WorkSched_melt = melt(WorkSched, id.vars = c("time", "day"))
ggplot(WorkSched_melt, aes(x = time, y = value, fill = variable)) + geom_col(position = "fill") +
facet_grid(. ~ day) + theme(legend.position="right", axis.text.x = element_text(angle = 90, hjust = 1))
As a general rule, avoid using really big and clunky variable names, and also avoid having a necessary variable (in this case, time) as your row name.

Here is a solution with the data preparation code done with package dplyr.
library(ggplot2)
library(dplyr)
WorkSchedWeek <- cbind(WorkSchedMonday, WorkSchedTuesday)
WorkSchedWeek <- as.data.frame(WorkSchedWeek)
WorkSchedWeek <- cbind.data.frame(Hour = row.names(WorkSchedWeek), WorkSchedWeek)
melted_WorkSchedWeek <- reshape2::melt(WorkSchedWeek, id.vars = "Hour")
melted_WorkSchedWeek %>%
mutate(variable = sub("^WorkSchedAndIndiv", "", variable),
Month = sub("(^.{3}).*", "\\1", variable),
variable = sub("^.*day", "", variable)) %>%
ggplot(aes(x = Hour,y = value, fill = variable)) +
geom_col(position = "fill") +
theme(legend.position = "right",
axis.text.x = element_text(angle = 90, hjust = 1)) +
facet_wrap(~ Month)

Related

From Boxplot to Barplot in ggplot possible?

I have to do a ggplot barplot with errorbars, Tukey sig. letters for plants grown with different fertilizer concentraitions.
The data should be grouped after the dif. concentrations and the sig. letters should be added automaticaly.
I have already a code for the same problem but for Boxplot - which is working nicely. I tried several tutorials with barplots but I always get the problem; stat_count() can only have an x or y aesthetic.
So I thought, is it possible to get my boxplot code to a barplot code? I tried but I couldnt do it :) And if not - how do I automatically add tukeyHSD Test result sig. letters to a ggplot barplot?
This is my Code for the boxplot with the tukey letters:
    value_max = Dünger, group_by(Duenger.g), summarize(max_value = max(Höhe.cm))
hsd=HSD.test(aov(Höhe.cm~Duenger.g, data=Dünger),
trt = "Duenger.g", group = T) sig.letters <- hsd$groups[order(row.names(hsd$groups)), ]
J <- ggplot(Dünger, aes(x = Duenger.g, y = Höhe.cm))+ geom_boxplot(aes(fill= Duenger.g))+ scale_fill_discrete(labels=c("0.5g", '1g', "2g", "3g", "4g"))+ geom_text(data = value_max, aes(x=Duenger.g, y = 0.1 + max_value, label = sig.letters$groups), vjust=0)+ stat_boxplot(geom = 'errorbar', width = 0.1)+ ggtitle("Auswirkung von Dünger auf die Höhe von Pflanzen") + xlab("Dünger in g") + ylab("Höhe in cm"); J
This is how it looks:
boxplot with tukey
Data from dput:
structure(list(Duenger.g = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,
0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,
0.5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4), plant = c(1, 2, 3, 4, 5, 7, 10, 11, 12, 13, 14, 18, 19,
21, 23, 24, 25, 26, 27, 29, 30, 31, 33, 34, 35, 37, 38, 39, 40,
41, 42, 43, 44, 48, 49, 50, 53, 54, 55, 56, 57, 58, 61, 62, 64,
65, 66, 67, 68, 69, 70, 71, 72, 73, 75, 79, 80, 81, 83, 85, 86,
88, 89, 91, 93, 99, 100, 102, 103, 104, 105, 106, 107, 108, 110,
111, 112, 113, 114, 115, 116, 117, 118, 120, 122, 123, 125, 126,
127, 128, 130, 131, 132, 134, 136, 138, 139, 140, 141, 143, 144,
145, 146, 147, 149), height.cm = c(5.7, 2.8, 5.5, 8, 3.5, 2.5,
4, 6, 10, 4.5, 7, 8.3, 11, 7, 8, 2.5, 7.4, 3, 14.5, 7, 12, 7.5,
30.5, 27, 6.5, 19, 10.4, 12.7, 27.3, 11, 11, 10.5, 10.5, 13,
53, 12.5, 12, 6, 12, 35, 8, 16, 56, 63, 69, 62, 98, 65, 77, 32,
85, 75, 33.7, 75, 55, 38.8, 39, 46, 35, 59, 44, 31.5, 49, 34,
52, 37, 43, 38, 28, 14, 28, 19, 20, 23, 17.5, 32, 16, 17, 24.7,
34, 50, 12, 14, 21, 33, 39.3, 41, 29, 35, 48, 40, 65, 35, 10,
26, 34, 41, 32, 38, 23.5, 22.2, 20.5, 29, 34, 45)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -105L))
Thank you
mirai
A bar chart and a boxplot are two different things. By default geom_boxplot computes the boxplot stats by default (stat="boxplot"). In contrast when you use geom_bar it will by default count the number of observations (stat="count") which are then mapped on y. That's the reason why you get an error. Hence, simply replacing geom_boxplot by geom_bar will not give your your desired result. Instead you could use e.g. stat_summary to create your bar chart with errorbars. Additionally I created a summary dataset to add the labels on the top of the error bars.
library(ggplot2)
library(dplyr)
library(agricolae)
Dünger <- Dünger |>
rename("Höhe.cm" = height.cm) |>
mutate(Duenger.g = factor(Duenger.g))
hsd <- HSD.test(aov(Höhe.cm ~ Duenger.g, data = Dünger), trt = "Duenger.g", group = T)
sig.letters <- hsd$groups %>% mutate(Duenger.g = row.names(.))
duenger_sum <- Dünger |>
group_by(Duenger.g) |>
summarize(mean_se(Höhe.cm)) |>
left_join(sig.letters, by = "Duenger.g")
ggplot(Dünger, aes(x = Duenger.g, y = Höhe.cm, fill = Duenger.g)) +
stat_summary(geom = "bar", fun = "mean") +
stat_summary(geom = "errorbar", width = .1) +
scale_fill_discrete(labels = c("0.5g", "1g", "2g", "3g", "4g")) +
geom_text(data = duenger_sum, aes(y = ymax, label = groups), vjust = 0, nudge_y = 1) +
labs(
title = "Auswirkung von Dünger auf die Höhe von Pflanzen",
x = "Dünger in g", y = "Höhe in cm"
)
#> No summary function supplied, defaulting to `mean_se()`
But as the summary dataset now already contains the mean and the values for the error bars a second option would be to do:
ggplot(duenger_sum, aes(x = Duenger.g, y = y, fill = Duenger.g)) +
geom_col() +
geom_errorbar(aes(ymin = ymin, ymax = ymax), width = .1) +
scale_fill_discrete(labels = c("0.5g", "1g", "2g", "3g", "4g")) +
geom_text(aes(y = ymax, label = groups), vjust = 0, nudge_y = 1) +
labs(
title = "Auswirkung von Dünger auf die Höhe von Pflanzen",
x = "Dünger in g", y = "Höhe in cm"
)

Create bar plot in ggplot2 - Place data frame values instead of count

I'd like to place this data onto a bar plot using ggplot2
where the column "Clades" would be placed on the X axis and the values from each column (such as the values of 19A, for example) would be place on Y axis
I'm trying something like this:
cols = as.vector(names(snv_data)[2:19])
ggplot(df, aes(x=cols)) + geom_bar()
But I keep getting this:
I'm new to ggplot2 so any help is very welcome!
I'm doing this to try and get 7 plots (one for each column such as 19A, 20A, 20B, etc) where each plot would have the Clades on the X-axis and each value from each column as the "counts" on the Y-axis
dput:
structure(list(Clades = c("C.T", "A.G", "G.A", "G.C", "T.C",
"C.A", "G.T", "A.T", "T.A", "T.G", "A.C", "C.G", "A.del", "TAT.del",
"TCTGGTTTT.del", "TACATG.del", "AGTTCA.del", "GATTTC.del"), `19A` = c(413,
93, 21, 0, 49, 9, 238, 13, 3, 1, 0, 4, 1, 0, 0, 0, 0, 0), `20A` = c(7929,
1920, 1100, 419, 1025, 124, 3730, 124, 22, 45, 64, 17, 8, 19,
23, 39, 0, 0), `20B` = c(5283, 1447, 2325, 1106, 336, 117, 946,
137, 35, 53, 123, 11, 9, 10, 21, 1, 0, 0), `20E (EU1)` = c(13086,
1927, 650, 1337, 1864, 96, 2967, 243, 69, 92, 115, 1486, 27,
5, 0, 1, 0, 0), `20I (Alpha, V1)` = c(71142, 12966, 12047, 15587,
14935, 15382, 11270, 12211, 5284, 4273, 430, 99, 5674, 4536,
4974, 4592, 0, 0), `20J (Gamma, V3)` = c(2822, 654, 883, 409,
501, 213, 843, 399, 203, 27, 429, 198, 1, 0, 197, 0, 0, 0), `21J (Delta)` = c(166003,
49195, 26713, 1399, 25824, 15644, 95967, 2011, 329, 11034, 716,
21087, 10532, 198, 0, 14, 9809, 10503)), class = "data.frame", row.names = c("C.T",
"A.G", "G.A", "G.C", "T.C", "C.A", "G.T", "A.T", "T.A", "T.G",
"A.C", "C.G", "A.del", "TAT.del", "TCTGGTTTT.del", "TACATG.del",
"AGTTCA.del", "GATTTC.del"))
To add to the previous answer, here is how you can get 7 plots (1 for each Clade, which is how I interpreted the question) using facet_wrap():
df <- df %>%
pivot_longer(-Clades)
ggplot(data = df,
aes(x = Clades,
y = value)) +
geom_bar(aes(fill = Clades),
stat = 'identity') +
facet_wrap(~name, scales = 'free_y') +
theme(axis.text.x = element_blank())
As cazman said in the comments, you need to get your data in long form for it to work with ggplot2 (efficiently).
First, use pivot_longer(), and then use ggplot2:
library(tidyverse)
dat %>%
pivot_longer(-Clades) %>%
ggplot(aes(x=Clades, y=value, fill=name)) +
geom_col()

How do I use column index as x axis in R

I have a data frame with 7 columns and 100 observations
I divided observations into two groups
the question I'm working on is: b) Construct two time plots of the mean blood lead levels superimposed on the blood lead levels at each occasion for succimer and placebo groups.
This is my code so far:
library(tidyverse)
library(haven)
library(dplyr)
library(plyr)
library(foreign)
library(ggplot2)
tlc = read_dta(file = 'tlc.dta')
head(tlc)
## a)
placebo = subset(tlc, tlc$trt==0)
succimer = subset(tlc, tlc$trt==1)
summary(placebo[, 3:6])
summary(succimer[, 3:6])
placebo_mean=colMeans(placebo[ ,3:6])
placebo_std=apply(placebo[ ,3:6],2,sd)
placebo_var=placebo_std^2
succimer_mean=colMeans(succimer[ ,3:6])
succimer_std=apply(succimer[ ,3:6],2,sd)
succimer_var=succimer_std^2
## b)
## c)
placebo_cor=cor(placebo[ , 3:6]) %>% round(digits = 3)
succimer_cor=cor(succimer[ , 3:6]) %>% round(digits = 3)
placebo_cov=cov(placebo[ , 3:6]) %>% round(digits = 3)
succimer_cov=cov(succimer[ , 3:6]) %>% round(digits = 3)
So the purpose is to plot all observation by using values as y axis, and columns y0, y1, y4, y6 (represent to week 0, week 1, week 4, week 6) as x axis, then plot the mean of each group superimposed on the plot. I'm planning to use different colors to distinguish two groups, so the final plot will have a lot of points on each x coordinate, and two short lines to indicate means for each group at each x coordinate.
My question is how to use column index as x axis in R? with or with out using ggplot. I know this question may be too elementary, but it caused a lot of trouble for me as a beginner.
below is my data:
dput(tlc)
structure(list(id = structure(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26,
27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42,
43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58,
59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74,
75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90,
91, 92, 93, 94, 95, 96, 97, 98, 99, 100), format.stata = "%9.0g"),
trt = structure(c(0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0,
1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1,
0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0,
0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1,
1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1,
1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1), format.stata = "%9.0g", class = "haven_labelled", labels = c(Placebo = 0,
Succimer = 1)), y0 = structure(c(30.7999992370605, 26.5,
25.7999992370605, 24.7000007629395, 20.3999996185303, 20.3999996185303,
28.6000003814697, 33.7000007629395, 19.7000007629395, 31.1000003814697,
19.7999992370605, 24.7999992370605, 21.3999996185303, 27.8999996185303,
21.1000003814697, 20.6000003814697, 24, 37.5999984741211,
35.2999992370605, 28.6000003814697, 31.8999996185303, 29.6000003814697,
21.5, 26.2000007629395, 21.7999992370605, 23, 22.2000007629395,
20.5, 25, 33.2999992370605, 26, 19.7000007629395, 27.8999996185303,
24.7000007629395, 28.7999992370605, 29.6000003814697, 32,
21.7999992370605, 24.3999996185303, 33.7000007629395, 24.8999996185303,
19.7999992370605, 26.7000007629395, 26.7999992370605, 20.2000007629395,
35.4000015258789, 25.2999992370605, 20.2000007629395, 24.5,
20.2999992370605, 20.3999996185303, 24.1000003814697, 27.1000003814697,
34.7000007629395, 28.5, 26.6000003814697, 24.5, 20.5, 25.2000007629395,
34.7000007629395, 30.2999992370605, 26.6000003814697, 20.7000007629395,
27.7000007629395, 24.2999992370605, 36.5999984741211, 28.8999996185303,
34, 32.5999984741211, 29.2000007629395, 26.3999996185303,
21.7999992370605, 27.2000007629395, 22.3999996185303, 32.5,
24.8999996185303, 24.6000003814697, 23.1000003814697, 21.1000003814697,
25.7999992370605, 30, 22.1000003814697, 20, 38.0999984741211,
28.8999996185303, 25.1000003814697, 19.7999992370605, 22.1000003814697,
23.5, 29.1000003814697, 30.2999992370605, 25.3999996185303,
30.6000003814697, 22.3999996185303, 31.2000007629395, 31.3999996185303,
41.0999984741211, 29.3999996185303, 21.8999996185303, 20.7000007629395
), format.stata = "%9.0g"), y1 = structure(c(26.8999996185303,
14.8000001907349, 23, 24.5, 2.79999995231628, 5.40000009536743,
20.7999992370605, 31.6000003814697, 14.8999996185303, 31.2000007629395,
17.5, 23.1000003814697, 26.2999992370605, 6.30000019073486,
20.2999992370605, 23.8999996185303, 16.7000007629395, 33.7000007629395,
25.5, 15.8000001907349, 27.8999996185303, 15.8000001907349,
6.5, 26.7999992370605, 12, 4.19999980926514, 11.5, 21.1000003814697,
3.90000009536743, 26.2000007629395, 21.3999996185303, 13.1999998092651,
21.6000003814697, 21.2000007629395, 26.3999996185303, 17.5,
30.2000007629395, 19.2999992370605, 16.3999996185303, 14.8999996185303,
20.8999996185303, 18.8999996185303, 6.40000009536743, 20.3999996185303,
10.6000003814697, 30.3999996185303, 23.8999996185303, 17.5,
10, 21, 17.2000007629395, 20.1000003814697, 14.8999996185303,
39, 32.5999984741211, 22.3999996185303, 5.09999990463257,
17.5, 25.1000003814697, 39.5, 29.3999996185303, 25.2999992370605,
19.2999992370605, 4, 24.2999992370605, 23.2999992370605,
28.8999996185303, 10.6999998092651, 19, 9.19999980926514,
15.3000001907349, 10.6000003814697, 28.5, 22, 25.1000003814697,
23.6000003814697, 25, 20.8999996185303, 5.59999990463257,
21.8999996185303, 27.6000003814697, 21, 22.7000007629395,
40.7999992370605, 12.5, 28.1000003814697, 11.6000003814697,
21.1000003814697, 7.90000009536743, 16.7999992370605, 3.5,
24.2999992370605, 28.2000007629395, 7.09999990463257, 10.8000001907349,
3.90000009536743, 15.1000003814697, 22.1000003814697, 7.59999990463257,
8.10000038146973), format.stata = "%9.0g"), y4 = structure(c(25.7999992370605,
19.5, 19.1000003814697, 22, 3.20000004768372, 4.5, 19.2000007629395,
28.5, 15.3000001907349, 29.2000007629395, 20.5, 24.6000003814697,
19.5, 18.5, 18.3999996185303, 19, 21.7000007629395, 34.4000015258789,
26.2999992370605, 22.8999996185303, 27.2999992370605, 23.7000007629395,
7.09999990463257, 25.2999992370605, 16.7999992370605, 4,
9.5, 17.3999996185303, 12.8000001907349, 34, 21, 14.6000003814697,
23.6000003814697, 22.8999996185303, 23.7999992370605, 21,
30.2000007629395, 16.3999996185303, 11.6000003814697, 14.5,
22.2000007629395, 18.8999996185303, 5.09999990463257, 19.2999992370605,
9, 26.5, 22.2000007629395, 17.3999996185303, 15.6000003814697,
16.7000007629395, 15.8999996185303, 17.8999996185303, 18.1000003814697,
28.7999992370605, 27.5, 21.7999992370605, 8.19999980926514,
19.6000003814697, 23.3999996185303, 38.5999984741211, 33.0999984741211,
25.1000003814697, 21.8999996185303, 4.19999980926514, 18.3999996185303,
40.4000015258789, 32.7999992370605, 12.6000003814697, 16.2999992370605,
8.30000019073486, 24.6000003814697, 14.3999996185303, 35,
19.1000003814697, 27.7999992370605, 21.2000007629395, 21.7000007629395,
21.7000007629395, 7.30000019073486, 23.6000003814697, 24,
8.60000038146973, 21.2000007629395, 38, 16.7000007629395,
27.5, 13, 21.5, 12.3999996185303, 15.1000003814697, 3, 22.7000007629395,
27, 17.2000007629395, 19.7999992370605, 7, 10.8999996185303,
25.2999992370605, 10.8000001907349, 25.7000007629395), format.stata = "%9.0g"),
y6 = structure(c(23.7999992370605, 21, 23.2000007629395,
22.5, 9.39999961853027, 11.8999996185303, 18.3999996185303,
25.1000003814697, 14.6999998092651, 30.1000003814697, 27.5,
30.8999996185303, 19, 16.2999992370605, 20.7999992370605,
17, 20.2999992370605, 31.3999996185303, 30.2999992370605,
25.8999996185303, 34.2000007629395, 23.3999996185303, 16,
24.7999992370605, 19.2000007629395, 16.2000007629395, 14.5,
21.1000003814697, 12.6999998092651, 28.2000007629395, 22.3999996185303,
11.6000003814697, 27.7000007629395, 21.8999996185303, 22,
24.2000007629395, 27.5, 17.6000003814697, 16.6000003814697,
63.9000015258789, 19.7999992370605, 15.5, 15.1000003814697,
23.7999992370605, 16, 28.1000003814697, 27.2000007629395,
18.6000003814697, 15.1999998092651, 13.5, 17.7000007629395,
18.7000007629395, 21.2999992370605, 34.7000007629395, 22.7999992370605,
21, 23.6000003814697, 18.3999996185303, 22.2000007629395,
43.2999992370605, 28.3999996185303, 27.8999996185303, 21.7999992370605,
11.6999998092651, 27.7999992370605, 39.2999992370605, 31.7999992370605,
21.2000007629395, 18.6000003814697, 18.3999996185303, 32.4000015258789,
18.7000007629395, 30.5, 18.7000007629395, 27.2999992370605,
21.1000003814697, 23.8999996185303, 19.8999996185303, 12.3000001907349,
24.7999992370605, 23.7000007629395, 24.6000003814697, 20.5,
32.7000007629395, 22.2000007629395, 24.7999992370605, 23.1000003814697,
20.6000003814697, 18.8999996185303, 18.7999992370605, 11.5,
20.1000003814697, 25.5, 18.7000007629395, 22.2000007629395,
17.7999992370605, 27.1000003814697, 4.09999990463257, 13,
12.3000001907349), format.stata = "%9.0g")), row.names = c(NA,
-100L), class = c("tbl_df", "tbl", "data.frame"))
also I have tried this:
p=ggplot(tlc, aes(x=colnames(tlc[,3:6],do.NULL=TRUE)),
y=value)
p=p+geom_point()
No errors found when running the code, but R did report an error (Aesthetics must be either length 1 or the same as the data (100): x) when I call 'p' to plot it.
I don't have your data, but it sounds like you want something that looks like this:
Here is how I made it:
library(tidyverse)
# Setting up some fake data: 100 observations and 7 variables
set.seed(123)
some_data <- data.frame(y0 = rnorm(100),
y1 = runif(100),
y2 = rexp(100, 2),
y3 = rnorm(100, 2, 1),
y4 = rexp(100),
y5 = rnorm(100, 2,2),
y6 = runif(100, -5, 5))
# pivoting the data to longer format:
long_data <- some_data %>%
pivot_longer(cols = everything(),
names_to = "variable")
# building the base plot
p <- ggplot(long_data, aes(x = variable, y = value))
# adding the points - use position_jitter to give it some width if you want
p <- p + geom_point(position = position_jitter(width = 0.2))
# adding the bars at mean - play around with width, color, and size
p <- p + stat_summary(geom = "errorbar",
fun = mean,
width = 0.4,
aes(ymax = ..y.., ymin = ..y..),
color = "orange",
size = 1.5)
p # show plot

Merge and Perfectly Align Histogram and Boxplot using ggplot2

since yesterday I am reading answers and websites in order to combine and align in one plot an histogram and a boxplot generated using ggplot2 package.
This question differs from others because the boxplot chart needs to be reduced in height and aligned to the left outer margin of the histogram.
Considering the following dataset:
my_df <- structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27,
28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43,
44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59,
60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75,
76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91,
92, 93, 94, 95, 96, 97, 98, 99, 100), value= c(18, 9, 3,
4, 3, 13, 12, 5, 8, 37, 64, 107, 11, 11, 8, 18, 5, 13, 13, 14,
11, 11, 9, 14, 11, 14, 12, 10, 11, 10, 5, 3, 8, 11, 12, 11, 7,
6, 6, 4, 11, 8, 14, 13, 14, 15, 10, 2, 4, 4, 8, 15, 21, 9, 5,
7, 11, 6, 11, 2, 6, 16, 5, 11, 21, 33, 12, 10, 13, 33, 35, 7,
7, 9, 2, 21, 32, 19, 9, 8, 3, 26, 37, 5, 6, 10, 18, 5, 70, 48,
30, 10, 15, 18, 7, 4, 19, 10, 4, 32)), row.names = c(NA, 100L
), class = "data.frame", .Names = c("id", "value"))
I generated the boxplot:
require(dplyr)
require(ggplot2)
my_df %>% select(value) %>%
ggplot(aes(x="", y = value)) +
geom_boxplot(fill = "lightblue", color = "black") +
coord_flip() +
theme_classic() +
xlab("") +
theme(axis.text.y=element_blank(),
axis.ticks.y=element_blank())
and I generated the histogram
my_df %>% select(id, value) %>%
ggplot() +
geom_histogram(aes(x = value, y = (..count..)/sum(..count..)),
position = "identity", binwidth = 1,
fill = "lightblue", color = "black") +
ylab("Relative Frequency") +
theme_classic()
The result I am looking to obtain is a single plot like:
Note that the boxplot must be reduced in height and the ticks must be exactly aligned in order to give a different perspective of the same visual.
You can use either egg, cowplot or patchwork packages to combine those two plots. See also this answer for more complex examples.
library(dplyr)
library(ggplot2)
plt1 <- my_df %>% select(value) %>%
ggplot(aes(x="", y = value)) +
geom_boxplot(fill = "lightblue", color = "black") +
coord_flip() +
theme_classic() +
xlab("") +
theme(axis.text.y=element_blank(),
axis.ticks.y=element_blank())
plt2 <- my_df %>% select(id, value) %>%
ggplot() +
geom_histogram(aes(x = value, y = (..count..)/sum(..count..)),
position = "identity", binwidth = 1,
fill = "lightblue", color = "black") +
ylab("Relative Frequency") +
theme_classic()
egg
# install.packages("egg", dependencies = TRUE)
egg::ggarrange(plt2, plt1, heights = 2:1)
cowplot
# install.packages("cowplot", dependencies = TRUE)
cowplot::plot_grid(plt2, plt1,
ncol = 1, rel_heights = c(2, 1),
align = 'v', axis = 'lr')
patchwork
# install.packages("devtools", dependencies = TRUE)
# devtools::install_github("thomasp85/patchwork")
library(patchwork)
plt2 + plt1 + plot_layout(nrow = 2, heights = c(2, 1))

Area plot with missing values in base R

I want to draw an area plot for which the base of the polygon is zero and the data lines are connected to the base by vertical segments at every data break (that is the beginning, the end and possible NAs/NaN).
I drew this:
I had to force vertical down ward segments where the serie is interrupted with NAs, and I did this transforming NAs in 0s. But that doesn't produce vertical segments but polygon lines that reach the following 0s. I solved the problem for the beginning and the end of the series, adding a (y = 0, x = 0) point on both sides on the serie.
But this doesn't fix the problem if the NAs are inside the serie.
Any idea?
here's an example code (different image):
pollen <- c(45, 257.4, 24.67, 54.6, 89.4, 297, 471.25, 1256.5, 312.25, 969.2, 787.5, 425, NaN, 76.6, 42.67, 38.5, 20.2, 5.67, 15.8, 13.2, 11, 6.25, 6.67, 2.3, 0.5, 30.8, 3.75, 3, 2, 2.2, 3.25, 4.5, 9.6, 15.8, 200.2, NaN)
weeks.vec <- c(5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40)
plot.ts(y = pollen, x = weeks.vec, col = 'red', ylab = 'Pollen', xlab = 'Weeks', lwd = 3, xy.labels = F, xy.lines = T)
pollen[is.na(pollen)] <- 0
poly.y <- c(0,pollen,0)
poly.x <- c(weeks.vec[1], weeks.vec, weeks.vec[length(weeks.vec)])
polygon(y = poly.y, x = poly.x, density = NA,border = NA, col = rgb(1,0,0, .3))
I'd use ggplot2:
pollen <- c(45, 257.4, 24.67, 54.6, 89.4, 297, 471.25, 1256.5, 312.25, 969.2, 787.5, 425, NaN, 76.6, 42.67, 38.5, 20.2, 5.67, 15.8, 13.2, 11, 6.25, 6.67, 2.3, 0.5, 30.8, 3.75, 3, 2, 2.2, 3.25, 4.5, 9.6, 15.8, 200.2, NaN)
weeks.vec <- c(5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40)
DF <- data.frame(pollen, weeks.vec)
library(ggplot2)
ggplot(DF, aes(x = weeks.vec, y = pollen)) +
geom_ribbon(aes(ymin = 0, ymax = pollen),
colour = NA, fill = "red", alpha = 0.3) +
geom_line(colour = "red") +
geom_point(colour = "red", size = 3) +
xlab("Week") + ylab("Pollen") +
theme_bw()
But if you must use base plots:
plot.ts(y = pollen, x = weeks.vec, col = 'red',
ylab = 'Pollen', xlab = 'Weeks', lwd = 3,
xy.labels = F, xy.lines = T)
g <- cumsum(!is.finite(pollen))
for (i in unique(g)) {
y <- pollen[g == i]
x <- weeks.vec[g == i]
x <- x[is.finite(y)]
y <- y[is.finite(y)]
x <- c(x, rev(x))
y <- c(y, y * 0)
polygon(y = y, x = x, density = NA,border = NA, col = rgb(1,0,0, .3))
}

Resources