Adding observation count in two-factor boxplot - r
I have already seen another similar question in : Add number of observations per group in ggplot2 boxplot
But this solution doesn't work in my plot as I have two-factors boxplots.
This is what I tried:
Here is my data:
> dput(Simp_Shan_Baseline_Grp[1:5,1:20])
structure(list(Datasets = structure(1:5, .Label = c("30001.10001.Visit.1.Baseline.Day.0.1h..",
"30001.10002.Visit.1.Baseline.Day.0.1h..", "30001.10003.Visit.1.Baseline.Day.0.1h..",
"30001.10004.Visit.1.Baseline.Day.0.1h..", "30001.10005.Visit.1.Baseline.Day.0.1h..",
"30001.10006.Visit.1.Baseline.Day.0.1h..", "30004.10001.Visit.1.Baseline.Day.0.1h..",
"30004.10002.Visit.1.Baseline.Day.0.1h..", "30004.10003.Visit.1.Baseline.Day.0.1h..",
"30004.10004.Visit.1.Baseline.Day.0.1h..", "30004.10006.Visit.1.Baseline.Day.0.1h..",
"30004.10007.Visit.1.Baseline.Day.0.1h..", "30004.10008.Visit.1.Baseline.Day.0.1h..",
"30005.10001.Visit.1.Baseline.Day.0.1h..", "30005.10002.Visit.1.Baseline",
"30005.10003.Visit.1.Baseline.Day.0.1h..", "30005.10004.Visit.1.Baseline.Day.0.1h..",
"30005.10005.Visit.1.Baseline.Day.0.1h..", "30005.10006.Visit.1.Baseline.Day.0.1h..",
"30005.10007.Visit.1.Baseline.Day.0.1h..", "30005.10008.Visit.1.Baseline.Day.0.1h..",
"30005.10009.Visit.1.Baseline.Day.0.1h..", "30006.10001.Visit.1.Baseline",
"30006.10002.Visit.1.Baseline", "30007.10001.Visit.1.Baseline.Day.0.1h..",
"30007.10002.Visit.1.Baseline.Day.0.1h..", "30008.10001.Visit.1.Baseline.Day.0.1h..",
"30008.10002.Visit.1.Baseline.Day.0.1h..", "30008.10003.Visit.1.Baseline",
"30008.10004.Visit.1.Baseline", "30008.10005.Visit.1.Baseline.Day.0.1h..",
"30008.10006.Visit.1.Baseline.Day.0.1h..", "30008.10007.Visit.1.Baseline.Day.0.1h..",
"30008.10008.Visit.1.Baseline.Day.0.1h..", "30009.10001.Visit.1.Baseline.Day.0.1h..",
"30009.10002.Visit.1.Baseline.Day.0.1h..", "30009.10003.Visit.1.Baseline.Day.0.1h..",
"30009.10004.Visit.1.Baseline.Day.0.1h..", "30009.10005.Visit.1.Baseline.Day.0.1h..",
"30009.10007.Visit.1.Baseline.Day.0.1h..", "30010.10001.Visit.1.Baseline.Day.0.1h..",
"30010.10002.Visit.1.Baseline.Day.0.1h..", "32005.10001.Visit.1.Baseline.Day.0.1h..",
"33001.10001.Visit.1.Baseline.Day.0.1h..", "33001.10002.Visit.1.Baseline.Day.0.1h..",
"33001.10003.Visit.1.Baseline.Day.0.1h..", "33001.10004.Visit.1.Baseline.Day.0.1h..",
"33001.10005.Visit.1.Baseline.Day.0.1h..", "33001.10006.Visit.1.Baseline.Day.0.1h..",
"33002.10001.Visit.1.Baseline.Day.0.1h..", "33002.10002.Visit.1.Baseline.Day.0.1h..",
"33002.10003.Visit.1.Baseline.Day.0.1h..", "33002.10004.Visit.1.Baseline.Day.0.1h..",
"33004.10001.Visit.1.Baseline.Day.0.1h..", "33005.10001.Visit.1.Baseline",
"33005.10002.Visit.1.Baseline.Day.0.1h..", "33005.10004.Visit.1.Baseline.Day.0.1h..",
"33006.10001.Visit.1.Baseline.Day.0.1h..", "33006.10002.Visit.1.Baseline.Day.0.1h..",
"33006.10003.Visit.1.Baseline.Day.0.1h..", "33006.10005.Unscheduled.Visit.F.Day.0.8h..",
"33006.10006.Visit.1.Baseline.Day.0.1h..", "33009.10001.Visit.1.Baseline.Day.0.1h..",
"33009.10002.Visit.1.Baseline.Day.0.1h..", "33009.10003.Visit.1.Baseline.Day.0.1h..",
"33009.10004.Visit.1.Baseline.Day.0.1h..", "33009.10005.Visit.1.Baseline.Day.0.1h..",
"34003.10001.Visit.1.Baseline.Day.0.1h..", "34003.10002.Visit.1.Baseline.Day.0.1h..",
"34003.10003.Visit.1.Baseline.Day.0.1h..", "34003.10004.Visit.1.Baseline.Day.0.1h..",
"34003.10005.Visit.1.Baseline.Day.0.1h..", "34003.10006.Visit.1.Baseline.Day.0.1h..",
"34003.10007.Visit.1.Baseline.Day.0.1h..", "34003.10008.Visit.1.Baseline.Day.0.1h..",
"34004.10001.Visit.1.Baseline.Day.0.1h..", "34004.10002.Visit.1.Baseline.Day.0.1h..",
"34004.10003.Visit.1.Baseline.Day.0.1h..", "34004.10004.Visit.1.Baseline.Day.0.1h..",
"34004.10005.Visit.1.Baseline", "35104.10001.Visit.1.Baseline.Day.0.1h..",
"35106.10001.Unscheduled.Visit.R.Day.0.7h..", "35107.10001.Visit.1.Baseline.Day.0.1h..",
"35801.10001.Visit.1.Baseline", "35802.10002.Visit.1.Baseline.Day.0.1h..",
"35802.10003.Visit.1.Baseline.Day.0.1h..", "36001.10001.Visit.1.Baseline.Day.0.1h..",
"36001.10002.Visit.1.Baseline.Day.0.1h..", "36004.10003.Visit.1.Baseline.Day.0.1h..",
"36004.10004.Visit.1.Baseline.Day.0.1h..", "36004.10005.Visit.1.Baseline.Day.0.1h..",
"36004.10006.Visit.1.Baseline.Day.0.1h..", "36005.10001.Visit.1.Baseline.Day.0.1h..",
"36007.10001.Visit.1.Baseline.Day.0.1h..", "36008.10001.Visit.1.Baseline.Day.0.1h..",
"36008.10005.Visit.1.Baseline.Day.0.1h..", "36008.10007.Visit.1.Baseline.Day.0.1h..",
"36008.10012.Visit.1.Baseline.Day.0.1h..", "36008.10017.Visit.1.Baseline.Day.0.1h..",
"36008.10018.Visit.1.Baseline.Day.0.1h..", "36008.10020.Visit.1.Baseline.Day.0.1h..",
"36008.10021.Visit.1.Baseline.Day.0.1h..", "36008.10022.Visit.1.Baseline.Day.0.1h..",
"36009.10001.Visit.1.Baseline.Day.0.1h..", "36009.10002.Visit.1.Baseline.Day.0.1h..",
"36009.10003.Visit.1.Baseline.Day.0.1h..", "36009.10004.Visit.1.Baseline.Day.0.1h..",
"36009.10005.Visit.1.Baseline.Day.0.1h..", "36009.10006.Visit.1.Baseline.Day.0.1h..",
"36010.10001.Visit.1.Baseline.Day.0.1h..", "36010.10002.Visit.1.Baseline.Day.0.1h..",
"36010.10003.Visit.1.Baseline.Day.0.1h..", "38501.10001.Visit.1.Baseline.Day.0.1h..",
"38501.10002.Visit.1.Baseline.Day.0.1h..", "38501.10003.Visit.1.Baseline.Day.0.1h..",
"38505.10001.Visit.1.Baseline.Day.0.1h..", "38505.10002.Visit.1.Baseline.Day.0.1h..",
"38506.10001.Visit.1.Baseline.Day.0.1h..", "38506.10002.Visit.1.Baseline.Day.0.1h..",
"38506.10003.Visit.1.Baseline.Day.0.1h..", "38506.10004.Visit.1.Baseline.Day.0.1h..",
"38601.10001.Visit.1.Baseline.Day.0.1h..", "38601.10003.Visit.1.Baseline.Day.0.1h..",
"38601.10004.Visit.1.Baseline.Day.0.1h..", "38601.10006.Visit.1.Baseline.Day.0.1h..",
"38601.10007.Visit.1.Baseline", "38602.10001.Visit.1.Baseline.Day.0.1h..",
"38602.10002.Visit.1.Baseline.Day.0.1h..", "38603.10002.Visit.1.Baseline.Day.0.1h..",
"38603.10003.Visit.1.Baseline.Day.0.1h..", "39001.10001.Visit.1.Baseline",
"39001.10002.Visit.1.Baseline", "39001.10003.Visit.1.Baseline",
"39001.10004.Visit.1.Baseline.Day.0.1h..", "39001.10005.Visit.1.Baseline",
"39001.10006.Visit.1.Baseline.Day.0.1h..", "39001.10007.Visit.1.Baseline",
"39001.10008.Visit.1.Baseline", "39001.10009.Visit.1.Baseline",
"39001.10010.Visit.1.Baseline.Day.0.1h..", "39002.10001.Visit.1.Baseline.Day.0.1h..",
"39003.10001.Visit.1.Baseline", "39004.10001.Visit.1.Baseline.Day.0.1h..",
"39004.10002.Visit.1.Baseline.Day.0.1h..", "39004.10003.Visit.1.Baseline.Day.0.1h..",
"39005.10001.Visit.1.Baseline.Day.0.1h..", "39005.10002.Visit.1.Baseline.Day.0.1h..",
"39006.10001.Visit.1.Baseline.Day.0.1h..", "39006.10002.Visit.1.Baseline.Day.0.1h..",
"39006.10003.Visit.1.Baseline.Day.0.1h..", "39006.10004.Visit.1.Baseline.Day.0.1h..",
"39006.10005.Visit.1.Baseline.Day.0.1h..", "39006.10006.Visit.1.Baseline.Day.0.1h..",
"39007.10001.Visit.1.Baseline.Day.0.1h..", "39007.10002.Visit.1.Baseline.Day.0.1h..",
"39007.10003.Visit.1.Baseline.Day.0.1h..", "39010.10001.Visit.1.Baseline.Day.0.1h..",
"40001.10002.Visit.1.Baseline.Day.0.1h..", "40001.10003.Visit.1.Baseline.Day.0.1h..",
"40001.10005.Visit.1.Baseline", "40001.10006.Visit.1.Baseline",
"40001.10007.Visit.1.Baseline", "40001.10011.Visit.1.Baseline.Day.0.1h..",
"40001.10013.Visit.1.Baseline", "40001.10014.Visit.1.Baseline",
"40001.10015.Visit.1.Baseline.Day.0.1h..", "40001.10016.Visit.1.Baseline.Day.0.1h..",
"40001.10017.Visit.1.Baseline.Day.0.1h..", "40001.10019.Visit.1.Baseline.Day.0.1h..",
"40002.10001.Visit.1.Baseline.Day.0.1h..", "40002.10002.Visit.1.Baseline.Day.0.1h..",
"40002.10003.Visit.1.Baseline", "40002.10004.Visit.1.Baseline.Day.0.1h..",
"40002.10005.Visit.1.Baseline", "40002.10006.Visit.1.Baseline",
"40002.10007.Visit.1.Baseline.Day.0.1h..", "40002.10008.Visit.1.Baseline",
"40002.10009.Visit.1.Baseline.Day.0.1h..", "40002.10010.Visit.1.Baseline",
"40002.10012.Visit.1.Baseline", "40002.10013.Visit.1.Baseline.Day.0.1h..",
"40002.10014.Visit.1.Baseline", "40002.10015.Visit.1.Baseline.Day.0.1h..",
"40002.10016.Visit.1.Baseline", "40002.10017.Visit.1.Baseline",
"40003.10001.Visit.1.Baseline.Day.0.1h..", "40003.10002.Visit.1.Baseline.Day.0.1h..",
"40003.10003.Visit.1.Baseline.Day.0.1h..", "40003.10004.Visit.1.Baseline.Day.0.1h..",
"40003.10005.Visit.1.Baseline.Day.0.1h..", "40003.10006.Visit.1.Baseline",
"40003.10007.Visit.1.Baseline.Day.0.1h..", "40003.10008.Visit.1.Baseline.Day.0.1h..",
"40003.10009.Visit.1.Baseline", "40003.10010.Visit.1.Baseline.Day.0.1h..",
"40003.10011.Visit.1.Baseline.Day.0.1h..", "40003.10012.Visit.1.Baseline.Day.0.1h..",
"40003.10013.Visit.1.Baseline", "40003.10014.Visit.1.Baseline.Day.0.1h..",
"40003.10015.Visit.1.Baseline.Day.0.1h..", "40003.10016.Visit.1.Baseline.Day.0.1h..",
"41001.10001.Visit.1.Baseline.Day.0.1h..", "41001.10002.Visit.1.Baseline.Day.0.1h..",
"41001.10003.Visit.1.Baseline.Day.0.1h..", "41002.10001.Visit.1.Baseline.Day.0.1h..",
"41004.10001.Visit.1.Baseline.Day.0.1h..", "42001.10001.Visit.1.Baseline.Day.0.1h..",
"42001.10002.Visit.1.Baseline.Day.0.1h..", "42001.10004.Visit.1.Baseline.Day.0.1h..",
"42001.10005.Visit.1.Baseline.Day.0.1h..", "42001.10006.Visit.1.Baseline.Day.0.1h..",
"42001.10007.Visit.1.Baseline.Day.0.1h..", "42001.10008.Visit.1.Baseline.Day.0.1h..",
"42002.10001.Visit.1.Baseline.Day.0.1h..", "42002.10002.Visit.1.Baseline.Day.0.1h..",
"42002.10003.Visit.1.Baseline", "42002.10004.Visit.1.Baseline.Day.0.1h..",
"42003.10001.Visit.1.Baseline.Day.0.1h..", "42003.10002.Visit.1.Baseline.Day.0.1h..",
"42003.10004.Visit.1.Baseline.Day.0.1h..", "42004.10001.Visit.1.Baseline",
"42004.10002.Visit.1.Baseline", "42004.10003.Visit.1.Baseline.Day.0.1h..",
"42004.10004.Visit.1.Baseline.Day.0.1h..", "42005.10001.Visit.1.Baseline.Day.0.1h..",
"42005.10002.Visit.1.Baseline.Day.0.1h..", "42005.10003.Unscheduled.Visit.R.Day.0.7h..",
"42005.10004.Visit.1.Baseline.Day.0.1h..", "42005.10005.Visit.1.Baseline.Day.0.1h..",
"42005.10006.Visit.1.Baseline", "42005.10007.Visit.1.Baseline.Day.0.1h..",
"42005.10008.Visit.1.Baseline.Day.0.1h..", "43001.10001.Visit.1.Baseline.Day.0.1h..",
"43002.10001.Visit.1.Baseline.Day.0.1h..", "43002.10002.Visit.1.Baseline.Day.0.1h..",
"43003.10001.Visit.1.Baseline", "44003.10001.Visit.1.Baseline",
"44005.10002.Visit.1.Baseline.Day.0.1h..", "44005.10003.Visit.1.Baseline.Day.0.1h..",
"44008.10006.Visit.1.Baseline.Day.0.1h..", "44008.10009.Visit.1.Baseline.Day.0.1h..",
"44008.10011.Visit.1.Baseline", "44008.10013.Visit.1.Baseline",
"45004.10001.Visit.1.Baseline.Day.0.1h..", "45004.10003.Visit.1.Baseline",
"46001.10001.Visit.1.Baseline.Day.0.1h..", "46001.10002.Visit.1.Baseline.Day.0.1h..",
"46001.10003.Visit.1.Baseline.Day.0.1h..", "46001.10004.Visit.1.Baseline.Day.0.1h..",
"46001.10005.Visit.1.Baseline.Day.0.1h..", "46002.10001.Visit.1.Baseline.Day.0.1h..",
"46002.10003.Visit.1.Baseline.Day.0.1h..", "46004.10001.Visit.1.Baseline.Day.0.1h..",
"46005.10001.Visit.1.Baseline.Day.0.1h..", "46005.10003.Visit.1.Baseline.Day.0.1h..",
"48002.10001.Visit.1.Baseline.Day.0.1h..", "48002.10002.Visit.1.Baseline.Day.0.1h..",
"48003.10001.Visit.1.Baseline.Day.0.1h..", "48003.10002.Visit.1.Baseline.Day.0.1h..",
"48003.10003.Visit.1.Baseline.Day.0.1h..", "48003.10004.Visit.1.Baseline.Day.0.1h..",
"48004.10001.Visit.1.Baseline.Day.0.1h..", "48004.10003.Visit.1.Baseline.Day.0.1h..",
"48004.10005.Visit.1.Baseline", "48004.10006.Visit.1.Baseline.Day.0.1h..",
"48004.10007.Visit.1.Baseline.Day.0.1h..", "48004.10008.Visit.1.Baseline.Day.0.1h..",
"48004.10009.Visit.1.Baseline.Day.0.1h..", "48004.10011.Visit.1.Baseline.Day.0.1h..",
"48004.10012.Visit.1.Baseline.Day.0.1h..", "48004.10014.Visit.1.Baseline.Day.0.1h..",
"48004.10017.Visit.1.Baseline.Day.0.1h..", "48004.10018.Visit.1.Baseline.Day.0.1h..",
"48004.10019.Visit.1.Baseline.Day.0.1h..", "48004.10020.Visit.1.Baseline.Day.0.1h..",
"48004.10021.Visit.1.Baseline.Day.0.1h..", "48004.10022.Visit.1.Baseline.Day.0.1h..",
"48004.10023.Visit.1.Baseline.Day.0.1h..", "48008.10001.Visit.1.Baseline.Day.0.1h..",
"48008.10002.Visit.1.Baseline.Day.0.1h..", "48011.10001.Visit.1.Baseline.Day.0.1h..",
"48011.10002.Visit.1.Baseline", "48012.10002.Visit.1.Baseline.Day.0.1h..",
"48012.10004.Visit.1.Baseline.Day.0.1h..", "48012.10005.Visit.1.Baseline.Day.0.1h..",
"49001.10001.Unscheduled.Visit.R.Day.0.7h..", "49001.10002.Visit.1.Baseline.Day.0.1h..",
"49006.10002.Visit.1.Baseline.Day.0.1h..", "49006.10003.Visit.1.Baseline.Day.0.1h..",
"49006.10006.Visit.1.Baseline", "49006.10007.Visit.1.Baseline.Day.0.1h..",
"49006.10008.Visit.1.Baseline.Day.0.1h..", "49006.10009.Visit.1.Baseline.Day.0.1h..",
"49008.10001.Visit.1.Baseline.Day.0.1h..", "49008.10002.Visit.1.Baseline.Day.0.1h..",
"49008.10003.Visit.1.Baseline.Day.0.1h..", "49011.10001.Visit.1.Baseline.Day.0.1h..",
"49011.10002.Visit.1.Baseline.Day.0.1h..", "49011.10003.Visit.1.Baseline.Day.0.1h..",
"49012.10001.Visit.1.Baseline.Day.0.1h..", "49016.10002.Visit.1.Baseline.Day.0.1h..",
"70001.10001.Visit.1.Baseline.Day.0.1h..", "70001.10002.Visit.1.Baseline.Day.0.1h..",
"70001.10003.Visit.1.Baseline.Day.0.1h..", "70001.10004.Visit.1.Baseline.Day.0.1h..",
"70001.10005.Visit.1.Baseline.Day.0.1h..", "70001.10006.Visit.1.Baseline.Day.0.1h..",
"70001.10007.Visit.1.Baseline.Day.0.1h..", "70001.10008.Visit.1.Baseline.Day.0.1h..",
"70003.10001.Visit.1.Baseline.Day.0.1h..", "70003.10002.Visit.1.Baseline.Day.0.1h..",
"70003.10003.Visit.1.Baseline.Day.0.1h..", "70003.10004.Visit.1.Baseline.Day.0.1h..",
"70003.10005.Visit.1.Baseline.Day.0.1h..", "70003.10006.Visit.1.Baseline.Day.0.1h..",
"90002.10001.Visit.1.Baseline.Day.0.1h..", "90003.10001.Visit.1.Baseline.Day.0.1h..",
"90003.10002.Visit.1.Baseline.Day.0.1h..", "90003.10003.Visit.1.Baseline.Day.0.1h..",
"90003.10004.Visit.1.Baseline.Day.0.1h..", "90005.10001.Visit.1.Baseline.Day.0.1h..",
"90005.10002.Visit.1.Baseline.Day.0.1h.."), class = "factor"),
Simp = c(0.562967424, 0.771395613, 0.720549673, 0.520301987,
0.498477511), Day = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "D0", class = "factor"),
Visit = structure(c(2L, 2L, 2L, 2L, 2L), .Label = c("U",
"V1"), class = "factor"), Group = structure(c(1L, 1L, 1L,
1L, 1L), .Label = "1_", class = "factor"), Timepoints = structure(c(1L,
1L, 1L, 1L, 1L), .Label = "1_D0", class = "factor"), Total.Sequence = c(16038L,
24250L, 13939L, 28722L, 19665L), specnumber = c(49L, 33L,
29L, 20L, 20L), Shan = c(1.237756795, 1.670299627, 1.617010117,
0.985164005, 0.960982468), TREATMENT = structure(c(2L, 1L,
1L, 1L, 1L), .Label = c("FIDAXOMICIN", "VANCOMYCIN"), class = "factor"),
SUBJID = c(3e+09, 3e+09, 3e+09, 3e+09, 3e+09), ARM = structure(c(2L,
1L, 1L, 1L, 1L), .Label = c("FIDAXOMICIN", "VANCOMYCIN"), class = "factor"),
TRT01PN = c(2L, 1L, 1L, 1L, 1L), SAFFL = structure(c(2L,
2L, 2L, 2L, 2L), .Label = c("N", "Y"), class = "factor"),
MFASFL = structure(c(2L, 2L, 2L, 2L, 2L), .Label = c("N",
"Y"), class = "factor"), MEASFL = structure(c(2L, 2L, 2L,
1L, 2L), .Label = c("N", "Y"), class = "factor"), SEX = structure(c(1L,
1L, 2L, 1L, 1L), .Label = c("F", "M"), class = "factor"),
AGE = c(86L, 76L, 60L, 83L, 85L), CRFL1 = structure(c(1L,
1L, 1L, 2L, 1L), .Label = c("Y", "N"), class = c("ordered",
"factor")), CRFL1_VF_YN = structure(c(4L, 1L, 2L, 1L, 2L), .Label = c("FIDAXOMICIN_N",
"FIDAXOMICIN_Y", "VANCOMYCIN_N", "VANCOMYCIN_Y"), class = "factor")), .Names = c("Datasets",
"Simp", "Day", "Visit", "Group", "Timepoints", "Total.Sequence",
"specnumber", "Shan", "TREATMENT", "SUBJID", "ARM", "TRT01PN",
"SAFFL", "MFASFL", "MEASFL", "SEX", "AGE", "CRFL1", "CRFL1_VF_YN"
), row.names = c(NA, 5L), class = "data.frame")
> head(Simp_Shan_Baseline_Grp[1:5,1:20])
Datasets Simp Day Visit Group
1 30001.10001.Visit.1.Baseline.Day.0.1h.. 0.5629674 D0 V1 1_
2 30001.10002.Visit.1.Baseline.Day.0.1h.. 0.7713956 D0 V1 1_
3 30001.10003.Visit.1.Baseline.Day.0.1h.. 0.7205497 D0 V1 1_
4 30001.10004.Visit.1.Baseline.Day.0.1h.. 0.5203020 D0 V1 1_
5 30001.10005.Visit.1.Baseline.Day.0.1h.. 0.4984775 D0 V1 1_
Timepoints Total.Sequence specnumber Shan TREATMENT SUBJID
1 1_D0 16038 49 1.2377568 VANCOMYCIN 3e+09
2 1_D0 24250 33 1.6702996 FIDAXOMICIN 3e+09
3 1_D0 13939 29 1.6170101 FIDAXOMICIN 3e+09
4 1_D0 28722 20 0.9851640 FIDAXOMICIN 3e+09
5 1_D0 19665 20 0.9609825 FIDAXOMICIN 3e+09
ARM TRT01PN SAFFL MFASFL MEASFL SEX AGE CRFL1 CRFL1_VF_YN
1 VANCOMYCIN 2 Y Y Y F 86 Y VANCOMYCIN_Y
2 FIDAXOMICIN 1 Y Y Y F 76 Y FIDAXOMICIN_N
3 FIDAXOMICIN 1 Y Y Y M 60 Y FIDAXOMICIN_Y
4 FIDAXOMICIN 1 Y Y N F 83 N FIDAXOMICIN_N
5 FIDAXOMICIN 1 Y Y Y F 85 Y FIDAXOMICIN_Y
Now the boxplot I tried:
ggplot(data = Simp_Shan_Baseline_Grp, aes(x=CRFL1, y=Shan)) + geom_boxplot(aes(fill=TREATMENT)) + stat_summary(fun.data = give.n, geom = "text", fun.y = median,
As you see only two factors as Y/N has been added. But I need count for each box.
Further Yes I have tried moving the aesthetics in ggplot solves the problem. Now I have used the code ggplot(data = Simp_Shan_Baseline_Grp, aes(x=CRFL1, y=Shan, colour = factor(TREATMENT))) + geom_boxplot() + stat_summary(fun.data = give.n, geom = "text", fun.y = median, position = position_dodge(width = 0.75)).
Now the resulting plot looks like:
enter image description here I don't manage to get fill colour boxes. Like my previous image.
Ideally I like to have fill colour and text in black for each box.
Can anybody please help me?
Thank you,
Mitra
You could use the following code
# load lib
library(ggplot2)
library(dplyr)
# define UDF give.n
give.n <- function(x){
return(c(y = median(x)*1.05, label = length(x)))
# experiment with the multiplier to find the perfect position
}
## some playing around with the original data
## if not changed it gives a very limited output
## first select only needed columns
Simp_Shan_Baseline_Grp2 <- Simp_Shan_Baseline_Grp %>%
select(CRFL1, Shan, TREATMENT)
## then duplicate data given, and change some values
Simp_Shan_Baseline_Grp2 <- rbind(Simp_Shan_Baseline_Grp2, Simp_Shan_Baseline_Grp2)
Simp_Shan_Baseline_Grp2$CRFL1 <- factor(rep(c("Y", "N"), 5))
Simp_Shan_Baseline_Grp2$TREATMENT[3] <- "VANCOMYCIN"
Simp_Shan_Baseline_Grp2$TREATMENT[8] <- "VANCOMYCIN"
# shift the fill-aes to the initial ggplot call
# add position adjustment
ggplot(data = Simp_Shan_Baseline_Grp2, aes(x=CRFL1, y=Shan, fill=TREATMENT)) +
geom_boxplot() +
stat_summary(fun.data = give.n, geom = "text", fun.y = median,
position = position_dodge(width = .750) )
This yields the following plot:
Related
Credibility interval with respect two factors using ggplot2 in r
I have problem ploting credibility interval like this: My data structure is following,L1,L2,M,U1,U2 stand for 0.025quant,0.25quant,0.5quant,0.75quant,0.975quant,respectively. ` structure(list(approach = structure(c(1L, 2L, 1L, 2L, 1L, 2L), class = "factor", .Label = c("INLA", "rjags")), param = structure(c(1L, 2L, 3L, 1L, 2L, 3L), class = "factor", .Label = c("alpha", "beta", "sig2")), L1 = c(0.0844546867936143, 1.79242348175439, 0.163143886545317, 0.0754165380733685, 1.79067991488052, 3.66675821267498 ), L2 = c(0.60090835904286, 1.95337968870806, 0.898159977552433, 0.606017177641373, 1.95260448314298, 4.07080184844179), M = c(0.870204161297956, 2.03768437879748, 2.20651061559405, 0.87408237273113, 2.03725552264872, 4.32531027636171), U2 = c(1.13905085248391, 2.12210930874551, 4.26836270504725, 1.66260576926063, 2.28900567640091, 5.10063756831338 ), U1 = c(1.65214011950274, 2.28396345192398, 4.9109804477583, 1.1450384685802, 2.12117799328209, 4.55657971279654), AP = structure(c(1L, 4L, 5L, 2L, 3L, 6L), .Label = c("INLA.alpha", "rjags.alpha", "INLA.beta", "rjags.beta", "INLA.sig2", "rjags.sig2"), class = "factor")), .Names = c("approach", "param", "L1", "L2", "M", "U2", "U1", "AP"), row.names = c(NA, -6L), class = "data.frame")` I referenced this answerenter link description here,but 'fill' seems only work for boxplot case.the code I tried so far is: CI$AP=interaction(CI$approach,CI$param) p=ggplot(CI,aes(y=AP))+geom_point(aes(x=M)) p=p+geom_segment(aes(x=L1,xend=U1,y=AP,yend=AP)) p=p+geom_segment(aes(x=L2,xend=U2,y=AP,yend=AP),size=1.5) It is far away from what I want. Many thanks!
How about the following: ggplot(df, aes(x = param, y = M, colour = approach)) + geom_point(position = position_dodge2(width = 0.3), size = 3) + geom_linerange( aes(ymin = L2, ymax = U2, x = param), position = position_dodge2(width = 0.3), size = 2) + geom_linerange( aes(ymin = L1, ymax = U1, x = param), position = position_dodge2(width = 0.3), size = 1) + coord_flip() + labs(x = "Parameter", y = "Estimate") Sample data df <- structure(list(approach = structure(c(1L, 2L, 1L, 2L, 1L, 2L), class = "factor", .Label = c("INLA", "rjags")), param = structure(c(1L, 2L, 3L, 1L, 2L, 3L), class = "factor", .Label = c("alpha", "beta", "sig2")), L1 = c(0.0844546867936143, 1.79242348175439, 0.163143886545317, 0.0754165380733685, 1.79067991488052, 3.66675821267498 ), L2 = c(0.60090835904286, 1.95337968870806, 0.898159977552433, 0.606017177641373, 1.95260448314298, 4.07080184844179), M = c(0.870204161297956, 2.03768437879748, 2.20651061559405, 0.87408237273113, 2.03725552264872, 4.32531027636171), U2 = c(1.13905085248391, 2.12210930874551, 4.26836270504725, 1.66260576926063, 2.28900567640091, 5.10063756831338 ), U1 = c(1.65214011950274, 2.28396345192398, 4.9109804477583, 1.1450384685802, 2.12117799328209, 4.55657971279654), AP = structure(c(1L, 4L, 5L, 2L, 3L, 6L), .Label = c("INLA.alpha", "rjags.alpha", "INLA.beta", "rjags.beta", "INLA.sig2", "rjags.sig2"), class = "factor")), .Names = c("approach", "param", "L1", "L2", "M", "U2", "U1", "AP"), row.names = c(NA, -6L), class = "data.frame")
Interpretation of error in sem.coef
I am trying to run a sem with a random effect in piecewiseSEM. My model runs with no error, and sem.fit() also runs with no error or warnings. However, when I run sem.coefs() I get the following warning: 1: In if (grepl("cbind", deparse(formula(x)))) all.vars(formula(x))[-c(1:2)] else all.vars(formula(x)) : the condition has length > 1 and only the first element will be used Any ideas what this warning is about or what it means? Given it's a warning and not an error, the code still runs and give me estimates, but can I trust the estimates? Thanks! EDIT #code: library(piecewiseSEM) library(nlme) avg.forb<-list( lme(nitrogen_variation~nat+impervious+precip.variation,random=~1|site/species,control = lmeControl(opt = "optim"),forb), lme(po4_variation~nat+impervious+precip.variaton,random=~1|site/species,control = lmeControl(opt = "optim"),forb), lme(nitrogen~nat +impervious+precip.variation,random=~1|site/species,control = lmeControl(opt = "optim"), forb), lme(po4 ~nat +impervious+precip.variation,random=~1|site/species,control = lmeControl(opt = "optim"),forb), lme(avg.height~nat+impervious+po4+po4_variation+nitrogen+nitrogen_variation+precip.variation+n_i, random=~1|site/species,control =lmeControl(opt="optim"),forb), lme(avg.culms~nat+impervious+po4+po4_variation+nitrogen+nitrogen_variation+precip.variation+n_i,random=~1|site/species,control = lmeControl(opt = "optim"), forb), lme(avg.chloro~nat+impervious+po4+po4_variation+nitrogen+nitrogen_variation+precip.variation+n_i,random=~1|site/species, control =lmeControl(opt="optim"),forb), lme(avg.sla~nat+impervious+po4+po4_variation+nitrogen+nitrogen_variation+precip.variation+n_i,random=~1|site/species, control = lmeControl(opt = "optim"),forb)) sem.fit(avg.forb, conditional=T, forb) #this code gives the above error message #data subset: structure(list(site = structure(c(1L, 1L, 1L, 2L, 2L, 3L), .Label = c("Baker", "Cronkelton", "Delaware"), class = "factor"), species = structure(c(1L, 4L, 6L, 2L, 3L, 5L), .Label = c("apocynum cannabinum", "aster ericoides", "aster lanceolatus var. interior", "cirsium arvense", "impatiens capensis", "typha angustifolia"), class = "factor"), n_i = structure(c(2L, 1L, 1L, 2L, 2L, 2L), .Label = c("i", "n"), class = "factor"),nat=structure(c(1L, 1L, 1L, 1L, 1L, 2L), .Label = c("1", "2"), class = "factor"), impervious = structure(c(2L, 2L, 2L, 1L, 1L, 1L), .Label = c("1", "2"), class = "factor"), precip_variation = c(70.24882178, 70.24882178, 70.24882178, 21.92460821, 21.92460821, 18.90115299), po4 = c(-2.203425667, -2.204119983, -2.20481541, -1.845271793, -1.844967771, -2.417936637), po4_variation = c(0.8011, 0.801, 0.8009, 0.4839, 0.484, 0.5229), nitrogen = c(0.00627, 0.00626, 0.00625, 0.00432, 0.00433, 0.01018), nitrogen_variation = c(0.7739, 0.7738, 0.7737, 0.5435, 0.5436, -0.1251), avg.height = c(99.1, 113.5559506, 191.4111012, 73.72222025, 35.42222025, 59.52222025), avg.culms = c(0.492915384, 0.78612011, 0.884606749, 0.96483549, 0.819543936, 0.831087338), avg.sla = c(179.3510333, 149.0332471, 68.77888941, 334.2177912, 798.7581389, 443.2005556), avg.chloro = c(0.900670513, 0.790832282, 0.965532685, 0.565585484, 1.106203493, 0.970209082)), .Names = c("site", "species", "n_i", "nat", "impervious", "precip_variation", "po4", "po4_variation", "nitrogen", "nitrogen_variation", "avg.height", "avg.culms", "avg.sla", "avg.chloro"), row.names = c(NA, 6L), class = "data.frame")
creating a factor-based in dendrogram with R and ggplot2
This is not so much a coding as general approach call for help ;-) I prepared a table containing taxonomic information about organisms. But I want to use the "names" of these organisms, so no values or anything where you could compute a distance or clustering with (this is also all the information I have). I just want to use these factors to create a plot that shows the relationship. My data looks like this: test2<-structure(list(genus = structure(c(4L, 2L, 7L, 8L, 6L, 1L, 3L, 5L, 5L), .Label = c("Aminobacter", "Bradyrhizobium", "Hoeflea", "Hyphomonas", "Mesorhizobium", "Methylosinus", "Ochrobactrum", "uncultured"), class = "factor"), family = structure(c(4L, 1L, 2L, 3L, 5L, 6L, 6L, 6L, 6L), .Label = c("Bradyrhizobiaceae", "Brucellaceae", "Hyphomicrobiaceae", "Hyphomonadaceae", "Methylocystaceae", "Phyllobacteriaceae"), class = "factor"), order = structure(c(1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Caulobacterales", "Rhizobiales"), class = "factor"), class = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Alphaproteobacteria", class = "factor"), phylum = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Proteobacteria", class = "factor")), .Names = c("genus", "family", "order", "class", "phylum"), class = "data.frame", row.names = c(NA, 9L)) is it necessary to set up artificial values to describe a distance between the levels?
Here is an attempt using data.tree library First create a string variable in the form: Proteobacteria/Alphaproteobacteria/Caulobacterales/Hyphomonadaceae/Hyphomonas library(data.tree) test2$pathString <- with(test2, paste(phylum, class, order, family, genus, sep = "/")) tree_test2 = as.Node(test2) plot(tree_test2) many things can be done after like: Interactive network: library(networkD3) test2_Network <- ToDataFrameNetwork(tree_test2, "name") simpleNetwork(test2_Network) or graph styled library(igraph) plot(as.igraph(tree_test2, directed = TRUE, direction = "climb")) check out the vignette using ggplot2: library(ggraph) graph = as.igraph(tree_test2, directed = TRUE, direction = "climb") ggraph(graph, layout = 'kk') + geom_node_text(aes(label = name))+ geom_edge_link(arrow = arrow(type = "closed", ends = "first", length = unit(0.20, "inches"), angle = 15)) + geom_node_point() + theme_graph()+ coord_cartesian(xlim = c(-3,3), expand = TRUE) or perhaps: ggraph(graph, layout = 'kk') + geom_node_text(aes(label = name), repel = T)+ geom_edge_link(angle_calc = 'along', end_cap = circle(3, 'mm'))+ geom_node_point(size = 5) + theme_graph()+ coord_cartesian(xlim = c(-3,3), expand = TRUE)
plotting with different color between old and updated data
I reformulated my question and hope it is a bit clearer now: Here is my data: ID Type X Y Sex a1 Test -12.12609861 208.6810478 XY a2 Test -1.32366642 63.0574351 XXY a3 Test -9.02867948 114.1501293 XY b4 NewTest 0.01101428 0.87207664 XX b5 Test -1.14651604 -0.86714741 XX b6 Test -13.05848944 155.5109551 XY x7 NewTest -4.74479593 80.82528931 XY x8 Test -8.17386444 124.4765311 XY x9 Test 1.14870262 -0.36606683 XX x10 Test 1.20879037 0.80972607 XX x11 Test -1.04261274 0.35654895 XX x12 Test -11.73602 185.5326725 XY I would like to plot the data with different color according to whether the data is new or old. The new data is added daily or weekly so the color change needs to be dynamic. N.B the new data always start with "newTest" in the column "TYPE" The code: for_loop_start<- (nrow(whole_data)-1) len_of_whole_data<- nrow(whole_data) for (j in c(for_loop_start:1)){ if (whole_data[j,2] == "NewTest"){ break } } new_data <- with(whole_data,whole_data[j:len_of_whole_data,]) > p <- ggplot(data=whole_data,aes(x=X,y=Y)) + geom_point(colour = "black") > ggplotly(p) > p <- p + geom_point(data= new_data, mapping=aes(x=X,y=Y,text=SampleID,colour = "darkgoldenrod2")) > ggplotly(p)
Answer to edited version of the question If the last "NewType" value in your "Type" column consistently starts "new data", this should work: dat <- structure(list(ID = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 10L, 11L, 12L, 7L, 8L, 9L), .Label = c("a1", "a2", "a3", "b4", "b5", "b6", "x10", "x11", "x12", "x7", "x8", "x9"), class = "factor"), Type = structure(c(2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L), .Label = c("NewTest", "Test"), class = "factor"), X = c(-12.12609861, -1.32366642, -9.02867948, 0.01101428, -1.14651604, -13.05848944, -4.74479593, -8.17386444, 1.14870262, 1.20879037, -1.04261274, -11.73602), Y = c(208.6810478, 63.0574351, 114.1501293, 0.87207664, -0.86714741, 155.5109551, 80.82528931, 124.4765311, -0.36606683, 0.80972607, 0.35654895, 185.5326725 ), Sex = structure(c(3L, 2L, 3L, 1L, 1L, 3L, 3L, 3L, 1L, 1L, 1L, 3L), .Label = c("XX", "XXY", "XY"), class = "factor")), .Names = c("ID", "Type", "X", "Y", "Sex"), class = "data.frame", row.names = c(NA, -12L)) lim.id <- max(which(dat$Type == "NewTest")) - 1 dat$Age <- c(rep("old", lim.id), rep("new", nrow(dat) - lim.id)) ggplot(dat, aes(x=X, y=Y, color = Age)) + geom_point() + scale_color_manual(values = c("darkgoldenrod2", "black")) Old answer You could try to make a script that reads the modification time (see ?file.mtime) and use that to make a column, which specifies whether the entry is "new" or "old" dat <- structure(list(ID = 1:12, Type = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L), .Label = c("control", "Test" ), class = "factor"), X = c(-12.12609861, -1.32366642, -9.02867948, 0.01101428, -1.14651604, -13.05848944, -4.74479593, -8.17386444, 1.14870262, 1.20879037, -1.04261274, -11.73602), Y = c(208.6810478, 63.0574351, 114.1501293, 0.87207664, -0.86714741, 155.5109551, 80.82528931, 124.4765311, -0.36606683, 0.80972607, 0.35654895, 185.5326725), Sex = structure(c(3L, 2L, 3L, 1L, 1L, 3L, 3L, 3L, 1L, 1L, 1L, 3L), .Label = c("XX", "XXY", "XY"), class = "factor")), .Names = c("ID", "Type", "X", "Y", "Sex"), class = "data.frame", row.names = c(NA, -12L)) dat$Time <- seq(as.Date("2017-07-12"), as.Date("2017-06-12"), length = nrow(dat)) dat$Time.type <- ifelse(as.Date(Sys.time()) - dat$Time < 12, "new", "old") library(ggplot2) ggplot(dat, aes(x=X, y=Y, color = Time.type)) + geom_point() + scale_color_manual(values = c("black", "darkgoldenrod2"))
you could also just set a variable which defines on which ID to split your dataframe for plotting (supposing df1 is your dataframe): lim.id <- 7 #here you can put whatever value you would like to split your data.frame on plot1 <- ggplot() + geom_point(data = df1[df1$ID < lim.id, ], aes(x = X, y = Y), colour = "black") plot1 <- plot1 + geom_point(data = df1[df1$ID >= lim.id, ], aes(x = X, y = Y), colour = "darkgoldenrod2") plot2 <- ggplotly(plot2)
plotting 3 variables on a single plot in ggplot2
Hi have an experiment which consists of three variables, and I would like to plot them all on a single plot. This is my df: AB <- data.frame(block=c("A", "A", "A", "A", "B", "B", "B", "B" ), familiarity=c("fam", "fam", "unfam", "unfam" ), prime=c("P", "UP" ), RT=c("570.6929", "628.7446", "644.6268", "607.4312", "556.3581", "645.4821", "623.5624", "604.4113")) Right now I can only break one of the variables into two separate plots, like this where A and B are the two levels of the third variable: A <- AB[which(AB$block == "A"),] B <- AB[which(AB$block == "B"),] pa <- ggplot(data=A, aes(x=prime, y=RT, group=familiarity)) + geom_line(aes(linetype=familiarity), size=1) + expand_limits(y=c(500,650)) pb <- ggplot(data=B, aes(x=prime, y=RT, group=familiarity)) + geom_line(aes(linetype=familiarity), size=1) + expand_limits(y=c(500,650)) I would like to superimpose plot A over plot B, and have this third variables to be identified by color. Any ideas?
Is this what you mean? p_all <- ggplot(AB, aes(x=prime,y=RT,group=interaction(familiarity,block))) + geom_line(aes(linetype=familiarity,color=block)) Data used: AB <- structure(list(block = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"), familiarity = structure(c(1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L), class = "factor", .Label = c("fam", "unfam")), prime = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L ), class = "factor", .Label = c("P", "UP")), RT = c(570.6929, 628.7446, 644.6268, 607.4312, 556.3581, 645.4821, 623.5624, 604.4113 )), .Names = c("block", "familiarity", "prime", "RT"), row.names = c(NA, -8L), class = "data.frame")
IF you have different datasets for those variables, then you can specify the data ggplot()+ geom_line(data=A, aes(x=prime, y=RT, group=familiarity,linetype=familiarity), size=1) + geom_line(data=B, aes(x=prime, y=RT, group=familiarity,linetype=familiarity), size=1)+ expand_limits(y=c(500,650))