Robust Independent T-test - r

This is my first time asking a question, so I apologize for any formatting issues or anything that makes this difficult to answer. Please let me know what I need to add to be able to the answer question.
I'm attempting to compare differences between 2 unequal group sizes (one ~ 97 the other ~ 714). The reason for the large discrepancy is I am looking at a program done by one class to see if it is significantly different than what has occurred in previous classes. I've been reading about robust stats recently and decided to use a yuen bootstrap in R-Studio from the WRS2 package for a more valid comparison, especially with the difference in sample size.
My formula is
yuenbt(DataExample$PT500 ~ DataExample3$ClassPT500, tr = 0.2, nboot = 599, side = TRUE)
and it returns
Call:
yuenbt(formula = DataExample$PT500 ~ DataExample$ClassPT500,
tr = 0.2, nboot = 599, side = TRUE)
Test statistic: NA (df = NA), p-value = 0
Trimmed mean difference: -65
95 percent confidence interval:
NA NA
The NA's return on other variables that I've tried out as well, or in some cases the confidence interval will state INF. Any ideas why this is happening (such a big difference in sample size?) and suggestions on what the next best step would be are greatly appreciated.
Here is a sample of data:
structure(list(PrePT500 = c(74, 105, 121, 128), PostPT500 = c(191,
264, 327, 314), PT500 = c(117, 159, 206, 186), PrePullups = c(0,
NA, NA, 2), PostPullups = c(3, NA, NA, 3), Pullups = c(3, NA,
NA, 1), PreSitups = c(46, 40, 25, 33), PostSitups = c(41, 61,
39, 49), Situps = c(-5, 21, 14, 16), PreMC = c(8, 16, 29, 19),
PostMC = c(41, 45, 60, 60), MC = c(33, 29, 31, 41), PrePushups = c(20,
16, 28, 30), PostPushups = c(40, 47, 50, 50), Pushups = c(20,
31, 22, 20), Pre1.5 = c(1048, 917, 902, 905), Post1.5 = c(846,
748, 696, 760), X1.5 = c(-202, -169, -206, -145), Pre220 = c(43,
50, 41, 45), Post220 = c(39, 40, 32, 34), X220 = c(-4, -10,
-9, -11), PreAgility = c(20.96, NA, 21.1, 19.88), PostAgility = c(19.69,
NA, 18.8, 20.79), Agility = c(-1.27, NA, -2.3, 0.91), PreBD = c(6.17,
7.82, 5.08, 7), PostBD = c(5, 4.87, 4.68, 6.2), BD = c(-1.17,
-2.95, -0.4, -0.8), PreCL = c(7.05, 13.6, 14.4, 8.8), PostCL = c(8.1,
8.9, 8.27, 7.6), CL = c(1.05, -4.7, -6.13, -1.2), PreSW = c(10.2,
NA, 20.34, 8), PostSW = c(11.4, NA, 9.3, 7.4), SW = c(1.2,
NA, -11.04, -0.6), Pre500 = c(115, 128, 107, 114), Post500 = c(105,
112, 93, 99), X500 = c(-10, -16, -14, -15), PreTotal = c(446,
91, 255, NA), PostTotal = c(493, 439, 503, NA), Total = c(47,
348, 248, NA), ClassPrePT500 = c(338, 213, 215, 243), ClassPostPT500 = c(430,
396, 333, 314), ClassPT500 = c(92, 183, 118, 71), ClassPrePullups = c(6,
5, 2, 0), ClassPostPullups = c(13, 7, 15, 0), ClassPullups = c(7,
2, 13, 0), ClassPreSitups = c(59, 42, 45, 53), ClassPostSitups = c(75,
70, 51, 53), ClassSitups = c(16, 28, 6, 0), ClassPreMC = c(60,
43, 31, 48), ClassPostMC = c(60, 60, 31, 60), ClassMC = c(0,
17, 0, 12), ClassPrePushups = c(50, 37, 26, 30), ClassPostPushups = c(50,
50, 47, 34), ClassPushups = c(0, 13, 21, 4), ClassPre1.5 = c(803,
810, 803, 741), ClassPost1.5 = c(700, 690, 664, 661), Class1.5 = c(-103,
-120, -139, -80), ClassPre220 = c(32, 41, 31, 40), ClassPost220 = c(31,
33, 30, 37), Class220 = c(-1, -8, -1, -3), ClassPreAgility = c(19,
23, 18, 22.1), ClassPostAgility = c(16.4, 18, 16.5, 20.3),
ClassAgility = c(-2.6, -5, -1.5, -1.8), ClassPreBD = c(6.4,
8.5, 5.8, 11.2), ClassPostBD = c(5.3, 5.8, 5.5, 7.5), ClassBD = c(-1.1,
-2.7, -0.3, -3.7), ClassPreCL = c(7.8, 9.3, 7.3, 9.6), ClassPostCL = c(7.6,
7.4, 7.4, 9.2), ClassCL = c(-0.2, -1.9, 0.100000000000001,
-0.4), ClassPreSW = c(8.5, 8.4, 7.7, NA), ClassPostSW = c(7.8,
8.1, 7.6, 8), ClassSW = c(-0.7, -0.300000000000001, -0.100000000000001,
NA), ClassPre500 = c(102, 104, 100, 108), ClassPost500 = c(94,
88, 98, 101), Class500 = c(-8, -16, -2, -7), ClassPreTotal = c(495,
418, 528, 264), ClassPostTotal = c(561, 539, 562, 482), ClassTotal = c(66,
121, 34, 218)), row.names = c(NA, -4L), class = c("tbl_df",
"tbl", "data.frame"))
Thank you in advance for any help.

The R function
yuenbt(x, y, tr=0.2, alpha=0.05, nboot=599, side=F) computes a 1 − α confidence interval for μt 1 − μt 2 using the bootstrap-t method, where the default amount of trimming (tr) is 0.2, the default value for α is 0.05, and the default value
for nboot (B) is 599. So far, simulations suggest that in terms of probability coverage, there is little or no advantage to using B > 599 when α = 0.05. However, there is no recommended choice for B when α < 0.05 simply because little is known about how the bootstrap-t performs for this special case. Finally, the default value for side is FALSE, indicating that the equal-tailed two-sided confidence interval is to be used. Using side=TRUE results in the symmetric two-sided confidence interval.
Try:
yuenbt(DataExample$PT500, DataExample3$ClassPT500, tr = 0.2, nboot = 599, side = TRUE)

Related

From Boxplot to Barplot in ggplot possible?

I have to do a ggplot barplot with errorbars, Tukey sig. letters for plants grown with different fertilizer concentraitions.
The data should be grouped after the dif. concentrations and the sig. letters should be added automaticaly.
I have already a code for the same problem but for Boxplot - which is working nicely. I tried several tutorials with barplots but I always get the problem; stat_count() can only have an x or y aesthetic.
So I thought, is it possible to get my boxplot code to a barplot code? I tried but I couldnt do it :) And if not - how do I automatically add tukeyHSD Test result sig. letters to a ggplot barplot?
This is my Code for the boxplot with the tukey letters:
    value_max = Dünger, group_by(Duenger.g), summarize(max_value = max(Höhe.cm))
hsd=HSD.test(aov(Höhe.cm~Duenger.g, data=Dünger),
trt = "Duenger.g", group = T) sig.letters <- hsd$groups[order(row.names(hsd$groups)), ]
J <- ggplot(Dünger, aes(x = Duenger.g, y = Höhe.cm))+ geom_boxplot(aes(fill= Duenger.g))+ scale_fill_discrete(labels=c("0.5g", '1g', "2g", "3g", "4g"))+ geom_text(data = value_max, aes(x=Duenger.g, y = 0.1 + max_value, label = sig.letters$groups), vjust=0)+ stat_boxplot(geom = 'errorbar', width = 0.1)+ ggtitle("Auswirkung von Dünger auf die Höhe von Pflanzen") + xlab("Dünger in g") + ylab("Höhe in cm"); J
This is how it looks:
boxplot with tukey
Data from dput:
structure(list(Duenger.g = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,
0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,
0.5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4), plant = c(1, 2, 3, 4, 5, 7, 10, 11, 12, 13, 14, 18, 19,
21, 23, 24, 25, 26, 27, 29, 30, 31, 33, 34, 35, 37, 38, 39, 40,
41, 42, 43, 44, 48, 49, 50, 53, 54, 55, 56, 57, 58, 61, 62, 64,
65, 66, 67, 68, 69, 70, 71, 72, 73, 75, 79, 80, 81, 83, 85, 86,
88, 89, 91, 93, 99, 100, 102, 103, 104, 105, 106, 107, 108, 110,
111, 112, 113, 114, 115, 116, 117, 118, 120, 122, 123, 125, 126,
127, 128, 130, 131, 132, 134, 136, 138, 139, 140, 141, 143, 144,
145, 146, 147, 149), height.cm = c(5.7, 2.8, 5.5, 8, 3.5, 2.5,
4, 6, 10, 4.5, 7, 8.3, 11, 7, 8, 2.5, 7.4, 3, 14.5, 7, 12, 7.5,
30.5, 27, 6.5, 19, 10.4, 12.7, 27.3, 11, 11, 10.5, 10.5, 13,
53, 12.5, 12, 6, 12, 35, 8, 16, 56, 63, 69, 62, 98, 65, 77, 32,
85, 75, 33.7, 75, 55, 38.8, 39, 46, 35, 59, 44, 31.5, 49, 34,
52, 37, 43, 38, 28, 14, 28, 19, 20, 23, 17.5, 32, 16, 17, 24.7,
34, 50, 12, 14, 21, 33, 39.3, 41, 29, 35, 48, 40, 65, 35, 10,
26, 34, 41, 32, 38, 23.5, 22.2, 20.5, 29, 34, 45)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -105L))
Thank you
mirai
A bar chart and a boxplot are two different things. By default geom_boxplot computes the boxplot stats by default (stat="boxplot"). In contrast when you use geom_bar it will by default count the number of observations (stat="count") which are then mapped on y. That's the reason why you get an error. Hence, simply replacing geom_boxplot by geom_bar will not give your your desired result. Instead you could use e.g. stat_summary to create your bar chart with errorbars. Additionally I created a summary dataset to add the labels on the top of the error bars.
library(ggplot2)
library(dplyr)
library(agricolae)
Dünger <- Dünger |>
rename("Höhe.cm" = height.cm) |>
mutate(Duenger.g = factor(Duenger.g))
hsd <- HSD.test(aov(Höhe.cm ~ Duenger.g, data = Dünger), trt = "Duenger.g", group = T)
sig.letters <- hsd$groups %>% mutate(Duenger.g = row.names(.))
duenger_sum <- Dünger |>
group_by(Duenger.g) |>
summarize(mean_se(Höhe.cm)) |>
left_join(sig.letters, by = "Duenger.g")
ggplot(Dünger, aes(x = Duenger.g, y = Höhe.cm, fill = Duenger.g)) +
stat_summary(geom = "bar", fun = "mean") +
stat_summary(geom = "errorbar", width = .1) +
scale_fill_discrete(labels = c("0.5g", "1g", "2g", "3g", "4g")) +
geom_text(data = duenger_sum, aes(y = ymax, label = groups), vjust = 0, nudge_y = 1) +
labs(
title = "Auswirkung von Dünger auf die Höhe von Pflanzen",
x = "Dünger in g", y = "Höhe in cm"
)
#> No summary function supplied, defaulting to `mean_se()`
But as the summary dataset now already contains the mean and the values for the error bars a second option would be to do:
ggplot(duenger_sum, aes(x = Duenger.g, y = y, fill = Duenger.g)) +
geom_col() +
geom_errorbar(aes(ymin = ymin, ymax = ymax), width = .1) +
scale_fill_discrete(labels = c("0.5g", "1g", "2g", "3g", "4g")) +
geom_text(aes(y = ymax, label = groups), vjust = 0, nudge_y = 1) +
labs(
title = "Auswirkung von Dünger auf die Höhe von Pflanzen",
x = "Dünger in g", y = "Höhe in cm"
)

Heatmap with the data point categorized by their class label

I have a dataframe with columns for different attributes and a column for the class label. I am trying to create a Heatmap/matrix plot of all the attributes with the data points categorized by their class label.
If I turn my dataframe into a numeric matrix, I can use the heatmap function to create a heatmap:
q3 <- read.arff("diabetes.arff")
q3_m <- as.matrix(q3[,1:8])
heatmap(q3_m, Colv=NA, Rowv=NA)
However, I can't figure out how to order these by the class variable, as I had to remove it from the matrix because it isn't numeric.
If I transform the data into the long format, I can also make the following heatmap using ggplot:
q3_long <- pivot_longer(q3, preg:age, names_to = "Attribute",
values_to = "Value")
ggplot(data = q3_long, mapping = aes(x = Attribute, y=class, fill = Value)) +
geom_raster() +
xlab(label = "Attribute")
However, this averages the values of every case in a given class rather than showing every case as a separate row with its own fill.
How can I combine these approaches to get a heatmap that clusters the cases by class?
(Apologies in advance - I attempted to include images here ,but I just joined stackoverflow and therefore don't have the 10 reputation points needed to include images).
Thanks for your help.
Edit: here is a sample of the data. It is also publicly available - the diabetes.arff dataset is automatically downloaded with Weka installation (https://waikato.github.io/weka-wiki/downloading_weka/).
structure(list(preg = c(6, 1, 8, 1, 0, 5, 3, 10, 2, 8, 4, 10,
10, 1, 5, 7, 0, 7, 1, 1), plas = c(148, 85, 183, 89, 137, 116,
78, 115, 197, 125, 110, 168, 139, 189, 166, 100, 118, 107, 103,
115), pres = c(72, 66, 64, 66, 40, 74, 50, 0, 70, 96, 92, 74,
80, 60, 72, 0, 84, 74, 30, 70), skin = c(35, 29, 0, 23, 35, 0,
32, 0, 45, 0, 0, 0, 0, 23, 19, 0, 47, 0, 38, 30), insu = c(0,
0, 0, 94, 168, 0, 88, 0, 543, 0, 0, 0, 0, 846, 175, 0, 230, 0,
83, 96), mass = c(33.6, 26.6, 23.3, 28.1, 43.1, 25.6, 31, 35.3,
30.5, 0, 37.6, 38, 27.1, 30.1, 25.8, 30, 45.8, 29.6, 43.3, 34.6
), pedi = c(0.627, 0.351, 0.672, 0.167, 2.288, 0.201, 0.248,
0.134, 0.158, 0.232, 0.191, 0.537, 1.441, 0.398, 0.587, 0.484,
0.551, 0.254, 0.183, 0.529), age = c(50, 31, 32, 21, 33, 30,
26, 29, 53, 54, 30, 34, 57, 59, 51, 32, 31, 31, 33, 32), class = structure(c(2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L,
2L, 1L, 2L), .Label = c("tested_negative", "tested_positive"), class = "factor")), row.names = c(NA,
20L), class = "data.frame")
Maybe this is what you are looking for. To get a heatmap by cases you could add an id variable to your dataset which you could map on x and make use of faceting to cluster the cases by class:
library(tidyr)
library(ggplot2)
library(dplyr)
q3_long <- q3 %>%
mutate(id = row_number(), id = factor(id)) %>%
pivot_longer(-c(class, id), names_to = "Attribute", values_to = "Value")
ggplot(data = q3_long, mapping = aes(x = Attribute, y = id, fill = Value)) +
geom_raster() +
xlab(label = "Attribute") +
facet_wrap(~class, scales = "free_y")

How do I use column index as x axis in R

I have a data frame with 7 columns and 100 observations
I divided observations into two groups
the question I'm working on is: b) Construct two time plots of the mean blood lead levels superimposed on the blood lead levels at each occasion for succimer and placebo groups.
This is my code so far:
library(tidyverse)
library(haven)
library(dplyr)
library(plyr)
library(foreign)
library(ggplot2)
tlc = read_dta(file = 'tlc.dta')
head(tlc)
## a)
placebo = subset(tlc, tlc$trt==0)
succimer = subset(tlc, tlc$trt==1)
summary(placebo[, 3:6])
summary(succimer[, 3:6])
placebo_mean=colMeans(placebo[ ,3:6])
placebo_std=apply(placebo[ ,3:6],2,sd)
placebo_var=placebo_std^2
succimer_mean=colMeans(succimer[ ,3:6])
succimer_std=apply(succimer[ ,3:6],2,sd)
succimer_var=succimer_std^2
## b)
## c)
placebo_cor=cor(placebo[ , 3:6]) %>% round(digits = 3)
succimer_cor=cor(succimer[ , 3:6]) %>% round(digits = 3)
placebo_cov=cov(placebo[ , 3:6]) %>% round(digits = 3)
succimer_cov=cov(succimer[ , 3:6]) %>% round(digits = 3)
So the purpose is to plot all observation by using values as y axis, and columns y0, y1, y4, y6 (represent to week 0, week 1, week 4, week 6) as x axis, then plot the mean of each group superimposed on the plot. I'm planning to use different colors to distinguish two groups, so the final plot will have a lot of points on each x coordinate, and two short lines to indicate means for each group at each x coordinate.
My question is how to use column index as x axis in R? with or with out using ggplot. I know this question may be too elementary, but it caused a lot of trouble for me as a beginner.
below is my data:
dput(tlc)
structure(list(id = structure(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26,
27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42,
43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58,
59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74,
75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90,
91, 92, 93, 94, 95, 96, 97, 98, 99, 100), format.stata = "%9.0g"),
trt = structure(c(0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0,
1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1,
0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0,
0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1,
1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1,
1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1), format.stata = "%9.0g", class = "haven_labelled", labels = c(Placebo = 0,
Succimer = 1)), y0 = structure(c(30.7999992370605, 26.5,
25.7999992370605, 24.7000007629395, 20.3999996185303, 20.3999996185303,
28.6000003814697, 33.7000007629395, 19.7000007629395, 31.1000003814697,
19.7999992370605, 24.7999992370605, 21.3999996185303, 27.8999996185303,
21.1000003814697, 20.6000003814697, 24, 37.5999984741211,
35.2999992370605, 28.6000003814697, 31.8999996185303, 29.6000003814697,
21.5, 26.2000007629395, 21.7999992370605, 23, 22.2000007629395,
20.5, 25, 33.2999992370605, 26, 19.7000007629395, 27.8999996185303,
24.7000007629395, 28.7999992370605, 29.6000003814697, 32,
21.7999992370605, 24.3999996185303, 33.7000007629395, 24.8999996185303,
19.7999992370605, 26.7000007629395, 26.7999992370605, 20.2000007629395,
35.4000015258789, 25.2999992370605, 20.2000007629395, 24.5,
20.2999992370605, 20.3999996185303, 24.1000003814697, 27.1000003814697,
34.7000007629395, 28.5, 26.6000003814697, 24.5, 20.5, 25.2000007629395,
34.7000007629395, 30.2999992370605, 26.6000003814697, 20.7000007629395,
27.7000007629395, 24.2999992370605, 36.5999984741211, 28.8999996185303,
34, 32.5999984741211, 29.2000007629395, 26.3999996185303,
21.7999992370605, 27.2000007629395, 22.3999996185303, 32.5,
24.8999996185303, 24.6000003814697, 23.1000003814697, 21.1000003814697,
25.7999992370605, 30, 22.1000003814697, 20, 38.0999984741211,
28.8999996185303, 25.1000003814697, 19.7999992370605, 22.1000003814697,
23.5, 29.1000003814697, 30.2999992370605, 25.3999996185303,
30.6000003814697, 22.3999996185303, 31.2000007629395, 31.3999996185303,
41.0999984741211, 29.3999996185303, 21.8999996185303, 20.7000007629395
), format.stata = "%9.0g"), y1 = structure(c(26.8999996185303,
14.8000001907349, 23, 24.5, 2.79999995231628, 5.40000009536743,
20.7999992370605, 31.6000003814697, 14.8999996185303, 31.2000007629395,
17.5, 23.1000003814697, 26.2999992370605, 6.30000019073486,
20.2999992370605, 23.8999996185303, 16.7000007629395, 33.7000007629395,
25.5, 15.8000001907349, 27.8999996185303, 15.8000001907349,
6.5, 26.7999992370605, 12, 4.19999980926514, 11.5, 21.1000003814697,
3.90000009536743, 26.2000007629395, 21.3999996185303, 13.1999998092651,
21.6000003814697, 21.2000007629395, 26.3999996185303, 17.5,
30.2000007629395, 19.2999992370605, 16.3999996185303, 14.8999996185303,
20.8999996185303, 18.8999996185303, 6.40000009536743, 20.3999996185303,
10.6000003814697, 30.3999996185303, 23.8999996185303, 17.5,
10, 21, 17.2000007629395, 20.1000003814697, 14.8999996185303,
39, 32.5999984741211, 22.3999996185303, 5.09999990463257,
17.5, 25.1000003814697, 39.5, 29.3999996185303, 25.2999992370605,
19.2999992370605, 4, 24.2999992370605, 23.2999992370605,
28.8999996185303, 10.6999998092651, 19, 9.19999980926514,
15.3000001907349, 10.6000003814697, 28.5, 22, 25.1000003814697,
23.6000003814697, 25, 20.8999996185303, 5.59999990463257,
21.8999996185303, 27.6000003814697, 21, 22.7000007629395,
40.7999992370605, 12.5, 28.1000003814697, 11.6000003814697,
21.1000003814697, 7.90000009536743, 16.7999992370605, 3.5,
24.2999992370605, 28.2000007629395, 7.09999990463257, 10.8000001907349,
3.90000009536743, 15.1000003814697, 22.1000003814697, 7.59999990463257,
8.10000038146973), format.stata = "%9.0g"), y4 = structure(c(25.7999992370605,
19.5, 19.1000003814697, 22, 3.20000004768372, 4.5, 19.2000007629395,
28.5, 15.3000001907349, 29.2000007629395, 20.5, 24.6000003814697,
19.5, 18.5, 18.3999996185303, 19, 21.7000007629395, 34.4000015258789,
26.2999992370605, 22.8999996185303, 27.2999992370605, 23.7000007629395,
7.09999990463257, 25.2999992370605, 16.7999992370605, 4,
9.5, 17.3999996185303, 12.8000001907349, 34, 21, 14.6000003814697,
23.6000003814697, 22.8999996185303, 23.7999992370605, 21,
30.2000007629395, 16.3999996185303, 11.6000003814697, 14.5,
22.2000007629395, 18.8999996185303, 5.09999990463257, 19.2999992370605,
9, 26.5, 22.2000007629395, 17.3999996185303, 15.6000003814697,
16.7000007629395, 15.8999996185303, 17.8999996185303, 18.1000003814697,
28.7999992370605, 27.5, 21.7999992370605, 8.19999980926514,
19.6000003814697, 23.3999996185303, 38.5999984741211, 33.0999984741211,
25.1000003814697, 21.8999996185303, 4.19999980926514, 18.3999996185303,
40.4000015258789, 32.7999992370605, 12.6000003814697, 16.2999992370605,
8.30000019073486, 24.6000003814697, 14.3999996185303, 35,
19.1000003814697, 27.7999992370605, 21.2000007629395, 21.7000007629395,
21.7000007629395, 7.30000019073486, 23.6000003814697, 24,
8.60000038146973, 21.2000007629395, 38, 16.7000007629395,
27.5, 13, 21.5, 12.3999996185303, 15.1000003814697, 3, 22.7000007629395,
27, 17.2000007629395, 19.7999992370605, 7, 10.8999996185303,
25.2999992370605, 10.8000001907349, 25.7000007629395), format.stata = "%9.0g"),
y6 = structure(c(23.7999992370605, 21, 23.2000007629395,
22.5, 9.39999961853027, 11.8999996185303, 18.3999996185303,
25.1000003814697, 14.6999998092651, 30.1000003814697, 27.5,
30.8999996185303, 19, 16.2999992370605, 20.7999992370605,
17, 20.2999992370605, 31.3999996185303, 30.2999992370605,
25.8999996185303, 34.2000007629395, 23.3999996185303, 16,
24.7999992370605, 19.2000007629395, 16.2000007629395, 14.5,
21.1000003814697, 12.6999998092651, 28.2000007629395, 22.3999996185303,
11.6000003814697, 27.7000007629395, 21.8999996185303, 22,
24.2000007629395, 27.5, 17.6000003814697, 16.6000003814697,
63.9000015258789, 19.7999992370605, 15.5, 15.1000003814697,
23.7999992370605, 16, 28.1000003814697, 27.2000007629395,
18.6000003814697, 15.1999998092651, 13.5, 17.7000007629395,
18.7000007629395, 21.2999992370605, 34.7000007629395, 22.7999992370605,
21, 23.6000003814697, 18.3999996185303, 22.2000007629395,
43.2999992370605, 28.3999996185303, 27.8999996185303, 21.7999992370605,
11.6999998092651, 27.7999992370605, 39.2999992370605, 31.7999992370605,
21.2000007629395, 18.6000003814697, 18.3999996185303, 32.4000015258789,
18.7000007629395, 30.5, 18.7000007629395, 27.2999992370605,
21.1000003814697, 23.8999996185303, 19.8999996185303, 12.3000001907349,
24.7999992370605, 23.7000007629395, 24.6000003814697, 20.5,
32.7000007629395, 22.2000007629395, 24.7999992370605, 23.1000003814697,
20.6000003814697, 18.8999996185303, 18.7999992370605, 11.5,
20.1000003814697, 25.5, 18.7000007629395, 22.2000007629395,
17.7999992370605, 27.1000003814697, 4.09999990463257, 13,
12.3000001907349), format.stata = "%9.0g")), row.names = c(NA,
-100L), class = c("tbl_df", "tbl", "data.frame"))
also I have tried this:
p=ggplot(tlc, aes(x=colnames(tlc[,3:6],do.NULL=TRUE)),
y=value)
p=p+geom_point()
No errors found when running the code, but R did report an error (Aesthetics must be either length 1 or the same as the data (100): x) when I call 'p' to plot it.
I don't have your data, but it sounds like you want something that looks like this:
Here is how I made it:
library(tidyverse)
# Setting up some fake data: 100 observations and 7 variables
set.seed(123)
some_data <- data.frame(y0 = rnorm(100),
y1 = runif(100),
y2 = rexp(100, 2),
y3 = rnorm(100, 2, 1),
y4 = rexp(100),
y5 = rnorm(100, 2,2),
y6 = runif(100, -5, 5))
# pivoting the data to longer format:
long_data <- some_data %>%
pivot_longer(cols = everything(),
names_to = "variable")
# building the base plot
p <- ggplot(long_data, aes(x = variable, y = value))
# adding the points - use position_jitter to give it some width if you want
p <- p + geom_point(position = position_jitter(width = 0.2))
# adding the bars at mean - play around with width, color, and size
p <- p + stat_summary(geom = "errorbar",
fun = mean,
width = 0.4,
aes(ymax = ..y.., ymin = ..y..),
color = "orange",
size = 1.5)
p # show plot

Conditionnally formatting the background color of a table (formattable in R)

I have a table similar to the below:
library(tibble)
library(formattable)
mytable <- tibble(
id = c(NA, 748, 17, 717, 39, 734, 10, 762),
NPS = c(65, 63, 56, 62, 73, 80, 50, 54),
`NPS Chge vs. month ago` = c(-2, -5, -2, -8, -1, 6, 7, -9),
`Cumulative Response` = c(766, 102, 154, 81, 239, 79, 50, 61),
`Response Rate` = c(0.25, 0.24, 0.25, 0.34, 0.21, 0.34, 0.32, 0.27),
`Response for Month` = c(161, 43, 7, 37, 7, 32, 15, 20)
)
formattable(mytable)
And I wish to set a conditional formatting to the background of the rows such that if the NPS score is below 60 the background is set to red, otherwise it's set to green. In my limited knowledge of HTML I figured I could use "td". Unfortunately it appears to mess the format of the table as a whole:
html_tag <- "td"
my_format <- formatter(html_tag, style = x ~ ifelse(mytable$NPS < 60, "background-color:red", "background-color:green"))
formattable(mytable, list(
area(col = 2:6) ~ my_format
))
The headers of the table are no longer aligned with the rest of the rows. What am I doing wrong? What should I use instead of "td"?
You can also change the background color conditionally without using HTML. A simple version of code without extra aesthetics could be like this:
library(dplyr)
library(formattable)
tibble(
id = c(NA, 748, 17, 717, 39, 734, 10, 762),
NPS = c(65, 63, 56, 62, 73, 80, 50, 54),
`NPS Chge vs. month ago` = c(-2, -5, -2, -8, -1, 6, 7, -9),
`Cumulative Response` = c(766, 102, 154, 81, 239, 79, 50, 61),
`Response Rate` = c(0.25, 0.24, 0.25, 0.34, 0.21, 0.34, 0.32, 0.27),
`Response for Month` = c(161, 43, 7, 37, 7, 32, 15, 20)
) %>%
formattable(align = rep("c", ncol(.)),
list(
`NPS` = formatter("span", style = ~ style (display = "block",
`background-color` = ifelse(NPS < 60, "red", "green")))
)
)

Area plot with missing values in base R

I want to draw an area plot for which the base of the polygon is zero and the data lines are connected to the base by vertical segments at every data break (that is the beginning, the end and possible NAs/NaN).
I drew this:
I had to force vertical down ward segments where the serie is interrupted with NAs, and I did this transforming NAs in 0s. But that doesn't produce vertical segments but polygon lines that reach the following 0s. I solved the problem for the beginning and the end of the series, adding a (y = 0, x = 0) point on both sides on the serie.
But this doesn't fix the problem if the NAs are inside the serie.
Any idea?
here's an example code (different image):
pollen <- c(45, 257.4, 24.67, 54.6, 89.4, 297, 471.25, 1256.5, 312.25, 969.2, 787.5, 425, NaN, 76.6, 42.67, 38.5, 20.2, 5.67, 15.8, 13.2, 11, 6.25, 6.67, 2.3, 0.5, 30.8, 3.75, 3, 2, 2.2, 3.25, 4.5, 9.6, 15.8, 200.2, NaN)
weeks.vec <- c(5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40)
plot.ts(y = pollen, x = weeks.vec, col = 'red', ylab = 'Pollen', xlab = 'Weeks', lwd = 3, xy.labels = F, xy.lines = T)
pollen[is.na(pollen)] <- 0
poly.y <- c(0,pollen,0)
poly.x <- c(weeks.vec[1], weeks.vec, weeks.vec[length(weeks.vec)])
polygon(y = poly.y, x = poly.x, density = NA,border = NA, col = rgb(1,0,0, .3))
I'd use ggplot2:
pollen <- c(45, 257.4, 24.67, 54.6, 89.4, 297, 471.25, 1256.5, 312.25, 969.2, 787.5, 425, NaN, 76.6, 42.67, 38.5, 20.2, 5.67, 15.8, 13.2, 11, 6.25, 6.67, 2.3, 0.5, 30.8, 3.75, 3, 2, 2.2, 3.25, 4.5, 9.6, 15.8, 200.2, NaN)
weeks.vec <- c(5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40)
DF <- data.frame(pollen, weeks.vec)
library(ggplot2)
ggplot(DF, aes(x = weeks.vec, y = pollen)) +
geom_ribbon(aes(ymin = 0, ymax = pollen),
colour = NA, fill = "red", alpha = 0.3) +
geom_line(colour = "red") +
geom_point(colour = "red", size = 3) +
xlab("Week") + ylab("Pollen") +
theme_bw()
But if you must use base plots:
plot.ts(y = pollen, x = weeks.vec, col = 'red',
ylab = 'Pollen', xlab = 'Weeks', lwd = 3,
xy.labels = F, xy.lines = T)
g <- cumsum(!is.finite(pollen))
for (i in unique(g)) {
y <- pollen[g == i]
x <- weeks.vec[g == i]
x <- x[is.finite(y)]
y <- y[is.finite(y)]
x <- c(x, rev(x))
y <- c(y, y * 0)
polygon(y = y, x = x, density = NA,border = NA, col = rgb(1,0,0, .3))
}

Resources