Pre-defining number format of a data frame in R - r

Is there any way to pre-define a number format (e.g. rounding off to the specified number of decimal places) of a data frame so that whenever I add a new column it follows the same format?
I tried with format {base}, but it only changes the format of the existing columns not for the ones I add after.
A workable example is given below
mydf <- as.data.frame(matrix(rnorm(50), ncol=5))
mydf
V1 V2 V3 V4 V5
1 -1.3088022 -0.22088032 -1.8739405 1.65276442 1.21762297
2 1.1123253 -0.76042101 -0.1608188 0.39945804 -0.58674209
3 -0.9366654 0.92893610 -0.6905299 -0.37374892 -1.70539909
4 0.4619175 -0.28929198 1.0280021 -0.87998207 -0.34493824
5 -0.3741670 -0.61782368 -1.0435906 0.52166082 -0.29308408
6 -1.2283031 -0.37065379 0.8652538 0.05088202 -1.80997313
7 -1.1137726 -0.97878307 0.5045051 0.85442196 0.02932812
8 0.3373866 -0.46614754 -0.4642278 -0.38438002 -1.47251777
9 0.3245720 -0.06047061 -0.3273080 0.49145133 -0.86507348
10 1.6459180 -1.31076464 1.5627246 0.49841764 0.73895626
the following changes the format of the data frame
mydf <- format(mydf, digits=2)
mydf
V1 V2 V3 V4 V5
1 -1.31 -0.22 -1.87 1.653 1.218
2 1.11 -0.76 -0.16 0.399 -0.587
3 -0.94 0.93 -0.69 -0.374 -1.705
4 0.46 -0.29 1.03 -0.880 -0.345
5 -0.37 -0.62 -1.04 0.522 -0.293
6 -1.23 -0.37 0.87 0.051 -1.810
7 -1.11 -0.98 0.50 0.854 0.029
8 0.34 -0.47 -0.46 -0.384 -1.473
9 0.32 -0.06 -0.33 0.491 -0.865
10 1.65 -1.31 1.56 0.498 0.739
but this formatting is not applied when I add a new column to the data frame, see below
mydf$new <- rnorm(10)
mydf
V1 V2 V3 V4 V5 new
1 -1.31 -0.22 -1.87 1.653 1.218 0.30525117
2 1.11 -0.76 -0.16 0.399 -0.587 -1.83038790
3 -0.94 0.93 -0.69 -0.374 -1.705 0.34830499
4 0.46 -0.29 1.03 -0.880 -0.345 -0.66017888
5 -0.37 -0.62 -1.04 0.522 -0.293 0.03103741
6 -1.23 -0.37 0.87 0.051 -1.810 1.32809006
7 -1.11 -0.98 0.50 0.854 0.029 0.85428977
8 0.34 -0.47 -0.46 -0.384 -1.473 -0.51917266
9 0.32 -0.06 -0.33 0.491 -0.865 -0.37057104
10 1.65 -1.31 1.56 0.498 0.739 -1.32447706
I know I can adjust the digits using print {base}, but that also does not change the underlying format of the data frame. Any suggestion? Thanks in advance.

Related

Fixing Tukey multicomparison

Hi I was running a Tukey multicomparison test for the data below (data and code attached) after confirming that the data was normal, equal variances and significant interactions via two-way ANOVA (time and growth condition). The results in R and final barchart (1) were included as well. As you can see the visualization could be improved and need to be tidied up due to the redundant letters. I was advised to redo using the same Tukey test but adding some additional codes to assign the samples at time point 0 hr as the reference/control (sort of like Dunnett test with a single control). I couldn't really find any useful information regarding this online, would appreciate any help/suggestion!
data.frame(Exp1)
id growth_condition time fv fq npq in_situ rd
1 1 Control 0 0.81 0.56 0.72 0.797 1.000
2 2 Control 0 0.81 0.58 0.78 0.788 1.000
3 3 Control 0 0.80 0.59 0.76 0.793 1.000
4 4 High light+Chilled 0 0.82 0.57 0.85 0.799 1.000
5 5 High light+Chilled 0 0.81 0.59 0.75 0.796 1.000
6 6 High light+Chilled 0 0.81 0.56 0.69 0.782 1.000
7 7 Control 0.5 0.81 0.53 1.08 0.759 1.279
8 8 Control 0.5 0.81 0.56 0.72 0.759 0.668
9 9 Control 0.5 0.79 0.50 1.04 0.771 0.877
10 10 High light+Chilled 0.5 0.70 0.46 1.04 0.540 0.487
11 11 High light+Chilled 0.5 0.60 0.43 0.69 0.652 1.341
12 12 High light+Chilled 0.5 0.73 0.46 1.19 0.606 0.904
13 13 Control 8 0.82 0.52 1.20 0.753 0.958
14 14 Control 8 0.81 0.55 1.09 0.759 0.642
15 15 Control 8 0.80 0.55 1.07 0.747 0.612
16 16 High light+Chilled 8 0.44 0.28 0.58 0.230 0.471
17 17 High light+Chilled 8 0.35 0.21 0.45 0.237 0.777
18 18 High light+Chilled 8 0.54 0.35 0.68 0.186 0.342
19 19 Control 24 0.81 0.49 1.17 0.762 0.915
20 20 Control 24 0.82 0.67 1.25 0.749 0.876
21 21 Control 24 0.82 0.48 1.18 0.756 0.836
22 22 High light+Chilled 24 0.40 0.25 0.45 0.089 0.392
23 23 High light+Chilled 24 0.43 0.27 0.51 0.106 0.627
24 24 High light+Chilled 24 0.34 0.21 0.37 0.140 0.258
25 25 Control 48 0.81 0.48 1.05 0.773 0.662
26 26 Control 48 0.80 0.45 1.14 0.785 0.914
27 27 Control 48 0.82 0.47 1.09 0.792 0.912
28 28 High light+Chilled 48 0.73 0.45 0.90 0.750 0.800
29 29 High light+Chilled 48 0.70 0.51 0.79 0.626 1.305
30 30 High light+Chilled 48 0.66 0.43 0.74 0.655 0.579
Code:
res.Exp8 <- aov(npq ~ growth_condition * time, data =Exp1)
summary(res.Exp8)
t8<-TukeyHSD(res.Exp8)
plot(t8)
multcompLetters4(res.Exp8,t8)
Results:
$`growth_condition:time`
Control:24 Control:8 Control:48 High light+Chilled:0.5 Control:0.5 High light+Chilled:48
"a" "ab" "abc" "abc" "abc" "bcd"
High light+Chilled:0 Control:0 High light+Chilled:8 High light+Chilled:24
"cde" "cde" "de" "e"

Error in running t-test via stat_compare_means function, and TukeyHSD

For the "rd" parameter, I got an error message while running t.test using the ggpubr::stat_compare_means() function. Moreover, TukeyHSD analysis of my data categorized all the individual group as "a", implying that there was no significance differences. This seems a bit weird as I'm expecting the opposite by looking at the plot (attached my plot). Moreover, there was no issue for identical t.test and TukeyHSD analysis of other parameters (fv,fq,npq, and in_situ etc. in data frame ). Please find my scripts and datas below, thanks.
This was just an example of similar plot from another parameter ("fv" in data frame) where the results of t.test from ggpubr::stat_compare_means()were shown above the error bars,identical script was being used here. expected plot
Exp1 <- read_csv("Raw data/Exp1.csv")
Exp1 $time <- factor(Exp1$time)
Exp1 $growth_condition <- factor(Exp1$growth_condition)
summary_anti_PsbS_both <-summarySE(data=Exp1, measurevar="rd", groupvars=c("time","growth_condition"))
> data.frame(Exp1)
id growth_condition time fv fq npq in_situ rd
1 1 Control 0 0.81 0.56 0.72 0.797 1.000
2 2 Control 0 0.81 0.58 0.78 0.788 1.000
3 3 Control 0 0.80 0.59 0.76 0.793 1.000
4 4 High light+Chilled 0 0.82 0.57 0.85 0.799 1.000
5 5 High light+Chilled 0 0.81 0.59 0.75 0.796 1.000
6 6 High light+Chilled 0 0.81 0.56 0.69 0.782 1.000
7 7 Control 0.5 0.81 0.53 1.08 0.759 1.279
8 8 Control 0.5 0.81 0.56 0.72 0.759 0.668
9 9 Control 0.5 0.79 0.50 1.04 0.771 0.877
10 10 High light+Chilled 0.5 0.70 0.46 1.04 0.540 0.487
11 11 High light+Chilled 0.5 0.60 0.43 0.69 0.652 1.341
12 12 High light+Chilled 0.5 0.73 0.46 1.19 0.606 0.904
13 13 Control 8 0.82 0.52 1.20 0.753 0.958
14 14 Control 8 0.81 0.55 1.09 0.759 0.642
15 15 Control 8 0.80 0.55 1.07 0.747 0.612
16 16 High light+Chilled 8 0.44 0.28 0.58 0.230 0.471
17 17 High light+Chilled 8 0.35 0.21 0.45 0.237 0.777
18 18 High light+Chilled 8 0.54 0.35 0.68 0.186 0.342
19 19 Control 24 0.81 0.49 1.17 0.762 0.915
20 20 Control 24 0.82 0.67 1.25 0.749 0.876
21 21 Control 24 0.82 0.48 1.18 0.756 0.836
22 22 High light+Chilled 24 0.40 0.25 0.45 0.089 0.392
23 23 High light+Chilled 24 0.43 0.27 0.51 0.106 0.627
24 24 High light+Chilled 24 0.34 0.21 0.37 0.140 0.258
25 25 Control 48 0.81 0.48 1.05 0.773 0.662
26 26 Control 48 0.80 0.45 1.14 0.785 0.914
27 27 Control 48 0.82 0.47 1.09 0.792 0.912
28 28 High light+Chilled 48 0.73 0.45 0.90 0.750 0.800
29 29 High light+Chilled 48 0.70 0.51 0.79 0.626 1.305
30 30 High light+Chilled 48 0.66 0.43 0.74 0.655 0.579
Script for plot
ggplot(data=summary_anti_PsbS_both, mapping = aes(x = factor(time), y = rd, fill= growth_condition))+
geom_bar(stat = "identity", position = "dodge")+
labs(x= "Time (hr)", y="Relative density", fill= "Growth conditions")+
ylim(0,1.5)+
geom_errorbar(aes(ymin=rd-se, ymax=rd+se), width=.2, position=position_dodge(width= 0.9))+
annotate(geom="text", x=1, y=1.45, label="n=3")+
stat_compare_means(data=Exp1, label="p.signif", label.y= 1.35, method="t.test")+
theme_bw()+
theme(text = element_text(size = 15))
Error message
Warning message:
Computation failed in `stat_compare_means()`:
Problem while computing `p = purrr::map(...)`.
Script for TukeyHSD
res.both88 <- aov(rd ~ growth_condition * time, data =Exp1)
summary(res.both88)
t8<-TukeyHSD(res.both88)
multcompLetters4(res.both88,t8)

Show different data in top and bottom of Rcirclize

I have 2 dataframes with different number of rows and columns, and I'd like to show both of them in a circos plot with circlize.
My data looks like this:
df1=data.frame(replicate(7,sample(-200:200,200,rep=TRUE))/100)
df2=data.frame(replicate(2,sample(-200:200,200,rep=TRUE))/100)
#head(df1)
X1 X2 X3 X4 X5 X6 X7
1 -0.03 0.63 -0.33 0.73 -1.37 -1.39 1.96
2 -1.81 -1.24 -1.63 1.58 0.13 1.39 -0.76
3 0.02 -2.00 -1.93 -1.35 1.06 -0.58 -0.77
4 -1.11 -1.38 -0.66 -0.40 1.69 -0.47 -1.55
5 0.98 0.06 0.00 -0.35 1.97 1.74 0.72
6 1.51 -1.68 -0.44 -1.74 0.15 0.26 0.36
#head(df2)
X1 X2
1 0.16 -0.81
2 -1.38 -0.16
3 -0.22 -0.74
4 0.73 -0.82
5 0.58 -1.87
6 -0.63 1.50
I want to build a single circos plot where the top is showing df1 and bottom is showing df2, but I can only show individual dfs. For instance, this is how I show df1:
col_fun1=colorRamp2(c(min(df1), 0, max(df1)), c("blue", "white", "red"))
circos.heatmap(df1, col = col_fun1, cluster = T, track.height = 0.2, rownames.side = "outside", rownames.cex = 0.6)
circos.clear()
How can I df1 only in the top half, and df2 only in the bottom half?

How to do Shapiro test for multicolumns in data.frame? And avoid 2 errors: values are identical and missing value where TRUE/FALSE needed

I have a dataframe like this:
head(Betula, 10)
year start Start_DayOfYear end End_DayOfYear duration DateMax Max_DayOfYear BetulaPollenMax SPI Jan.NAO Jan.AO
1 1997 <NA> NA <NA> NA NA <NA> NA NA NA -0.49 -0.46
2 1998 <NA> 143 <NA> 184 41 <NA> 146 42 361 0.39 -2.08
3 1999 <NA> 148 <NA> 188 40 <NA> 158 32 149 0.77 0.11
4 2000 <NA> 135 <NA> 197 62 <NA> 156 173 917 0.60 1.27
5 2001 <NA> 143 <NA> 175 32 <NA> 154 113 457 0.25 -0.96
Jan.SO Feb.NAO Feb.AO Feb.SO Mar.NAO Mar.AO Mar.SO Apr.NAO Apr.AO Apr.SO DecJanFebMarApr.NAO DecJanFebMar.NAO
1 0.5 1.70 1.89 1.7 1.46 1.09 -0.4 -1.02 0.32 -0.6 0.14 0.43
2 -2.7 -0.11 -0.18 -2.0 0.87 -0.25 -2.4 -0.68 -0.04 -1.4 0.27 0.51
3 1.8 0.29 0.48 1.0 0.23 -1.49 1.3 -0.95 0.28 1.4 0.39 0.73
4 0.7 1.70 1.08 1.7 0.77 -0.45 1.3 -0.03 -0.28 1.2 0.49 0.62
5 1.0 0.45 -0.62 1.7 -1.26 -1.69 0.9 0.00 0.91 0.2 -0.28 -0.35
DecJanFeb.NAO DecJan.NAO JanFebMarApr.NAO JanFebMar.NAO JanFeb.NAO FebMarApr.NAO FebMar.NAO MarApr.NAO
1 0.08 -0.73 0.41 0.89 0.61 0.71 1.58 0.22
2 0.38 0.63 0.12 0.38 0.14 0.03 0.38 0.10
3 0.89 1.19 0.09 0.43 0.53 -0.14 0.26 -0.36
4 0.57 0.01 0.76 1.02 1.15 0.81 1.24 0.37
5 -0.04 -0.29 -0.14 -0.19 0.35 -0.27 -0.41 -0.63
DecJanFebMarApr.AO DecJanFebMar.AO DecJanFeb.AO DecJan.AO JanFebMarApr.AO JanFebMar.AO JanFeb.AO FebMarApr.AO
1 0.55 0.61 0.45 -0.27 0.71 0.84 0.72 1.10
2 -0.24 -0.29 -0.30 -0.37 -0.64 -0.84 -1.13 -0.16
3 0.08 0.04 0.54 0.58 -0.16 -0.30 0.30 -0.24
4 -0.15 -0.11 0.00 -0.54 0.41 0.63 1.18 0.12
5 -0.74 -1.15 -0.97 -1.14 -0.59 -1.09 -0.79 -0.47
FebMar.AO MarApr.AO DecJanFebMarApr.SO DecJanFebMar.SO DecJanFeb.SO DecJan.SO JanFebMarApr.SO JanFebMar.SO
1 1.49 0.71 0.04 0.20 0.40 -0.25 0.30 0.60
2 -0.22 -0.15 -1.42 -1.43 -1.10 -0.65 -2.13 -2.37
3 -0.51 -0.61 1.38 1.38 1.40 1.60 1.38 1.37
4 0.32 -0.37 1.14 1.13 1.07 0.75 1.23 1.23
5 -1.16 -0.39 0.60 0.70 0.63 0.10 0.95 1.20
JanFeb.SO FebMarApr.SO FebMar.SO MarApr.SO TmaxAprI TminAprI TmeanAprI RainfallAprI HumidityAprI SunshineAprI
1 1.10 0.23 0.65 -0.50 3.27 -3.86 -0.44 0.82 76.3 3.45
2 -2.35 -1.93 -2.20 -1.90 4.52 -3.28 -0.15 0.12 73.5 7.12
3 1.40 1.23 1.15 1.35 4.11 -3.86 -0.34 1.32 78.4 4.85
4 1.20 1.40 1.50 1.25 6.11 -1.31 1.93 0.80 71.9 4.20
5 1.35 0.93 1.30 0.55 1.46 -2.37 -1.04 2.83 84.4 1.21
CloudAprI WindAprI SeeLevelPressureAprI TmaxAprII TminAprII TmeanAprII RainfallAprII HumidityAprII
1 6.30 5.26 1008.63 12.12 2.11 6.17 0.23 76.5
2 3.93 3.86 1022.39 5.57 -0.44 1.82 0.83 77.9
3 5.02 3.23 1007.09 0.20 -6.36 -3.23 2.63 82.5
4 6.15 5.13 1012.21 2.74 -4.88 -2.35 0.34 76.0
5 7.50 3.90 1009.50 6.75 -3.22 1.16 0.32 71.5
SunshineAprII CloudAprII WindAprII SeeLevelPressureAprII TmaxAprIII TminAprIII TmeanAprIII RainfallAprIII
1 3.12 6.53 5.19 1024.31 7.35 0.33 3.37 0.33
2 2.41 6.85 3.70 1012.01 6.34 0.76 2.69 2.01
3 4.99 5.87 6.23 1019.66 8.65 0.73 4.23 0.70
4 6.63 5.17 5.84 1022.62 5.84 -1.81 2.02 0.00
5 6.11 4.82 3.92 1018.81 8.47 1.02 4.17 1.09
HumidityAprIII SunshineAprIII CloudAprIII WindAprIII SeeLevelPressureAprIII TmaxDecI TminDecI TmeanDecI
1 75.0 3.73 6.40 4.08 1009.91 -0.90 -5.88 -3.67
2 83.5 1.52 7.31 4.66 1008.33 5.33 0.01 2.46
3 73.4 6.62 5.12 3.16 1017.01 -0.24 -6.93 -3.64
4 69.0 8.80 4.80 4.99 1021.18 4.67 1.86 2.79
5 72.7 5.33 5.41 4.27 1005.48 3.69 -1.43 1.65
RainfallDecI HumidityDecI SunshineDecI CloudDecI WindDecI SeeLevelPressureDecI TmaxDecII TminDecII TmeanDecII
1 0.12 77.3 0.22 5.08 3.49 1003.15 7.99 0.77 4.10
2 1.10 73.5 0.04 6.29 5.21 999.94 0.24 -4.74 -2.67
3 2.41 82.3 0.00 6.70 4.92 998.64 1.22 -5.90 -2.05
4 3.13 88.1 0.00 7.97 4.00 997.82 2.76 -3.89 -0.54
5 1.60 79.1 0.07 5.44 5.76 996.35 10.82 4.36 6.90
RainfallDecII HumidityDecII SunshineDecII CloudDecII WindDecII SeeLevelPressureDecII TmaxDecIII TminDecIII
1 1.90 71.3 0 4.96 5.55 1007.16 4.78 -2.12
2 4.34 82.2 0 7.03 6.06 998.02 2.07 -4.60
3 1.94 78.6 0 6.53 5.82 1008.33 2.09 -2.48
4 1.45 77.2 0 6.57 5.26 1005.11 -1.49 -8.37
5 1.15 66.6 0 5.74 5.47 1030.02 1.40 -7.34
TmeanDecIII RainfallDecIII HumidityDecIII SunshineDecIII CloudDecIII WindDecIII SeeLevelPressureDecIII TmaxFebI
1 1.15 3.96 82.36 0 6.01 4.02 991.60 -0.23
2 -0.51 4.10 81.18 0 6.67 3.91 986.52 0.79
3 -0.61 1.97 81.27 0 6.21 5.53 982.13 2.19
4 -5.28 1.26 79.64 0 6.11 4.22 1019.63 3.27
5 -3.45 1.19 82.18 0 6.20 4.77 1015.53 2.42
TminFebI TmeanFebI RainfallFebI HumidityFebI SunshineFebI CloudFebI WindFebI SeeLevelPressureFebI TmaxFebII
1 -6.67 -3.57 0.84 84.3 1.11 6.81 5.35 990.51 2.97
2 -7.79 -4.49 2.31 72.2 1.88 4.73 4.53 990.39 3.31
3 -4.14 -1.77 0.42 73.3 1.29 6.02 5.57 1007.67 1.55
4 -2.48 0.04 2.28 77.0 0.46 6.84 4.29 982.97 -1.24
5 -3.52 -0.74 1.98 81.5 0.76 5.78 4.93 1008.29 6.71
TminFebII TmeanFebII RainfallFebII HumidityFebII SunshineFebII CloudFebII WindFebII SeeLevelPressureFebII
1 -2.31 -0.10 1.44 82.2 1.07 6.45 4.42 980.59
2 -4.85 -0.99 3.84 75.0 2.54 5.91 5.05 999.98
3 -5.76 -2.44 2.89 75.3 0.40 6.95 5.82 990.44
4 -8.47 -4.65 3.33 83.1 0.63 6.55 4.95 1000.10
5 -0.25 3.01 1.38 66.1 1.16 6.18 6.28 1001.46
TmaxFebIII TminFebIII TmeanFebIII RainfallFebIII HumidityFebIII SunshineFebIII CloudFebIII WindFebIII
1 0.05 -6.01 -3.35 4.60 83.50 1.29 6.58 4.71
2 -0.45 -7.43 -4.51 2.93 78.38 1.00 6.91 5.99
3 2.13 -4.51 -1.21 2.90 79.38 2.51 5.76 5.46
4 0.59 -3.79 -1.92 5.94 88.33 1.40 6.86 6.70
5 -2.68 -7.23 -5.05 1.39 83.88 1.13 7.41 5.69
SeeLevelPressureFebIII TmaxJanI TminJanI TmeanJanI RainfallJanI HumidityJanI SunshineJanI CloudJanI WindJanI
1 980.25 0.38 -5.57 -3.36 0.01 82.9 0.27 3.45 2.97
2 997.71 4.29 -0.03 2.08 3.70 82.9 0.00 7.39 5.01
3 988.45 1.02 -4.47 -1.87 2.22 82.3 0.00 6.94 4.29
4 987.21 0.04 -6.28 -3.03 4.99 85.8 0.00 5.84 4.75
5 1023.84 -0.33 -5.11 -3.17 0.66 81.2 0.00 7.08 3.88
SeeLevelPressureJanI TmaxJanII TminJanII TmeanJanII RainfallJanII HumidityJanII SunshineJanII CloudJanII
1 1023.71 0.09 -6.48 -2.50 4.29 86.5 0.01 7.23
2 984.57 -0.34 -6.49 -3.61 2.74 80.2 0.23 6.99
3 1004.06 0.32 -5.59 -3.03 5.28 83.3 0.00 6.68
4 983.42 8.38 1.46 4.97 0.64 69.3 0.10 6.13
5 1010.31 7.35 3.00 5.09 1.27 66.3 0.03 6.19
WindJanII SeeLevelPressureJanII TmaxJanIII TminJanIII TmeanJanIII RainfallJanIII HumidityJanIII SunshineJanIII
1 5.42 998.88 5.66 -2.39 1.97 1.03 74.27 0.65
2 6.38 1011.44 3.84 -3.32 -0.37 0.70 73.55 0.55
3 6.24 980.15 4.33 -5.19 -0.59 2.23 76.64 0.69
4 6.44 1019.41 4.09 -2.67 0.05 2.18 71.73 0.42
5 6.74 1006.10 4.43 -0.86 1.58 1.91 80.09 0.20
CloudJanIII WindJanIII SeeLevelPressureJanIII TmaxMarI TminMarI TmeanMarI RainfallMarI HumidityMarI
1 6.47 7.59 1004.59 2.83 -3.60 -0.72 2.14 79.9
2 5.25 4.72 1019.95 -5.31 -12.52 -9.52 2.28 72.6
3 5.34 4.65 1001.66 -0.70 -6.67 -4.47 1.39 81.0
4 5.85 4.83 1007.23 0.10 -7.91 -3.98 2.36 80.2
5 6.53 3.63 992.53 -0.38 -4.59 -2.27 3.00 86.4
SunshineMarI CloudMarI WindMarI SeeLevelPressureMarI TmaxMarII TminMarII TmeanMarII RainfallMarII HumidityMarII
1 0.85 6.77 6.64 986.96 -1.48 -8.43 -5.58 1.09 81.0
2 2.92 5.91 4.68 1013.17 6.53 -1.81 2.56 0.43 65.5
3 2.40 5.71 4.02 1014.62 0.53 -5.17 -2.90 5.20 82.8
4 0.91 7.02 5.87 1006.64 5.32 -0.94 1.23 1.11 74.4
5 0.19 7.82 4.49 999.35 1.60 -4.29 -1.89 0.95 79.3
SunshineMarII CloudMarII WindMarII SeeLevelPressureMarII TmaxMarIII TminMarIII TmeanMarIII RainfallMarIII
1 2.12 5.51 3.93 1021.57 3.88 -1.95 0.55 1.42
2 2.25 6.29 6.11 1008.31 3.95 -2.46 -0.15 1.30
3 1.00 6.61 5.77 1006.63 -0.68 -6.60 -4.07 0.70
4 2.16 6.61 6.45 1003.23 5.49 -0.68 1.65 1.58
5 4.07 5.21 3.14 1017.24 -0.66 -7.21 -4.00 1.37
HumidityMarIII SunshineMarIII CloudMarIII WindMarIII SeeLevelPressureMarIII
1 80.45 2.80 6.13 4.03 995.31
2 72.09 3.98 5.99 5.14 1000.32
3 78.73 2.34 6.46 3.81 1005.67
4 74.64 2.85 6.54 6.34 1013.45
5 79.45 4.71 5.65 4.95 1010.47
[ reached 'max' / getOption("max.print") -- omitted 5 rows ]
And I would like to do the normality test for all column in once. I tried
apply(x, shapiro.test)
Betula_shapiro <- apply(Betula, shapiro.test)
Error in FUN(X[[i]], ...) : is.numeric(x) is not TRUE
and it didnĀ“t work. I also tried this:
Betula <- apply(Betula[which(sapply(Betula, is.numeric))], 2, shapiro.test)
Error in FUN(newX[, i], ...) : all 'x' values are identical
f<-function(x){if(diff(range(x))==0)list()else shapiro.test(x)}
Betula <- apply(Betula[which(sapply(Betula, is.numeric))], 2, f)
Error in if (diff(range(x)) == 0) list() else shapiro.test(x) :
missing value where TRUE/FALSE needed
So I did:
Betula_numerics_only <- Betula[which(sapply(Betula, is.numeric))]
selecting columns with at least 3 not missing values and applying shapiro.test on them
Betula_numerics_only_filled_columns <- Betula_numerics_only[which(apply(Betula_numerics_only, 2, function(f) sum(!is.na(f))>=3 ))]
Betula_shapiro<-apply(Betula_numerics_only_filled_columns, 2, shapiro.test)
Error in FUN(newX[, i], ...) : all 'x' values are identical
Could you please help me with this problem?
Since i was talking about readability in my comment i felt i should provide something more readable too as an answer.
Lets make some dummy-data:
data_test <- data.frame(matrix(rnorm(100, 10, 1), ncol = 5, byrow = T), stringsAsFactors = F)
Lets apply shapiro.test to each column
apply(data_test, 2, shapiro.test)
In case there are non numeric columns:
Lets add a dummy-char column for testing-purposes
data_test$non_numeric <- sample(c("hello", "hi", "good morning"), NROW(data_test), replace = T)
and try to apply the test again
apply(data_test, 2, shapiro.test)
which results in:
> apply(data_test, 2, shapiro.test)
Error: is.numeric(x) is not TRUE
To solve this we select only numeric colums by using sapply:
data_test[which(sapply(data_test, is.numeric))]
and combine it with the apply:
apply(data_test[which(sapply(data_test, is.numeric))], 2, shapiro.test)
Removing colums, that are all NA:
data_test_numerics_only <- data_test[which(sapply(data_test, is.numeric))]
Selecting colums with at least 3 not missing values and applying shapiro.test on them:
data_test_numerics_only_filled_colums = data_test_numerics_only[which(apply(data_test_numerics_only, 2, function(f) sum(!is.na(f)) >= 3))]
apply(data_test_numerics_only_filled_colums, 2, shapiro.test)
We will get this running, lets try once more :)
remove non numeric columns
Betula_numerics <- Betula[which(sapply(Betula, is.numeric))]
Remove columns with less than 3 values
Betula_numerics_filled <- Betula_numerics[which(apply(Betula_numerics, 2, function(f) sum(!is.na(f)) >= 3))]
Remove columns with zero variance
Betula_numerics_filled_not_constant <- Betula_numerics_filled [apply(Betula_numerics_filled , 2, function(f) var(f, na.rm = T) != 0)]
Shapiro.test and hope for the best :)
apply(Betula_numerics_filled_not_constant, 2, shapiro.test)

How to skip NA when applying geometric-mean function

I have the following data frame:
1 8.03 0.37 0.55 1.03 1.58 2.03 15.08 2.69 1.63 3.84 1.26 1.9692516
2 4.76 0.70 NA 0.12 1.62 3.30 3.24 2.92 0.35 0.49 0.42 NA
3 6.18 3.47 3.00 0.02 0.19 16.70 2.32 69.78 3.72 5.51 1.62 2.4812459
4 1.06 45.22 0.81 1.07 8.30 196.23 0.62 118.51 13.79 22.80 9.77 8.4296220
5 0.15 0.10 0.07 1.52 1.02 0.50 0.91 1.75 0.02 0.20 0.48 0.3094169
7 0.27 0.68 0.09 0.15 0.26 1.54 0.01 0.21 0.04 0.28 0.31 0.1819510
I want to calculate the geometric mean for each row. My codes is
dat <- read.csv("MXreport.csv")
if(any(dat$X18S > 25)){ print("Fail!") } else { print("Pass!")}
datpass <- subset(dat, dat$X18S <= 25)
gene <- datpass[, 42:52]
gm_mean <- function(x){ prod(x)^(1/length(x))}
gene$score <- apply(gene, 1, gm_mean)
head(gene)
I got this output after typing this code:
1 8.03 0.37 0.55 1.03 1.58 2.03 15.08 2.69 1.63 3.84 1.26 1.9692516
2 4.76 0.70 NA 0.12 1.62 3.30 3.24 2.92 0.35 0.49 0.42 NA
3 6.18 3.47 3.00 0.02 0.19 16.70 2.32 69.78 3.72 5.51 1.62 2.4812459
4 1.06 45.22 0.81 1.07 8.30 196.23 0.62 118.51 13.79 22.80 9.77 8.4296220
5 0.15 0.10 0.07 1.52 1.02 0.50 0.91 1.75 0.02 0.20 0.48 0.3094169
7 0.27 0.68 0.09 0.15 0.26 1.54 0.01 0.21 0.04 0.28 0.31 0.1819510
The problem is I got NA after applying the geometric mean function to the row that has NA. How do I skip NA and calculate the geometric mean for the row that has NA
When I used gene<- na.exclude(datpass[, 42:52]). It skipped the row that has NA and not calculate the geometric mean at all. That is now what I want. I want to also calculate the geometric mean for the row that has NA also. How do I do this?

Resources