R function to test significance of within group means - r

my question is about this fictitious data. I would like to test if there is significant difference among the three means (V1, V2 and V3) considered together. In R And test if the average of v1 is significantly different from V2.
id <- c(1,2,3,4,5,6,7,8,9,10)
V1<- c(50, 42, 58, 56, 25, 85, 12, 23, 89, 52)
V2<- c(65, 63, 52, 45, 89, 58, 74, 51, 26, 25)
V3<- c(68, 95, 62, 14, 12, 25, 48, 56, 32, 57)
sex <- c("F","F","F","F","F","M","F","F","M","M")
data<- data.frame(id,V1,V2,V3,sex)
I tried using ANOVA but was not successful

If you want to use anova(), you need to wrap your formula using lm().
id <- c(1,2,3,4,5,6,7,8,9,10)
V1<- c(50, 42, 58, 56, 25, 85, 12, 23, 89, 52)
V2<- c(65, 63, 52, 45, 89, 58, 74, 51, 26, 25)
V3<- c(68, 95, 62, 14, 12, 25, 48, 56, 32, 57)
sex <- c("F","F","F","F","F","M","F","F","M","M")
data<- data.frame(id,V1,V2,V3,sex)
anova(lm(id ~ V1 + V2 + V3, data = data))
Analysis of Variance Table
Response: id
Df Sum Sq Mean Sq F value Pr(>F)
V1 1 0.438 0.4382 0.0751 0.79330
V2 1 29.750 29.7497 5.0959 0.06478 .
V3 1 17.285 17.2846 2.9607 0.13610
Residuals 6 35.028 5.8379
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Related

Splitting a vector or list based on a value

I am trying to split the following list:
x <- c(1, 19, 25, 62, 38, 41, 52, 53, 60, 61, 1, 74, 72, 66, 1, 68, 5, 1)
What I would like to do is split the above using the number 1 as the break points.
x1 <- c(1, 19, 25, 62, 38, 41, 52, 53, 60, 61)
x2 <- c(1, 74, 72, 66)
x3 <- c(1, 68, 5)
There must be a simple method to use but I am drawing a blank and my search-fu is weak and coming up empty.
Thanks for your help.
Use split with cumsum:
x <- c(1, 19, 25, 62, 38, 41, 52, 53, 60, 61, 1, 74, 72, 66, 1, 68, 5, 1)
split(x, f=cumsum(x==1))
#> $`1`
#> [1] 1 19 25 62 38 41 52 53 60 61
#>
#> $`2`
#> [1] 1 74 72 66
#>
#> $`3`
#> [1] 1 68 5
#>
#> $`4`
#> [1] 1

Web scraping and reshaping data

I have a problem when tidying a table from website scraping.
I want to get the table (with header V1 to V5) from the link below, but I failed to convert it into the same format in R studio.
This is what I'm doing
url <- "https://www.r-bloggers.com/2018/08/using-control-charts-in-r/"
library(rvest)
library(tidyverse)
h <- read_html(url)
tab <- h %>% html_nodes("table")
tab <- tab[[2]] %>% html_table()
tab <- separate_rows(tab, 1, sep = " ")
tab <- tab[8:132,]
tab <- as.data.frame(tab)
tab1 <- data.frame(c("V1", "V2", "V3", "V4", "V5"))
tab1 <- tab1 %>% setNames("Cat")
tab2 <- cbind(tab1,tab)
tab3 <- tab2 %>% spread(key = Cat, X1)
Here is the result
Error: Each row of output must be identified by a unique combination of keys.
Keys are shared for 125 rows:
* 1, 6, 11, 16, 21, 26, 31, 36, 41, 46, 51, 56, 61, 66, 71, 76, 81, 86, 91, 96, 101, 106, 111, 116, 121
* 2, 7, 12, 17, 22, 27, 32, 37, 42, 47, 52, 57, 62, 67, 72, 77, 82, 87, 92, 97, 102, 107, 112, 117, 122
* 3, 8, 13, 18, 23, 28, 33, 38, 43, 48, 53, 58, 63, 68, 73, 78, 83, 88, 93, 98, 103, 108, 113, 118, 123
* 4, 9, 14, 19, 24, 29, 34, 39, 44, 49, 54, 59, 64, 69, 74, 79, 84, 89, 94, 99, 104, 109, 114, 119, 124
* 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125
So what should I do to get the same table as from the website?
And if you can think of a better way to get the table from this website, please tell me.
P/s: I'm learning R programming on my own, so please teach me!
Cheers.
Here's a way :
library(rvest)
url <- "https://www.r-bloggers.com/2018/08/using-control-charts-in-r/"
url %>%
read_html %>%
html_nodes('table') %>%
.[[2]] %>%
html_table() %>%
dplyr::pull(X1) %>%
stringr::str_extract_all('\\d+\\.\\d+') %>%
.[[1]] %>%
matrix(ncol = 5, byrow = TRUE) %>%
as.data.frame() %>% type.convert() -> tab
tab
# V1 V2 V3 V4 V5
#1 1.45 1.56 1.40 1.45 1.33
#2 1.75 1.53 1.55 1.42 1.42
#3 1.60 1.41 1.35 1.52 1.36
#4 1.53 1.58 1.54 1.71 1.55
#5 1.48 1.34 1.64 1.59 1.46
#6 1.69 1.55 1.49 1.61 1.47
#...
#...

How to colour background sections of graphs in R to indicate time periods of interest

I would like to add colour 'chunks' to the background of my graphs in R to highlight nesting periods. My x axis is in days, so I'd like the colours to be set 'from-to' certain days.
I've created a crude manual version of how I'd like it to look on my graph (see image) but am unsure how to implement this within my code. I'd ideally like to have different colours for different chunks e.g. orange for one period and blue for another period of interest which can also be displayed in a legend on the right. My data is distance per day, which was then converted to standard deviations for graphing.
Code below for distance to stdev, then graphing using the standard plot() function:
ig16 <- read.csv(file='ig16distance.csv')
ig16$stdDist <- (ig16$Distance - mean(ig16$Distance))/sd(ig16$Distance)
plot(ig16$stdDist, type = "o",col = "red", xlab = "Days", ylab = "Stdev",
main = "IG0016")
Sample data below:
Day Distance
1 1 20.396078
2 2 21.540659
3 3 4.000000
4 4 16.492423
5 5 16.000000
6 6 34.000000
7 7 34.234486
8 8 0.000000
9 9 4.000000
10 10 0.000000
11 11 0.000000
12 12 0.000000
13 13 0.000000
14 14 22.203603
15 15 0.000000
16 16 0.000000
17 17 2.280351
18 18 2.280351
19 19 2.280351
20 20 2.280351
Any advice on code to achieve this would be much appreciated!
Since you do not provide data, I will illustrate with some simple example data. You can just plot some transparent rectangles over the region that you want to highlight.
kings = c(60, 43, 67, 50, 56, 42, 50, 65, 68, 43, 65, 34, 47, 34, 49,
41, 13, 35, 53, 56, 16, 43, 69, 59, 48, 59, 86, 55, 68, 51, 33,
49, 67, 77, 81, 67, 71, 81, 68, 70, 77, 56)
plot(kings, type = "o",col = "red", xlab = "", ylab = "Years",
main = "Kings")
polygon(x=c(5,5,15,15), y=c(0,100,100,0), col="#0000FF22", border=F)
polygon(x=c(25,25,35,35), y=c(0,100,100,0), col="#FF990022", border=F)
you can also do this, a kind of an event plot:
kings = c(60, 43, 67, 50, 56, 42, 50, 65, 68, 43, 65, 34, 47, 34, 49,
41, 13, 35, 53, 56, 16, 43, 69, 59, 48, 59, 86, 55, 68, 51, 33,
49, 67, 77, 81, 67, 71, 81, 68, 70, 77, 56)
plot(kings, ylim=c(-25,100), type = "o",col = "red", xlab = "", ylab = "Years", main = "Kings")
rect(5, -10,15,-20,col=rgb(0,0,1,.133),border=NA)
rect(25,-10,35,-20,col=rgb(1,.6,0,.133),border=NA)

creating named vector from a csv file did not work

Creating named vector where names are associated to GO id from a csv file did not work.
> head(read.delim("~/GOmapping.tsv", sep = '\t'))
V1 V14
1 sp0000005 GO:0003723
2 sp0000006 GO:0016021
3 sp0000007 GO:0003700,GO:0006355,GO:0043565
4 sp0000016 GO:0046983
5 sp0000017 GO:0004672,GO:0005524,GO:0006468
6 sp0000022 GO:0003677,GO:0046983
> head(read.delim("~/GOmapping.tsv", sep = '\t'))[1]
V1
1 sp0000005
2 sp0000006
3 sp0000007
4 sp0000016
5 sp0000017
6 sp0000022
> head(read.delim("~/GOmapping.tsv", sep = '\t'))[2]
V14
1 GO:0003723
2 GO:0016021
3 GO:0003700,GO:0006355,GO:0043565
4 GO:0046983
5 GO:0004672,GO:0005524,GO:0006468
6 GO:0003677,GO:0046983
> geneID2GO <- read.delim("~/GOmapping.tsv", sep = '\t'))[2]
> geneID2GO <- read.delim("~/GOmapping.tsv", sep = '\t')[2]
> names(geneID2GO) <- read.delim("~/GOmapping.tsv", sep = '\t')[1]
> head(geneID2GO)
c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 57, 58, 59, 60, 6 ...
1 GO:0003723
2 GO:0016021
3 GO:0003700,GO:0006355,GO:0043565
4 GO:0046983
5 GO:0004672,GO:0005524,GO:0006468
6 GO:0003677,GO:0046983
What did I miss?
Thank you in advance.
If you want a vector as result, maybe you could try to coerce your values and names (column 1) to character.
data <- read.delim("~/GOmapping.tsv", sep = '\t')
geneID2GO <- as.character(data[,2])
names(geneID2GO) <- as.character(data[,1])
head(geneID2GO)
sp0000005 sp0000006 sp0000007
"GO:0003723" "GO:0016021" "GO:0003700,GO:0006355,GO:0043565"
sp0000016
"GO:0046983"
Alternatively, you can display the result as follows:
cbind(geneID2GO)
geneID2GO
sp0000005 "GO:0003723"
sp0000006 "GO:0016021"
sp0000007 "GO:0003700,GO:0006355,GO:0043565"
sp0000016 "GO:0046983"

Trying to Repeat, but data is not a multiple

So I am trying to label a data matrix with conditions; however, when I did my experiment, I had 3 tubes where I repeated the first two 7 times and the third tube 6 times. How can I code the matrix to be re-written and ignore that there is "missing" data:
dm$Strain<-dm$variable
dm$Strain<-rep(c("446-1", "446-2", "446-3"), each.out=193)
dm$Strain<-factor(dm$Strain)
levels(dm$Strain)
Error in $<-.data.frame(*tmp*, "Strain", value = c("446-1", "446-2", :
replacement has 3 rows, data has 19300
Data Setup in Wells:
1) Control = 1, 16, 31, 46, 61, 76, 91
2) LI 446-1 tube = 2, 17, 32, 47, 62, 77, 92
3) LI 446-1 10^7 = 3, 18, 33, 48, 63, 78, 93
4) LI 446-1 10^6 = 4, 19, 34, 49, 64, 79, 94
5) LI 446-1 10^5 = 5, 20, 35, 50, 65, 80, 95
6) Control = 6, 21, 36, 51, 66, 81, 96
7) LI-446-2 tube = 7, 22, 37, 52, 67, 82, 97
8) LI-446-2 10^7 = 8, 23, 38, 53, 68, 83, 98
9) LI-446-2 10^6 = 9, 24, 39, 54, 69, 84, 99
10) LI-446-2 10^5 = 10, 25, 40 ,55, 70, 85, 100
11) Control = 11, 26, 41, 56, 71, 86
12) LI-446-3 tube = 12, 27, 42, 57, 72, 87
13) LI-446-3 10^7 = 13, 28, 43, 58, 73, 88
14) LI-446-3 10^6 = 14, 29, 44, 59, 74, 89
15) LI-446-3 10^5 = 15, 30, 45, 60, 75, 90
I have 19300 columns of data, where 1:193 correspond to Well 1 at 15min intervals, 194:386 are Well 2 at 15 min intervals, etc up to Well 100. However, 446-3 (AKA 11-15 above) are repeated 6 times and 446-1 and 446-2 are repeated 7 times.
str(dm)
'data.frame': 19300 obs. of 4 variables:
$ Time..mins.: int 15 30 45 60 75 90 105 120 135 150 ...
$ variable : Factor w/ 100 levels "Well_1","Well_2",..: 1 1 1 1 1 1 1 1 1 1 ...
$ value : num 0.439 0.204 0.191 0.187 0.185 0.19 0.187 0.19 0.188 0.191 ...
$ Media : Factor w/ 2 levels "BHI","BHI_salt": 1 1 1 1 1 1 1 1 1 1 ...

Resources