Replace ID variable values with counts of value occurrences

Replace ID variable values with counts of value occurrences - r

I have a data frame like:
DATE x y ID
06/10/2003 7.21 0.651 1
12/10/2003 5.99 0.428 1
18/10/2003 4.68 1.04 1
24/10/2003 3.47 0.363 1
30/10/2003 2.42 0.507 1
02/05/2010 2.72 0.47 2
05/05/2010 2.6 0. 646 2
08/05/2010 2.67 0.205 2
11/05/2010 3.57 0.524 2
12/05/2010 0.428 4.68 3
13/05/2010 1.04 3.47 3
14/05/2010 0.363 2.42 3
18/10/2003 0.507 2.52 3
24/10/2003 0.418 4.68 3
30/10/2003 0.47 3.47 3
29/04/2010 0.646 2.42 4
18/10/2003 3.47 2.52 4
i have the count of number of rows per group for column ID as an integer vector like 5 4 6 2
is there a way to replace the group values in column id with these integer vector 5 4 6 2
the output i am expecting is
DATE x y ID
06/10/2003 7.21 0.651 5
12/10/2003 5.99 0.428 5
18/10/2003 4.68 1.04 5
24/10/2003 3.47 0.363 5
30/10/2003 2.42 0.507 5
02/05/2010 2.72 0.47 4
05/05/2010 2.6 646 4
08/05/2010 2.67 0.205 4
11/05/2010 3.57 0.524 4
12/05/2010 0.428 4.68 6
13/05/2010 1.04 3.47 6
14/05/2010 0.363 2.42 6
18/10/2003 0.507 2.52 6
24/10/2003 0.418 4.68 6
30/10/2003 0.47 3.47 6
29/04/2010 0.646 2.42 2
18/10/2003 3.47 2.52 2
i am quite new to R and tried to find if there is any idea replace function. But having a hard time. Any help is much appreciated.
above data is just an example for understanding my requirement.

A compact solution with the data.table-package:
library(data.table)
setDT(mydf)[, ID := .N, by = ID][]
which gives:
> mydf
DATE x y ID
1: 06/10/2003 7.210 0.651 5
2: 12/10/2003 5.990 0.428 5
3: 18/10/2003 4.680 1.040 5
4: 24/10/2003 3.470 0.363 5
5: 30/10/2003 2.420 0.507 5
6: 02/05/2010 2.720 0.470 4
7: 05/05/2010 2.600 0.646 4
8: 08/05/2010 2.670 0.205 4
9: 11/05/2010 3.570 0.524 4
10: 12/05/2010 0.428 4.680 6
11: 13/05/2010 1.040 3.470 6
12: 14/05/2010 0.363 2.420 6
13: 18/10/2003 0.507 2.520 6
14: 24/10/2003 0.418 4.680 6
15: 30/10/2003 0.470 3.470 6
16: 29/04/2010 0.646 2.420 2
17: 18/10/2003 3.470 2.520 2
What this does:
setDT(mydf) converts the dataframe to a data.table
by = ID groups by ID
ID := .N replaces the original value of ID with the count by group

You can use the ave() function to calculate how many rows each ID takes up. In the example below I created a new variable ID2, but you could replace the original ID if you want.
(I included code to create your data in R below, but when you ask questions in the future please include your data in the question by using the dput() function on the data object. That's what I did to make the code below.)
mydata <- structure(list(DATE = c("06/10/2003", "12/10/2003", "18/10/2003",
"24/10/2003", "30/10/2003", "02/05/2010", "05/05/2010", "08/05/2010",
"11/05/2010", "12/05/2010", "13/05/2010", "14/05/2010", "18/10/2003",
"24/10/2003", "30/10/2003", "29/04/2010", "18/10/2003"),
x = c(7.21, 5.99, 4.68, 3.47, 2.42, 2.72, 2.6, 2.67, 3.57, 0.428, 1.04, 0.363,
0.507, 0.418, 0.47, 0.646, 3.47),
y = c(0.651, 0.428, 1.04, 0.363, 0.507, 0.47, 646, 0.205, 0.524, 4.68, 3.47,
2.42, 2.52, 4.68, 3.47, 2.42, 2.52),
ID = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4)),
.Names = c("DATE", "x", "y", "ID"),
class = c("data.frame"),
row.names = c(NA, -17L))
# ave() takes an input object, an object of group IDs of the same length
# as the input object, and a function to apply to the input object split across groups
mydata$ID2 <- ave(mydata$ID, mydata$ID, FUN = length)
mydata
DATE x y ID ID2
1 06/10/2003 7.210 0.651 1 5
2 12/10/2003 5.990 0.428 1 5
3 18/10/2003 4.680 1.040 1 5
4 24/10/2003 3.470 0.363 1 5
5 30/10/2003 2.420 0.507 1 5
6 02/05/2010 2.720 0.470 2 4
7 05/05/2010 2.600 646.000 2 4
8 08/05/2010 2.670 0.205 2 4
9 11/05/2010 3.570 0.524 2 4
10 12/05/2010 0.428 4.680 3 6
11 13/05/2010 1.040 3.470 3 6
12 14/05/2010 0.363 2.420 3 6
13 18/10/2003 0.507 2.520 3 6
14 24/10/2003 0.418 4.680 3 6
15 30/10/2003 0.470 3.470 3 6
16 29/04/2010 0.646 2.420 4 2
17 18/10/2003 3.470 2.520 4 2
# if you want to replace the original ID variable, you can assign to it
# instead of adding a new variable
mydata$ID <- ave(mydata$ID, mydata$ID, FUN = length)

A solution with dplyr:
library(dplyr)
df %>%
group_by(ID) %>%
mutate(ID2 = n()) %>%
ungroup() %>%
mutate(ID = ID2) %>%
select(-ID2)
Edit:
I've just found a solution that's a bit cleaner than the above:
df %>%
group_by(ID2 = ID) %>%
mutate(ID = n()) %>%
select(-ID2)
Result:
# A tibble: 17 x 4
DATE x y ID
<fctr> <dbl> <dbl> <int>
1 06/10/2003 7.210 0.651 5
2 12/10/2003 5.990 0.428 5
3 18/10/2003 4.680 1.040 5
4 24/10/2003 3.470 0.363 5
5 30/10/2003 2.420 0.507 5
6 02/05/2010 2.720 0.470 4
7 05/05/2010 2.600 0.646 4
8 08/05/2010 2.670 0.205 4
9 11/05/2010 3.570 0.524 4
10 12/05/2010 0.428 4.680 6
11 13/05/2010 1.040 3.470 6
12 14/05/2010 0.363 2.420 6
13 18/10/2003 0.507 2.520 6
14 24/10/2003 0.418 4.680 6
15 30/10/2003 0.470 3.470 6
16 29/04/2010 0.646 2.420 2
17 18/10/2003 3.470 2.520 2
Notes:
The reason behind ungroup() %>% mutate(ID = ID2) %>% select(-ID2) is that dplyr doesn't allow mutateing on grouping variables. So this would not work:
df %>%
group_by(ID) %>%
mutate(ID = n())
Error in mutate_impl(.data, dots) : Column ID can't be modified
because it's a grouping variable
If you don't care about replacing the original ID column, you can just do:
df %>%
group_by(ID) %>%
mutate(ID2 = n())
Alternative Result:
# A tibble: 17 x 5
# Groups: ID [4]
DATE x y ID ID2
<fctr> <dbl> <dbl> <int> <int>
1 06/10/2003 7.210 0.651 1 5
2 12/10/2003 5.990 0.428 1 5
3 18/10/2003 4.680 1.040 1 5
4 24/10/2003 3.470 0.363 1 5
5 30/10/2003 2.420 0.507 1 5
6 02/05/2010 2.720 0.470 2 4
7 05/05/2010 2.600 0.646 2 4
8 08/05/2010 2.670 0.205 2 4
9 11/05/2010 3.570 0.524 2 4
10 12/05/2010 0.428 4.680 3 6
11 13/05/2010 1.040 3.470 3 6
12 14/05/2010 0.363 2.420 3 6
13 18/10/2003 0.507 2.520 3 6
14 24/10/2003 0.418 4.680 3 6
15 30/10/2003 0.470 3.470 3 6
16 29/04/2010 0.646 2.420 4 2
17 18/10/2003 3.470 2.520 4 2

Related

Method in R to find difference between rows with varying row spacing

I want to add an extra column in a dataframe which displays the difference between certain rows, where the distance between the rows also depends on values in the table.
I found out that:
mutate(Col_new = Col_1 - lead(Col_1, n = x))
can find the difference for a fixed n, but only a integer can be used as input. How would you find the difference between rows for a varying distance between the rows?
I am trying to get the output in Col_new, which is the difference between the i and i+n row where n should take the value in column Count. (The data is rounded so there might be 0.01 discrepancies in Col_new).
col_1 count Col_new
1 0.90 1 -0.68
2 1.58 1 -0.31
3 1.89 1 0.05
4 1.84 1 0.27
5 1.57 1 0.27
6 1.30 2 -0.26
7 1.25 2 -0.99
8 1.56 2 -1.58
9 2.24 2 -1.80
10 3.14 2 -1.58
11 4.04 3 -0.95
12 4.72 3 0.01
13 5.04 3 0.60
14 4.99 3 0.60
15 4.71 3 0.01
16 4.44 4 -1.84
17 4.39 4 NA
18 4.70 4 NA
19 5.38 4 NA
20 6.28 4 NA

Data:
df <- data.frame(Col_1 = c(0.90, 1.58, 1.89, 1.84, 1.57, 1.30, 1.35,
1.56, 2.24, 3.14, 4.04, 4.72, 5.04, 4.99,
4.71, 4.44, 4.39, 4.70, 5.38, 6.28),
Count = sort(rep(1:4, 5)))
Some code that generates the intended output, but can undoubtably be made more efficient.
library(dplyr)
df %>%
mutate(col_2 = sapply(1:4, function(s){lead(Col_1, n = s)})) %>%
rowwise() %>%
mutate(Col_new = Col_1 - col_2[Count]) %>%
select(-col_2)
Output:
# A tibble: 20 × 3
# Rowwise:
Col_1 Count Col_new
<dbl> <int> <dbl>
1 0.9 1 -0.68
2 1.58 1 -0.310
3 1.89 1 0.0500
4 1.84 1 0.27
5 1.57 1 0.27
6 1.3 2 -0.26
7 1.35 2 -0.89
8 1.56 2 -1.58
9 2.24 2 -1.8
10 3.14 2 -1.58
11 4.04 3 -0.95
12 4.72 3 0.0100
13 5.04 3 0.600
14 4.99 3 0.600
15 4.71 3 0.0100
16 4.44 4 -1.84
17 4.39 4 NA
18 4.7 4 NA
19 5.38 4 NA
20 6.28 4 NA

df %>% mutate(Col_new = case_when(
df$count == 1 ~ df$col_1 - lead(df$col_1 , n = 1),
df$count == 2 ~ df$col_1 - lead(df$col_1 , n = 2),
df$count == 3 ~ df$col_1 - lead(df$col_1 , n = 3),
df$count == 4 ~ df$col_1 - lead(df$col_1 , n = 4),
df$count == 5 ~ df$col_1 - lead(df$col_1 , n = 5)
))
col_1 count Col_new
1 0.90 1 -0.68
2 1.58 1 -0.31
3 1.89 1 0.05
4 1.84 1 0.27
5 1.57 1 0.27
6 1.30 2 -0.26
7 1.25 2 -0.99
8 1.56 2 -1.58
9 2.24 2 -1.80
10 3.14 2 -1.58
11 4.04 3 -0.95
12 4.72 3 0.01
13 5.04 3 0.60
14 4.99 3 0.60
15 4.71 3 0.01
16 4.44 4 -1.84
17 4.39 4 NA
18 4.70 4 NA
19 5.38 4 NA
20 6.28 4 NA
This would give you your desired results but is not a very good solution for more cases. Imagine your task with 10 or more different counts another solution is required.

pivot_longer on a mix of matrix columns and regular vector columns

I have a tibble where some columns are matrices. Here's a toy example:
library(dplyr)
library(tidyr)
dat <- structure(list(id = 0:5, matrix_column = structure(c(-1.34333431222985,
-1.54123232044003, -1.7260282725816, -1.8924463753132, -2.0376516335872,
-2.16069643164938, -0.250406602741403, -0.287716094522968, -0.32269823315914,
-0.354360193430544, -0.382155662949252, -0.405883260458378, 1.53709630050992,
1.76715755374983, 1.98313378488307, 2.17881959842109, 2.35072520728221,
2.4974704619887), .Dim = c(6L, 3L)), vector_column = c(10.453112322311,
10.3019556236512, 10.1273409693709, 9.91474471968391, 9.65093549479026,
9.32601906868098)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
The tibble looks like this.
> dat
# A tibble: 6 x 3
id matrix_column[,1] [,2] [,3] vector_column
<int> <dbl> <dbl> <dbl> <dbl>
1 0 -1.34 -0.250 1.54 10.5
2 1 -1.54 -0.288 1.77 10.3
3 2 -1.73 -0.323 1.98 10.1
4 3 -1.89 -0.354 2.18 9.91
5 4 -2.04 -0.382 2.35 9.65
6 5 -2.16 -0.406 2.50 9.33
If I apply pivot_longer from tidyr to the non-id columns, the values in vector_column get replicated to fill the two additional columns required to accommodate matrix_column.
dat %>%
pivot_longer(cols = -id, values_to = "new_column")
# A tibble: 12 x 3
id name new_column[,1] [,2] [,3]
<int> <chr> <dbl> <dbl> <dbl>
1 0 matrix_column -1.34 -0.250 1.54
2 0 vector_column 10.5 10.5 10.5
3 1 matrix_column -1.54 -0.288 1.77
4 1 vector_column 10.3 10.3 10.3
5 2 matrix_column -1.73 -0.323 1.98
6 2 vector_column 10.1 10.1 10.1
7 3 matrix_column -1.89 -0.354 2.18
8 3 vector_column 9.91 9.91 9.91
9 4 matrix_column -2.04 -0.382 2.35
10 4 vector_column 9.65 9.65 9.65
11 5 matrix_column -2.16 -0.406 2.50
12 5 vector_column 9.33 9.33 9.33
Is there a way to have the [,2] and the [,3] columns of new_column to be NA (instead of the same value of [,1]) when name equals vector_column?
Something like
# A tibble: 12 x 3
id name new_column[,1] [,2] [,3]
<int> <chr> <dbl> <dbl> <dbl>
1 0 matrix_column -1.34 -0.250 1.54
2 0 vector_column 10.5 NA NA
3 1 matrix_column -1.54 -0.288 1.77
4 1 vector_column 10.3 NA NA
My real life data have dozens of matrix columns and vector columns.

If you continue with the format of data that you currently have (having dataframe and matrix together) you'll keep on running into trouble to work with it. I would suggest to convert the matrix into dataframe and add them as their separate columns.
library(dplyr)
library(tidyr)
dat$matrix_column %>%
data.frame() %>%
bind_cols(dat %>% select(-matrix_column)) %>%
pivot_longer(cols = -id, values_to = "new_column")
# id name new_column
# <int> <chr> <dbl>
# 1 0 X1 -1.34
# 2 0 X2 -0.250
# 3 0 X3 1.54
# 4 0 vector_column 10.5
# 5 1 X1 -1.54
# 6 1 X2 -0.288
# 7 1 X3 1.77
# 8 1 vector_column 10.3
# 9 2 X1 -1.73
#10 2 X2 -0.323
# … with 14 more rows

Extract every 11 rows from data frame [duplicate]

This question already has answers here:
Split a vector into chunks
(22 answers)
Closed 1 year ago.
So I have a data frame and I want to get every 11 rows. Not just the every 11th rows but a chunk of 11 rows every time for eg:
Subject Wt Dose Time conc
1 1 79.6 4.02 0.00 0.74
2 1 79.6 4.02 0.25 2.84
3 1 79.6 4.02 0.57 6.57
4 1 79.6 4.02 1.12 10.50
5 1 79.6 4.02 2.02 9.66
6 1 79.6 4.02 3.82 8.58
7 1 79.6 4.02 5.10 8.36
8 1 79.6 4.02 7.03 7.47
9 1 79.6 4.02 9.05 6.89
10 1 79.6 4.02 12.12 5.94
11 1 79.6 4.02 24.37 3.28
and then later 11 and then again the other following 11 rows.
I tried this
for (i in 1:nrow(Theoph)) {
everyEleven = Theoph[11,i]
everyEl
}
But it just gives me the first 11 rows and not the second chunk of 11 rows and so on

Maybe you can try split like below
everyEleven <- split(Theoph,ceiling(seq(nrow(Theoph))/11))

Try this as adapted from [split into multiple subset of dataframes with dplyr:group_by?
library(tibble)
library(dplyr)
library(tidyr)
Make an indicative dataframe as your data in the question is only 11 rows.
tib <- tibble(sub = rep(1:33),
var = runif(33))
tib1 <-
tib %>%
# create a grouping variable every 11 rows , unless there is a variable in your data which does the same.
mutate(grp = rep(1:3, each = 11)) %>%
group_by(grp) %>%
nest()%>%
select(data) %>%
unlist(recursive = FALSE)
Gives you:
$data1
# A tibble: 11 x 2
sub var
<int> <dbl>
1 1 0.258
2 1 0.337
3 1 0.463
4 1 0.856
5 1 0.466
6 1 0.701
7 1 0.548
8 1 0.999
9 1 0.454
10 1 0.292
11 1 0.173
$data2
# A tibble: 11 x 2
sub var
<int> <dbl>
1 2 0.148
2 2 0.487
3 2 0.246
4 2 0.279
5 2 0.130
6 2 0.730
7 2 0.312
8 2 0.935
9 2 0.968
10 2 0.745
11 2 0.485
$data3
# A tibble: 11 x 2
sub var
<int> <dbl>
1 3 0.141
2 3 0.200
3 3 0.00000392
4 3 0.993
5 3 0.644
6 3 0.334
7 3 0.567
8 3 0.817
9 3 0.0342
10 3 0.718
11 3 0.527

Since in the sample data you provided there is a column Subject, which I assume represents the subject IDs and there are only 11 rows with the same value for Subject, you can use
split(Theoph, Theoph$Subject)

I will assume your data frame is 11*N rows long then
everyEleven = vector(mode = "list", length = N)
for(i in 1:N){
start = (i - 1) * 11 + 1
end = i * 11
everyEleven[[i]] = Theoph[start:end, ]
}

We can use gl to create the grouping index
split(Theoph, as.integer(gl(nrow(Theoph), 11, nrow(Theoph))))

Calculating group differences in a "badly" partitioned data set

I tried to solve the problem with questions here on SO but I could not find a satisfying answer. My data frame has the structure
X = data_frame(
treat = c(rep(1,4), rep(2,4), rep(3,4), rep(4,4)),
id = seq(1:16),
x = rnorm(16),
y = rnorm(16),
z = rnorm(16)
)
Looks like
# A tibble: 16 x 5
treat id x y z
<int> <int> <dbl> <dbl> <dbl>
1 1 1 -0.0724 1.26 0.317
2 1 2 -0.486 -0.628 0.392
3 1 3 -0.406 -0.706 1.18
4 1 4 -1.35 -1.27 2.36
5 2 5 -0.0751 -0.0394 0.568
6 2 6 0.243 0.873 0.132
7 2 7 0.138 0.611 -0.700
8 2 8 -0.732 1.02 -0.811
9 3 9 -0.0278 1.78 0.568
10 3 10 0.526 1.18 1.03
11 3 11 1.43 0.0937 -0.0825
12 3 12 -0.299 -0.117 0.367
13 4 13 1.05 2.04 0.678
14 4 14 -1.93 0.201 0.250
15 4 15 0.624 1.09 0.852
16 4 16 0.502 0.119 -0.843
Every fourth value in treat is a control and now I want to calculate the difference in x, y and z between the treatments and the controls. For example I would like to calculate for the first treatment
-0.724 - (-1.35) #x
1.26 - (-1.27) #y
0.317 - 2.36 #z
for the first treatment. For the second treatment accordingly,
-0.486 - (-1.35) #x
-0.628 - (-1.27) #y
0.392 - 2.36 #z
... and so on.
I would like to use a dplyr / tidyverse solution but I have no idea how to do that in a "smooth" way. I found a solution already by using joins but this seems rather tedious compared to the "smooth" solution dplyr usually offers.

With dplyr, we can group_by treat and use mutate_at to select specific columns (x:z) and subtract each value with 4th value using the nth function.
library(dplyr)
X %>%
group_by(treat) %>%
mutate_at(vars(x:z), funs(. - nth(., 4)))
#treat id x y z
# <dbl> <int> <dbl> <dbl> <dbl>
# 1 1 1 -0.631 0.971 0.206
# 2 1 2 -0.301 -1.49 0.189
# 3 1 3 1.49 1.17 0.133
# 4 1 4 0 0 0
# 5 2 5 1.39 -0.339 0.934
# 6 2 6 2.98 0.511 0.319
# 7 2 7 1.73 -0.297 0.0745
# 8 2 8 0 0 0
# 9 3 9 -1.05 -0.778 -2.86
#10 3 10 -0.805 -1.84 -2.38
#11 3 11 0.864 0.684 -3.43
#12 3 12 0 0 0
#13 4 13 -1.39 -0.843 1.67
#14 4 14 -1.68 1.55 -0.656
#15 4 15 -2.34 0.722 0.0638
#16 4 16 0 0 0
This can be also written as
X %>%
group_by(treat) %>%
mutate_at(vars(x:z), funs(. - .[4]))
data
set.seed(123)
X = data_frame(
treat = c(rep(1,4), rep(2,4), rep(3,4), rep(4,4)),
id = seq(1:16),
x = rnorm(16),
y = rnorm(16),
z = rnorm(16)
)

Calculate a mean per day per sample, then add a line of best fit

My Question:
How do I calculate the average (mean) per sample A, B, C per day (3 separate to 5) and then add a line of best fit through the mean from one day to the next?
I wanted to add this to a dot plot (ggplot2 geom_point) example of data is below... R script used below data.
Data below:
Day Sample Measurement
3 A 0.648
3 A 0.661
3 A 0.65
3 A 0.594
3 A 0.548
3 A 0.653
3 A 0.648
3 A 0.672
3 A 0.661
3 A 0.66
3 A 0.647
3 A 0.629
3 A 0.691
3 A 0.534
3 A 0.567
3 A 0.634
3 A 0.579
3 B 0.689
3 B 0.598
3 B 0.658
3 B 0.662
3 B 0.599
3 B 0.678
3 B 0.65
3 B 0.617
3 B 0.673
3 B 0.67
3 B 0.666
3 B 0.595
3 B 0.604
3 B 0.59
3 B 0.569
3 B 0.614
3 C 0.624
3 C 0.623
3 C 0.606
3 C 0.66
3 C 0.623
3 C 0.669
3 C 0.642
3 C 0.658
3 C 0.645
3 C 0.653
3 C 0.501
3 C 0.552
3 C 0.663
3 C 0.589
3 C 0.602
5 A 0.811
5 A 0.822
5 A 0.811
5 A 0.824
5 A 0.773
5 A 0.823
5 A 0.815
5 A 0.819
5 A 0.754
5 A 0.81
5 A 0.796
5 A 0.818
5 A 0.797
5 A 0.811
5 A 0.812
5 A 0.817
5 A 0.821
5 B 0.827
5 B 0.798
5 B 0.819
5 B 0.81
5 B 0.826
5 B 0.821
5 B 0.805
5 B 0.821
5 B 0.825
5 B 0.821
5 B 0.816
5 B 0.814
5 B 0.823
5 B 0.81
5 B 0.823
5 B 0.762
5 B 0.825
5 B 0.821
5 B 0.825
5 B 0.812
R Code for ggplot:
p2 <- ggplot(data=data1, aes(x=Day, y=Fv.Fm..XE..Mean)) +
geom_point(aes(colour= Sample),
position = position_jitterdodge(dodge.width=0.75 , jitter.width=0.250)) +
# geom_line(aes(colour=Sample),
# position = position_jitterdodge(dodge.width=0.75)) +
scale_x_discrete(labels=c(3, 5, 7, 10, 14)) +
scale_y_continuous(limits=c(0.3 , 1.0))
p2
ggsave("p2.jpg")

First calculate the mean for each Sample and Day
library(tidyverse)
library(ggpmisc)
data1 <- read.table(text = txt, header = TRUE)
mean_data1 <- data1 %>%
group_by(Day, Sample) %>%
summarise(Mean = mean(Measurement, na.rm = TRUE))
mean_data1
#> # A tibble: 5 x 3
#> # Groups: Day [?]
#> Day Sample Mean
#> <int> <fct> <dbl>
#> 1 3 A 0.628
#> 2 3 B 0.633
#> 3 3 C 0.621
#> 4 5 A 0.808
#> 5 5 B 0.815
Then plot all Measurement, facet_grid by Sample. Linear lines are added using geom_smooth. stat_poly_eq function from ggpmisc package is used for displaying equation & R2. Finally, we plot the Mean values.
p2 <- ggplot(data = data1, aes(x = Day, y = Measurement)) +
geom_point(aes(colour= Sample),
alpha = 0.7,
position = position_jitterdodge(dodge.width=0.75,
jitter.width=0.250)) +
scale_x_continuous(breaks=c(3, 5)) +
scale_y_continuous(limits=c(0.3 , 1.0))
formula <- y ~ x
p2 +
facet_grid(~ Sample) +
geom_smooth(method = "lm", formula = formula, se = FALSE) +
stat_poly_eq(aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~~")),
label.x.npc = "left", label.y.npc = "top",
formula = formula, parse = TRUE, size = 4) +
geom_point(data = mean_data1, aes(Day, Mean, color = "Mean"),
size = 3) +
theme_bw() +
theme(aspect.ratio = 1)
Data used:
txt <- "Day Sample Measurement
3 A 0.648
3 A 0.661
3 A 0.65
3 A 0.594
3 A 0.548
3 A 0.653
3 A 0.648
3 A 0.672
3 A 0.661
3 A 0.66
3 A 0.647
3 A 0.629
3 A 0.691
3 A 0.534
3 A 0.567
3 A 0.634
3 A 0.579
3 B 0.689
3 B 0.598
3 B 0.658
3 B 0.662
3 B 0.599
3 B 0.678
3 B 0.65
3 B 0.617
3 B 0.673
3 B 0.67
3 B 0.666
3 B 0.595
3 B 0.604
3 B 0.59
3 B 0.569
3 B 0.614
3 C 0.624
3 C 0.623
3 C 0.606
3 C 0.66
3 C 0.623
3 C 0.669
3 C 0.642
3 C 0.658
3 C 0.645
3 C 0.653
3 C 0.501
3 C 0.552
3 C 0.663
3 C 0.589
3 C 0.602
5 A 0.811
5 A 0.822
5 A 0.811
5 A 0.824
5 A 0.773
5 A 0.823
5 A 0.815
5 A 0.819
5 A 0.754
5 A 0.81
5 A 0.796
5 A 0.818
5 A 0.797
5 A 0.811
5 A 0.812
5 A 0.817
5 A 0.821
5 B 0.827
5 B 0.798
5 B 0.819
5 B 0.81
5 B 0.826
5 B 0.821
5 B 0.805
5 B 0.821
5 B 0.825
5 B 0.821
5 B 0.816
5 B 0.814
5 B 0.823
5 B 0.81
5 B 0.823
5 B 0.762
5 B 0.825
5 B 0.821
5 B 0.825
5 B 0.812"
Created on 2018-03-17 by the reprex package (v0.2.0).

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Replace ID variable values with counts of value occurrences - r

Related

Method in R to find difference between rows with varying row spacing

pivot_longer on a mix of matrix columns and regular vector columns

Extract every 11 rows from data frame [duplicate]

Calculating group differences in a "badly" partitioned data set

Calculate a mean per day per sample, then add a line of best fit

Categories

Resources