If Group Contains Value Logic R - r

I'm trying to get the age given certain criteria. For example, if RNA is not null, then I would like to have the AvatarKey be associated with the minimum age associated with a present RNA entry. If, however, it is NA, I would like to take the minimum Age where DNA is not null. If both are null, remove.
Input:
ID DNA RNA Age
2 NA SL43 22.2
2 SL333 NA 55.7
2 SL333 SL43 43.7
6 SL333 NA 10.3
6 SL333 NA 65.6
6 NA NA 35.5
5 NA SL43 78.0
5 NA SL43 23.3
5 NA SL43 35.8
7 SL333 SL43 13.5
7 SL333 SL43 98.1
1 NA NA 55.6
Desired Output
ID DNA RNA Age
2 NA SL43 22.2
2 SL333 SL43 43.7
6 SL333 NA 10.3
5 NA SL43 23.3
7 SL333 SL43 13.5

Different order than your output, but does this work?
library(dplyr)
my_data %>%
filter(!is.na(DNA) | !is.na(RNA)) %>%
group_by(ID, DNA) %>%
arrange(DNA, -Age) %>%
slice(n())
ID DNA RNA Age
<int> <chr> <chr> <dbl>
1 2 SL333 SL43 43.7
2 2 NA SL43 22.2
3 5 NA SL43 23.3
4 6 SL333 NA 10.3
5 7 SL333 SL43 13.5

You can try :
library(dplyr)
df %>%
group_by(ID, DNA, RNA) %>%
summarise(Age = min(Age)) %>%
ungroup() %>%
filter(!(is.na(DNA) & is.na(RNA)))

Related

how to copy part of rows based on group by 'id' in R?

I have a data frame such as below:
id Date Age Sex PP Duration cd nh W_B R_B
583 99/07/19 51 2 NA 1 0 0 6.2 4.26
583 99/07/23 51 2 NA NA NA NA 7 4.35
3024 99/10/30 42 2 4 6 NA 1 6.2 5.28
3024 99/11/01 42 2 NA NA NA NA 5.2 5.47
3024 99/11/02 42 2 NA NA NA NA 7.1 5.54
I have to copy the values of 'pp' column to 'nh' based on 'id' in other rows with that 'id'. my target data frame is as below:
id Date Age Sex PP Duration cd nh W_B R_B
583 99/07/19 51 2 NA 1 0 0 6.2 4.26
583 99/07/23 51 2 NA 1 0 0 7 4.35
3024 99/10/30 42 2 4 6 NA 1 6.2 5.28
3024 99/11/01 42 2 4 6 NA 1 5.2 5.47
3024 99/11/02 42 2 4 6 NA 1 7.1 5.54
I apprecite it if anybody share his/her comment with me.
Best Regards
Another option using na.locf:
df <- read.table(text="id Date Age Sex PP Duration cd nh W_B R_B
583 99/07/19 51 2 NA 1 0 0 6.2 4.26
583 99/07/23 51 2 NA NA NA NA 7 4.35
3024 99/10/30 42 2 4 6 NA 1 6.2 5.28
3024 99/11/01 42 2 NA NA NA NA 5.2 5.47
3024 99/11/02 42 2 NA NA NA NA 7.1 5.54", header=TRUE)
library(dplyr)
library(zoo)
df %>%
group_by(id) %>%
summarise(across(everything(), ~na.locf(., na.rm = FALSE, fromLast = FALSE)))
#> `summarise()` has grouped output by 'id'. You can override using the `.groups`
#> argument.
#> # A tibble: 5 × 10
#> # Groups: id [2]
#> id Date Age Sex PP Duration cd nh W_B R_B
#> <int> <chr> <int> <int> <int> <int> <int> <int> <dbl> <dbl>
#> 1 583 99/07/19 51 2 NA 1 0 0 6.2 4.26
#> 2 583 99/07/23 51 2 NA 1 0 0 7 4.35
#> 3 3024 99/10/30 42 2 4 6 NA 1 6.2 5.28
#> 4 3024 99/11/01 42 2 4 6 NA 1 5.2 5.47
#> 5 3024 99/11/02 42 2 4 6 NA 1 7.1 5.54
Created on 2022-07-02 by the reprex package (v2.0.1)
library(tidyverse)
df <- read_table("id Date Age Sex PP Duration cd nh W_B R_B
583 99/07/19 51 2 NA 1 0 0 6.2 4.26
583 99/07/23 51 2 NA NA NA NA 7 4.35
3024 99/10/30 42 2 4 6 NA 1 6.2 5.28
3024 99/11/01 42 2 NA NA NA NA 5.2 5.47
3024 99/11/02 42 2 NA NA NA NA 7.1 5.54")
df %>%
group_by(id) %>%
fill(PP:nh, .direction = 'updown')
#> # A tibble: 5 × 10
#> # Groups: id [2]
#> id Date Age Sex PP Duration cd nh W_B R_B
#> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 583 99/07/19 51 2 NA 1 0 0 6.2 4.26
#> 2 583 99/07/23 51 2 NA 1 NA 0 7 4.35
#> 3 3024 99/10/30 42 2 4 6 NA 1 6.2 5.28
#> 4 3024 99/11/01 42 2 4 6 NA 1 5.2 5.47
#> 5 3024 99/11/02 42 2 4 6 NA 1 7.1 5.54
Created on 2022-07-02 by the reprex package (v2.0.1)

How to find mean value using multiple columns of a R data.frame?

I am trying to find mean of A and B for each row and save it as separate column but seems like the code only average the first row and fill the rest of the rows with that value. Any suggestion how to fix this?
library(tidyverse)
library(lubridate)
set.seed(123)
DF <- data.frame(Date = seq(as.Date("2001-01-01"), to = as.Date("2003-12-31"), by = "day"),
A = runif(1095, 1,60),
Z = runif(1095, 5,100)) %>%
mutate(MeanofAandZ= mean(A:Z))
Are you looking for this:
DF %>% rowwise() %>% mutate(MeanofAandZ = mean(c_across(A:Z)))
# A tibble: 1,095 x 4
# Rowwise:
Date A Z MeanofAandZ
<date> <dbl> <dbl> <dbl>
1 2001-01-01 26.5 7.68 17.1
2 2001-01-02 54.9 33.1 44.0
3 2001-01-03 37.1 82.0 59.5
4 2001-01-04 6.91 18.0 12.4
5 2001-01-05 53.0 8.76 30.9
6 2001-01-06 26.1 7.63 16.9
7 2001-01-07 59.3 30.8 45.0
8 2001-01-08 39.9 14.6 27.3
9 2001-01-09 59.2 93.6 76.4
10 2001-01-10 30.7 89.1 59.9
you can do it with Base R: rowMeans
Full Base R:
DF$MeanofAandZ <- rowMeans(DF[c("A", "Z")])
head(DF)
#> Date A Z MeanofAandZ
#> 1 2001-01-01 17.967074 76.92436 47.44572
#> 2 2001-01-02 47.510003 99.28325 73.39663
#> 3 2001-01-03 25.129638 64.33253 44.73109
#> 4 2001-01-04 53.098027 32.42556 42.76179
#> 5 2001-01-05 56.487570 23.99162 40.23959
#> 6 2001-01-06 3.687833 81.08720 42.38751
or inside a mutate:
library(dplyr)
DF <- DF %>% mutate(MeanofAandZ = rowMeans(cbind(A,Z)))
head(DF)
#> Date A Z MeanofAandZ
#> 1 2001-01-01 17.967074 76.92436 47.44572
#> 2 2001-01-02 47.510003 99.28325 73.39663
#> 3 2001-01-03 25.129638 64.33253 44.73109
#> 4 2001-01-04 53.098027 32.42556 42.76179
#> 5 2001-01-05 56.487570 23.99162 40.23959
#> 6 2001-01-06 3.687833 81.08720 42.38751
We can also do
DF$MeanofAandZ <- Reduce(`+`, DF[c("A", "Z")])/2
Or using apply
DF$MeanofAandZ <- apply(DF[c("A", "Z")], 1, mean)

Sort a dataframe according to characters in R [duplicate]

This question already has answers here:
R Sort strings according to substring
(2 answers)
Closed 2 years ago.
I got the dataframe (code) and I I want to sort it according to combName in a numerical order.
> code
# A tibble: 1,108 x 2
combName sumLength
<chr> <dbl>
1 20-1 8.05
2 20-10 14.7
3 20-100 21.2
4 20-101 17.6
5 20-102 25.4
6 20-103 46.3
7 20-104 68.7
8 20-105 24.3
9 20-106 46.3
10 20-107 14.0
# ... with 1,098 more rows
Afterwards the left column should look like:
> code
# A tibble: 1,108 x 2
combName sumLength
<chr> <dbl>
1 20-1 8.05
2 20-2 ...
3 20-3 ...
4 20-4 ...
5 20-5 ...
...
10 20-10 14.7
# ... with 1,098 more rows
It do not know what I can do to reach this format.
Does this work:
library(dplyr)
library(tidyr)
df
# A tibble: 10 x 2
combName sumLength
<chr> <dbl>
1 20-102 25.4
2 20-100 21.2
3 20-101 17.6
4 20-105 24.3
5 20-10 14.7
6 20-103 46.3
7 20-104 68.7
8 20-1 8.05
9 20-106 46.3
10 20-107 14
df %>% separate(combName, into = c('1','2'), sep = '-', remove = F) %>%
type.convert(as.is = T) %>% arrange(`1`,`2`) %>% select(-c(`1`,`2`))
# A tibble: 10 x 2
combName sumLength
<chr> <dbl>
1 20-1 8.05
2 20-10 14.7
3 20-100 21.2
4 20-101 17.6
5 20-102 25.4
6 20-103 46.3
7 20-104 68.7
8 20-105 24.3
9 20-106 46.3
10 20-107 14

reshape untidy data frame, spreading rows to columns names [duplicate]

This question already has answers here:
Transpose a data frame
(6 answers)
Closed 2 years ago.
Have searched the threads but can't understand a solution that will solve the problem with the data frame that I have.
My current data frame (df):
# A tibble: 8 x 29
`Athlete` Monday...2 Tuesday...3 Wednesday...4 Thursday...5 Friday...6 Saturday...7 Sunday...8
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Date 29/06/2020 30/06/2020 43837.0 43868.0 43897.0 43928.0 43958.0
2 HR 47.0 54.0 51.0 56.0 59.0 NA NA
3 HRV 171.0 91.0 127.0 99.0 77.0 NA NA
4 Sleep Duration 9.11 7.12 8.59 7.15 8.32 NA NA
5 Sleep Efficien~ 92.0 94.0 89.0 90.0 90.0 NA NA
6 Recovery Score 98.0 66.0 96.0 72.0 46.0 NA NA
7 Life Stress NO NO NO NO NO NA NA
8 Sick NO NO NO NO NO NA NA
Have tried to use spread and pivot wider but I know there would require additional functions in order to get the desired output which beyond my level on understanding in R.
Do I need to u
Desired output:
Date HR HRV Sleep Duration Sleep Efficiency Recovery Score Life Stress Sick
29/06/2020 47.0 171.0 9.11
30/06/2020 54.0 91.0 7.12
43837.0 51.0 127.0 8.59
43868.0 56.0 99.0 7.15
43897.0 59.0 77.0 8.32
43928.0 NA NA NA
43958.0 NA NA NA
etc.
Thank you
In Base R you will do:
type.convert(setNames(data.frame(t(df[-1]), row.names = NULL), df[,1]))
Date HR HRV Sleep Duration Sleep Efficien~ Recovery Score Life Stress Sick
1 29/06/2020 47 171 9.11 92 98 NO NO
2 30/06/2020 54 91 7.12 94 66 NO NO
3 43837.0 51 127 8.59 89 96 NO NO
4 43868.0 56 99 7.15 90 72 NO NO
5 43897.0 59 77 8.32 90 46 NO NO
6 43928 NA NA NA NA NA <NA> <NA>
7 43958 NA NA NA NA NA <NA> <NA>

filter by observation that cumulate X% of values

I would like to filter by observations (after sorting in decreasing way in every group) that cumulate X % of values, in my case less than or equal to 80 percent of total of the values. And that in every group.
So from this dataframe below:
Group<-c("A","A","A","A","A","B","B","B","B","C","C","C","C","C","C")
value<-c(c(2,3,6,3,1,1,3,3,5,4,3,5,3,4,2))
data1<-data.frame(Group,value)
data1<-data1%>%arrange(Group,desc(value))%>%
group_by(Group)%>%mutate(pct=round (100*value/sum(value),1))%>%
mutate(cumPct=cumsum(pct))
I would like to have the below filtered dataframe according to conditions I decribed above:
Group value pct cumPct
1 A 6 40.0 40.0
2 A 3 20.0 60.0
3 A 3 20.0 80.0
4 B 5 41.7 41.7
5 B 3 25.0 66.7
6 C 5 23.8 23.8
7 C 4 19.0 42.8
8 C 4 19.0 61.8
9 C 3 14.3 76.1
You can arrange the data in descending order of value, for each Group calculate pct and cum_pct and select rows where cum_pct is less than equal to 80.
library(dplyr)
data1 %>%
arrange(Group, desc(value)) %>%
group_by(Group) %>%
mutate(pct = value/sum(value) * 100,
cum_pct = cumsum(pct)) %>%
filter(cum_pct <= 80)
# Group value pct cum_pct
# <chr> <dbl> <dbl> <dbl>
#1 A 6 40 40
#2 A 3 20 60
#3 A 3 20 80
#4 B 5 41.7 41.7
#5 B 3 25 66.7
#6 C 5 23.8 23.8
#7 C 4 19.0 42.9
#8 C 4 19.0 61.9
#9 C 3 14.3 76.2

Resources