Create an id based on column value in R [duplicate] - r

This question already has answers here:
Numbering rows within groups in a data frame
(10 answers)
Closed 1 year ago.
I am working on a two columns dataset in R representing response values ("Response") of different samples belonging to different groups ("Group") and I want to create a third ID column to identify each sample with a number from 1 to [..] (there is not the same number of sample in each group). Here is just a few lines as an example: Example
Thanks for your help.

try
library(tidyverse)
your_data %>%
group_by(Group)%>%
mutate(ID = 1:n())

We could use cur_group_id
library(dplyr)
df %>%
group_by(Group) %>%
mutate(new_id = cur_group_id())
Output:
Group Response Id new_id
<chr> <dbl> <dbl> <int>
1 A 1.5 1 1
2 A 3.4 2 1
3 A 2.3 3 1
4 A 1.8 4 1
5 B 1.9 1 2
6 B 1.4 2 2
7 C 2.7 1 3
8 C 2.3 2 3
9 C 3.2 3 3

Related

How do I subset patient data based on number of readings for a particular variable for each patient?

I keep trying to find an answer, but haven't had much luck. I'll add a sample of some similar data.
What I'd be trying to do here is exclude patient 1 and patient 4 from my subset, as they only have one reading for "Mobility Score". So far, I've been unable to work out a way of counting the number of readings under each variable for each patient. If the patient only has one or zero readings, I'd like to exclude them from a subset.
This is an imgur link to the sample data. I can't upload the real data, but it's similar to this
This can be done with dplyr and group_by. For more information see ?group_by and ?summarize
# Create random data
dta <- data.frame(patient = rep(c(1,2),4), MobiScor = runif(8, 0,20))
dta$MobiScor[sample(1:8,3)] <- NA
# Count all avaiable Mobility scores per patient and leave original format
library(dplyr)
dta %>% group_by(patient) %>% mutate(count = sum(!is.na(MobiScor)))
# Merge and create pivot table
dta %>% group_by(patient) %>% summarize(count = sum(!is.na(MobiScor)))
Example data
patient MobiScor
1 1 19.203898
2 2 13.684209
3 1 17.581468
4 2 NA
5 1 NA
6 2 NA
7 1 7.794959
8 2 NA
Result (mutate) 1)
patient MobiScor count
<dbl> <dbl> <int>
1 1 19.2 3
2 2 13.7 1
3 1 17.6 3
4 2 NA 1
5 1 NA 3
6 2 NA 1
7 1 7.79 3
8 2 NA 1
Result (summarize) 2)
patient count
<dbl> <int>
1 1 3
2 2 1
You can count the number of non-NA in each group and then filter based on that.
This can be done in base R :
subset(df, ave(!is.na(Mobility_score), patient, FUN = sum) > 1)
Using dplyr
library(dplyr)
df %>% group_by(patient) %>% filter(sum(!is.na(Mobility_score)) > 1)
and data.table
library(data.table)
setDT(df)[, .SD[sum(!is.na(Mobility_score)) > 1], patient]

Invert rows using dplyr [duplicate]

This question already has answers here:
Reorder the rows of data frame in dplyr
(2 answers)
dplyr arrange by reverse alphabetical order [duplicate]
(1 answer)
Closed 3 years ago.
How can I invert the rows of a dataframe/tibble using dplyr? I don't want to arrange it by a certain variable, but rather have it just inverted.
I.e. the tibble
# A tibble: 5 x 2
a b
<int> <chr>
1 1 one
2 2 two
3 3 three
4 4 four
5 5 five
should become
# A tibble: 5 x 2
a b
<int> <chr>
1 5 five
2 4 four
3 3 three
4 2 two
5 1 one
Just arrange() by descending row_number() like this:
my_tibble %>%
dplyr::arrange(-dplyr::row_number())
We can use desc
my_tibble %>%
arrange(desc(row_number()))
Or another option is slice
my_tibble %>%
slice(rev(row_number()))
Or the 'a' column
my_tibble %>%
arrange(desc(a))
# a b
#1 5 five
#2 4 four
#3 3 three
#4 2 two
#5 1 one

How to average all columns in dataset by group [duplicate]

This question already has answers here:
How to calculate mean of all columns, by group?
(6 answers)
Closed 4 years ago.
I'm using aggregate in R to try and summarize my dataset. I currently have 3-5 observation per ID and I need to average these so that I have 1 value (the mean) per ID. Some columns are returning all "NA" when I use aggregate.
So far, I've created a vector for each column to average it, then tried to use merge to combine all of them. Some columns are characters, so I tried converting them to numbers using as.numeric(as.character(column)), but that returns too many NA in the column.
library(dplyr)
Tr1 <- data %>% group_by(ID) %>% summarise(mean = mean(Tr1))
Tr2 <- data %>% group_by(ID) %>% summarise(mean = mean(Tr2))
Tr3 <- data %>% group_by(ID) %>% summarise(mean = mean(Tr3))
data2 <- merge(Tr1,Tr2,Tr3, by = ID)
From this code I get error codes:
There were 50 or more warnings (use warnings() to see the first 50)
then,
Error in fix.by(by.x, x) :
'by' must specify one or more columns as numbers, names or logical
My original dataset looks like:
ID Tr1 Tr2 Tr3
1 4 5 6
1 5 3 9
1 3 5 9
4 5 1 8
4 2 6 4
6 2 8 6
6 2 7 4
6 7 1 9
and I am trying to find a code so that it looks like:
ID Tr1 Tr2 Tr3
1 4 4.3 8
4 3.5 3.5 6
6 3.7 5.3 6.3
You can use summarise_all instead of multiple uses of summarise:
library(dplyr)
data %>%
group_by(ID) %>%
summarise_all(mean)
# A tibble: 3 x 4
ID Tr1 Tr2 Tr3
<int> <dbl> <dbl> <dbl>
1 1 4 4.33 8
2 4 3.5 3.5 6
3 6 3.67 5.33 6.33

Dynamically Normalize all rows with first element within a group

Suppose I have the following data frame:
year subject grade study_time
1 1 a 30 20
2 2 a 60 60
3 1 b 30 10
4 2 b 90 100
What I would like to do is be able to divide grade and study_time by their first record within each subject. I do the following:
df %>%
group_by(subject) %>%
mutate(RN = row_number()) %>%
mutate(study_time = study_time/study_time[RN ==1],
grade = grade/grade[RN==1]) %>%
select(-RN)
I would get the following output
year subject grade study_time
1 1 a 1 1
2 2 a 2 3
3 1 b 1 1
4 2 b 3 10
It's fairly easy to do when I know what the variable names are. However, I'm trying to write a generalize function that would be able to act on any data.frame/data.table/tibble where I may not know the name of the variables that I need to mutate, I'll only know the variables names not to mutate. I'm trying to get this done using tidyverse/data.table and I can't get anything to work.
Any help would be greatly appreciated.
We group by 'subject' and use mutate_at to change multiple columns by dividing the element by the first element
library(dplyr)
df %>%
group_by(subject) %>%
mutate_at(3:4, funs(./first(.)))
# A tibble: 4 x 4
# Groups: subject [2]
# year subject grade study_time
# <int> <chr> <dbl> <dbl>
#1 1 a 1 1
#2 2 a 2 3
#3 1 b 1 1
#4 2 b 3 10

Tibble - add id-specific mean column [duplicate]

This question already has answers here:
Calculate group mean, sum, or other summary stats. and assign column to original data
(4 answers)
Closed 4 years ago.
Suppose I have
tibble(id = c(1,1,2,2), data = c(1:4))
i.e.
id data
1 1
1 2
2 3
2 4
I want to add a column with id-secific means, i.e. I want to get to
id data id_means
1 1 1.5
1 2 1.5
2 3 3.5
2 4 3.5
How can I do this?
We can use mutate after grouping by 'id'
df1 %>%
group_by(id) %>%
mutate(id_means = mean(data))
# A tibble: 4 x 3
# Groups: id [2]
# id data id_means
# <dbl> <int> <dbl>
#1 1.00 1 1.50
#2 1.00 2 1.50
#3 2.00 3 3.50
#4 2.00 4 3.50
data
df1 <- tibble(id = c(1,1,2,2), data = c(1:4))

Resources