Could someone help me with this transformation in R? I would like to transform
this table
ID
Condition
Count
1
A
1
1
B
0
2
A
1
2
B
1
3
A
0
3
B
1
4
A
1
4
B
1
5
A
1
5
B
1
6
A
1
6
B
0
7
A
0
7
B
1
8
A
0
9
B
0
into this table
To create a table of like-against like
A
B
Count of ID
1
0
2
0
0
1
1
1
3
0
1
2
Any help would be appreciated. Thank you.
Phil,
You can do:
with(dat, split(Count, Condition)) |>
table() |>
data.frame()
A B Freq
1 0 0 1
2 1 0 2
3 0 1 2
4 1 1 3
Data:
dat <- structure(list(ID = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7,
7, 8, 9), Condition = c("A", "B", "A", "B", "A", "B", "A", "B",
"A", "B", "A", "B", "A", "B", "A", "B"), Count = c(1, 0, 1, 1,
0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0)), class = "data.frame", row.names = c(NA,
-16L))
Here is a tidyverse solution. I filled missing values with 0, please note that this leads to a different count than in your table (do you mean to have 8, 8 as the last two IDs and not 8, 9?):
data <- read.table(text = "ID Condition Count
1 A 1
1 B 0
2 A 1
2 B 1
3 A 0
3 B 1
4 A 1
4 B 1
5 A 1
5 B 1
6 A 1
6 B 0
7 A 0
7 B 1
8 A 0
9 B 0", header = TRUE)
library(tidyr)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
data %>%
pivot_wider(
id_cols = ID,
names_from = Condition,
values_from = Count,
values_fill = 0
) %>%
count(A, B, name = "Count of ID")
#> # A tibble: 4 × 3
#> A B `Count of ID`
#> <int> <int> <int>
#> 1 0 0 2
#> 2 0 1 2
#> 3 1 0 2
#> 4 1 1 3
Created on 2023-01-20 by the reprex package (v1.0.0)
I have a dataset that has some ID and associated timepoints. I want to filter out IDs that have a specific combination of timepoints. If I filter using %in% or |, I get IDs out of the specific combination. How do I do this in R ?
ID
Timepoint
1
1
1
6
1
12
2
1
3
1
3
6
3
12
3
18
4
1
4
6
4
12
I want to filter IDs that have timepoints 1,6 and 12 and exclude other IDs.
Result would be IDs 1,3 and 4
library(dplyr)
df <- data.frame(ID = c(1, 1, 1, 2, 3, 3, 3, 3, 4, 4, 4),
Timepoint = c(1, 6, 12, 1, 1, 6, 12, 18, 1, 6, 12))
df %>%
filter(Timepoint %in% c(1, 6, 12)) %>%
mutate(indicator = 1) %>%
group_by(ID) %>%
complete(Timepoint = c(1, 6, 12)) %>%
filter(!ID %in% pull(filter(., is.na(indicator)), ID)) %>%
select(indicator)
Output:
# A tibble: 9 × 2
# Groups: ID [3]
ID indicator
<dbl> <dbl>
1 1 1
2 1 1
3 1 1
4 3 1
5 3 1
6 3 1
7 4 1
8 4 1
9 4 1
We can use
library(dplyr)
df %>%
group_by(ID) %>%
filter(all(c(1, 6, 12) %in% Timepoint)) %>%
ungroup
-output
# A tibble: 10 x 2
ID Timepoint
<dbl> <dbl>
1 1 1
2 1 6
3 1 12
4 3 1
5 3 6
6 3 12
7 3 18
8 4 1
9 4 6
10 4 12
From your data, ID 2 has time point 1. So if filter by time points 1, 6, 12, the result will be 1, 2, 3, 4 instead of 1, 3, 4.
ids <- c(1, 1, 1, 2, 3, 3, 3, 3, 4, 4, 4)
time_points <- c(1, 6, 12, 1, 1, 6, 12, 18, 1, 6, 12)
dat <- data.frame(ids, time_points)
unique(dat$ids[dat$time_points %in% c(1, 6, 12)])
I want to create assign column based on rank and limit by group.
In particular, for each group, I have a priority rank (e.g., 1,2,3 or 1,3,6 or 3,4,5 etc). Based on the rank (the small number is a priority), I want to allocate the resource given in limit column. Now I am doing this by hand. But I want to express this exercise using tidyverse. How do I allocate by mutate and group_by(or other methods)?
Using tidyverse, you can use top_n after grouping. This will filter the top values based on rank - where the n to keep in each group is determined by limit. Those kept will be assigned 1, and then merged with your original data.
Let me know if this provides the desired result.
library(tidyverse)
df %>%
group_by(group) %>%
top_n(limit[1], desc(rank)) %>%
mutate(assign = 1) %>%
right_join(df) %>%
replace_na(list(assign = 0)) %>%
arrange(group, rank)
Output
group rank limit assign
<chr> <dbl> <dbl> <dbl>
1 A 1 1 1
2 A 2 1 0
3 A 3 1 0
4 B 1 1 1
5 B 3 1 0
6 B 6 1 0
7 C 3 2 1
8 C 4 2 1
9 C 5 2 0
10 C 6 2 0
Data
df <- structure(list(group = c("A", "A", "A", "B", "B", "B", "C", "C",
"C", "C"), rank = c(1, 2, 3, 1, 3, 6, 3, 4, 5, 6), limit = c(1,
1, 1, 1, 1, 1, 2, 2, 2, 2)), class = "data.frame", row.names = c(NA,
-10L))
I am trying to create a for loop that does the following:
for (i in 2:length(Exampledata$Levels)) {
if(is.na(Exampledata$Levels[i]) == "TRUE" {
find the last instance where
is.na(Exampledata$Levels) == "FALSE"
for that same ID, and input
the day from that row into last_entry[i]
}
}
Example data:
ID<-c("QYZ","MMM","QYZ","bb2","gm6","gm6","YOU","LLL","LLL","LLL")
day<-c(1,2,3,4,5,6,7,8,9,10)
values<-c(1,2,4,5,5,6,8,9,6,4)
Levels<-c("A","","A","C",'D','D',"C","y","","")
last_entry<-c(0,0,0,0,0,0,0,0,0,0)
What data currently looks like:
ID values Levels day last_entry
1 QYZ 1 A 1 0
2 MMM 2 2 0
3 QYZ 4 A 3 0
4 bb2 5 C 4 0
5 gm6 5 D 5 0
6 gm6 6 D 6 0
7 YOU 8 C 7 0
8 LLL 9 y 8 0
9 LLL 6 9 0
10 LLL 4 10 0
What I want it to look like:
ID values Levels day last_entry
1 QYZ 1 A 1 0
2 MMM 2 2 0
3 QYZ 4 A 3 0
4 bb2 5 C 4 0
5 gm6 5 D 5 0
6 gm6 6 D 6 0
7 YOU 8 C 7 0
8 LLL 9 y 8 0
9 LLL 6 9 8
10 LLL 4 10 8
I have seen a lot of code that looks for last non-zero elements or last is.na=FALSE, but none that can do it by ID, and extract a value from that row. I also need to ignore cases where there is no entry for that ID.
Essentially I want to know the last day that a level was entered for that ID.
Here's a solution using data.table:
library('data.table')
dt <- data.table(ID = c("QYZ","MMM","QYZ","bb2","gm6","gm6","YOU","LLL","LLL","LLL"),
Day = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
values = c(1, 2, 4, 5, 5, 6, 8, 9, 6, 4),
Levels = c("A", NA, "A", "C", "D", "D", "C", "y", NA, NA),
last_entry = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0))
func <- function(days, levels){
if(!any(is.na(levels)) | all(is.na(levels))) return(0)
return(last(days[which(!is.na(levels))]))
}
dt[, last_entry := ifelse(!is.na(Levels), 0, func(Day, Levels)), by = ID]
But if you're set on using a for loop:
ID <- c("QYZ","MMM","QYZ","bb2","gm6","gm6","YOU","LLL","LLL","LLL")
Day <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
Levels <- c("A", NA, "A", "C", "D", "D", "C", "y", NA, NA)
last_entry <- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
i.na <- which(is.na(Levels))
for(id in unique(ID)){
i.id <- which(ID == id)
if(all(is.na(Levels[i.id])) | !any(is.na(Levels[i.id]))) next
day <- last(Day[i.id[!(i.id %in% i.na)]])
last_entry[i.na[i.na %in% i.id]] <- day
}
Here is one way using tidyr::fill. We replace the last_entry columns with NA where the Levels are empty, then use fill to replace those NA's with latest non-NA values and turn last_entry value of all non-empty Levels to 0.
library(dplyr)
df %>%
mutate(last_entry = ifelse(Levels != "", day, NA)) %>%
group_by(ID) %>%
tidyr::fill(last_entry) %>%
mutate(last_entry = replace(last_entry, Levels != "" | n() == 1, 0))
# ID day values Levels last_entry
# <fct> <dbl> <dbl> <fct> <dbl>
# 1 QYZ 1 1 A 0
# 2 MMM 2 2 "" 0
# 3 QYZ 3 4 A 0
# 4 bb2 4 5 C 0
# 5 gm6 5 5 D 0
# 6 gm6 6 6 D 0
# 7 YOU 7 8 C 0
# 8 LLL 8 9 y 0
# 9 LLL 9 6 "" 8
#10 LLL 10 4 "" 8
We can also do
df %>%
group_by(ID) %>%
mutate(last_entry = purrr::map_dbl(row_number(), ~if (Levels[.x] == "" & n() > 1)
day[max(which(Levels[1:.x] != ""))] else 0))
data
ID<-c("QYZ","MMM","QYZ","bb2","gm6","gm6","YOU","LLL","LLL","LLL")
day<-c(1,2,3,4,5,6,7,8,9,10)
values<-c(1,2,4,5,5,6,8,9,6,4)
Levels<-c("A","","A","C",'D','D',"C","y","","")
last_entry<-c(0,0,0,0,0,0,0,0,0,0)
df <- data.frame(ID, day, values, Levels, last_entry)
If you want to do it properly, you may want to code "empty" cells to NA beforehand.
Exampledata[Exampledata == ""] <- NA
Then you may use by from base R to look up "day" of the last !is.na entry of "Levels" in the by "ID" splitted data.
res <- do.call(rbind, by(Exampledata, Exampledata$ID, function(x) {
x$last_entry <- ifelse(is.na(x$Levels), x$day[tail(which(!is.na(x$Levels)), 1)], 0)
x
}))
Since the rbinded result comes out ordered alphabetically by "ID" we can re-order it by day.
res <- res[order(res$day), ]
res
# ID day values Levels last_entry
# QYZ.1 QYZ 1 1 A 0
# MMM MMM 2 2 <NA> NA
# QYZ.3 QYZ 3 4 A 0
# bb2 bb2 4 5 C 0
# gm6.5 gm6 5 5 D 0
# gm6.6 gm6 6 6 D 0
# YOU YOU 7 8 C 0
# LLL.8 LLL 8 9 y 0
# LLL.9 LLL 9 6 <NA> 8
# LLL.10 LLL 10 4 <NA> 8
Now there are the desired last entries for the "LLL" level, and an NA for MMM what it logically should have since "Levels" is NA and it has no last entry.
Data
Exampledata <- structure(list(ID = structure(c(5L, 4L, 5L, 1L, 2L, 2L, 6L, 3L,
3L, 3L), .Label = c("bb2", "gm6", "LLL", "MMM", "QYZ", "YOU"), class = "factor"),
day = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), values = c(1, 2,
4, 5, 5, 6, 8, 9, 6, 4), Levels = structure(c(2L, NA, 2L,
3L, 4L, 4L, 3L, 5L, NA, NA), .Label = c("", "A", "C", "D",
"y"), class = "factor"), last_entry = c(0, 0, 0, 0, 0, 0,
0, 0, 0, 0)), row.names = c(NA, -10L), class = "data.frame")