Take the letter that comes first in the alphabet (in R) - r

In the following dataframe df,
structure(list(Name = c("Gregory", "Jane", "Joey", "Mark", "Rachel", "Phoebe", "Liza"), code = c("xx11-9090", "1367-88uu", "117y-xxxh", "cf56-gh67", "1888-ddf5", "rf52-628u", "hj69-5kk5"), `CLASS IF5` = c("E", "C", "C", "D", "D", "A", "A"), `CLASS AIS` = c("E",
"C", "C", "D", "D", "A", "A"), `CLASS IPP` = c("C", "C", "C",
"E", "E", "B", "A"), `CLASS SJR` = c("D", "C", "C", "D", "D",
"B", "A")), row.names = c(1682L, 1683L, 1768L, 333L, 443L, 510L,
897L), class = "data.frame")
the letters denote a ranking. For example: A is the first position, B is the second and so on. The letters range between A and E. I would like to collapse the columns that begin with CLASS (i.e., the last four columns of the dataframe) in only one column keeping, for each row of the dataframe, only the letter that corresponds to the highest position in the ranking.
The desired result is:
Name code new column
1682 Gregory xx11-9090 C
1683 Jane 1367-88uu C
1768 Joey 117y-xxxh C
333 Mark cf56-gh67 D
443 Rachel 1888-ddf5 D
510 Phoebe rf52-628u A
897 Liza hj69-5kk5 A

You can use the apply statement to apply the min function to each row and then assign its output to a new column:
df$new_column <- apply(df[, grep("^CLASS", names(df))], 1, min, na.rm = TRUE)

A possible solution in base R:
df$new_coolumn <- apply(df, 1, \(x) sort(x[-(1:2)])[1])
df[,c(1,2,7)]
#> Name code new_coolumn
#> 1682 Gregory xx11-9090 C
#> 1683 Jane 1367-88uu C
#> 1768 Joey 117y-xxxh C
#> 333 Mark cf56-gh67 D
#> 443 Rachel 1888-ddf5 D
#> 510 Phoebe rf52-628u A
#> 897 Liza hj69-5kk5 A
Using dplyr:
library(dplyr)
df %>%
rowwise %>%
mutate(new_column = c_across(starts_with("CLASS")) %>% sort %>% .[1]) %>%
select(Name, code, new_column) %>% ungroup
#> # A tibble: 7 × 3
#> Name code new_column
#> <chr> <chr> <chr>
#> 1 Gregory xx11-9090 C
#> 2 Jane 1367-88uu C
#> 3 Joey 117y-xxxh C
#> 4 Mark cf56-gh67 D
#> 5 Rachel 1888-ddf5 D
#> 6 Phoebe rf52-628u A
#> 7 Liza hj69-5kk5 A

Related

R - Creating a new variable based on multiple observations

My dataset represents patients which have been treated multiple times. The dataset is in a long format, patients either get treatment A, C or S or a combination. A and C are never combined.
Simply put, the data looks something like this:
df <- tibble(PatientID = c(1,1,1,2,2,3,3,3,3,4,4,5,5,5,6,6),
treatment = c("A", "A", "S", "C", "S", "S", "C", "C", NA, "C", NA, NA, "S", "A", "S", NA)
I would like to creat a new variable based on if any patient had treatment A or C or neither, so the end result looking something like:
df <- tibble(PatientID = c(1,1,1,2,2,3,3,3,3,4,4,5,5,5,6,6),
treatment = c("A", "A", "S", "C", "S", "S", "C", "C", NA, "C", NA, NA, "S", "A", "S", "S"),
group = c("A", "A", "A", "C", "C", "C", "C", "C", "C", "C", "C", "A", "A", "A", "S", "S"))
How can I best approach this? I'm struggling with how to deal with multiple observations per ID.
Thank you!
You can use group_by() in combination with mutate() and case_when() to achieve this:
library(tidyverse)
df <- tibble(PatientID = c(1,1,1,2,2,3,3,3,3,4,4,5,5,5,6,6),
treatment = c("A", "A", "S", "C", "S", "S", "C", "C", NA, "C", NA, NA, "S", "A", "S", NA))
df %>%
group_by(PatientID) %>%
mutate(groups = case_when("A" %in% treatment ~ "A",
"C" %in% treatment ~ "C",
TRUE ~ "S"))
#> # A tibble: 16 × 3
#> # Groups: PatientID [6]
#> PatientID treatment groups
#> <dbl> <chr> <chr>
#> 1 1 A A
#> 2 1 A A
#> 3 1 S A
#> 4 2 C C
#> 5 2 S C
#> 6 3 S C
#> 7 3 C C
#> 8 3 C C
#> 9 3 <NA> C
#> 10 4 C C
#> 11 4 <NA> C
#> 12 5 <NA> A
#> 13 5 S A
#> 14 5 A A
#> 15 6 S S
#> 16 6 <NA> S
Created on 2022-08-18 with reprex v2.0.2

Duplicate rows based on common value in other column?

I have a data frame which includes: one column having individual ID's (unique), and a second column showing a common unique variable. That is, everyone in column 1 took the same action, which is shown in column B.
I'd like to write code in R which creates new rows, which pair everyone in column A based on column B.
that is, given this example:
person <- c("a", "b", "c", "d", "e", "f")
action <- c("x", "x", "x", "y", "y", "y")
data.frame(person, action)
I'd want to create this:
person1 <- c("a", "a", "b", "d", "d", "e")
person2 <- c("b", "c", "c", "e", "f", "f")
data.frame(person1, person2)
A method using group_modify() and combn():
library(dplyr)
df %>%
group_by(action) %>%
group_modify(~ as_tibble(t(combn(pull(.x, person), 2))))
# A tibble: 6 × 3
# Groups: action [2]
action V1 V2
<chr> <chr> <chr>
1 x a b
2 x a c
3 x b c
4 y d e
5 y d f
6 y e f
How about this:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyr)
#> Warning: package 'tidyr' was built under R version 4.1.2
person<-c("a", "b", "c", "d", "e", "f")
action<-c("x", "x", "x", "y", "y", "y")
dat <- data.frame(person, action)
dat %>%
group_by(action) %>%
summarise(person = as.data.frame(t(combn(person, 2)))) %>%
unnest(person) %>%
rename(person1=V1, person2=V2)
#> `summarise()` has grouped output by 'action'. You can override using the
#> `.groups` argument.
#> # A tibble: 6 × 3
#> # Groups: action [2]
#> action person1 person2
#> <chr> <chr> <chr>
#> 1 x a b
#> 2 x a c
#> 3 x b c
#> 4 y d e
#> 5 y d f
#> 6 y e f
Created on 2022-04-21 by the reprex package (v2.0.1)
Here is a one liner in base R.
person <- c("a", "b", "c", "d", "e", "f")
action <- c("x", "x", "x", "y", "y", "y")
df <- data.frame(person, action)
setNames(
do.call(
rbind,
lapply(split(df, df$action),
function(x) as.data.frame(t(combn(x$person, 2))))),
c("person1", "person2"))
# person1 person2
# x.1 a b
# x.2 a c
# x.3 b c
# y.1 d e
# y.2 d f
# y.3 e f
Using base R
subset(merge(dat, dat, by = 'action'), person.x != person.y &
duplicated(paste(pmin(person.x, person.y), pmax(person.x, person.y))))
action person.x person.y
4 x b a
7 x c a
8 x c b
13 y e d
16 y f d
17 y f e

How to generate string counts in different samples by R

Let's say I have a data table as follow:
ID1 ID2 ID3
-------------
a a b
a b b
b b b
c c c
c c d
c d d
d e
d e
e
Then I want to convert it as like following structure:
Samples ID1 ID2 ID3
-------------------
a 2 1 0
b 1 2 3
c 3 2 1
d 2 1 2
e 1 0 2
Would any of you please help me with R or bash code to achieve such transformation?
Try the R code below
> table(stack(df))
ind
values ID1 ID2 ID3
a 2 1 0
b 1 2 3
c 3 2 1
d 2 1 2
e 1 0 2
data
> dput(df)
structure(list(ID1 = c("a", "a", "b", "c", "c", "c", "d", "d",
"e"), ID2 = c("a", "b", "b", "c", "c", "d", NA, NA, NA), ID3 = c("b",
"b", "b", "c", "d", "d", "e", "e", NA)), class = "data.frame", row.names = c(NA,
-9L))
An option with tidyverse - reshape to 'long' format with pivot_longer, get the count and reshape back to 'wide' format with pivot_wider
library(dplyr)
library(tidyr)
df %>%
pivot_longer(everything(), values_drop_na = TRUE, values_to = 'Samples') %>%
count(name, Samples) %>%
pivot_wider(names_from = name, values_from = n, values_fill = 0)
-output
# A tibble: 5 × 4
Samples ID1 ID2 ID3
<chr> <int> <int> <int>
1 a 2 1 0
2 b 1 2 3
3 c 3 2 1
4 d 2 1 2
5 e 1 0 2
data
df <- structure(list(ID1 = c("a", "a", "b", "c", "c", "c", "d", "d",
"e"), ID2 = c("a", "b", "b", "c", "c", "d", NA, NA, NA), ID3 = c("b",
"b", "b", "c", "d", "d", "e", "e", NA)), class = "data.frame",
row.names = c(NA,
-9L))

Filter directed co-occurrences in a table

I have co-occurrence data that can be represented in two columns. The entries in each column are from the same set of possibilities. Ultimately I am aiming to plot a directed network but first I would like to split the table into those that reciprocal (i.e. both X->Y and Y->X) and those that occur in just one direction (i.e. only Y->Z). Here is an example:
library(tidyverse)
# Example data
from <- c("A", "B", "F", "Q", "T", "S", "D", "E", "A", "T", "F")
to <- c("E", "D", "Q", "S", "F", "T", "B", "A", "D", "A", "E")
df <- data_frame(from, to)
df
# A tibble: 11 x 2
from to
<chr> <chr>
1 A E
2 B D
3 F Q
4 Q S
5 T F
6 S T
7 D B
8 E A
9 A D
10 T A
11 F E
and here is my desired output:
# Desired output 1 - reciprocal co-occurrences
df %>%
slice(c(1,2)) %>%
rename(item1 = from, item2 = to)
# A tibble: 2 x 2
item1 item2
<chr> <chr>
1 A E
2 B D
# Desired output 2 - single occurrences
df %>%
slice(c(3,4,6,6,9,10,11))
# A tibble: 7 x 2
from to
<chr> <chr>
1 F Q
2 Q S
3 S T
4 S T
5 A D
6 T A
7 F E
If the co-occurrences are reciprocal it does not matter what order the entries are in I only need their names co-occurrences are not I need to know the direction.
This feels like a graph problem so I have had a go but am unfamiliar with working with this type of data and most tutorials seem to cover undirected graphs. Looking at the tidygraph package which I understand uses the igraph package I have tried this:
library(tidygraph)
df %>%
as_tbl_graph(directed = TRUE) %>%
activate(edges) %>%
mutate(recip_occur = edge_is_mutual()) %>%
as_tibble() %>%
filter(recip_occur == TRUE)
# A tibble: 4 x 3
from to recip_occur
<int> <int> <lgl>
1 1 8 TRUE
2 2 7 TRUE
3 7 2 TRUE
4 8 1 TRUE
However this divorces the edges from the nodes and repeats reciprocal co-occurrences. Does anyone have experience with this sort of data?
try this:
data:
from <- c("A", "B", "F", "Q", "T", "S", "D", "E", "A", "T", "F")
to <- c("E", "D", "Q", "S", "F", "T", "B", "A", "D", "A", "E")
df <- data_frame(from, to)
code:
recursive_IND <-
1:nrow(df) %>%
sapply(function(x){
if(any((as.character(df[x,]) == t(df[,c(2,1)])) %>% {.[1,] & .[2,]}))
return(T) else return(F)
})
df[recursive_IND,][!(df[recursive_IND,] %>% apply(1,sort) %>% t %>% duplicated(.)),]
df[!recursive_IND,]
result:
# A tibble: 2 x 2
# from to
# <chr> <chr>
#1 A E
#2 B D
# A tibble: 7 x 2
# from to
# <chr> <chr>
#1 F Q
#2 Q S
#3 T F
#4 S T
#5 A D
#6 T A
#7 F E

R: select top products in groups

I need to select 3 top selling products in each category, but if category dose not have 3 products I should add more products from best available category ("a" being the best category, "c" worst).
Every day the products change so I would like to this automatically. Previously I did choose top 3 products and if there was not available I did not bothered, but unfortunately the conditions changed. For that I used code as follows:
Selected <- items %>% group_by(Cat) %>% dplyr:: filter(row_number() < 3) %>% ungroup
Sample data:
items <- data.frame(Cat = c("a", "a", "a", "b", "b", "c", "c", "c", "c", "c", "c", "c", "c", "c", "c"),
ranking = 1:15)
Desired results:
"a", "a", "a", "b", "b", "c", "c", "c", "c"
Sample data - 2:
items <- data.frame(Cat = c("a", "a", "a", "a", "b", "c", "c", "c", "c", "c", "c", "c", "c", "c", "c"),
ranking = 1:15)
Desired results - 2:
"a", "a", "a", "a", "b", "c", "c", "c", "c"
Here is a possible answer. I'm not entirely sure if I'm getting what you are after - if not let me know.
items <- data.frame(Cat = c("a", "a", "a",
"b", "b",
"c", "c", "c", "c", "c", "c", "c", "c", "c", "c"),
ranking = 1:15)
First we order the data according from best to worst category and add the count number within category.
Selected <- items %>% group_by(Cat) %>%
mutate(id = row_number()) %>%
ungroup() %>% arrange(Cat)
Then we can make the filter and fill up with remaining rows from best to worst
Selected %>% filter(id<=3) %>% # Select top 3 in each group
bind_rows(Selected %>% filter(id>3)) %>% # Merge with the ones that weren't selected
mutate(id=row_number()) %>%
filter(id <= 3*length(unique(Cat))) # Extract the right number
This produces
# A tibble: 9 x 3
Cat ranking id
<fctr> <int> <int>
1 a 1 1
2 a 2 2
3 a 3 3
4 b 4 4
5 b 5 5
6 c 6 6
7 c 7 7
8 c 8 8
9 c 9 9
The second data example yields
# A tibble: 9 x 3
Cat ranking id
<fctr> <int> <int>
1 a 1 1
2 a 2 2
3 a 3 3
4 b 5 4
5 c 6 5
6 c 7 6
7 c 8 7
8 a 4 8
9 c 9 9
which seems to be what you were after.

Resources