Count the distinct value by row by group - r

I wish to calculate the unique values by row by group in r .The unique value by row should not include the blank cell.
for e.g,
df<-data.frame(
Group=c("A1","A1","A1","A1","A1","B1","B1","B1"),
Segment=c("A",NA,"A","B","A",NA,"A","B")
)
INPUT:
+---------+--------+
| Group |Segment |
+---------+--------+
| A1 |A |
| A1 |NA |
| A1 |A |
| A1 |B |
| A1 |A |
| B1 |NA |
| B1 |A |
| B1 |B |
+---------+--------+
I have used for loop in solving the problem but in the big dataset it is taking more time in getting the result.
Expected output in Distinct column
+---------+--------+----------+
| Group |Segment | distinct |
+---------+--------+----------+
| A1 |A | 1 |
| A1 |NA | 1 |
| A1 |A | 1 |
| A1 |B | 2 |
| A1 |A | 2 |
| B1 |NA | 0 |
| B1 |A | 1 |
| B1 |B | 1 |
+---------+--------+----------+

duplicated is useful for this, although the NAs make it a bit tricky:
library(dplyr)
df %>%
group_by(Group) %>%
mutate(distinct = cumsum(!duplicated(Segment) & !is.na(Segment)))
# A tibble: 8 x 3
# Groups: Group [2]
Group Segment distinct
<fct> <fct> <int>
1 A1 A 1
2 A1 NA 1
3 A1 A 1
4 A1 B 2
5 A1 A 2
6 B1 NA 0
7 B1 A 1
8 B1 B 2

Related

How to assign a numeric value (in new column) based on groupings of other columns R

I would like to assign each unique combination of variables a value and list those values in a new column called ID, as shown below. For example I would like patients who are Ta cancer, N0 lymph, and 1 immunotherapy ID'd as 1. Patients who are TA, NX, and 1 as ID 2 and so on... Below is a table of what the data looks like before, and what I would like it to look like as after. Data was loaded from .csv
So to summarize:
Patients TA, N0, 1 ID = 1
Patients TA, N0, 2 ID = 2
Patients TA, Nx, 0 ID = 3
Patients TA, Nx, 1 ID = 4
Patients TA, N0, 0 ID = 5
Patients TA, Nx, 2 ID = 6
Before:
| Cancer | Lymph |Immunotherapy
| -------- | -------- |---------
| TA | N0 |1
| TA | N0 |2
| TA | N0 |1
| TA | Nx |0
| TA | Nx |1
| TA | N0 |0
| TA | Nx |1
| TA | Nx |2
After:
| Cancer | Lymph |Immunotherapy|ID
| -------- | -------- |--------- |-------
| TA | N0 |1 | 1
| TA | N0 |2 | 2
| TA | N0 |1 | 1
| TA | Nx |0 | 3
| TA | Nx |1 | 4
| TA | N0 |0 | 5
| TA | Nx |1 | 4
| TA | Nx |2 | 6
I attempted to use group_by() dplyr and mutate with no luck. Any help would be much appreciated. Thanks!
in Base R:
d <- do.call(paste, df)
cbind(df, id = as.numeric(factor(d, unique(d))))
Cancer Lymph Immunotherapy id
1 TA N0 1 1
2 TA N0 2 2
3 TA N0 1 1
4 TA Nx 0 3
5 TA Nx 1 4
6 TA N0 0 5
7 TA Nx 1 4
8 TA Nx 2 6
library(dplyr)
df %>%
group_by(Cancer, Lymph, Immunotherapy) %>%
mutate(ID = cur_group_id()) %>%
ungroup()
alternatively:
df %>%
left_join(df %>%
distinct(Cancer,Lymph,Immunotherapy) %>%
mutate(ID = row_number())
)

R: Pivot from long to vector

Is there a way to summarize occurrences of variable values by another variable?
It's similar to pivoting from long to wide, but pivoting is done into a vector rather than into multiple variables
data have:
| var1 | var2 |
| :--: |:------:|
| 1 | 2 |
| 1 | 4 |
| 1 | 4 |
| 1 | 4 |
| 1 | 6 |
| 2 | 8 |
| 2 | 8 |
| 2 | 10 |
| 2 | 12 |
data want:
| var1 | var2 |
| :--: |:---------:|
| 1 | (2, 4, 6) |
| 2 | (8,10,12) |
We could create a list column after getting the unique elements
library(dplyr)
df1 %>%
distinct %>%
group_by(var1) %>%
summarise(var2 = list(var2))
A base R approach with aggregate
aggregate(. ~ var1, df, function(x) list(unique(x)))
var1 var2
1 1 2, 4, 6
2 2 8, 10, 12

How to make column value a new row in r

I have a dataframe that I would like to reformat by making the first column its own row above the values.
I want this:
| Type | Value1 | Value2 |
| -----| ------ | ------ |
| A | 1 | 3 |
| B | 2 | 2 |
To become this with the rows containing "A" and "B" to be merged cells:
| | Value1 | Value2 |
| -----| ------ | ------ |
| A |
| | 1 | 3 |
| B |
| | 2 | 2 |
We could use insertRows function from berryFunctions package:
For your original data you may adapt c(1,3) -> for example with a sequence:
library (berryFunctions)
librar(dplyr)
insertRows(df, c(1,3), new="") %>%
mutate(Type = lead(Type, default = ""))
Type Value1 Value2
1 A
2 1 3
3 B
4 2 2

join each row to the whole second table in R dplyr [duplicate]

This question already has answers here:
Cartesian product with dplyr
(7 answers)
Closed 1 year ago.
I have two tables:
table 1:
| | a | b |
|---|----|----|
| 1 | a1 | b1 |
| 2 | a2 | b2 |
and
table 2:
| | c | d |
|---|----|----|
| 1 | c1 | d1 |
| 2 | c2 | d2 |
I want to join them in a way that each row of table one bind column-wise with table two to get this result:
| | a | b | c | d |
|---|----|----|----|----|
| 1 | a1 | b1 | c1 | d1 |
| 2 | a1 | b1 | c2 | d2 |
| 3 | a2 | b2 | c1 | d1 |
| 4 | a2 | b2 | c2 | d2 |
I feel like this is a duplicated question, but I could not find right wordings and search terms to find the answer.
There is no need to join, we can use tidyr::expand_grid:
library(dplyr)
library(tidyr)
table1 <- tibble(a = c("a1", "a2"),
b = c("b1", "b2"))
table2 <- tibble(c = c("c1","c2"),
d = c("d1", "d2"))
expand_grid(table1, table2)
#> # A tibble: 4 x 4
#> a b c d
#> <chr> <chr> <chr> <chr>
#> 1 a1 b1 c1 d1
#> 2 a1 b1 c2 d2
#> 3 a2 b2 c1 d1
#> 4 a2 b2 c2 d2
Created on 2021-09-17 by the reprex package (v2.0.1)
I found a crude answer:
table1$key <- 1
table2$key <- 1
result <- left_join(table1,table2, by="key") %>%
select(-key)
Any better answers is much appreciated.

how to spread column data to rownames

I want to spread name column.
d <- data.frame(ID = c(1,1,2,2,2,3,3),
name = c("a", "b", "a", "c", "d","c","d"))
| ID | name |
|-----|------|
| 1 | a |
| 1 | b |
| 2 | a |
| 2 | c |
| 2 | d |
| 3 | c |
| 3 | d |
using tidyr::spread() can get like under the data.frame
d %>% tidyr::spread(name,name)
| ID| a | b | c | d |
| 1 | a | b | NA| NA|
| 2 | a | NA| c | d |
| 3 | NA| NA| c | d |
but I want to get like this data.frame.
| ID | name1 | name2 | name3 |
|-----|-------|-------|-------|
| 1 | a | b | NA |
| 2 | a | c | d |
| 3 | c | d | NA |
We can create a new column and spread
library(tidyverse)
d %>%
group_by(ID) %>%
mutate(new = paste0("name", row_number())) %>%
spread(new, name)
# ID name1 name2 name3
#* <dbl> <fctr> <fctr> <fctr>
#1 1 a b NA
#2 2 a c d
#3 3 c d NA
It is relatively concise with dcast
library(data.table)
dcast(setDT(d), ID~paste0("name", rowid(ID)), value.var = "name")

Resources