how to spread column data to rownames - r

I want to spread name column.
d <- data.frame(ID = c(1,1,2,2,2,3,3),
name = c("a", "b", "a", "c", "d","c","d"))
| ID | name |
|-----|------|
| 1 | a |
| 1 | b |
| 2 | a |
| 2 | c |
| 2 | d |
| 3 | c |
| 3 | d |
using tidyr::spread() can get like under the data.frame
d %>% tidyr::spread(name,name)
| ID| a | b | c | d |
| 1 | a | b | NA| NA|
| 2 | a | NA| c | d |
| 3 | NA| NA| c | d |
but I want to get like this data.frame.
| ID | name1 | name2 | name3 |
|-----|-------|-------|-------|
| 1 | a | b | NA |
| 2 | a | c | d |
| 3 | c | d | NA |

We can create a new column and spread
library(tidyverse)
d %>%
group_by(ID) %>%
mutate(new = paste0("name", row_number())) %>%
spread(new, name)
# ID name1 name2 name3
#* <dbl> <fctr> <fctr> <fctr>
#1 1 a b NA
#2 2 a c d
#3 3 c d NA
It is relatively concise with dcast
library(data.table)
dcast(setDT(d), ID~paste0("name", rowid(ID)), value.var = "name")

Related

R: Pivot from long to vector

Is there a way to summarize occurrences of variable values by another variable?
It's similar to pivoting from long to wide, but pivoting is done into a vector rather than into multiple variables
data have:
| var1 | var2 |
| :--: |:------:|
| 1 | 2 |
| 1 | 4 |
| 1 | 4 |
| 1 | 4 |
| 1 | 6 |
| 2 | 8 |
| 2 | 8 |
| 2 | 10 |
| 2 | 12 |
data want:
| var1 | var2 |
| :--: |:---------:|
| 1 | (2, 4, 6) |
| 2 | (8,10,12) |
We could create a list column after getting the unique elements
library(dplyr)
df1 %>%
distinct %>%
group_by(var1) %>%
summarise(var2 = list(var2))
A base R approach with aggregate
aggregate(. ~ var1, df, function(x) list(unique(x)))
var1 var2
1 1 2, 4, 6
2 2 8, 10, 12

How to make column value a new row in r

I have a dataframe that I would like to reformat by making the first column its own row above the values.
I want this:
| Type | Value1 | Value2 |
| -----| ------ | ------ |
| A | 1 | 3 |
| B | 2 | 2 |
To become this with the rows containing "A" and "B" to be merged cells:
| | Value1 | Value2 |
| -----| ------ | ------ |
| A |
| | 1 | 3 |
| B |
| | 2 | 2 |
We could use insertRows function from berryFunctions package:
For your original data you may adapt c(1,3) -> for example with a sequence:
library (berryFunctions)
librar(dplyr)
insertRows(df, c(1,3), new="") %>%
mutate(Type = lead(Type, default = ""))
Type Value1 Value2
1 A
2 1 3
3 B
4 2 2

R: How to filter column as long as it contains combination of values?

I have a df like this:
VisitID | Item |
1 | A |
1 | B |
1 | C |
1 | D |
2 | A |
2 | D |
2 | B |
3 | B |
3 | C |
4 | D |
4 | C |
In R, how do I filter for VisitIDs as long as they contain Item A & B?
Expected Outcome:
VisitID | Item |
1 | A |
1 | B |
1 | C |
1 | D |
2 | A |
2 | D |
2 | B |
I tried df %>% group_by(VisitID) %>% filter(any(Item == 'A' & Item == 'B')) but it doesn't work..
df <- read_delim("ID | Item
1 | A
1 | B
2 | A
3 | B
1 | C
4 | C
5 | B
3 | A
4 | A
5 | D", delim = "|", trim_ws = TRUE)
Since you want both "A" and "B" you can use all
library(dplyr)
df %>% group_by(VisitID) %>% filter(all(c("A", "B") %in% Item))
# VisitID Item
# <int> <chr>
#1 1 A
#2 1 B
#3 1 C
#4 1 D
#5 2 A
#6 2 D
#7 2 B
OR if you want to use any use them separately.
df %>% group_by(VisitID) %>% filter(any(Item == 'A') && any(Item == 'B'))
An otion with data.table
library(data.table)
setDT(df)[, .SD[all(c("A", "B") %in% Item)], VisitID]

R Find Total count broken down by Visit ID

In R, I have a dataframe:
df <- read_delim("Visit_ID | Visit_Count | Cluster
A | 2 | orange
A | 2 | green
B | 2 | green
B | 2 | green
C | 3 | orange
C | 3 | orange
C | 3 | green
D | 3 | orange
D | 3 | green
D | 3 | orange", delim = "|", trim_ws = TRUE)
How I would like to get a breakdown of each Cluster's Visit_ID count by visit frequency. The resulting dataframe should look like this:
df_result <- read_delim("Cluster | VisitID_Frequency | Total_count
Orange | 1 | 1
Orange | 2 | 2
Orange | 3 | 0
Green | 1 | 3
Green | 2 | 1
Green | 3 | 0
", delim = "|", trim_ws = TRUE)
df %>% group_by(Visit_ID, Cluster) %>%
summarise(visit_count = n()) %>%
arrange(Cluster) %>%
group_by(Cluster, visit_count) %>%
count()

Count the distinct value by row by group

I wish to calculate the unique values by row by group in r .The unique value by row should not include the blank cell.
for e.g,
df<-data.frame(
Group=c("A1","A1","A1","A1","A1","B1","B1","B1"),
Segment=c("A",NA,"A","B","A",NA,"A","B")
)
INPUT:
+---------+--------+
| Group |Segment |
+---------+--------+
| A1 |A |
| A1 |NA |
| A1 |A |
| A1 |B |
| A1 |A |
| B1 |NA |
| B1 |A |
| B1 |B |
+---------+--------+
I have used for loop in solving the problem but in the big dataset it is taking more time in getting the result.
Expected output in Distinct column
+---------+--------+----------+
| Group |Segment | distinct |
+---------+--------+----------+
| A1 |A | 1 |
| A1 |NA | 1 |
| A1 |A | 1 |
| A1 |B | 2 |
| A1 |A | 2 |
| B1 |NA | 0 |
| B1 |A | 1 |
| B1 |B | 1 |
+---------+--------+----------+
duplicated is useful for this, although the NAs make it a bit tricky:
library(dplyr)
df %>%
group_by(Group) %>%
mutate(distinct = cumsum(!duplicated(Segment) & !is.na(Segment)))
# A tibble: 8 x 3
# Groups: Group [2]
Group Segment distinct
<fct> <fct> <int>
1 A1 A 1
2 A1 NA 1
3 A1 A 1
4 A1 B 2
5 A1 A 2
6 B1 NA 0
7 B1 A 1
8 B1 B 2

Resources