In R, I have a dataframe:
df <- read_delim("Visit_ID | Visit_Count | Cluster
A | 2 | orange
A | 2 | green
B | 2 | green
B | 2 | green
C | 3 | orange
C | 3 | orange
C | 3 | green
D | 3 | orange
D | 3 | green
D | 3 | orange", delim = "|", trim_ws = TRUE)
How I would like to get a breakdown of each Cluster's Visit_ID count by visit frequency. The resulting dataframe should look like this:
df_result <- read_delim("Cluster | VisitID_Frequency | Total_count
Orange | 1 | 1
Orange | 2 | 2
Orange | 3 | 0
Green | 1 | 3
Green | 2 | 1
Green | 3 | 0
", delim = "|", trim_ws = TRUE)
df %>% group_by(Visit_ID, Cluster) %>%
summarise(visit_count = n()) %>%
arrange(Cluster) %>%
group_by(Cluster, visit_count) %>%
count()
Related
Is there a way to summarize occurrences of variable values by another variable?
It's similar to pivoting from long to wide, but pivoting is done into a vector rather than into multiple variables
data have:
| var1 | var2 |
| :--: |:------:|
| 1 | 2 |
| 1 | 4 |
| 1 | 4 |
| 1 | 4 |
| 1 | 6 |
| 2 | 8 |
| 2 | 8 |
| 2 | 10 |
| 2 | 12 |
data want:
| var1 | var2 |
| :--: |:---------:|
| 1 | (2, 4, 6) |
| 2 | (8,10,12) |
We could create a list column after getting the unique elements
library(dplyr)
df1 %>%
distinct %>%
group_by(var1) %>%
summarise(var2 = list(var2))
A base R approach with aggregate
aggregate(. ~ var1, df, function(x) list(unique(x)))
var1 var2
1 1 2, 4, 6
2 2 8, 10, 12
I have a dataframe that I would like to reformat by making the first column its own row above the values.
I want this:
| Type | Value1 | Value2 |
| -----| ------ | ------ |
| A | 1 | 3 |
| B | 2 | 2 |
To become this with the rows containing "A" and "B" to be merged cells:
| | Value1 | Value2 |
| -----| ------ | ------ |
| A |
| | 1 | 3 |
| B |
| | 2 | 2 |
We could use insertRows function from berryFunctions package:
For your original data you may adapt c(1,3) -> for example with a sequence:
library (berryFunctions)
librar(dplyr)
insertRows(df, c(1,3), new="") %>%
mutate(Type = lead(Type, default = ""))
Type Value1 Value2
1 A
2 1 3
3 B
4 2 2
I have a df like this:
VisitID | Item |
1 | A |
1 | B |
1 | C |
1 | D |
2 | A |
2 | D |
2 | B |
3 | B |
3 | C |
4 | D |
4 | C |
In R, how do I filter for VisitIDs as long as they contain Item A & B?
Expected Outcome:
VisitID | Item |
1 | A |
1 | B |
1 | C |
1 | D |
2 | A |
2 | D |
2 | B |
I tried df %>% group_by(VisitID) %>% filter(any(Item == 'A' & Item == 'B')) but it doesn't work..
df <- read_delim("ID | Item
1 | A
1 | B
2 | A
3 | B
1 | C
4 | C
5 | B
3 | A
4 | A
5 | D", delim = "|", trim_ws = TRUE)
Since you want both "A" and "B" you can use all
library(dplyr)
df %>% group_by(VisitID) %>% filter(all(c("A", "B") %in% Item))
# VisitID Item
# <int> <chr>
#1 1 A
#2 1 B
#3 1 C
#4 1 D
#5 2 A
#6 2 D
#7 2 B
OR if you want to use any use them separately.
df %>% group_by(VisitID) %>% filter(any(Item == 'A') && any(Item == 'B'))
An otion with data.table
library(data.table)
setDT(df)[, .SD[all(c("A", "B") %in% Item)], VisitID]
This is what my data looks like:
+---------+--+----------+--+
| Subj_ID | | Location | |
+---------+--+----------+--+
| 1 | | 1 | |
| 1 | | 2 | |
| 1 | | 3 | |
| 2 | | 1 | |
| 2 | | 4 | |
| 2 | | 2 | |
| 3 | | 1 | |
| 3 | | 2 | |
| 3 | | 5 | |
+---------+--+----------+--+
In this dataset, only subject 1 has a location value of 3, so I want to label subject 1 as YES for intervention. Since subject 2 and 3 didn't have a location value of 3, they need to be labeled as false.
This is what I want the data to look like.
| Subj_ID | | Location | Intervention |
+---------+--+----------+--------------+
| 1 | | 1 | YES |
| 1 | | 2 | YES |
| 1 | | 3 | YES |
| 2 | | 1 | NO |
| 2 | | 4 | NO |
| 2 | | 3 | NO |
| 3 | | 1 | NO |
| 3 | | 2 | NO |
| 3 | | 5 | NO |
+---------+--+----------+-----+
Thanks in advance for the help! Dplyr preferred if possible.
An option with dplyr is after grouping by 'Subj_ID', check whether 3 is %in/% Location which returns a single TRUE/FALSE, change that to a numeric index to replace the values with "NO", "YES"
library(dplyr)
df1 %>%
group_by(Subj_ID) %>%
mutate(Intervention = c("NO", "YES")[(3 %in% Location)+1])
# A tibble: 9 x 3
# Groups: Subj_ID [3]
# Subj_ID Location Intervention
# <int> <dbl> <chr>
#1 1 1 YES
#2 1 2 YES
#3 1 3 YES
#4 2 1 NO
#5 2 4 NO
#6 2 2 NO
#7 3 1 NO
#8 3 2 NO
#9 3 5 NO
Or use any
df1 %>%
group_by(Subj_ID) %>%
mutate(Intervention = case_when(any(Location == 3) ~ "YES", TRUE ~ "NO"))
Or using base R
df1$Intervention <- with(df1, c("NO", "YES")[1 + (Subj_ID %in%
Subj_ID[Location == 3])])
data
df1 <- data.frame(Subj_ID = rep(1:3, each = 3),
Location = c(1:3, 1, 4, 2, 1, 2, 5))
We can use match for each Subj_ID to check if 3 is present in any Location.
library(dplyr)
df %>%
group_by(Subj_ID) %>%
mutate(Intervention = c('Yes', 'No')[is.na(match(3,Location)) + 1])
#Can also use
#mutate(Intervention = c('No', 'Yes')[(match(3,Location, nomatch = 0L) > 0) + 1])
# Subj_ID Location Intervention
# <int> <dbl> <chr>
#1 1 1 Yes
#2 1 2 Yes
#3 1 3 Yes
#4 2 1 No
#5 2 4 No
#6 2 2 No
#7 3 1 No
#8 3 2 No
#9 3 5 No
data
df <- structure(list(Subj_ID = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L),
Location = c(1, 2, 3, 1, 4, 2, 1, 2, 5)), class = "data.frame",
row.names = c(NA, -9L))
I want to spread name column.
d <- data.frame(ID = c(1,1,2,2,2,3,3),
name = c("a", "b", "a", "c", "d","c","d"))
| ID | name |
|-----|------|
| 1 | a |
| 1 | b |
| 2 | a |
| 2 | c |
| 2 | d |
| 3 | c |
| 3 | d |
using tidyr::spread() can get like under the data.frame
d %>% tidyr::spread(name,name)
| ID| a | b | c | d |
| 1 | a | b | NA| NA|
| 2 | a | NA| c | d |
| 3 | NA| NA| c | d |
but I want to get like this data.frame.
| ID | name1 | name2 | name3 |
|-----|-------|-------|-------|
| 1 | a | b | NA |
| 2 | a | c | d |
| 3 | c | d | NA |
We can create a new column and spread
library(tidyverse)
d %>%
group_by(ID) %>%
mutate(new = paste0("name", row_number())) %>%
spread(new, name)
# ID name1 name2 name3
#* <dbl> <fctr> <fctr> <fctr>
#1 1 a b NA
#2 2 a c d
#3 3 c d NA
It is relatively concise with dcast
library(data.table)
dcast(setDT(d), ID~paste0("name", rowid(ID)), value.var = "name")