Is there a way to summarize occurrences of variable values by another variable?
It's similar to pivoting from long to wide, but pivoting is done into a vector rather than into multiple variables
data have:
| var1 | var2 |
| :--: |:------:|
| 1 | 2 |
| 1 | 4 |
| 1 | 4 |
| 1 | 4 |
| 1 | 6 |
| 2 | 8 |
| 2 | 8 |
| 2 | 10 |
| 2 | 12 |
data want:
| var1 | var2 |
| :--: |:---------:|
| 1 | (2, 4, 6) |
| 2 | (8,10,12) |
We could create a list column after getting the unique elements
library(dplyr)
df1 %>%
distinct %>%
group_by(var1) %>%
summarise(var2 = list(var2))
A base R approach with aggregate
aggregate(. ~ var1, df, function(x) list(unique(x)))
var1 var2
1 1 2, 4, 6
2 2 8, 10, 12
Related
I have a dataframe that I would like to reformat by making the first column its own row above the values.
I want this:
| Type | Value1 | Value2 |
| -----| ------ | ------ |
| A | 1 | 3 |
| B | 2 | 2 |
To become this with the rows containing "A" and "B" to be merged cells:
| | Value1 | Value2 |
| -----| ------ | ------ |
| A |
| | 1 | 3 |
| B |
| | 2 | 2 |
We could use insertRows function from berryFunctions package:
For your original data you may adapt c(1,3) -> for example with a sequence:
library (berryFunctions)
librar(dplyr)
insertRows(df, c(1,3), new="") %>%
mutate(Type = lead(Type, default = ""))
Type Value1 Value2
1 A
2 1 3
3 B
4 2 2
I have a df like this:
VisitID | Item |
1 | A |
1 | B |
1 | C |
1 | D |
2 | A |
2 | D |
2 | B |
3 | B |
3 | C |
4 | D |
4 | C |
In R, how do I filter for VisitIDs as long as they contain Item A & B?
Expected Outcome:
VisitID | Item |
1 | A |
1 | B |
1 | C |
1 | D |
2 | A |
2 | D |
2 | B |
I tried df %>% group_by(VisitID) %>% filter(any(Item == 'A' & Item == 'B')) but it doesn't work..
df <- read_delim("ID | Item
1 | A
1 | B
2 | A
3 | B
1 | C
4 | C
5 | B
3 | A
4 | A
5 | D", delim = "|", trim_ws = TRUE)
Since you want both "A" and "B" you can use all
library(dplyr)
df %>% group_by(VisitID) %>% filter(all(c("A", "B") %in% Item))
# VisitID Item
# <int> <chr>
#1 1 A
#2 1 B
#3 1 C
#4 1 D
#5 2 A
#6 2 D
#7 2 B
OR if you want to use any use them separately.
df %>% group_by(VisitID) %>% filter(any(Item == 'A') && any(Item == 'B'))
An otion with data.table
library(data.table)
setDT(df)[, .SD[all(c("A", "B") %in% Item)], VisitID]
This is what my data looks like:
+---------+--+----------+--+
| Subj_ID | | Location | |
+---------+--+----------+--+
| 1 | | 1 | |
| 1 | | 2 | |
| 1 | | 3 | |
| 2 | | 1 | |
| 2 | | 4 | |
| 2 | | 2 | |
| 3 | | 1 | |
| 3 | | 2 | |
| 3 | | 5 | |
+---------+--+----------+--+
In this dataset, only subject 1 has a location value of 3, so I want to label subject 1 as YES for intervention. Since subject 2 and 3 didn't have a location value of 3, they need to be labeled as false.
This is what I want the data to look like.
| Subj_ID | | Location | Intervention |
+---------+--+----------+--------------+
| 1 | | 1 | YES |
| 1 | | 2 | YES |
| 1 | | 3 | YES |
| 2 | | 1 | NO |
| 2 | | 4 | NO |
| 2 | | 3 | NO |
| 3 | | 1 | NO |
| 3 | | 2 | NO |
| 3 | | 5 | NO |
+---------+--+----------+-----+
Thanks in advance for the help! Dplyr preferred if possible.
An option with dplyr is after grouping by 'Subj_ID', check whether 3 is %in/% Location which returns a single TRUE/FALSE, change that to a numeric index to replace the values with "NO", "YES"
library(dplyr)
df1 %>%
group_by(Subj_ID) %>%
mutate(Intervention = c("NO", "YES")[(3 %in% Location)+1])
# A tibble: 9 x 3
# Groups: Subj_ID [3]
# Subj_ID Location Intervention
# <int> <dbl> <chr>
#1 1 1 YES
#2 1 2 YES
#3 1 3 YES
#4 2 1 NO
#5 2 4 NO
#6 2 2 NO
#7 3 1 NO
#8 3 2 NO
#9 3 5 NO
Or use any
df1 %>%
group_by(Subj_ID) %>%
mutate(Intervention = case_when(any(Location == 3) ~ "YES", TRUE ~ "NO"))
Or using base R
df1$Intervention <- with(df1, c("NO", "YES")[1 + (Subj_ID %in%
Subj_ID[Location == 3])])
data
df1 <- data.frame(Subj_ID = rep(1:3, each = 3),
Location = c(1:3, 1, 4, 2, 1, 2, 5))
We can use match for each Subj_ID to check if 3 is present in any Location.
library(dplyr)
df %>%
group_by(Subj_ID) %>%
mutate(Intervention = c('Yes', 'No')[is.na(match(3,Location)) + 1])
#Can also use
#mutate(Intervention = c('No', 'Yes')[(match(3,Location, nomatch = 0L) > 0) + 1])
# Subj_ID Location Intervention
# <int> <dbl> <chr>
#1 1 1 Yes
#2 1 2 Yes
#3 1 3 Yes
#4 2 1 No
#5 2 4 No
#6 2 2 No
#7 3 1 No
#8 3 2 No
#9 3 5 No
data
df <- structure(list(Subj_ID = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L),
Location = c(1, 2, 3, 1, 4, 2, 1, 2, 5)), class = "data.frame",
row.names = c(NA, -9L))
I want to spread name column.
d <- data.frame(ID = c(1,1,2,2,2,3,3),
name = c("a", "b", "a", "c", "d","c","d"))
| ID | name |
|-----|------|
| 1 | a |
| 1 | b |
| 2 | a |
| 2 | c |
| 2 | d |
| 3 | c |
| 3 | d |
using tidyr::spread() can get like under the data.frame
d %>% tidyr::spread(name,name)
| ID| a | b | c | d |
| 1 | a | b | NA| NA|
| 2 | a | NA| c | d |
| 3 | NA| NA| c | d |
but I want to get like this data.frame.
| ID | name1 | name2 | name3 |
|-----|-------|-------|-------|
| 1 | a | b | NA |
| 2 | a | c | d |
| 3 | c | d | NA |
We can create a new column and spread
library(tidyverse)
d %>%
group_by(ID) %>%
mutate(new = paste0("name", row_number())) %>%
spread(new, name)
# ID name1 name2 name3
#* <dbl> <fctr> <fctr> <fctr>
#1 1 a b NA
#2 2 a c d
#3 3 c d NA
It is relatively concise with dcast
library(data.table)
dcast(setDT(d), ID~paste0("name", rowid(ID)), value.var = "name")
This question already has answers here:
Reshaping data.frame from wide to long format
(8 answers)
Closed 6 years ago.
EDIT:
Upon further examination, this dataset is way more insane than I previously believed.
Values have been encapsulated in the column names!
My dataframe looks like this:
| ID | Year1_A | Year1_B | Year2_A | Year2_B |
|----|---------|---------|---------|---------|
| 1 | a | b | 2a | 2b |
| 2 | c | d | 2c | 2d |
I am searching for a way to reformat it as such:
| ID | Year | _A | _B |
|----|------|-----|-----|
| 1 | 1 | a | b |
| 1 | 2 | 2a | 2b |
| 2 | 1 | c | d |
| 2 | 2 | 2c | 2d |
The answer below is great, and works perfectly, but the issue is that the dataframe needs more work -- somehow possibly be spread back out, so that each row has 3 columns.
My best idea was to do merge(df, df, by="ID") and then filter out the unwanted rows but this is quickly becoming unwieldy.
df <- data.frame(ID = 1:2, Year1_A = c('a', 'c'), Year1_B = c('b','d' ), Year2_A = c('2a', '2c'), Year2_B = c('2b', '2d'))
library(tidyr)
# your example data
df <- data.frame(ID = 1:2, Year1_A = c('a', 'c'), Year1_B = c('b','d' ), Year2_A = c('2a', '2c'), Year2_B = c('2b', '2d'))
# the solution
df <- gather(df, Year, value, -ID)
# cleaning up
df$Year <- gsub("Year", "", df$Year)
Result:
> df
ID Year value
1 1 1_A a
2 2 1_A c
3 1 1_B b
4 2 1_B d
5 1 2_A 2a
6 2 2_A 2c
7 1 2_B 2b
8 2 2_B 2d