I need to add 2 rows to a dataframe that have the same values as existing rows. For example, below I would need to add "a" = 3 with the same "b" values as "a" = 2, going from this:
| a | b |
| --| ------|
| 1 | higha |
| 1 | lowa |
| 2 | highb |
| 2 | lowb |
to this:
| a | b |
| --| ------|
| 1 | higha |
| 1 | lowa |
| 2 | highb |
| 2 | lowb |
| 3 | highb |
| 3 | lowb |
A one-liner in base R would be:
`rownames<-`(rbind(df, within(df[df$a == 2,], a <- 3)), NULL)
#> a b
#> 1 1 higha
#> 2 1 lowa
#> 3 2 highb
#> 4 2 lowb
#> 5 3 highb
#> 6 3 lowb
We may use
library(dplyr)
library(tidyr)
df %>%
uncount((a == 2)+1) %>%
mutate(a = replace(a, duplicated(b) & a == 2, 3)) %>%
arrange(a)
-output
# A tibble: 6 × 2
a b
<dbl> <chr>
1 1 higha
2 1 lowa
3 2 highb
4 2 lowb
5 3 highb
6 3 lowb
Or with base R
i1 <- df$a == 2
df[nrow(df) + seq_len(sum(i1)),] <- data.frame(a = 3, b = df$b[i1])
data
df <- data.frame(a = rep(1:2, each = 2),
b = c("higha", "lowa", "highb", "lowb"))
Related
I have a df like this:
VisitID | Item |
1 | A |
1 | B |
1 | C |
1 | D |
2 | A |
2 | D |
2 | B |
3 | B |
3 | C |
4 | D |
4 | C |
In R, how do I filter for VisitIDs as long as they contain Item A & B?
Expected Outcome:
VisitID | Item |
1 | A |
1 | B |
1 | C |
1 | D |
2 | A |
2 | D |
2 | B |
I tried df %>% group_by(VisitID) %>% filter(any(Item == 'A' & Item == 'B')) but it doesn't work..
df <- read_delim("ID | Item
1 | A
1 | B
2 | A
3 | B
1 | C
4 | C
5 | B
3 | A
4 | A
5 | D", delim = "|", trim_ws = TRUE)
Since you want both "A" and "B" you can use all
library(dplyr)
df %>% group_by(VisitID) %>% filter(all(c("A", "B") %in% Item))
# VisitID Item
# <int> <chr>
#1 1 A
#2 1 B
#3 1 C
#4 1 D
#5 2 A
#6 2 D
#7 2 B
OR if you want to use any use them separately.
df %>% group_by(VisitID) %>% filter(any(Item == 'A') && any(Item == 'B'))
An otion with data.table
library(data.table)
setDT(df)[, .SD[all(c("A", "B") %in% Item)], VisitID]
This is what my data looks like:
+---------+--+----------+--+
| Subj_ID | | Location | |
+---------+--+----------+--+
| 1 | | 1 | |
| 1 | | 2 | |
| 1 | | 3 | |
| 2 | | 1 | |
| 2 | | 4 | |
| 2 | | 2 | |
| 3 | | 1 | |
| 3 | | 2 | |
| 3 | | 5 | |
+---------+--+----------+--+
In this dataset, only subject 1 has a location value of 3, so I want to label subject 1 as YES for intervention. Since subject 2 and 3 didn't have a location value of 3, they need to be labeled as false.
This is what I want the data to look like.
| Subj_ID | | Location | Intervention |
+---------+--+----------+--------------+
| 1 | | 1 | YES |
| 1 | | 2 | YES |
| 1 | | 3 | YES |
| 2 | | 1 | NO |
| 2 | | 4 | NO |
| 2 | | 3 | NO |
| 3 | | 1 | NO |
| 3 | | 2 | NO |
| 3 | | 5 | NO |
+---------+--+----------+-----+
Thanks in advance for the help! Dplyr preferred if possible.
An option with dplyr is after grouping by 'Subj_ID', check whether 3 is %in/% Location which returns a single TRUE/FALSE, change that to a numeric index to replace the values with "NO", "YES"
library(dplyr)
df1 %>%
group_by(Subj_ID) %>%
mutate(Intervention = c("NO", "YES")[(3 %in% Location)+1])
# A tibble: 9 x 3
# Groups: Subj_ID [3]
# Subj_ID Location Intervention
# <int> <dbl> <chr>
#1 1 1 YES
#2 1 2 YES
#3 1 3 YES
#4 2 1 NO
#5 2 4 NO
#6 2 2 NO
#7 3 1 NO
#8 3 2 NO
#9 3 5 NO
Or use any
df1 %>%
group_by(Subj_ID) %>%
mutate(Intervention = case_when(any(Location == 3) ~ "YES", TRUE ~ "NO"))
Or using base R
df1$Intervention <- with(df1, c("NO", "YES")[1 + (Subj_ID %in%
Subj_ID[Location == 3])])
data
df1 <- data.frame(Subj_ID = rep(1:3, each = 3),
Location = c(1:3, 1, 4, 2, 1, 2, 5))
We can use match for each Subj_ID to check if 3 is present in any Location.
library(dplyr)
df %>%
group_by(Subj_ID) %>%
mutate(Intervention = c('Yes', 'No')[is.na(match(3,Location)) + 1])
#Can also use
#mutate(Intervention = c('No', 'Yes')[(match(3,Location, nomatch = 0L) > 0) + 1])
# Subj_ID Location Intervention
# <int> <dbl> <chr>
#1 1 1 Yes
#2 1 2 Yes
#3 1 3 Yes
#4 2 1 No
#5 2 4 No
#6 2 2 No
#7 3 1 No
#8 3 2 No
#9 3 5 No
data
df <- structure(list(Subj_ID = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L),
Location = c(1, 2, 3, 1, 4, 2, 1, 2, 5)), class = "data.frame",
row.names = c(NA, -9L))
at the moment I'm trying to figure out how to keep the names of an inner and other list nested within a tibble while unnesting.
The .id parameter of the unnest function is the closest I found, but it starts to number the values instead of using the given names.
here is a MWE with my idea of the final tibble:
library(dplyr)
library(tidyr)
df.1 <- tibble(
x = list("Foo","Bar"),
y = list(
list(a = list(aa = 1, ab = 2), b = list(ba = 6, bb = 22)),
list(c = list(ca = 561, cb = 35), d = list(da = 346, db = 17))
)
)
df.2 <- unnest(df.1, .id = "name.outher")
df.3 <- unnest(df.2, .id = "name.inner")
# How do I get from this:
#
#-----------------------------------------------------------------------
# x | y |
#-----+----------------------------------------------------------------+
# Foo | list(a = list(aa = 1, ab = 2), b = list(ba = 6, bb = 22)) |
#-----+----------------------------------------------------------------+
# Bar | list(c = list(ca = 561, cb = 35), d = list(da = 346, db = 17)) |
#-----------------------------------------------------------------------
#
# to this:
#
#---------------------------------------
# x | name.outher | y | name.inner |
#-----+-------------+-----+------------+
# Foo | a | 1 | aa |
#-----+-------------+-----+------------+
# Foo | a | 2 | ab |
#-----+-------------+-----+------------+
# Foo | b | 6 | ba |
#-----+-------------+-----+------------+
# Foo | b | 22 | bb |
#-----+-------------+-----+------------+
# Bar | c | 561 | ca |
#-----+-------------+-----+------------+
# Bar | c | 35 | cb |
#-----+-------------+-----+------------+
# Bar | d | 346 | da |
#-----+-------------+-----+------------+
# Bar | d | 17 | db |
#-------------------------------------
#
# instead of this:
#
#---------------------------------------
# x | name.outher | y | name.inner |
#-----+-------------+-----+------------+
# Foo | 1 | 1 | 1 |
#-----+-------------+-----+------------+
# Foo | 1 | 2 | 1 |
#-----+-------------+-----+------------+
# Foo | 1 | 6 | 2 |
#-----+-------------+-----+------------+
# Foo | 1 | 22 | 2 |
#-----+-------------+-----+------------+
# Bar | 2 | 561 | 3 |
#-----+-------------+-----+------------+
# Bar | 2 | 35 | 3 |
#-----+-------------+-----+------------+
# Bar | 2 | 346 | 4 |
#-----+-------------+-----+------------+
# Bar | 2 | 17 | 4 |
#---------------------------------------
Do you have any idea how i can preserve the names while unnesting this data structure?
We can melt
library(reshape2)
library(dplyr)
df.1 %>%
.$y %>%
melt %>%
select(x = L1, name.outher = L2, y = value, name.inner = L3)
# x name.outher y name.inner
#1 1 a 1 aa
#2 1 a 2 ab
#3 1 b 6 ba
#4 1 b 22 bb
#5 2 c 561 ca
#6 2 c 35 cb
#7 2 d 346 da
#8 2 d 17 db
Or use map and as_tibble
library(tidyverse)
df.1 %>%
pull(y) %>%
map_df(~ as_tibble(.x) %>%
map_df(~as_tibble(.x) %>%
gather(name.inner, y), .id = 'name.outer'),
.id = 'x')
# A tibble: 8 x 4
# x name.outer name.inner y
# <chr> <chr> <chr> <dbl>
#1 1 a aa 1
#2 1 a ab 2
#3 1 b ba 6
#4 1 b bb 22
#5 2 c ca 561
#6 2 c cb 35
#7 2 d da 346
#8 2 d db 17
I will post a reproducible Example.
id <- c(1,1,1,1,2,2,1,1)
group <- c("a","b","c","d","a","b","c","d")
df <- data.frame(id, group)
I want something like this as end result.
+====+========+========+
| id | group1 | group2 |
+====+========+========+
| 1 | a | b |
+----+--------+--------+
| 1 | b | c |
+----+--------+--------+
| 1 | c | d |
+----+--------+--------+
| 1 | d | - |
+----+--------+--------+
| 2 | a | b |
+----+--------+--------+
| 2 | b | - |
+----+--------+--------+
| 1 | c | d |
+----+--------+--------+
| 1 | d | - |
+----+--------+--------+
Just to mention the order of ID's matter. I have another column as timestamp.
One solution with dplyr and rleid from data.table:
library(dplyr)
df %>%
mutate(id2 = data.table::rleid(id)) %>%
group_by(id2) %>%
mutate(group2 = lead(group))
# A tibble: 8 x 4
# Groups: id2 [3]
id group id2 group2
<dbl> <fct> <int> <fct>
1 1.00 a 1 b
2 1.00 b 1 c
3 1.00 c 1 d
4 1.00 d 1 NA
5 2.00 a 2 b
6 2.00 b 2 NA
7 1.00 c 3 d
8 1.00 d 3 NA
If I understood correct your question, you can use the following function:
id <- c(1,1,1,1,2,2,1,1)
group <- c("a","b","c","d","a","b","c","d")
df <- data.frame(id, group)
add_group2 <- function(df) {
n <-length(group)
group2 <- as.character(df$group[2:n])
group2 <- c(group2, "-")
group2[which(c(df$id[-n] - c(df$id[2:n]), 0) != 0)] <- "-"
return(data.frame(df, group2))
}
add_group2(df)
Result should be:
id group group2
1 1 a b
2 1 b c
3 1 c d
4 1 d -
5 2 a b
6 2 b -
7 1 c d
8 1 d -
There are so many posts on how to get the group-wise min or max with SQL. But how do you do it in R?
Let's say, you have got the following data frame
ID | t | value
a | 1 | 3
a | 2 | 5
a | 3 | 2
a | 4 | 1
a | 5 | 5
b | 2 | 2
b | 3 | 1
b | 4 | 5
For every ID, I don't want the min t, but the value at the min t.
ID | value
a | 3
b| 2
df is your data.frame -
library(data.table)
setDT(df) # convert to data.table in place
df[, value[which.min(t)], by = ID]
Output -
> df[, value[which.min(t)], by = ID]
ID V1
1: a 3
2: b 2
You are looking for tapply:
df <- read.table(textConnection("
ID | t | value
a | 1 | 3
a | 2 | 5
a | 3 | 2
a | 4 | 1
a | 5 | 5
b | 2 | 2
b | 3 | 1
b | 4 | 5"), header=TRUE, sep="|")
m <- tapply(1:nrow(df), df$ID, function(i) {
df$value[i[which.min(df$t[i])]]
})
# a b
# 3 2
Two more solutions (with sgibb's df):
sapply(split(df, df$ID), function(x) x$value[which.min(x$t)])
#a b
#3 2
library(plyr)
ddply(df, .(ID), function(x) x$value[which.min(x$t)])
# ID V1
#1 a 3
#2 b 2