Expand a numerical variable based on sequence and value to multi columns - r

I have a data like that :
structure(list(time = c(3L, 4L, 2L, 1L, 2L, 3L,
1L, 4L, 2L)), class = "data.frame", row.names = c(NA,
-9L))
These numbers are time in which participants have been in the study. I want to form the column to have a binomial column for each time.
structure(list(time = c(3L, 4L, 2L, 1L, 2L, 3L, 1L, 4L, 2L),
t1 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), t2 = c(1L, 1L,
1L, NA, 1L, 1L, NA, 1L, 1L), t3 = c(1L, 1L, NA, NA, NA, 1L,
NA, 1L, NA), t4 = c(NA, 1L, NA, NA, NA, NA, NA, 1L, NA)), class = "data.frame", row.names = c(NA,
-9L))

Another option with base R:
m <- matrix(nrow = nrow(dtt), ncol = max(dtt$time))
m[col(m) <= dtt$time] <- 1L
cbind(dtt, m)
# time 1 2 3 4
# 1 3 1 1 1 NA
# 2 4 1 1 1 1
# 3 2 1 1 NA NA
# 4 1 1 NA NA NA
# 5 2 1 1 NA NA
# 6 3 1 1 1 NA
# 7 1 1 NA NA NA
# 8 4 1 1 1 1
# 9 2 1 1 NA NA

Here's a pretty direct approach with base R (calling your input df):
max_length = max(df$time)
rows = lapply(df$time, function(t) c(rep(1, t), rep(NA, max_length - t)))
result = cbind(df, do.call(rbind, rows))
names(result)[-1] = paste0("t", names(result)[-1])
result
# time t1 t2 t3 t4
# 1 3 1 1 1 NA
# 2 4 1 1 1 1
# 3 2 1 1 NA NA
# 4 1 1 NA NA NA
# 5 2 1 1 NA NA
# 6 3 1 1 1 NA
# 7 1 1 NA NA NA
# 8 4 1 1 1 1
# 9 2 1 1 NA NA

Here's a tidyverse approach :
library(dplyr)
library(tidyr)
df %>%
mutate(row = row_number()) %>%
uncount(time, .remove = FALSE) %>%
group_by(row) %>%
mutate(col = row_number()) %>%
pivot_wider(names_from = col, values_from = col,
values_fn = length, names_prefix = 't') %>%
ungroup %>%
select(-row)
# time t1 t2 t3 t4
# <int> <int> <int> <int> <int>
#1 3 1 1 1 NA
#2 4 1 1 1 1
#3 2 1 1 NA NA
#4 1 1 NA NA NA
#5 2 1 1 NA NA
#6 3 1 1 1 NA
#7 1 1 NA NA NA
#8 4 1 1 1 1
#9 2 1 1 NA NA

Related

Replace NA values in dataframe with variables in the next column (R)

I am new still trying to learn R and I could not find the answers I am looking for in any other thread.
I have a dataset with (for simplicity) 5 columns. Columns 1,2, and4 always have values, but in some rows column 3 doesn't. Below is an example:
Current
A B C D E
1 1 2 3
1 2 NA 4 5
1 2 3 4
1 3 NA 9 7
1 2 NA 5 6
I want to make it so that the NA's are replaced by the value in column D, and then the value in col E is shifted to D, etc.
Desired output:
A B C D E
1 1 2 3 NA
1 2 4 5 NA
1 2 3 4 NA
1 3 9 7 NA
1 2 5 6 NA
I copied what was on different Stack overflow threads and none achieved what I wanted.
na.omit gets rid of the row. Any help is greatly appreciated.
Data
data <- structure(list(A = c(1L, 1L, 1L, 1L, 1L), B = c(1L, 2L, 2L, 3L,
2L), C = c(2L, NA, 3L, NA, NA), D = c(3L, 4L, 4L, 9L, 5L), E = c(NA,
5L, NA, 7L, 6L)), class = "data.frame", row.names = c(NA, -5L
))
Code
library(dplyr)
data %>%
mutate(
aux = C,
C = if_else(is.na(aux),D,C),
D = if_else(is.na(aux),E,D),
E = NA
) %>%
select(-aux)
Output
A B C D E
1 1 1 2 3 NA
2 1 2 4 5 NA
3 1 2 3 4 NA
4 1 3 9 7 NA
5 1 2 5 6 NA
Replacement operation all in one go:
dat[is.na(dat$C), c("C","D","E")] <- c(dat[is.na(dat$C), c("D","E")], NA)
dat
# A B C D E
#1 1 1 2 3 NA
#2 1 2 4 5 NA
#3 1 2 3 4 NA
#4 1 3 9 7 NA
#5 1 2 5 6 NA
Where dat was:
dat <- read.table(text="A B C D E
1 1 2 3
1 2 NA 4 5
1 2 3 4
1 3 NA 9 7
1 2 NA 5 6", fill=TRUE, header=TRUE)
Using shift_row_values
library(hacksaw)
shift_row_values(df1)
A B C D E
1 1 1 2 3 NA
2 1 2 4 5 NA
3 1 2 3 4 NA
4 1 3 9 7 NA
5 1 2 5 6 NA
data
df1 <- structure(list(A = c(1L, 1L, 1L, 1L, 1L), B = c(1L, 2L, 2L, 3L,
2L), C = c(2L, NA, 3L, NA, NA), D = c(3L, 4L, 4L, 9L, 5L), E = c(NA,
5L, NA, 7L, 6L)), class = "data.frame", row.names = c(NA, -5L
))
A base R universal approach using order without prior knowledge of NA positions.
setNames(data.frame(t(apply(data, 1, function(x)
x[order(is.na(x))]))), colnames(data))
A B C D E
1 1 1 2 3 NA
2 1 2 4 5 NA
3 1 2 3 4 NA
4 1 3 9 7 NA
5 1 2 5 6 NA
Using dplyr
library(dplyr)
t(data) %>%
data.frame() %>%
mutate(across(everything(), ~ .x[order(is.na(.x))])) %>%
t() %>%
as_tibble()
# A tibble: 5 × 5
A B C D E
<int> <int> <int> <int> <int>
1 1 1 2 3 NA
2 1 2 4 5 NA
3 1 2 3 4 NA
4 1 3 9 7 NA
5 1 2 5 6 NA
Data
data <- structure(list(A = c(1L, 1L, 1L, 1L, 1L), B = c(1L, 2L, 2L, 3L,
2L), C = c(2L, NA, 3L, NA, NA), D = c(3L, 4L, 4L, 9L, 5L), E = c(NA,
5L, NA, 7L, 6L)), class = "data.frame", row.names = c(NA, -5L
))

Assigning 1 or 0 in new column based on similar ID BUT sum not to exceed value in another column in R

See table below: I want to assign 1 or 0 to a new_col but the sum of 1s per unique hhid column should not exceed the value of any element in the column "nets" as seen in the table below, assuming new_col doesn't exist
hhid nets new_col
1 1 3 1
1 1 3 1
1 1 3 1
1 1 3 0
1 2 2 1
1 2 2 1
1 2 2 0
1 3 2 1
1 3 2 1
1 3 2 0
1 3 2 0
I tried code below
df %>% group_by(hhid) %>% mutate(new_col = ifelse(summarise(across(new_col), sum)<= df$nets),1,0)
Try this:
Data:
df <- structure(list(hhid = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L,
3L), nets = c(3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L)), class = "data.frame", row.names = c(NA,
-11L))
hhid nets
1 1 3
2 1 3
3 1 3
4 1 3
5 2 2
6 2 2
7 2 2
8 3 2
9 3 2
10 3 2
11 3 2
Code:
df %>%
group_by(hhid) %>%
mutate(new_col = ifelse(row_number() <= nets,1,0))
Output:
# A tibble: 11 x 3
# Groups: hhid [3]
hhid nets new_col
<int> <int> <dbl>
1 1 3 1
2 1 3 1
3 1 3 1
4 1 3 0
5 2 2 1
6 2 2 1
7 2 2 0
8 3 2 1
9 3 2 1
10 3 2 0
11 3 2 0
Same solution but using data.table instead of dplyr
dt <- structure(list(hhid = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L,
3L), nets = c(3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L)), row.names = c(NA,
-11L), class = c("data.frame"))
library(data.table)
setDT(dt)
dt[, new_col := +(seq_len(.N) <= nets), by = hhid]
dt
hhid nets new_col
1: 1 3 1
2: 1 3 1
3: 1 3 1
4: 1 3 0
5: 2 2 1
6: 2 2 1
7: 2 2 0
8: 3 2 1
9: 3 2 1
10: 3 2 0
11: 3 2 0

Replace values in multiple columns with NA based on value in a different column

I have a tibble...
# A tibble: 20 x 6
id X_1 Y_1 number X_2 Y_2
<int> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 3 1 1 3
2 1 1 3 0 1 3
3 2 2 4 1 2 4
4 2 2 4 0 2 4
5 3 1 3 1 1 3
6 3 1 3 0 1 3
I want to make all values equal NA if the value in the number column equals 1, but only in columns ending "_1" (so X_1 and Y_1).
I would also like to do the opposite in _2 columns (i.e. rows where number equals zero become NA).
It should end up looking like this...
# A tibble: 20 x 6
id X_1 Y_1 number X_2 Y_2
<int> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 NA NA 1 1 3
2 1 1 3 0 1 3
3 2 NA NA 1 2 4
4 2 2 4 0 2 4
5 3 NA NA 1 1 3
6 3 1 3 0 1 3
I tried the following...
df %>% mutate_at(vars(contains("_1")), .funs = list(~if_else(number == 1, NA_real_, .)))
But that didn't work.
I work mostly using tidyverse, so tidyverse solution would be preferable.
Here a solution that actually evaluates if the variable number is 0 or 1 (previous solutions evaluated whether the varible that end with "_1" or "_2" are 1 or 0).
library(dplyr)
df %>%
mutate(across((ends_with("_1")), ~ na_if(number, 1)),
(across((ends_with("_2")), ~ na_if(number, 0))))
# A tibble: 6 x 6
id X_1 Y_1 number X_2 Y_2
<int> <int> <int> <int> <int> <int>
1 1 NA NA 1 1 1
2 1 0 0 0 NA NA
3 2 NA NA 1 1 1
4 2 0 0 0 NA NA
5 3 NA NA 1 1 1
6 3 0 0 0 NA NA
Edit (keep original values)
df %>%
mutate(across((ends_with("_1")), ~if_else(number == 1, NA_integer_, .))) %>%
mutate(across((ends_with("_2")), ~if_else(number == 0, NA_integer_, .)))
# A tibble: 6 x 6
id X_1 Y_1 number X_2 Y_2
<int> <int> <int> <int> <int> <int>
1 1 NA NA 1 1 3
2 1 1 3 0 NA NA
3 2 NA NA 1 2 4
4 2 2 4 0 NA NA
5 3 NA NA 1 1 3
6 3 1 3 0 NA NA
Data
df <- tibble::tribble(
~id, ~X_1, ~Y_1, ~number, ~X_2, ~Y_2,
1L, 1L, 3L, 1L, 1L, 3L,
1L, 1L, 3L, 0L, 1L, 3L,
2L, 2L, 4L, 1L, 2L, 4L,
2L, 2L, 4L, 0L, 2L, 4L,
3L, 1L, 3L, 1L, 1L, 3L,
3L, 1L, 3L, 0L, 1L, 3L
)
if yor data is large, speed may be gained using the data.table-package like this
library( data.table )
#first make your data a data.table, using `setDT( mydata )`
cols <- grep( "_1$", names(DT), value = TRUE )
for(col in cols) set(dt, i=which(dt[[col]]==1), j=cols, value=NA)
cols <- grep( "_2$", names(DT), value = TRUE )
for(col in cols) set(dt, i=which(dt[[col]]==0), j=cols, value=NA)

I want to make a new column where its first and last row with respect of 3 groups is 2 and NA otherwise

I have 3 groups
group1 group2 group3 time
1 1 1 3:0
1 1 1 4:0
1 1 1 9:0
1 2 1 6:0
1 2 2 5:0
1 2 2 2:0
1 2 2 1:0
2 1 1 3:0
2 3 2 1:0
new column
group1 group2 group3 time new
1 1 1 3:0 2
1 1 1 4:0 NA
1 1 1 9:0 2
1 2 1 6:0 2
1 2 2 5:0 2
1 2 2 2:0 NA
1 2 2 1:0 2
2 1 1 3:0 2
2 3 2 1:0 2
the first and last row of group_by(group1,group2,group3) is 2 and other rows is NA. I know I can get it with slice and mutate but I couldn't find the right format.
d %>%
group_by_at(vars(-time)) %>%
mutate(new = replace(NA, range(row_number()), 2))
## A tibble: 9 x 5
## Groups: group1, group2, group3 [5]
# group1 group2 group3 time new
# <int> <int> <int> <chr> <dbl>
#1 1 1 1 3:0 2
#2 1 1 1 4:0 NA
#3 1 1 1 9:0 2
#4 1 2 1 6:0 2
#5 1 2 2 5:0 2
#6 1 2 2 2:0 NA
#7 1 2 2 1:0 2
#8 2 1 1 3:0 2
#9 2 3 2 1:0 2
Check for row_number in ifelse
library(dplyr)
df %>%
group_by(group1, group2, group3) %>%
mutate(new = ifelse(row_number() %in% c(1L, n()), 2, NA))
#OR from #d.b
#mutate(new = ifelse(row_number() %in% range(row_number()), 2, NA))
# group1 group2 group3 time new
# <int> <int> <int> <fct> <dbl>
#1 1 1 1 3:0 2
#2 1 1 1 4:0 NA
#3 1 1 1 9:0 2
#4 1 2 1 6:0 2
#5 1 2 2 5:0 2
#6 1 2 2 2:0 NA
#7 1 2 2 1:0 2
#8 2 1 1 3:0 2
#9 2 3 2 1:0 2
We could implement the same logic in base R or data.table
df$new <- with(df, ave(group1, group1, group2, group3, FUN = function(x)
ifelse(seq_along(x) %in% c(1L, length(x)), 2, NA)))
library(data.table)
setDT(df)[, new := ifelse(seq_along(time) %in% c(1L, .N), 2, NA),
.(group1, group2, group3)]
data
df <- structure(list(group1 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L),
group2 = c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 3L), group3 = c(1L,
1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L), time = structure(c(3L, 4L,
7L, 6L, 5L, 2L, 1L, 3L, 1L), .Label = c("1:0", "2:0", "3:0",
"4:0", "5:0", "6:0", "9:0"), class = "factor")), class = "data.frame",
row.names = c(NA, -9L))
Here is one option with data.table with .I and it should be more efficient
library(data.table)
nm1 <- grep("^group\\d+$", names(df1), value = TRUE)
i1 <- setDT(df1)[, .I[c(1, .N)], by = nm1]$V1
df1[i1, new := 2][]
# group1 group2 group3 time new
#1: 1 1 1 3:0 2
#2: 1 1 1 4:0 NA
#3: 1 1 1 9:0 2
#4: 1 2 1 6:0 2
#5: 1 2 2 5:0 2
#6: 1 2 2 2:0 NA
#7: 1 2 2 1:0 2
#8: 2 1 1 3:0 2
#9: 2 3 2 1:0 2
Or using dplyr
library(dplyr)
df1 %>%
group_by_at(vars(starts_with('group'))) %>%
mutate(new = 2 * NA^ !row_number() %in% c(1, n()))
# A tibble: 9 x 5
# Groups: group1, group2, group3 [5]
# group1 group2 group3 time new
# <int> <int> <int> <fct> <dbl>
#1 1 1 1 3:0 2
#2 1 1 1 4:0 NA
#3 1 1 1 9:0 2
#4 1 2 1 6:0 2
#5 1 2 2 5:0 2
#6 1 2 2 2:0 NA
#7 1 2 2 1:0 2
#8 2 1 1 3:0 2
#9 2 3 2 1:0 2
data
df1 <- structure(list(group1 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L),
group2 = c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 3L), group3 = c(1L,
1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L), time = structure(c(3L, 4L,
7L, 6L, 5L, 2L, 1L, 3L, 1L), .Label = c("1:0", "2:0", "3:0",
"4:0", "5:0", "6:0", "9:0"), class = "factor")), class = "data.frame",
row.names = c(NA, -9L))

selecting the last row of a group [duplicate]

This question already has answers here:
Numbering rows within groups in a data frame
(10 answers)
Closed 3 years ago.
I have a data frame.
household person trip loop
1 1 1 1
1 1 2 1
1 1 3 1
1 1 4 2
1 1 5 2
1 2 1 1
1 2 2 1
1 2 3 2
2 1 1 1
2 1 2 1
2 1 3 2
2 1 4 2
for each person in each household I want to change some of index in column trip as below:
when loop is changed I want the trip index Strats from 1 agin.
output
household person trip loop
1 1 1 1
1 1 2 1
1 1 3 1
1 1 1 2
1 1 2 2
1 2 1 1
1 2 2 1
1 2 1 2
2 1 1 1
2 1 2 1
2 1 1 2
2 1 2 2
We can use
library(dplyr)
df1 %>%
group_by(household, person, loop) %>%
mutate(trip = row_number())
# A tibble: 12 x 4
# Groups: household, person, loop [6]
# household person trip loop
# <int> <int> <int> <int>
# 1 1 1 1 1
# 2 1 1 2 1
# 3 1 1 3 1
# 4 1 1 1 2
# 5 1 1 2 2
# 6 1 2 1 1
# 7 1 2 2 1
# 8 1 2 1 2
# 9 2 1 1 1
#10 2 1 2 1
#11 2 1 1 2
#12 2 1 2 2
data
df1 <- structure(list(household = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L), person = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L,
1L, 1L, 1L), trip = c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 1L, 2L,
3L, 4L), loop = c(1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L,
2L)), class = "data.frame", row.names = c(NA, -12L))
Using data.table :
library(data.table)
df <- setDT(df) # Making sure your data is a data table
df[, trip := seq_len(.N), by = .(household, person, loop)]

Resources