Transpose datatable from columns to rows based on a condition in R

Transpose datatable from columns to rows based on a condition in R - r

I have a dataset which includes data regarding the activities of different people in rows of 15 minutes. Something like this:
Id
Ins
Out
1
1
1
1
0
1
1
1
1
.
0
0
.
1
0
.
0
1
2
1
1
2
1
0
.
0
1
etc
1
1
Here each row corresponds to a 15-minute timeslot, but the time slots referring to different people are placed beneath each other. I want to have the data in the columns "ins" and "out" to be in the same row as their respective id. So for id = 1, the whole row contains the data that is now in the columns "ins" and "out". I tried to use transpose but this obviously transposes the whole table making it very short and extremely wide.
The desired output is
id
type
var1
var2
var3
var4
1
ins
1
0
1
etc
1
out
1
1
1
etc
2
ins
1
1
0
etc
2
out
1
0
1
etc
etc.

You could
group by Id to get the needed column indices with row_number
pivot longer to put Ins and Outs together
pivot wider to get the expected output
data <- read.table(text = '
Id Ins Out
1 1 1
1 0 1
1 1 1
1 0 0
1 1 0
1 0 1
2 1 1
2 1 0
2 0 1',header=T)
library(tidyr)
library(dplyr)
data %>% group_by(Id) %>%
mutate(Var = row_number()) %>%
pivot_longer(cols = c("Ins","Out"), names_to = 'type') %>%
pivot_wider(names_from = Var, names_prefix = 'Var', values_from = value)
#> # A tibble: 4 x 8
#> # Groups: Id [2]
#> Id type Var1 Var2 Var3 Var4 Var5 Var6
#> <int> <chr> <int> <int> <int> <int> <int> <int>
#> 1 1 Ins 1 0 1 0 1 0
#> 2 1 Out 1 1 1 0 0 1
#> 3 2 Ins 1 1 0 NA NA NA
#> 4 2 Out 1 0 1 NA NA NA

Related

extract duplicate row based on condition across column in R

I'm stuck trying to keep row based on condition in R. I want to keep row of data based on the same condition across a large number of columns. So in the below example I want to keep rows from duplicated rows where hv value '0' at each column.
here is the data frame:
ID A B C
1 001 1 1 1
2 002 0 1 0
3 002 1 0 0
4 003 0 1 1
5 003 1 0 1
6 003 0 0 1
I want get like this:
ID A B C
1 001 1 1 1
2 002 0 0 0
3 003 0 0 1
Any help would be much appreciated, thanks!

Please check this code
# A tibble: 6 × 4
ID A B C
<dbl> <dbl> <dbl> <dbl>
1 1 1 1 1
2 2 0 1 0
3 2 1 0 0
4 3 0 1 1
5 3 1 0 1
6 3 0 0 1
code
data2 <- data %>% group_by(ID) %>%
mutate(across(c('A','B','C'), ~ ifelse(.x==0, 0, NA), .names = 'x{col}')) %>%
fill(xA, xB, xC) %>%
mutate(across(c('xA','xB','xC'), ~ ifelse(is.na(.x), 1, .x))) %>%
ungroup() %>% group_by(ID) %>% slice_tail(n=1)
output
# A tibble: 3 × 7
# Groups: ID [3]
ID A B C xA xB xC
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 1 1 1 1 1
2 2 1 0 0 0 0 0
3 3 0 0 1 0 0 1

Retain records where condition is met across two rows and two columns

I have a dataset similar to this:
df <-
read.table(textConnection("ID Column1 Column2
A 0 1
A 1 0
A 1 0
A 1 0
A 0 1
A 1 0
A 0 1
A 0 0
A 1 0
A 1 0
B 0 1
B 1 0
C 0 1
C 0 0
C 1 0"), header=TRUE)
I am looking to do a group_by ID in dplyr that maintains records where Column2 = '1' and the record underneath it has Column1 = '1'. This may happen more than once per ID; all other records should be excluded. So the output from the above should be:
ID
Column1
Column2
A
0
1
A
1
0
A
0
1
A
1
0
B
0
1
B
1
0
Any help will be very much appreciated, thanks!

You could use lag and lead:
library(dplyr)
df %>%
group_by(ID) %>%
filter(lead(Column1) == 1 & Column2 == 1 |
Column1 == 1 & lag(Column2) == 1) %>%
ungroup()
# # A tibble: 6 × 3
# ID Column1 Column2
# <chr> <int> <int>
# 1 A 0 1
# 2 A 1 0
# 3 A 0 1
# 4 A 1 0
# 5 B 0 1
# 6 B 1 0

Here is an alternative approach:
library(dplyr)
df %>%
group_by(ID, x = rep(row_number(), each=2, length.out = n())) %>%
filter(sum(Column1)>=1 & sum(Column2)>=1) %>%
ungroup() %>%
select(-x)
ID Column1 Column2
<chr> <int> <int>
1 A 0 1
2 A 1 0
3 A 0 1
4 A 1 0
5 B 0 1
6 B 1 0

Is there a R function for preparing datasets for survival analysis like stset in Stata?

Datasets look like this
id start end failure x1
1 0 1 0 0
1 1 3 0 0
1 3 6 1 0
2 0 1 1 1
2 1 3 1 1
2 3 4 0 1
2 4 6 0 1
2 6 7 1 1
As you see, when id = 1, it's just the data input to coxph in survival package. However, when id = 2, at the beginning and end, failure occurs, but in the middle, failure disappears.
Is there a general function to extract data from id = 2 and get the result like id = 1?
I think when id = 2, the result should look like below.
id start end failure x1
1 0 1 0 0
1 1 3 0 0
1 3 6 1 0
2 3 4 0 1
2 4 6 0 1
2 6 7 1 1

A bit hacky, but should get the job done.
Data:
# Load data
library(tidyverse)
df <- read_table("
id start end failure x1
1 0 1 0 0
1 1 3 0 0
1 3 6 1 0
2 0 1 1 1
2 1 3 1 1
2 3 4 0 1
2 4 6 0 1
2 6 7 1 1
")
Data wrangling:
# Check for sub-groups within IDs and remove all but the last one
df <- df %>%
# Group by ID
group_by(
id
) %>%
mutate(
# Check if a new sub-group is starting (after a failure)
new_group = case_when(
# First row is always group 0
row_number() == 1 ~ 0,
# If previous row was a failure, then a new sub-group starts here
lag(failure) == 1 ~ 1,
# Otherwise not
TRUE ~ 0
),
# Assign sub-group number by calculating cumulative sums
group = cumsum(new_group)
) %>%
# Keep only last sub-group for each ID
filter(
group == max(group)
) %>%
ungroup() %>%
# Remove working columns
select(
-new_group, -group
)
Result:
> df
# A tibble: 6 × 5
id start end failure x1
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0 1 0 0
2 1 1 3 0 0
3 1 3 6 1 0
4 2 3 4 0 1
5 2 4 6 0 1
6 2 6 7 1 1

Add a column that count number of rows until the first 1, by group in R

I have the following dataset:
test_df=data.frame(Group=c(1,1,1,1,2,2),var1=c(1,0,0,1,1,1),var2=c(0,0,1,1,0,0),var3=c(0,1,0,0,0,1))
Group
var1
var2
var3
1
1
0
0
1
0
0
1
1
0
1
0
1
1
1
0
2
1
0
0
2
1
0
1
I want to add 3 columns (out1-3) for var1-3, which count number of rows until the first 1, by Group,
as shown below:
Group
var1
var2
var3
out1
out2
out3
1
1
0
0
1
3
2
1
0
0
1
1
3
2
1
0
1
0
1
3
2
1
1
1
0
1
3
2
2
1
0
0
1
0
2
2
1
0
1
1
0
2
I used this R code, I repeated it for my 3 variables, and my actual dataset contains more than only 3 columns.
But it is not working:
test_var1<-select(test_df,Group,var1 )%>%
group_by(Group) %>%
mutate(out1 = row_number()) %>%
filter(var1 != 0) %>%
slice(1)

df <- data.frame(Group=c(1,1,1,1,2,2),
var1=c(1,0,0,1,1,1),
var2=c(0,0,1,1,0,0),
var3=c(0,1,0,0,0,1))
This works for any number of variables as long as the structure is the same as in the example (i.e. Group + many variables that are 0 or 1)
df %>%
mutate(rownr = row_number()) %>%
pivot_longer(-c(Group, rownr)) %>%
group_by(Group, name) %>%
mutate(out = cumsum(value != 1 & (cumsum(value) < 1)) + 1,
out = ifelse(max(out) > n(), 0, max(out))) %>%
pivot_wider(names_from = c(name, name), values_from = c(value, out)) %>%
select(-rownr)
Returns:
Group value_var1 value_var2 value_var3 out_var1 out_var2 out_var3
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 0 0 1 3 2
2 1 0 0 1 1 3 2
3 1 0 1 0 1 3 2
4 1 1 1 0 1 3 2
5 2 1 0 0 1 0 2
6 2 1 0 1 1 0 2

If you only have 3 "out" variables then you can create three rows as follows
#1- Your dataset
df=data.frame(Group=rep(1,4),var1=c(1,0,0,1),var2=c(0,0,1,1),var3=c(0,1,0,0))
#2- Count the first row number with "1" value
df$out1=min(rownames(df)[which(df$var1==1)])
df$out2=min(rownames(df)[which(df$var2==1)])
df$out3=min(rownames(df)[which(df$var3==1)])
If you have more than 3 columns, then it may be better to create a loop for example
for(i in 1:3){
df[paste("out",i,sep="")]=min(rownames(df)[which(df[,which(colnames(df)==paste("var",i,sep=""))]==1)])
}

Making a conditional variable based on last observation in temporal data

ID T V1
1 1 1
1 2 1
2 1 0
2 2 0
3 1 1
3 2 1
3 3 1
I need a to make two variables from these data. The first needs to be a 1 on the last observation only when V1 = 1, and then a 1 on the last observation for all cases. Ideal final product:
ID T V1 v2 v3
1 1 1 0 0
1 2 1 1 1
2 1 0 0 0
2 2 0 0 1
3 1 1 0 0
3 2 1 0 0
3 3 1 1 1
Thanks in advance.

in the package dplyr, you can group your data according a variable (according ID in your case) and make operations for each group. As one of your column (T) already counts the rank of each observation (within each group), you can combine with the function n() which returns the number of rows of each group in order to obtain what you want.
Suppose your data are in the dataframe df :
df %>%
group_by(ID) %>%
mutate(
v2 = 1 * (`T` == n()),
v3 = 1 * (`T` == n()) * (V1 == 1)
)
# A tibble: 7 x 5
# Groups: ID [3]
ID T V1 v2 v3
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 1 0 0
2 1 2 1 1 1
3 2 1 0 0 0
4 2 2 0 1 0
5 3 1 1 0 0
6 3 2 1 0 0
7 3 3 1 1 1