How to use forloop to create a sequence of columns - r

I need to create the variable x out of the variable y as below.
df$x<-0
df$x<-ifelse(df$y==0 | df$y==1, 1, 0)
df$x[is.na(df$x)] <- 0
However i hhave y ranging from 1 to 52, which means i need to create x1 thru x52. I am an avid stata user and it is pretty straight forward to do using the forval function. However I am having difficulties doing it in R. I thought about the following, but it didn't workout very well:
for (i in 1:52){
df$x[i] <- 0
.
.
.
}
I thought i could let r replace the i by the values from the loop the same way stata does.
thanks

Try something like this. Here an example using dummy data:
set.seed(123)
#Data
df <- as.data.frame(matrix(rnorm(520),nrow = 10,ncol = 52))
names(df) <- paste0('y',1:52)
#new names
vals <- paste0('x',1:52)
#Loop
for(i in vals)
{
df[[i]]<-0
df[[i]]<-ifelse(df[[gsub('x','y',i)]]==0 | df[[gsub('x','y',i)]]==1, 1, 0)
df[[i]][is.na(df[[i]])] <- 0
}

Suppose you had data that looked something like this:
data
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
1 1 3 0 NA 3 NA 3 1 2 3
2 1 1 NA NA 3 0 3 3 0 0
3 0 1 1 0 2 2 1 1 0 1
4 0 NA 2 1 3 2 NA 0 2 0
5 1 2 NA 2 0 1 2 3 2 3
6 3 NA 1 3 NA NA NA 3 NA 3
7 2 NA 3 3 NA 0 NA 1 1 1
8 NA 3 2 1 1 NA 1 0 1 2
9 0 1 0 NA NA 0 2 0 NA 2
10 1 0 3 0 3 2 NA 0 1 2
One approach might be to use dplyr::mutate with across:
library(tidyverse)
data %>%
mutate(across(everything(),~ ifelse(. %in% c(0,1), 1, 0),
.names = "y{.col}")) %>%
rename_all(~str_replace(.,"yx","y"))
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 y1 y2 y3 y4 y5 y6 y7 y8 y9 y10
1 2 1 2 2 2 2 0 1 1 0 0 1 0 0 0 0 1 1 1 1
2 3 2 3 1 3 3 3 2 0 1 0 0 0 1 0 0 0 0 1 1
3 0 2 2 1 0 2 3 0 2 0 1 0 0 1 1 0 0 1 0 1
4 2 2 3 1 0 0 1 1 2 3 0 0 0 1 1 1 1 1 0 0
5 1 3 1 2 2 2 3 2 3 3 1 0 1 0 0 0 0 0 0 0
6 3 1 1 1 0 3 2 2 1 2 0 1 1 1 1 0 0 0 1 0
7 1 1 3 1 3 1 1 0 1 2 1 1 0 1 0 1 1 1 1 0
8 1 2 3 3 2 1 2 2 2 0 1 0 0 0 0 1 0 0 0 1
9 1 2 3 0 2 3 0 0 2 1 1 0 0 1 0 0 1 1 0 1
10 2 0 1 0 3 2 3 2 2 3 0 1 1 1 0 0 0 0 0 0
Example data:
set.seed(123)
data <- as.data.frame(matrix(sample(c(NA,0:3),100,replace = TRUE),ncol =10))
names(data) <- paste0("x",1:10)

Related

Count of a string of values between values

I have a simple dataframe that is a set of ID columns and values of 0 or 1, for an example:
data.frame(replicate(10,sample(0:1,1000,rep=TRUE)))
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 1 1 0 1 0 0 1 1 1 0
2 0 0 0 1 0 1 0 0 1 0
3 0 1 1 1 1 0 1 1 1 1
4 0 0 0 1 1 1 1 1 1 0
5 1 0 1 0 1 1 0 1 1 0
6 0 1 1 1 1 1 0 1 1 1
I want to write a code or loop that for every column, counts the number of 0's until encountering another 1, and continues down the column. So ideally the output is a new dataframe with the same ID column head, and a list of counts:
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 3 1 2 1 2 1 1 1 NA 2
2 1 2 1 1 NA 1 2 NA NA 2
I'm not sure how to do this and also the row outcome may be of different lengths. If each column has to create a new dataframe that's fine.
Here's a base R solution. I used a size-10 example instead of a size 1000 example so we can actually see what's going on and make sure it looks right.
set.seed(47)
d = data.frame(replicate(10,sample(0:1,10,rep=TRUE)))
d
# X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
# 1 0 0 0 0 0 0 1 1 0 0
# 2 0 1 0 1 0 0 0 0 0 0
# 3 1 1 1 0 1 0 0 0 1 0
# 4 0 0 0 0 0 1 1 1 1 1
# 5 1 1 0 1 0 0 1 1 1 0
# 6 0 1 1 1 1 1 1 1 0 1
# 7 1 1 0 0 1 0 0 1 1 0
# 8 0 0 1 0 1 0 1 0 0 0
# 9 0 0 0 1 1 1 0 0 1 1
# 10 1 1 1 0 1 0 1 1 0 0
results = lapply(d, function(x) with(rle(x), lengths[values == 0]))
max_length = max(lengths(results))
results = lapply(results, function(x) {length(x) = max_length; x})
results = do.call(cbind, results)
results
# X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
# [1,] 2 1 2 1 2 3 2 2 2 3
# [2,] 1 1 2 2 2 1 1 2 1 1
# [3,] 1 2 1 2 NA 2 1 NA 1 2
# [4,] 2 NA 1 1 NA 1 NA NA 1 1
One dplyr and purrr option could be:
map(.x = names(df),
~ df %>%
mutate(rleid = with(rle(get(.x)), rep(seq_along(lengths), lengths))) %>%
filter(get(.x) == 0) %>%
group_by(rleid = cumsum(!duplicated(rleid))) %>%
summarise(!!.x := n())) %>%
reduce(full_join, by = c("rleid" = "rleid"))
rleid X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
<int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 1 1 1 2 2 9 2 3 4 1 1
2 2 1 1 NA 3 NA 1 1 2 1 1
3 3 1 3 NA NA NA 2 1 NA 2 2
4 4 1 NA NA NA NA 1 NA NA 1 2
Sample data:
set.seed(123)
df <- data.frame(replicate(10, sample(0:1, 10, rep = TRUE)))
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 0 1 1 1 0 0 1 1 0 0
2 1 0 1 1 0 0 0 1 1 1
3 0 1 1 1 0 1 0 1 0 0
4 1 1 1 1 0 0 0 0 1 1
5 1 0 1 0 0 1 1 0 0 0
6 0 1 1 0 0 0 0 0 0 0
7 1 0 1 1 0 0 1 0 1 1
8 1 0 1 0 0 1 1 1 1 0
9 1 0 0 0 0 1 1 0 1 0
10 0 1 0 0 1 0 0 0 0 1
Here's an alternate approach that uses the indices of the 1 values to determine the runs of zero (using Gregor's data):
library(purrr)
map(df, ~ {
y <- diff(c(0, which(.x == 1), nrow(df) + 1)) - 1
y[y != 0]
}) %>%
map_df(`length<-`, max(lengths(.)))
# A tibble: 4 x 10
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2 1 2 1 2 3 2 2 2 3
2 1 1 2 2 2 1 1 2 1 1
3 1 2 1 2 NA 2 1 NA 1 2
4 2 NA 1 1 NA 1 NA NA 1 1
Or same in base R:
res <- lapply(df, function(x) {
y <- diff(c(0, which(x == 1), nrow(df) + 1)) - 1
y[y != 0]})
data.frame(do.call(cbind, lapply(res, `length<-`, max(lengths(res)))))

Count several rows and make a new column in R

I want to count several rows (x1-x4) and make a new column (x1_x4) in R looks like the below picture. Can anyone help me?
df <- data.frame(ID = c(1,2,3,4,5,6,7,8,9,10),
x1 = c(0,NA,0,1,0,0,1,1,1,NA),
x2 = c(0,NA,1,0,0,NA,0,1,0,0),
x3 = c(0,NA,0,1,1,0,1,1,1,0),
x4 = c(0,NA,0,0,0,0,1,1,1,1))
You can use rowSums and test with apply if all are NA.
df$x1_x4 <- rowSums(df[-1], TRUE)
df$x1_x4[apply(is.na(df[2:5]), 1, all)] <- NA
# ID x1 x2 x3 x4 x1_x4
#1 1 0 0 0 0 0
#2 2 NA NA NA NA NA
#3 3 0 1 0 0 1
#4 4 1 0 1 0 2
#5 5 0 0 1 0 1
#6 6 0 NA 0 0 0
#7 7 1 0 1 1 3
#8 8 1 1 1 1 4
#9 9 1 0 1 1 3
#10 10 NA 0 0 1 1
One dplyr solution could be:
df %>%
rowwise() %>%
mutate(x1_x4 = any(!is.na(c_across(-ID)))^NA * sum(c_across(-ID), na.rm = TRUE))
ID x1 x2 x3 x4 x1_x4
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0 0 0 0 0
2 2 NA NA NA NA NA
3 3 0 1 0 0 1
4 4 1 0 1 0 2
5 5 0 0 1 0 1
6 6 0 NA 0 0 0
7 7 1 0 1 1 3
8 8 1 1 1 1 4
9 9 1 0 1 1 3
10 10 NA 0 0 1 1
vars <- paste0("x", 1:4)
df$x1_x4 <- rowSums(df[vars], na.rm = TRUE)
df[rowSums(is.na(df[vars]), na.rm = TRUE) == 4, "x1_x4"] <- NA
df
# ID x1 x2 x3 x4 x1_x4
# 1 1 0 0 0 0 0
# 2 2 NA NA NA NA NA
# 3 3 0 1 0 0 1
# 4 4 1 0 1 0 2
# 5 5 0 0 1 0 1
# 6 6 0 NA 0 0 0
# 7 7 1 0 1 1 3
# 8 8 1 1 1 1 4
# 9 9 1 0 1 1 3
# 10 10 NA 0 0 1 1
Base R one (obfuscated) expression:
within(df, {x1_x4 <- apply(df[,grepl("^x", names(df))], 1,
function(x){ifelse(all(is.na(x)), NA_integer_, sum(x, na.rm = TRUE))})})

Row-wise operation by group over time R

Problem:
I am trying to create variable x2 which is equal to 1, for all rows within each ID group where over time x1 switches from 1 to 0.
Additionally, after the switch, every consecutive 0 in the run, x2 is set to 1.
I tried to figure out how to do this using library(dplyr), but could not figure out how to look at previous records within the group.
Input Data:
ID<-c("1","1","1","1","1","2","2","2","2","3","3","3","4","4","5","5","5")
time<-c("1","2","3","4","5","1","2","3","4","1","2","3","1","2","1","2","3")
x1<-c("0","1","1","1","1","0","0","0","0","1","0","0","1","1","1","0","1")
df<-data.frame(ID,time,x1)
Required Output:
ID time x1 x2
1 1 0 0
1 2 1 0
1 3 1 0
1 4 1 0
1 5 1 0
2 1 0 0
2 2 0 0
2 3 0 0
2 4 0 0
3 1 1 0
3 2 0 1
3 3 0 1
4 1 1 0
4 2 1 0
5 1 1 0
5 2 0 1
5 3 1 0
It is better to have the 'x1' as numeric column
library(data.table)
setDT(df)[, x2 := (cumsum(x1) < 2)*cumsum(c(FALSE, diff(x1) < 0)), ID]
df
# ID time x1 x2
# 1: 1 1 0 0
# 2: 1 2 1 0
# 3: 1 3 1 0
# 4: 1 4 1 0
# 5: 1 5 1 0
# 6: 2 1 0 0
# 7: 2 2 0 0
# 8: 2 3 0 0
# 9: 2 4 0 0
#10: 3 1 1 0
#11: 3 2 0 1
#12: 3 3 0 1
#13: 4 1 1 0
#14: 4 2 1 0
#15: 5 1 1 0
#16: 5 2 0 1
#17: 5 3 1 0
data
ID<-c("1","1","1","1","1","2","2","2","2","3","3","3","4","4","5","5","5")
time<-c("1","2","3","4","5","1","2","3","4","1","2","3","1","2","1","2","3")
x1<- as.integer(c("0","1","1","1","1","0","0","0","0","1","0","0","1","1","1","0","1"))
df<-data.frame(ID,time,x1)
If you want a dplyr answer, you can use #akrun's code in mutate after grouping by ID
library(dplyr)
ID<-c("1","1","1","1","1","2","2","2","2","3","3","3","4","4","5","5","5")
time<-c("1","2","3","4","5","1","2","3","4","1","2","3","1","2","1","2","3")
x1<- as.integer(c("0","1","1","1","1","0","0","0","0","1","0","0","1","1","1","0","1"))
df<-data.frame(ID,time,x1)
df <- df %>%
group_by(ID) %>%
mutate(x2 = (cumsum(x1) < 2)*cumsum(c(FALSE, diff(x1) < 0)))
df
# ID time x1 x2
# 1 1 0 0
# 1 2 1 0
# 1 3 1 0
# 1 4 1 0
# 1 5 1 0
# 2 1 0 0
# 2 2 0 0
# 2 3 0 0
# 2 4 0 0
# 3 1 1 0
# 3 2 0 1
# 3 3 0 1
# 4 1 1 0
# 4 2 1 0
# 5 1 1 0
# 5 2 0 1
# 5 3 1 0

merge one data frame by row with another data frame as a template

I want to merge each row of the data.frame my.samples to another data.frame my.template to obtain the desired.result.
The template my.template could be created with expand.grid. So, even though this is a minimal example the output data set desired.result is still large.
I have posted below several attempts that did not work and one attempt that does work. However, the code that works seems overly complex.
Thank you for any advice. I prefer base R. There are numerous other posts about merging data frames. I looked at quite a few, but did not see this scenario addressed. Sorry if I overlooked it.
my.samples <- read.table(text = '
obs X1 X2 X3 z
1 2 1 0 1
2 0 0 0 1
3 0 1 2 1
', header = TRUE)
my.template <- read.table(text = '
X1 X2 X3
0 0 0
0 0 1
0 0 2
0 1 0
0 1 1
0 1 2
0 2 0
0 2 1
0 2 2
1 0 0
1 0 1
1 0 2
1 1 0
1 1 1
1 1 2
1 2 0
1 2 1
1 2 2
2 0 0
2 0 1
2 0 2
2 1 0
2 1 1
2 1 2
2 2 0
2 2 1
2 2 2
', header = TRUE)
desired.result <- read.table(text = '
obs X1 X2 X3 z
1 0 0 0 0
1 0 0 1 0
1 0 0 2 0
1 0 1 0 0
1 0 1 1 0
1 0 1 2 0
1 0 2 0 0
1 0 2 1 0
1 0 2 2 0
1 1 0 0 0
1 1 0 1 0
1 1 0 2 0
1 1 1 0 0
1 1 1 1 0
1 1 1 2 0
1 1 2 0 0
1 1 2 1 0
1 1 2 2 0
1 2 0 0 0
1 2 0 1 0
1 2 0 2 0
1 2 1 0 1
1 2 1 1 0
1 2 1 2 0
1 2 2 0 0
1 2 2 1 0
1 2 2 2 0
2 0 0 0 1
2 0 0 1 0
2 0 0 2 0
2 0 1 0 0
2 0 1 1 0
2 0 1 2 0
2 0 2 0 0
2 0 2 1 0
2 0 2 2 0
2 1 0 0 0
2 1 0 1 0
2 1 0 2 0
2 1 1 0 0
2 1 1 1 0
2 1 1 2 0
2 1 2 0 0
2 1 2 1 0
2 1 2 2 0
2 2 0 0 0
2 2 0 1 0
2 2 0 2 0
2 2 1 0 0
2 2 1 1 0
2 2 1 2 0
2 2 2 0 0
2 2 2 1 0
2 2 2 2 0
3 0 0 0 0
3 0 0 1 0
3 0 0 2 0
3 0 1 0 0
3 0 1 1 0
3 0 1 2 1
3 0 2 0 0
3 0 2 1 0
3 0 2 2 0
3 1 0 0 0
3 1 0 1 0
3 1 0 2 0
3 1 1 0 0
3 1 1 1 0
3 1 1 2 0
3 1 2 0 0
3 1 2 1 0
3 1 2 2 0
3 2 0 0 0
3 2 0 1 0
3 2 0 2 0
3 2 1 0 0
3 2 1 1 0
3 2 1 2 0
3 2 2 0 0
3 2 2 1 0
3 2 2 2 0
', header = TRUE)
# this works for one obs at a time
merge(my.samples[1,], my.template, by=c('X1', 'X2', 'X3'), all=TRUE)
# this does not work
apply(my.samples, 1, function(x) merge(x, my.template, by=c('X1', 'X2', 'X3'), all=TRUE))
# this does not work
my.output <- matrix(0, nrow=(3^3 * max(my.samples$obs)), ncol=5)
for(i in 1:max(desired.result$obs)) {
x <- merge(my.samples[i,], my.template, by=c('X1', 'X2', 'X3'), all=TRUE)
my.output[((i-1) * 3^3 +1) : ((i-1) * 3^3 + 3^3), 1:5] <- x
}
# this works
for(i in 1:max(desired.result$obs)) {
x <- merge(my.samples[i,], my.template, by=c('X1', 'X2', 'X3'), all=TRUE)
x$obs <- i
x$z[is.na(x$z)] <- 0
if(i == 1) {my.output = x}
if(i > 1) {my.output = rbind(my.output, x)}
}
my.output
all.equal(my.output[1:3], desired.result[,2:4])
I believe this should work
#expand template
full<-do.call(rbind, lapply(unique(my.samples$obs),
function(x) cbind(obs=x, my.template)))
#merge
result<-merge(full, my.samples, all.x=T)
#change NA's to 0
result$z[is.na(result$z)]<-0
#> all(result==desired.result)
#[1] TRUE
I like the answer posted by #MrFlick but when I added another column to my.samples I discovered that I had to modify the code. Below is what I came up with.
my.samples <- read.table(text = '
obs X1 X2 X3 z aa
1 2 1 0 1 20
2 0 0 0 1 -10
3 0 1 2 1 10
', header = TRUE)
my.template <- read.table(text = '
X1 X2 X3
0 0 0
0 0 1
0 0 2
0 1 0
0 1 1
0 1 2
0 2 0
0 2 1
0 2 2
1 0 0
1 0 1
1 0 2
1 1 0
1 1 1
1 1 2
1 2 0
1 2 1
1 2 2
2 0 0
2 0 1
2 0 2
2 1 0
2 1 1
2 1 2
2 2 0
2 2 1
2 2 2
', header = TRUE)
obs.aa <- my.samples[, c(1, ncol(my.samples))]
my.template2 <- merge(my.template, obs.aa)
my.template3 <- merge(my.template2, my.samples, by=c('obs', 'aa', paste0('X', 1:(ncol(my.samples)-3))), all = TRUE)
my.template3$z[is.na(my.template3$z)] <- 0
my.template3

selecting rows according to all covariates combinations of a different dataframe

I am currently trying to figure out how to select all the rows of a long dataframe (long) that present the same x1 and x2 combinations characterizing another dataframe (short).
The simplified data are:
long <- read.table(text = "
id_type x1 x2
1 0 0
1 0 1
1 1 0
1 1 1
2 0 0
2 0 1
2 1 0
2 1 1
3 0 0
3 0 1
3 1 0
3 1 1
4 0 0
4 0 1
4 1 0
4 1 1",
header=TRUE)
and
short <- read.table(text = "
x1 x2
0 0
0 1",
header=TRUE)
The expected output would be:
id_type x1 x2
1 0 0
1 0 1
2 0 0
2 0 1
3 0 0
3 0 1
4 0 0
4 0 1
I have tried to use:
out <- long[unique(long[,c("x1", "x2")]) %in% unique(short[,c("x1", "x2")]), ]
but the %in% adoption is used wrongly here.. thank you very much for any help!
You are requesting an inner join:
> merge(long, short)
x1 x2 id_type
1 0 0 1
2 0 0 2
3 0 0 3
4 0 0 4
5 0 1 1
6 0 1 2
7 0 1 3
8 0 1 4

Resources