Looping through items in dataframe while using apply function - r

I have a dataframe that has 104 items ("items1" --> "items104"). Each item was administered at different ages, so "items1" for instance is divided into columns items1.12, items1.18, items1.24, items1.30, items1.36. This is the case for all 104 items. I would like to run the following code on each of the 104 items in the dataframe.
ID <- c("4000", "4001", "4006", "4007", "4009", "4010")
items1.12 <- c(1, 1, 1, 1, 1, 1)
items1.18 <- c(1, 1, 1, 1, 1, 1)
items1.24 <- c(1, 1, 1, 1, 1, 1)
items1.30 <- c(1, 1, 1, 1, 1, 1)
items1.36 <- c(1, 1, 1, 1, 1, 1)
items2.12 <- c(2, 2, 1, 1, 2, 1)
items2.18 <- c(2, 2, 1, 1, 2, 1)
items2.24 <- c(2, 2, 1, 1, 2, 1)
items2.30 <- c(2, 2, 1, 1, 2, 1)
items2.36 <- c(2, 2, 1, 1, 2, 1)
wide <- data.frame(ID, items1.12, items1.18, items1.24, items1.30, items1.36, items2.12, items2.18, items2.24, items2.30, items2.36)
ID items1.12 items1.18 items1.24 items1.30 items1.36 items2.12 items2.18 items2.24 items2.30 items2.36
4000 1 1 1 1 1 2 2 2 2 2
4001 1 1 1 1 1 2 2 2 2 2
4006 1 1 1 NA 1 1 1 1 1 1
4007 1 1 1 1 1 1 1 1 1 1
4009 1 1 1 1 1 2 2 2 2 2
4010 1 1 1 1 1 1 1 1 1 1
I would like to run this code for each item. Here is an example for "items1"
wide$items1.new <- apply(!is.na(wide[,paste("items1.", c(12,18,24,30,36), sep = "")]), 1, max)
wide$items1.new.2 <- NA
for(i in unique(wide$ID)){
select <- i == wide$ID
ifelse(wide$items1.new[select] == 0, wide$items1.new.2 [select] <- NA, wide$items1.new.2[select] <- rowMeans(wide[select,c("items1.12", "items1.18", "items1.24", "items1.30", "items1.36")], na.rm = T))}
wide <- subset(wide, select = -c(items1.new, items1.12, items1.18, items1.24, items1.30, items1.36))
names(wide)[names(wide) == 'items1.new.2'] <- "item1"
Here is an example for "items2"
wide$items2.new <- apply(!is.na(wide[,paste("items2.", c(12,18,24,30,36), sep = "")]), 1, max)
wide$items2.new.2 <- NA
for(i in unique(wide$ID)){
select <- i == wide$ID
ifelse(wide$items2.new[select] == 0, wide$items2.new.2 [select] <- NA, wide$items2.new.2[select] <- rowMeans(wide[select,c("items2.12", "items2.18", "items2.24", "items2.30", "items2.36")], na.rm = T))}
wide <- subset(wide, select = -c(items2.new, items2.12, items2.18, items2.24, items2.30, items2.36))
names(wide)[names(wide) == 'items2.new.2'] <- "item2"
Here is what I would like to end with:
ID item1 item2
4000 1 2
4001 1 2
4006 1 1
4007 1 1
4009 1 2
4010 1 1
I would like to do this for items1 to items104 in my dataset. I can't imagine the solution would be very complicated, but I would really appreciate some help as I'm new to R. Thank you so much.

In base R, this can be done with split.default
cbind(wide['ID'], sapply(split.default(wide[-1],
sub("\\.\\d+$", "" , names(wide)[-1])), rowMeans, na.rm = TRUE))
-output
ID items1 items2
1 4000 1 2
2 4001 1 2
3 4006 1 1
4 4007 1 1
5 4009 1 2
6 4010 1 1

Related

change numeric vector

I have a numeric vector (see below). I would like to change all numbers that are assigned to high_ to 1 and all low_ to 2.
c(high_X17 = 3, high_X18 = 4, high_X19 = 5, high_X20 = 3, high_X21 = 1,
high_X22 = 1, high_X23 = 2, high_X24 = 2, low_X25 = 6, low_X26 = 4,
low_X27 = 6, low_X28 = 5, low_X29 = 2, low_X30 = 1, low_X31 = 1,
low_X32 = 2)
result
high_X17 high_X18 high_X19 high_X20 high_X21 high_X22 high_X23 high_X24 low_X25 low_X26
1 1 1 1 1 1 1 1 2 2
low_X29 low_X30 low_X31 low_X32
2 2 2 2
Try the code below
x <- startsWith(names(x),"low_") + 1
You can use -
x[] <- as.integer(sub('_.*', '', names(x)) == 'low') + 1
x
#high_X17 high_X18 high_X19 high_X20 high_X21 high_X22 high_X23 high_X24
# 1 1 1 1 1 1 1 1
# low_X25 low_X26 low_X27 low_X28 low_X29 low_X30 low_X31 low_X32
# 2 2 2 2 2 2 2 2
sub('_.*', '', names(x)) removes everything after underscore keeping only 'high' and 'low' values.
Using grepl
grepl("low_", names(x)) + 1

How to write a for loop to create multiple new variables in R?

Suppose I have this example dataset df with only character variables.
dx_order1<-c(1, 1, NA, 1, 1)
dx_order2<-c(2, 2, 2, 2, NA)
Suppose that these variables are numeric.
I want to recode the variables. For dx_order1 variable, I want to recode 1 as 1 and 0 otherwise. Similarly, for dx_order 2 variable I want to recode 2 as 1 and 0 otherwise. Say that the new variables are called diag_order1 and diag_order2.
I know how to do this one by one in a manual fashion. The codes below will do the job:
df$diag_order1 <- ifelse(is.na(df$dx_order1), 0, 1)
df$diag_order1 <- ifelse(is.na(df$dx_order1), 0, 1)
I was wondering how I can achieve the same outcome with for loop function. If I have a a lot of similar variables then this type of manual coding is not practical. So any advice on how to have a loop to fasten the process would be appreciated.
You don't need to use loop in this instance, you could do this by converting NA to 0 using is.na. For example:
Data
df <- data.frame(dx_order1 = c(1,1, NA, 1, 1),
dx_order2 = c(2, 2, 2, 2, NA))
df[!is.na(df)] <- 1
df[is.na(df)] <- 0
Or if you have more columns with NA but only want to apply to certain columns then you could do it by specifying those columns:
df2 <- data.frame(letter_col = c(NA, letters[1:4]),
dx_order1 = c(1,1, NA, 1, 1),
dx_order2 = c(2, 2, 2, 2, NA))
# any columns starting with dx
cols <- names(df2)[grepl("^dx", names(df2))]
df2[, cols][!is.na(df2[, cols])] <- 1
df2[, cols][is.na(df2[, cols])] <- 0
You can use across with mutate in dplyr like this
library(dplyr)
df2 <- data.frame(letter_col = c(NA, letters[1:4]),
dx_order1 = c(1,1, NA, 1, 1),
dx_order2 = c(2, 2, 2, 2, NA))
> df2
letter_col dx_order1 dx_order2
1 <NA> 1 2
2 a 1 2
3 b NA 2
4 c 1 2
5 d 1 NA
df2 %>% mutate(across(starts_with("dx"), ~case_when(. == as.numeric(str_extract(cur_column(), "\\d$")) ~ 1,
is.na(.) ~ 0,
TRUE ~ 0), .names = "diag_{.col}"))
letter_col dx_order1 dx_order2 diag_dx_order1 diag_dx_order2
1 <NA> 1 2 1 1
2 a 1 2 1 1
3 b NA 2 0 1
4 c 1 2 1 1
5 d 1 NA 1 0
Assuming that your dx column can have values like suffix, NA and otherwise too as written in your question, and it recodes everything else than suffix to 0
You can coerce the logical vector from is.na to integer. is.na works with the dataframe.
df <- data.frame(dx_order1 = c(1,1, NA, 1, 1),
dx_order2 = c(2, 2, 2, 2, NA))
df[] <- +!is.na(df)
df
# dx_order1 dx_order2
#1 1 1
#2 1 1
#3 0 1
#4 1 1
#5 1 0

Subtracting columns in a loop

I've got a data frame like that:
df:
A B C
1 1 2 3
2 2 2 4
3 2 2 3
I would like to subtract each column with the next smaler one (A-0, B-A, C-B). So my results should look like that:
df:
A B C
1 1 1 1
2 2 0 2
3 2 0 1
I tried the following loop, but it didn't work.
for (i in 1:3) {
j <- data[,i+1] - data[,i]
}
Try this
df - cbind(0, df[-ncol(df)])
# A B C
# 1 1 1 1
# 2 2 0 2
# 3 2 0 1
Data
df <- data.frame(A = c(1, 2, 2), B = c(2, 2, 2), C = c(3, 4, 3))
We can also remove the first and last column and do the subtraction
df[-1] <- df[-1]-df[-length(df)]
data
df <- data.frame(A = c(1, 2, 2), B = c(2, 2, 2), C = c(3, 4, 3))

How do I create a simple table in R using for loops?

I was asked to create a table with three columns, A, B and C and eight rows. Column A must go 1, 1, 1, 1, 2, 2, 2, 2. Column B must alternate 1, 2, 1, 2, 1, 2, 1, 2. And column C must go 1, 1, 2, 2, 1, 1, 2, 2. I am able to produce the A column data fine, but don't know how to get B or C. This is the code I have so far:
dataSheet <- matrix(nrow = 0, ncol = 3)
colnames(dataSheet) <- c('A', 'B', 'C')
A <- 1
B <- 1
C <- 1
for (A in 1:4){
A=1
dataSheet <- rbind(dataSheet, c(A, B, C))
}
for (A in 5:8){
A=2
dataSheet <- rbind(dataSheet, c(A, B, C))
}
This seems like a good excuse to get familiar with the rep() function as it easily supports this question, but many more complicated questions if you're clever enough:
dt <- data.frame(A = rep(1:2, each = 4),
B = rep(1:2, times = 4),
C = rep(1:2, each = 2))
dt
#> A B C
#> 1 1 1 1
#> 2 1 2 1
#> 3 1 1 2
#> 4 1 2 2
#> 5 2 1 1
#> 6 2 2 1
#> 7 2 1 2
#> 8 2 2 2
Created on 2019-01-26 by the reprex package (v0.2.1)
Simply use R's vectorization for this task, i.e.
A <- c(1, 1, 1, 1, 2, 2, 2, 2)
B <- c(1, 2, 1, 2, 1, 2, 1, 2) # or rep(1:2, 4)
C <- c(1, 1, 2, 2, 1, 1, 2, 2)
cbind(A,B,C)
Maybe something along the lines of the following will be acceptable by your professor.
for (i in 1:8){
A <- if(i <= 4) 1 else 2
B <- if(i %% 2) 1 else 2
C <- if(any(i %% 4 == c(0, 1, 4, 5))) 1 else 2
dataSheet <- rbind(dataSheet, c(A, B, C))
}
dataSheet
# A B C
#[1,] 1 1 1
#[2,] 1 2 2
#[3,] 1 1 2
#[4,] 1 2 1
#[5,] 2 1 1
#[6,] 2 2 2
#[7,] 2 1 2
#[8,] 2 2 1

exchange two columns and remove the duplicates in a data frame using R

Here is an example to explain what I want to do. I have a data frame like:
X Y
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
3 3
I want to change it to another format:
X1 Y1 X2 Y2
1 1 1 1
1 2 2 1
1 3 3 1
......
For two rows in the first table, say X=1, Y=2 and X=2, Y=1. They just exchange each other's values. So I want to put such rows in on row, as shown in the second table, and then remove the duplicates. So, the 'thin and long' table is turned to 'short and fat'. I know how to do it using two for loops. But in R, such operation takes for ever. So, can anyone help me with a quick way?
Here is a smallest example:
The original table is:
X Y
1 2
2 1
The transferred table that I want is like:
X1 Y1 X2 Y2
1 2 2 1
So, the rows in the first table that just exchanges values are integrated into one row in the second table and the extra row in the first table is removed.
Maybe the code below in base R can work
dfout <- `names<-`(cbind(r <- subset(df,df$Y>=df$X),rev(r)),
c("X1","Y1","X2","Y2"))
such that
> dfout
X1 Y1 X2 Y2
1 1 1 1 1
2 1 2 2 1
3 1 3 3 1
5 2 2 2 2
6 2 3 3 2
9 3 3 3 3
DATA
df <- structure(list(X = c(1, 1, 1, 2, 2, 2, 3, 3, 3), Y = c(1, 2,
3, 1, 2, 3, 1, 2, 3)), class = "data.frame", row.names = c(NA,
-9L))
library(tidyverse)
df <- tibble(x1 = 1, 1, 1, 2, 2, 2, 3, 3, 3,
y1 = 1, 2, 3, 1, 2, 3, 1, 2, 3)
df <- df %>% mutate(x2 = y1, y2 = x1) %>% distinct()
I think this does the trick.

Resources