I am calculating split-half reliability for certain behavioral items in my dataset and first need to grab the mean of the first 2 non-NA values per respondent followed by the last two non-NA values for each person (each row). I know there are ways to do this using packages runner, zoo and others by column, but I've yet to find a solution within rows.
For context, I designed a survey in which items were randomized in order to reduce item-level effects. Participants saw 1/2 of a random subset of items from a particular measurement scale at one point in the survey and the other 1/2 at a different point. Therefore, each participant will have the same number of non-NA as NA at each of the two-time points.
for instance, say I have 8 items total. Data for persons 1, 2, and 3 at time point 1 reads:
x1 x2 x3 x4 x5 x6 x7 x8
1 NA NA 2 NA 1 1 NA
NA 4 3 3 NA NA 4 NA
3 2 1 NA NA NA 3 NA
The resulting new variables (avg1 and avg2) should read:
x1 x2 x3 x4 x5 x6 x7 x8 avg1 avg2
1 NA NA 2 NA 1 1 NA 1.5 1
NA 4 3 3 NA NA 4 NA 3.5 3.5
3 2 1 NA NA NA 3 NA 2.5 2
any help is appreciated, thanks!
Here is one potential solution:
m <- as.matrix(read.table(text = "x1 x2 x3 x4 x5 x6 x7 x8
1 NA NA 2 NA 1 1 NA
NA 4 3 3 NA NA 4 NA
3 2 1 NA NA NA 3 NA ",
header = TRUE))
# Only keep non-NA values
m2 <- t(apply(m,1,function(x) c(x[!is.na(x)])))
# Select the first two non-NA values
m3 <- m2[,1:2]
# Select the second-last and last non-NA values
m4 <- m2[,(ncol(m2)-1):(ncol(m2))]
# Bind the matrix to the mean of the first two and the mean of the last two non-NA values
cbind(m, "avg1" = rowMeans(m3), "avg2" = rowMeans(m4))
#> x1 x2 x3 x4 x5 x6 x7 x8 avg1 avg2
#> [1,] 1 NA NA 2 NA 1 1 NA 1.5 1.0
#> [2,] NA 4 3 3 NA NA 4 NA 3.5 3.5
#> [3,] 3 2 1 NA NA NA 3 NA 2.5 2.0
Created on 2022-03-11 by the reprex package (v2.0.1)
This question already has answers here:
replace Yes, No to 1, 0 in multiple columns in r [duplicate]
(4 answers)
Closed 2 years ago.
I'm hoping that someone can help me :)
I have a data frame with about 1000 columns.
Within that, I have columns named like this:
X1,X2,X3,X4,X5,X6 etc... Y1,Y2,Y3,Y4,Y5,Y6 etc...
df <- data.frame("X1" = c("Yes","No","Yes","NA","NA","NA","Yes","No","Yes","NA","NA","NA","NA"),
"X2" = c("Yes","NA","NA","NA","NA","Yes","NA","NA","NA","NA","Yes","NA","NA"),
"X3" = c("Yes","NA","NA","NA","Yes","No","Yes","NA","Yes","NA","NA","NA", "Yes"),
"X4" = c("Yes","No","Yes","NA","NA","NA","Yes","No","Yes","NA","NA","NA","NA"),
"X5" = c("Yes","NA","NA","NA","NA","Yes","NA","NA","NA","NA","Yes","NA","NA"),
"X6" = c("Yes","NA","NA","NA","Yes","No","Yes","NA","Yes","NA","NA","NA", "Yes"),
"Y1" = c("Yes","No","Yes","NA","NA","NA","Yes","No","Yes","NA","NA","NA","NA"),
"Y2" = c("Yes","NA","NA","NA","NA","Yes","NA","NA","NA","NA","Yes","NA","NA"),
"Y3" = c("Yes","NA","NA","NA","Yes","No","Yes","NA","Yes","NA","NA","NA", "Yes"),
"Y4" = c("Yes","No","Yes","NA","NA","NA","Yes","No","Yes","NA","NA","NA","NA"),
"Y5" = c("Yes","NA","NA","NA","NA","Yes","NA","NA","NA","NA","Yes","NA","NA"),
"Y6" = c("Yes","NA","NA","NA","Yes","No","Yes","NA","Yes","NA","NA","NA", "Yes"))
In certain columns, I replace "Yes" with 1, and "No" with 0, and replace anything else with an NA.
I have tried this:
names = c("X","Y")
for (name in names){
try(
for (j in 1:6){
j <- toString(j)
colname <- paste(name , j, sep="")
df$colname <- gsub("Yes", as.integer(1), df$colname)
df$colname <- gsub("No", as.integer(0), df$colname)
})}
However, this is not working, throwing error message:
Error in `$<-.data.frame`(`*tmp*`, "colname", value = character(0)) : replacement has 0 rows, data has 13
My first question is: Why are the column names not referencing properly?
Second question is: How do I replace anything that's not a 0 or 1 in those columns with an "NA"?
This is possibly a really simple thing that I'm overlooking, but I can't quite figure out how to do it.
Any help would be greatly appreciated.
Many thanks in advance,
Rich
I wouldn't use a loop or gsub here, you can use this:
df[] <- lapply(df, function(x) x <- car::recode(x, "'Yes'=1; 'No'=0; 'NA'=NA"))
This iterates over each column in your dataframe and recodes the values as you want. This is also easier to expand if you get more values in the future.
If you only want certain columns, you can modify it like this:
df[, col_list] <- lapply(df[, col_list], function(x) x <- car::recode(x, "'Yes'=1; 'No'=0; 'NA'=NA"))
Where col_list is the vector of the variables you want to change. You could grep for them using col_list <- grep('^X|Y', names(df), value = T)
Since your data has only 'Yes', 'No' and 'NA' values you can also directly replace them.
#Column numbers to replace
cols <- grep('^[XY]\\d+', names(df))
#Replace "NA" with real NA
df[cols][df[cols] == 'NA'] <- NA
#Replace "Yes" with 1
df[cols][df[cols] == 'Yes'] <- 1
#Replace "No" with 0
df[cols][df[cols] == 'No'] <- 0
#Change dataframe type.
df <- type.convert(df)
df
# X1 X2 X3 X4 X5 X6 Y1 Y2 Y3 Y4 Y5 Y6
#1 1 1 1 1 1 1 1 1 1 1 1 1
#2 0 NA NA 0 NA NA 0 NA NA 0 NA NA
#3 1 NA NA 1 NA NA 1 NA NA 1 NA NA
#4 NA NA NA NA NA NA NA NA NA NA NA NA
#5 NA NA 1 NA NA 1 NA NA 1 NA NA 1
#6 NA 1 0 NA 1 0 NA 1 0 NA 1 0
#7 1 NA 1 1 NA 1 1 NA 1 1 NA 1
#8 0 NA NA 0 NA NA 0 NA NA 0 NA NA
#9 1 NA 1 1 NA 1 1 NA 1 1 NA 1
#10 NA NA NA NA NA NA NA NA NA NA NA NA
#11 NA 1 NA NA 1 NA NA 1 NA NA 1 NA
#12 NA NA NA NA NA NA NA NA NA NA NA NA
#13 NA NA 1 NA NA 1 NA NA 1 NA NA 1
If you are using R < 4.0.0, you first need to convert data into characters.
df[] <- lapply(df, as.character)
I have a dataframe that looks like this:
x1 y1 z1 x2 y2 z2
1 6 7 8 5 4 10
2 7 8 9 6 5 11
3 8 9 10 7 6 12
4 9 10 11 8 7 13
5 10 11 12 9 8 14
6 11 12 13 10 9 15
Now I want to change the values in x1 and x2 according to this rule: Every value in x1 or in x2 that is greater than 8 should be subtracted by eight, every value in x1 or x2 that is smaller that is 8 or smaller should be replaced by NA. Additionally, if a value in x1 or x2 is replaced by NA y1/y2 and z1/z2 should be also set to NA. The dataframe should look like this.
x1 y1 z1 x2 y2 z2
1 NA NA NA NA NA NA
2 NA NA NA NA NA NA
3 NA NA NA NA NA NA
4 1 10 11 NA NA NA
5 2 11 12 1 8 14
6 3 12 13 2 9 15
The code to generate the dataframe
df1<-data.frame("x1"=6:11,"y1"=7:12,"z1"=8:13,"x2"=5:10,"y2"=4:9,"z2"=10:15)
We create two indexes based for 'x1' and 'x2' and assign the values based on those index
i1 <- df1$x1 <=8 #x1 index
i2 <- df1$x2 <=8 #x2 index
nm1 <- grep("1$", names(df1)) #column index for suffix 1 in column names
nm2 <- grep("2$", names(df1)) #column index for suffix 2 in column names
df1[i1,nm1] <- NA #set the values for suffix 1 columns to NA
df1[i2, nm2] <- NA #set the values for suffix 2 columns to NA
df1[c('x1', 'x2')] <- df1[c('x1', 'x2')] - 8 #subtract 8 from the 'x' columns
df1
# x1 y1 z1 x2 y2 z2
#1 NA NA NA NA NA NA
#2 NA NA NA NA NA NA
#3 NA NA NA NA NA NA
#4 1 10 11 NA NA NA
#5 2 11 12 1 8 14
#6 3 12 13 2 9 15
We have a condition in two variables, and then a series of reactions in case of this conditions are TRUE.
# Activate the condition for x1 and x2
df1$x1 <- ifelse(df1$x1 > 8, df1$x1 - 8, NA)
df1$x2 <- ifelse(df1$x2 > 8, df1$x2 - 8, NA)
# Reaction of other variables to a external condition
df1$y1 <- ifelse(df1$x1 > 8, NA, df1$y1)
df1$y2 <- ifelse(df1$x2 > 8, NA, df1$y2)
# Reaction of other variables to a external condition
df1$z1 <- ifelse(df1$x1 > 8, NA, df1$z1)
df1$z2 <- ifelse(df1$x2 > 8, NA, df1$z2)
library(dplyr)
df[,c("x1","x2")] <- sapply(df[,c("x1","x2")],function(x)ifelse(x>8,x-8,NA))
df %>%
mutate(y1=replace(y1,which(x1%in%NA),NA))%>%
mutate(z1=replace(z1,which(x1%in%NA),NA))%>%
mutate(y2=replace(y2,which(x2%in%NA),NA))%>%
mutate(z2=replace(z2,which(x2%in%NA),NA))
x1 y1 z1 x2 y2 z2
1 NA NA NA NA NA NA
2 NA NA NA NA NA NA
3 NA NA NA NA NA NA
4 1 10 11 NA NA NA
5 2 11 12 1 8 14
6 3 12 13 2 9 15
I have a dataframe x with this values:
x1 x2 x3
1 NA 4 1
2 NA 3 NA
3 4 NA 2
4 NA 1 11
5 NA 2 NA
6 5 NA 1
7 5 9 NA
8 NA 2 NA
A simple question: How do I get the highest value? (11)
Use max() with the na.rm argument set to TRUE:
dat <- read.table(text="
x1 x2 x3
1 NA 4 1
2 NA 3 NA
3 4 NA 2
4 NA 1 11
5 NA 2 NA
6 5 NA 1
7 5 9 NA
8 NA 2 NA", header=TRUE)
Get the maximum:
max(dat, na.rm=TRUE)
[1] 11
To find the sum of a column, you might want to unlist it first;
max(unlist(myDataFrame$myColumn), na.rm = TRUE)
Source
you could write a column maximum function, colMax.
colMax <- function(data) sapply(data, max, na.rm = TRUE)
Use colMax function on sample data:
colMax(x)
# x1 x2 x3
# 5.0 9.0 11.0