Subtracting Columns Except When There is NA - r

I am trying to create a new variable that subtracts two columns only when both columns do not have NA, but has NA whenever one of the columns has NA. When I try to just subtract the columns, I only get a columns of NA. For instance, I am writing the command:
d$x3 <- d$x2 - d$x1
When I use the command above, I get:
x1 x2 x3
1 3 NA
1 NA NA
NA 3 NA
NA NA NA
Based on looking at some other posts online, I tried to doing a workaround where I changed x1 to negative numbers and then used rowSums command, but then I got this:
x3 <- rowSums(df[,c("x1","x2")], na.rm = TRUE)
x1 x2 x3
-1 3 2
-1 NA -1
NA 3 3
NA NA 0
What I am trying to produce is:
x1 x2 x3
1 3 2
1 NA NA
NA 3 NA
NA NA NA
Thanks for any help!
df <- read.table( text="x1 x2
1 3
1 NA
NA 3
NA NA", header=T)

Related

How to calculate mean of first 2 and last 2 non-NAs by row in R?

I am calculating split-half reliability for certain behavioral items in my dataset and first need to grab the mean of the first 2 non-NA values per respondent followed by the last two non-NA values for each person (each row). I know there are ways to do this using packages runner, zoo and others by column, but I've yet to find a solution within rows.
For context, I designed a survey in which items were randomized in order to reduce item-level effects. Participants saw 1/2 of a random subset of items from a particular measurement scale at one point in the survey and the other 1/2 at a different point. Therefore, each participant will have the same number of non-NA as NA at each of the two-time points.
for instance, say I have 8 items total. Data for persons 1, 2, and 3 at time point 1 reads:
x1 x2 x3 x4 x5 x6 x7 x8
1 NA NA 2 NA 1 1 NA
NA 4 3 3 NA NA 4 NA
3 2 1 NA NA NA 3 NA
The resulting new variables (avg1 and avg2) should read:
x1 x2 x3 x4 x5 x6 x7 x8 avg1 avg2
1 NA NA 2 NA 1 1 NA 1.5 1
NA 4 3 3 NA NA 4 NA 3.5 3.5
3 2 1 NA NA NA 3 NA 2.5 2
any help is appreciated, thanks!
Here is one potential solution:
m <- as.matrix(read.table(text = "x1 x2 x3 x4 x5 x6 x7 x8
1 NA NA 2 NA 1 1 NA
NA 4 3 3 NA NA 4 NA
3 2 1 NA NA NA 3 NA ",
header = TRUE))
# Only keep non-NA values
m2 <- t(apply(m,1,function(x) c(x[!is.na(x)])))
# Select the first two non-NA values
m3 <- m2[,1:2]
# Select the second-last and last non-NA values
m4 <- m2[,(ncol(m2)-1):(ncol(m2))]
# Bind the matrix to the mean of the first two and the mean of the last two non-NA values
cbind(m, "avg1" = rowMeans(m3), "avg2" = rowMeans(m4))
#> x1 x2 x3 x4 x5 x6 x7 x8 avg1 avg2
#> [1,] 1 NA NA 2 NA 1 1 NA 1.5 1.0
#> [2,] NA 4 3 3 NA NA 4 NA 3.5 3.5
#> [3,] 3 2 1 NA NA NA 3 NA 2.5 2.0
Created on 2022-03-11 by the reprex package (v2.0.1)

Combining gsub() and using variable names as columns in R [duplicate]

This question already has answers here:
replace Yes, No to 1, 0 in multiple columns in r [duplicate]
(4 answers)
Closed 2 years ago.
I'm hoping that someone can help me :)
I have a data frame with about 1000 columns.
Within that, I have columns named like this:
X1,X2,X3,X4,X5,X6 etc... Y1,Y2,Y3,Y4,Y5,Y6 etc...
df <- data.frame("X1" = c("Yes","No","Yes","NA","NA","NA","Yes","No","Yes","NA","NA","NA","NA"),
"X2" = c("Yes","NA","NA","NA","NA","Yes","NA","NA","NA","NA","Yes","NA","NA"),
"X3" = c("Yes","NA","NA","NA","Yes","No","Yes","NA","Yes","NA","NA","NA", "Yes"),
"X4" = c("Yes","No","Yes","NA","NA","NA","Yes","No","Yes","NA","NA","NA","NA"),
"X5" = c("Yes","NA","NA","NA","NA","Yes","NA","NA","NA","NA","Yes","NA","NA"),
"X6" = c("Yes","NA","NA","NA","Yes","No","Yes","NA","Yes","NA","NA","NA", "Yes"),
"Y1" = c("Yes","No","Yes","NA","NA","NA","Yes","No","Yes","NA","NA","NA","NA"),
"Y2" = c("Yes","NA","NA","NA","NA","Yes","NA","NA","NA","NA","Yes","NA","NA"),
"Y3" = c("Yes","NA","NA","NA","Yes","No","Yes","NA","Yes","NA","NA","NA", "Yes"),
"Y4" = c("Yes","No","Yes","NA","NA","NA","Yes","No","Yes","NA","NA","NA","NA"),
"Y5" = c("Yes","NA","NA","NA","NA","Yes","NA","NA","NA","NA","Yes","NA","NA"),
"Y6" = c("Yes","NA","NA","NA","Yes","No","Yes","NA","Yes","NA","NA","NA", "Yes"))
In certain columns, I replace "Yes" with 1, and "No" with 0, and replace anything else with an NA.
I have tried this:
names = c("X","Y")
for (name in names){
try(
for (j in 1:6){
j <- toString(j)
colname <- paste(name , j, sep="")
df$colname <- gsub("Yes", as.integer(1), df$colname)
df$colname <- gsub("No", as.integer(0), df$colname)
})}
However, this is not working, throwing error message:
Error in `$<-.data.frame`(`*tmp*`, "colname", value = character(0)) : replacement has 0 rows, data has 13
My first question is: Why are the column names not referencing properly?
Second question is: How do I replace anything that's not a 0 or 1 in those columns with an "NA"?
This is possibly a really simple thing that I'm overlooking, but I can't quite figure out how to do it.
Any help would be greatly appreciated.
Many thanks in advance,
Rich
I wouldn't use a loop or gsub here, you can use this:
df[] <- lapply(df, function(x) x <- car::recode(x, "'Yes'=1; 'No'=0; 'NA'=NA"))
This iterates over each column in your dataframe and recodes the values as you want. This is also easier to expand if you get more values in the future.
If you only want certain columns, you can modify it like this:
df[, col_list] <- lapply(df[, col_list], function(x) x <- car::recode(x, "'Yes'=1; 'No'=0; 'NA'=NA"))
Where col_list is the vector of the variables you want to change. You could grep for them using col_list <- grep('^X|Y', names(df), value = T)
Since your data has only 'Yes', 'No' and 'NA' values you can also directly replace them.
#Column numbers to replace
cols <- grep('^[XY]\\d+', names(df))
#Replace "NA" with real NA
df[cols][df[cols] == 'NA'] <- NA
#Replace "Yes" with 1
df[cols][df[cols] == 'Yes'] <- 1
#Replace "No" with 0
df[cols][df[cols] == 'No'] <- 0
#Change dataframe type.
df <- type.convert(df)
df
# X1 X2 X3 X4 X5 X6 Y1 Y2 Y3 Y4 Y5 Y6
#1 1 1 1 1 1 1 1 1 1 1 1 1
#2 0 NA NA 0 NA NA 0 NA NA 0 NA NA
#3 1 NA NA 1 NA NA 1 NA NA 1 NA NA
#4 NA NA NA NA NA NA NA NA NA NA NA NA
#5 NA NA 1 NA NA 1 NA NA 1 NA NA 1
#6 NA 1 0 NA 1 0 NA 1 0 NA 1 0
#7 1 NA 1 1 NA 1 1 NA 1 1 NA 1
#8 0 NA NA 0 NA NA 0 NA NA 0 NA NA
#9 1 NA 1 1 NA 1 1 NA 1 1 NA 1
#10 NA NA NA NA NA NA NA NA NA NA NA NA
#11 NA 1 NA NA 1 NA NA 1 NA NA 1 NA
#12 NA NA NA NA NA NA NA NA NA NA NA NA
#13 NA NA 1 NA NA 1 NA NA 1 NA NA 1
If you are using R < 4.0.0, you first need to convert data into characters.
df[] <- lapply(df, as.character)

Create new variables based on list, then populate based on whether row contains variable name [duplicate]

This question already has answers here:
Add empty columns to a dataframe with specified names from a vector
(6 answers)
Closed 4 years ago.
I have some data:
df = data.frame(matrix(rnorm(20), nrow=10))
X1 X2
1 1.17596402 0.06138821
2 -1.76439330 1.03674803
3 -0.39069424 0.61616793
4 0.68375346 0.27435354
5 0.27426476 -1.71226109
6 -0.06153577 1.14514453
7 -0.37067621 -0.61243104
8 1.11107852 0.47788971
9 -1.73036658 0.31545148
10 -1.83155718 -0.14433432
I want to add new variables to it for every element in a list, which changes:
list = c("a","b","c")
The result should be:
X1 X2 a b c
1 1.17596402 0.06138821 NA NA NA
2 -1.76439330 1.03674803 NA NA NA
3 -0.39069424 0.61616793 NA NA NA
4 0.68375346 0.27435354 NA NA NA
5 0.27426476 -1.71226109 NA NA NA
6 -0.06153577 1.14514453 NA NA NA
7 -0.37067621 -0.61243104 NA NA NA
8 1.11107852 0.47788971 NA NA NA
9 -1.73036658 0.31545148 NA NA NA
10 -1.83155718 -0.14433432 NA NA NA
I can do this using suggestions below:
df[list] <- NA
But now, I want to search every row for the variable name as a value and flag if it contains that value. For example:
X1 X2 a b c
1 a b 1 1 0
2 a c 1 0 1
So the code would search for "a" in all columns and flag if any column contains "a". How do I do this?
You can use
df[list] <- NA
The result:
X1 X2 a b c
1 -2.07205164 -0.93585363 NA NA NA
2 1.11014587 0.23468072 NA NA NA
3 -1.17909665 0.04741478 NA NA NA
4 0.23955056 1.02029880 NA NA NA
5 -0.79212220 -1.13485661 NA NA NA
6 -0.57571547 0.33069641 NA NA NA
7 -0.70063920 -0.17251563 NA NA NA
8 1.90625189 0.30277177 NA NA NA
9 0.09029121 -0.72104778 NA NA NA
10 -1.36324313 -1.48041873 NA NA NA
If you want to add only the variables that are not present in df, you can use:
df[list[!list %in% names(df)]] <- NA

Replacing changing values columnwise in a DF

I have a dataframe that looks like this:
x1 y1 z1 x2 y2 z2
1 6 7 8 5 4 10
2 7 8 9 6 5 11
3 8 9 10 7 6 12
4 9 10 11 8 7 13
5 10 11 12 9 8 14
6 11 12 13 10 9 15
Now I want to change the values in x1 and x2 according to this rule: Every value in x1 or in x2 that is greater than 8 should be subtracted by eight, every value in x1 or x2 that is smaller that is 8 or smaller should be replaced by NA. Additionally, if a value in x1 or x2 is replaced by NA y1/y2 and z1/z2 should be also set to NA. The dataframe should look like this.
x1 y1 z1 x2 y2 z2
1 NA NA NA NA NA NA
2 NA NA NA NA NA NA
3 NA NA NA NA NA NA
4 1 10 11 NA NA NA
5 2 11 12 1 8 14
6 3 12 13 2 9 15
The code to generate the dataframe
df1<-data.frame("x1"=6:11,"y1"=7:12,"z1"=8:13,"x2"=5:10,"y2"=4:9,"z2"=10:15)
We create two indexes based for 'x1' and 'x2' and assign the values based on those index
i1 <- df1$x1 <=8 #x1 index
i2 <- df1$x2 <=8 #x2 index
nm1 <- grep("1$", names(df1)) #column index for suffix 1 in column names
nm2 <- grep("2$", names(df1)) #column index for suffix 2 in column names
df1[i1,nm1] <- NA #set the values for suffix 1 columns to NA
df1[i2, nm2] <- NA #set the values for suffix 2 columns to NA
df1[c('x1', 'x2')] <- df1[c('x1', 'x2')] - 8 #subtract 8 from the 'x' columns
df1
# x1 y1 z1 x2 y2 z2
#1 NA NA NA NA NA NA
#2 NA NA NA NA NA NA
#3 NA NA NA NA NA NA
#4 1 10 11 NA NA NA
#5 2 11 12 1 8 14
#6 3 12 13 2 9 15
We have a condition in two variables, and then a series of reactions in case of this conditions are TRUE.
# Activate the condition for x1 and x2
df1$x1 <- ifelse(df1$x1 > 8, df1$x1 - 8, NA)
df1$x2 <- ifelse(df1$x2 > 8, df1$x2 - 8, NA)
# Reaction of other variables to a external condition
df1$y1 <- ifelse(df1$x1 > 8, NA, df1$y1)
df1$y2 <- ifelse(df1$x2 > 8, NA, df1$y2)
# Reaction of other variables to a external condition
df1$z1 <- ifelse(df1$x1 > 8, NA, df1$z1)
df1$z2 <- ifelse(df1$x2 > 8, NA, df1$z2)
library(dplyr)
df[,c("x1","x2")] <- sapply(df[,c("x1","x2")],function(x)ifelse(x>8,x-8,NA))
df %>%
mutate(y1=replace(y1,which(x1%in%NA),NA))%>%
mutate(z1=replace(z1,which(x1%in%NA),NA))%>%
mutate(y2=replace(y2,which(x2%in%NA),NA))%>%
mutate(z2=replace(z2,which(x2%in%NA),NA))
x1 y1 z1 x2 y2 z2
1 NA NA NA NA NA NA
2 NA NA NA NA NA NA
3 NA NA NA NA NA NA
4 1 10 11 NA NA NA
5 2 11 12 1 8 14
6 3 12 13 2 9 15

How to find highest value in a data frame?

I have a dataframe x with this values:
x1 x2 x3
1 NA 4 1
2 NA 3 NA
3 4 NA 2
4 NA 1 11
5 NA 2 NA
6 5 NA 1
7 5 9 NA
8 NA 2 NA
A simple question: How do I get the highest value? (11)
Use max() with the na.rm argument set to TRUE:
dat <- read.table(text="
x1 x2 x3
1 NA 4 1
2 NA 3 NA
3 4 NA 2
4 NA 1 11
5 NA 2 NA
6 5 NA 1
7 5 9 NA
8 NA 2 NA", header=TRUE)
Get the maximum:
max(dat, na.rm=TRUE)
[1] 11
To find the sum of a column, you might want to unlist it first;
max(unlist(myDataFrame$myColumn), na.rm = TRUE)
Source
you could write a column maximum function, colMax.
colMax <- function(data) sapply(data, max, na.rm = TRUE)
Use colMax function on sample data:
colMax(x)
# x1 x2 x3
# 5.0 9.0 11.0

Resources