I want to replace the values of columns by NA if the sum of their rows is equal to 0. Imagine the following columns:
a b
0 0
1 5
2 8
3 7
0 0
5 8
I would like to replace these by:
a b
NA NA
1 5
2 8
3 7
NA NA
5 8
I've been looking for answers on many pages but have not found any solution.
Here is what I have tried so far:
df[ , 31:36][df[,31:36] == 0 ] <- NA #With df being my dataframe and 31:36 the columns I want to apply the replacement too.
This replaces all the values equal to 0 by NA
I've also tried other alternatives using rowSums() but have not found a solution.
Any help would be greatly appreciated.
Thanks
How about this?
a <- df[31:36,1]
b <- df[31:36,2]
c <- a
a[a+b==0] <- NA
b[c+b==0] <- NA
df[31:36,1] <- a
df[31:36,2] <- b
We have to create a temporary variable called c, otherwise when you are checking the second column, you will be adding NA+0 which equals NA not 0.
An idiomatic way of doing this using dplyr would be:
library(dplyr)
tb <- tibble(
a = c(0, 1:3, 0, 5),
b = c(0, 5, 8, 7, 0, 8)
)
tb <- tb %>%
# creates a "rowsum" column storing the sum of columns 1:2
mutate(rowsum = rowSums(.[1:2])) %>%
# applies, to columns 1:2, a function that puts NA when the sum of the rows is 0
mutate_at(1:2, funs(ifelse(rowsum == 0, NA, .))) %>%
# removes rowsum
select(-rowsum)
Of course you could replace 1:2 with 31:36 when applying the code to your actual table.
Related
I have a large dataframe (df) containing 500+ rows, 50+ columns/variables but only want to target specific variables.
targ_vars <- c("all3a1", "3a1_arc",
"all3b1", "3b1_arc",
"all3c1", "3c1_arc")
The vector above contains the variables which have frequency data (i.e. multiple rows with 1,2,3 etc.)
I want to add a new count column in the original large dataframe (df) which contains the row sum of any non-NA value specifically for those select variables in "targ_vars".
Again, I'm not trying to add the value of the actual frequency data across each of those variables, but moreso just a sum of any non-NA value per row (i.e. 1,2,NA,7,NA,1 = total row count of 4 non-NA).
I've gotten as far as this:
df <- df %>%
select(targ_vars) %>%
mutate(targ_var_count = rowSums(!is.na(.), na.rm = TRUE))
The problem is I'm not sure how to "deselect" the variables I used to run the mutate calculation. The line above would result in overwriting the entire original dataframe (df) containing 50+ columns/vars, and placing back only the selected 6 variables in (targ_vars) plus the new (targ_var_count) variable that mutate calculated.
Essentially, I just want to focus on that last mutate line, and plop that new count column back into the original (df).
I tried something like the one below but it ended up giving me a list when I call "df$targcount" instead of just 1 rowSum column:
df$targcount <- df %>%
select(targ_vars) %>%
mutate(targcount = rowSums(!is.na(.), na.rm = TRUE))
Any help/tips would be appreciated.
You could use dplyr::across to get the count of non NA values for just your targ_vars columns.
Using some fake random example data:
set.seed(123)
dat <- data.frame(
a = sample(c(0, NA), 10, replace = TRUE),
b = sample(c(0, NA), 10, replace = TRUE),
c = sample(c(0, NA), 10, replace = TRUE),
d = sample(c(0, NA), 10, replace = TRUE)
)
targ_vars <- c("c", "d")
library(dplyr, w = FALSE)
dat %>%
mutate(targcount = rowSums(across(all_of(targ_vars), ~ !is.na(.x))))
#> a b c d targcount
#> 1 0 NA 0 0 2
#> 2 0 NA NA NA 0
#> 3 0 NA 0 0 2
#> 4 NA 0 0 NA 1
#> 5 0 NA 0 NA 1
#> 6 NA 0 0 0 2
#> 7 NA NA NA 0 1
#> 8 NA 0 NA 0 1
#> 9 0 0 0 0 2
#> 10 0 0 NA NA 0
This question already has answers here:
Sum rows in data.frame or matrix
(7 answers)
Closed 2 years ago.
I have a data frame in R that looks like
1 3 NULL,
2 NULL 5,
NULL NULL 9
I want to iterate through each row and perform and add the two numbers that are present. If there aren't two numbers present I want to throw an error. How do I refer to specific rows and cells in R? To iterate through the rows I have a for loop. Sorry not sure how to format a matrix above.
for(i in 1:nrow(df))
Data:
df <- data.frame(
v1 = c(1, 2, NA),
v2 = c(3, NA, NA),
v3 = c(NA, 5, 9)
)
Use rowSums:
df$sum <- rowSums(df, na.rm = T)
Result:
df
v1 v2 v3 sum
1 1 3 NA 4
2 2 NA 5 7
3 NA NA 9 9
If you do need a for loop:
for(i in 1:nrow(df)){
df$sum[i] <- rowSums(df[i,], na.rm = T)
}
If you have something with NULL you can make it a data.frame, but that will make the columns with NULL a character vector. You have to convert those to numeric, which will then introduce NA for NULL.
rowSums will then create the sum you want.
df <- read.table(text=
"
a b c
1 3 NULL
2 NULL 5
NULL NULL 9
", header =T)
# make columns numeric, this will change the NULL to NA
df <- data.frame(lapply(df, as.numeric))
cbind(df, sum=rowSums(df, na.rm = T))
# a b c sum
# 1 1 3 NA 4
# 2 2 NA 5 7
# 3 NA NA 9 9
This question already has answers here:
R: Find missing columns, add to data frame if missing
(3 answers)
Closed 2 years ago.
I have the following situation:
df1
a b c d
1 2 3 4
df2
a c
5 6
And the result I want is, to fill up the second data.frame with the missing columns from df1 and fill them with zeros. So the result should be:
df3
a b c d
5 0 6 0
The Data frames are quite big and that is why an automated way of doing this would be gerate.
We can use setdiff to find out columns which are not present in df2 and assign the value 0 to those columns.
df2[setdiff(names(df1), names(df2))] <- 0
# a c b d
#1 5 6 0 0
If we want to maintain the same order of columns as in df1 we can later do
df2[names(df1)]
# a b c d
#1 5 0 6 0
There's probably a more elegant solution, but I think this works for your situation.
If you're not too fussed about mixing your workflow up with dplyr and data.table syntax, you can use setdiff() to identify non-matching column names, and use data.table syntax to create those zero-value columns efficiently without using loops or apply() functions. Once you've made sure this works for all the possible situations, you can wrap it in a function and scale this across more datasets.
df1 <- data.frame(a = 1, b = 2, c = 3, d = 4)
df2 <- data.frame(a = 5, c = 6)
# Variables in df1 but not in df2
diff_vars <- dplyr::setdiff(names(df1),names(df2))
df2 %>%
data.table::data.table() %>%
.[,c(diff_vars):=0] %>%
tibble::as_tibble() # Can choose to keep this in data.table
df1 <- data.frame(a = 1, b = 2, c = 3, d = 4)
df2 <- data.frame(a = 5, c = 6)
library(tidyverse)
right_join(df1, df2)
a b c d
1 5 NA 6 NA
You'll have to change NA's to 0.
I am looking to do the following in a more elegant manner in R. I believe there is a way but just cant wrap my head around it. Following is the problem.
I have a df which contains NAs. However, I want to make the NAs into zeros where if the sum of the NA is not equal to zero and if the sum is NA then leave as NA. The example below should make it clear.
A<-c("A", "A", "A", "A",
"B","B","B","B",
"C","C","C","C")
B<-c(1,NA,NA,1,NA,NA,NA,NA,2,1,2,3)
data<-data.frame(A,B)
Following is how the data looks like
A B
1 A 1
2 A NA
3 A NA
4 A 1
5 B NA
6 B NA
7 B NA
8 B NA
9 C 2
10 C 1
11 C 2
12 C 3
And am looking to get a result as per the following
A B
1 A 1
2 A 0
3 A 0
4 A 1
5 B NA
6 B NA
7 B NA
8 B NA
9 C 2
10 C 1
11 C 2
12 C 3
I know I can use inner join by creating a table first and and then making an IF statement based on that table but I was wondering if there is a way to do it in one or two lines of code in R.
Following is the solution related to the inner join I was referring to
sum_NA <- function(x) if(all(is.na(x))) NA_integer_ else sum(x, na.rm=TRUE)
data2 <- data %>% group_by(A) %>% summarize(x = sum_NA(B), Y =
ifelse(is.na(x), TRUE, FALSE))
data2
data2_1 <- right_join(data, data2, by = "A")
data <- mutate(data2_1, B = ifelse(Y == FALSE & is.na(B), 0,B))
data <- select(data, - Y,-x)
data
Maybe solution like this would work:
data[is.na(B) & A %in% unique(na.omit(data)$A), ]$B <- 0
Here you're asking:
if B is NA
if A is within letters that have non-NA values
Then make those values 0.
Or similarly, with ifelse():
data$B <- ifelse(is.na(data$B) & data$A %in% unique(na.omit(data)$A), 0, data$B)
or with dplyr its:
library(dplyr)
data %>%
mutate(B=ifelse(is.na(B) & A %in% unique(na.omit(data)$A), 0, B))
I ran this code to calculate log returns from a data frame "df" and created another data frame "df2":
df2 <- as.data.frame(sapply(df[2:22], function(x) diff(log(x))))
My data frame "df2" has now 21 columns (INDEX, S1, S2, .... S20) and 4321 rows. I would like to replace zero values by values from the same column, but from the row above.
E.g. column 1 / row 100 has a "0" stored as value. I want to have the value from column 1 / row 99 to be copied to row 100.
How can I do it in a simple way? Since I am quite new to R, I try to improve my knowledge. Any kind of help is highly appreciated!
If a value is 0 then replace this value with the lagged value (the preceding value) of the same column.
library(dplyr)
library(tidy)
library(purrr)
df$col <- ifelse(df$col == 0, lag(df$col), df$col)
For columns with consecutive zeros (as per your comment), replace 0 with NA, then fill down (i.e. replace all NAs with the preceding non-NA value:
e.g.
df <- data.frame(col = c(1, 1, 0, 0, 0, 3, 3, 0, 1, 0))
df$col <- na_if(df$col, 0)
df
col
1 1
2 1
3 NA
4 NA
5 NA
6 3
7 3
8 NA
9 1
10 NA
fill(df, col, .direction = "down")
col
1 1
2 1
3 1
4 1
5 1
6 3
7 3
8 3
9 1
10 1
To do this for all your columns use dmap from the purrr package to replace 0s with NAs across all columns. Then use fill to fill in the NAs with the value in the preceding columns. col1:col3 refers to the range of columns that you want to apply fill to (i.e. columns 1-3) - change the column names to the ones in your dataframe.
df <- df %>% dmap(~na_if(., 0))
fill(df, col1:col3, .direction = "down")