I have a data frame in R :
a b c d e
1 2 3 23 1
4 5 6 -Inf 2
7 8 9 2 8
10 11 12 -Inf NaN
and I'd like to replace all the values in column e with NA if the corresponding value in column d is -Inf
like this:
a b c d e
1 2 3 23 1
4 5 6 -Inf NA
7 8 9 2 8
10 11 12 -Inf NA
Any help is appreciated. I haven't been able to do it without loops, and its taking a long time for the full data frame.
ifelse is vectorize. We can use ifelse without using a loop.
dat$e <- ifelse(dat$d == -Inf, NA, dat$e)
DATA
dat <- read.table(text = "a b c d e
1 2 3 23 1
4 5 6 -Inf 2
7 8 9 2 8
10 11 12 -Inf NaN", header = TRUE)
Using data.table
library(data.table)
setDT(dat)[is.infinite(d), e := NA]
A solution with dplyr:
library(tidyverse)
df <- tribble(
~a, ~b, ~c, ~d, ~e,
1, 2, 3, 23, 1,
4, 5, 6, -Inf, 2,
7, 8, 9, 2, 8,
10, 11, 12, -Inf, NaN)
df1 <- df %>%
dplyr::mutate(e = case_when(d == -Inf ~ NA_real_,
TRUE ~ e)
)
Related
I have a data frame that looks like:
df <- data.frame(x = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12),
y = c(NA, 2, NA, NA, NA, 3, NA, NA, NA, 1, NA, NA))
I want it to look like this:
data.frame(x = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12),
y = c(0, 2, 2, 0, 0, 3, 3, 3, 0, 1, 0, 0))
#> x y
#> 1 1 0
#> 2 2 2
#> 3 3 2
#> 4 4 0
#> 5 5 0
#> 6 6 3
#> 7 7 3
#> 8 8 3
#> 9 9 0
#> 10 10 1
#> 11 11 0
#> 12 12 0
I have solved with a while-loop, but was looking for a more R-like solution.
This is the loop solution:
df[is.na(df)] <- 0 # replace all NA with 0
i = 1
while (i < nrow(df)){
if (df$y[i] < 2){ # do nothing if y = 1
i = i+1
} else {
df$y[(i+1):(i+df$y[i]-1)] <- df$y[i]
i = i+df$y[i]
}
}
Bonus question: could it be done within a pipe and for multiple columns (e.g. a column z = c(1, NA, NA, NA, 4, NA, NA, NA, NA, 2, NA, NA))?
You can create an empty vector with numeric, get the value with complete.cases and rep, and get the indices with complete.cases and sequence:
fill_n_repeat <- function(x){
value = x[complete.cases(x)]
idx = which(complete.cases(x))
v = numeric(length(x))
v[sequence(value, idx)] <- rep(value, value)
v
}
library(dplyr)
df %>%
mutate(across(y:z, fill_n_repeat))
x y z
1 1 0 1
2 2 2 0
3 3 2 0
4 4 0 0
5 5 0 4
6 6 3 4
7 7 3 4
8 8 3 4
9 9 0 0
10 10 1 2
11 11 0 2
12 12 0 0
Group the rows so that each non-NA starts a new group and then for each such group if the first element is NA then output 0's and otherwise output the first element that many times followed by 0's. This uses base R only but if you prefer dplyr replace transform with mutate and all else stays the same.
f <- function(x) if (is.na(x[1])) 0 else ifelse(seq_along(x) > x[1], 0, x[1])
transform(df, y = ave(y, cumsum(!is.na(y)), FUN = f))
giving (continued below)
x y
1 1 0
2 2 2
3 3 2
4 4 0
5 5 0
6 6 3
7 7 3
8 8 3
9 9 0
10 10 1
11 11 0
12 12 0
If there were several columns then if ix contains the column numbers to be processed or the column names then using the same f as above then run it over each column to be transformed.
ix <- "y"
f <- function(x) if (is.na(x[1])) 0 else ifelse(seq_along(x) > x[1], 0, x[1])
f2 <- function(i) ave(df[[i]], cumsum(!is.na(df[[i]])), FUN = f)
replace(df, ix, lapply(ix, f2))
Alternatively, please try below code without any custom function
df2 <- df %>% mutate(z=y) %>% fill(z) %>% group_by(y,z) %>%
mutate(row=row_number()+1, y=ifelse(z>=row,z,y)) %>% ungroup() %>%
select(-z,-row)
I am trying to populate the missing values of df1 with df2.
Whenever there is a valid value for the same cell in both df, I need to keep the value as in df1.
If there is a column in df2 that is not present in df1, this new column (z) has to be added to df1.
This would be a simple example:
id <- c (1, 2, 3, 4, 5)
x <- c (10, NA, 20, 50, 70)
y <- c (3, 5, NA, 6, 9)
df1 <- data.frame(id, x, y)
id <- c ( 2, 3, 5)
x <- c (10, NA, NA)
z <- c (NA, 6, 7)
df2 <- data.frame(id, x, z)
I would like to obtain "df3":
id x y z
1 1 10 3 NA
2 2 10 5 NA
3 3 20 6 6
4 4 50 6 NA
5 5 70 9 7
I tried several "merge" options that didn't work.
A 'merge' option after several extract and replace steps could be
idx <- is.na(df1[df2$id,])
df1[df2$id,][idx] <- df2[idx]
out <- merge(df1, df2[, c("id", "z")], by = "id", all.x = TRUE)
Result
out
# id x y z
#1 1 10 3 NA
#2 2 10 5 NA
#3 3 20 6 6
#4 4 50 6 NA
#5 5 70 9 7
I have a database in R where there are some NAs in the variables. I would like to apply a logic function where the NAs would be filled with the immediately preceding value. Below is an example:
dados <- tibble::tibble(x = c(2, 3, 5, NA, 2, 1, NA, NA, 9, 3),
y = c(4, 1, 9, NA, 8, 5, NA, NA, 1, 2)
)
# A tibble: 10 x 2
x y
<dbl> <dbl>
1 2 4
2 3 1
3 5 9
4 NA NA
5 2 8
6 1 5
7 NA NA
8 NA NA
9 9 1
10 3 2
In this case, the 4th value of the variable x would be filled with a 5 and so on.
Thank you!
We could use fill from tidyr package:
ibrary(tidyr)
library(dplyr)
dados %>%
fill(c(x,y), .direction = "down")
x y
<dbl> <dbl>
1 2 4
2 3 1
3 5 9
4 5 9
5 2 8
6 1 5
7 1 5
8 1 5
9 9 1
10 3 2
We can use coalesce
library(dplyr)
dados %>%
mutate(across(x:y, ~ coalesce(., lag(.))))
# A tibble: 10 x 2
x y
<dbl> <dbl>
1 2 4
2 3 1
3 5 9
4 5 9
5 2 8
6 1 5
7 1 5
8 NA NA
9 9 1
10 3 2
library(dplyr)
dados %>%
mutate(x = case_when(is.na(x) ~ lag(x),
TRUE ~ x),
y = case_when(is.na(y) ~ lag(y),
TRUE ~ y))
The follow will only work, if the first value in a column is not NA but I leave that for the sake of clear and easy code as an execise for you we can solve this for one column as in:
library(tibble)
dados <- tibble::tibble(x = c(2, 3, 5, NA, 2, 1, NA, NA, 9, 3),
y = c(4, 1, 9, NA, 8, 5, NA, NA, 1, 2)
)
#where are the NA?
pos <- dados$x |>
is.na() |>
which()
# replace
while(any(is.na(dados$x)))
dados$x[pos] <- dados$x[pos-1]
dados
This question already has answers here:
Combine two data frames by rows (rbind) when they have different sets of columns
(14 answers)
Closed 2 years ago.
I want to bind df1 with df2 by row, keeping the same column name, to obtain df3.
library(tidyverse)
df1 <- tibble(a = c(1, 2, 3),
b = c(4, 5, 6),
c = c(1, 5, 7))
df2 <- tibble(a = c(8, 9),
b = c(5, 6))
# how to bind these tibbles by row to get
df3 <- tibble(a = c(1, 2, 3, 8, 9),
b = c(4, 5, 6, 5, 6),
c = c(1, 5, 7, NA, NA))
Created on 2020-10-30 by the reprex package (v0.3.0)
Try this using bind_rows() from dplyr. Updated credit to #AbdessabourMtk:
df3 <- dplyr::bind_rows(df1,df2)
Output:
# A tibble: 5 x 3
a b c
<dbl> <dbl> <dbl>
1 1 4 1
2 2 5 5
3 3 6 7
4 8 5 NA
5 9 6 NA
A base R option
df2[setdiff(names(df1),names(df2))]<-NA
df3 <- rbind(df1,df2)
giving
> df3
# A tibble: 5 x 3
a b c
<dbl> <dbl> <dbl>
1 1 4 1
2 2 5 5
3 3 6 7
4 8 5 NA
5 9 6 NA
We can use rbindlist from data.table
library(data.table)
rbindlist(list(df1, df2), fill = TRUE)
-output
# a b c
#1: 1 4 1
#2: 2 5 5
#3: 3 6 7
#4: 8 5 NA
#5: 9 6 NA
My dataframe
a<-c(2, 4, 6, 6, 8, 10, 12, 13, 14)
c<-c(2, 2, 2, 2, 2, 2, 4, 4,4)
d<-c(10, 10, 10, 30, 30, 30, 50, 50, 50)
ID<-rep(c("no","bo", "fo"), each=3)
mydata<-data.frame(ID, a, c, d)
gg.df <- melt(mydata, id="ID", variable.name="variable")
I want to subset just the variable "no", I have tried:
gg.df[,"variable"=="no"]
which returns
data frame with 0 columns and 27 rows
Please refer to the subset function:
no.df <- subset( x = gg.df
, subset = ID == "no"
)
ID variable value
1 no a 2
2 no a 4
3 no a 6
10 no c 2
11 no c 2
12 no c 2
19 no d 10
20 no d 10
21 no d 10
Or:
gg.df[ ID == "no", ]
ID variable value
1 no a 2
2 no a 4
3 no a 6
10 no c 2
11 no c 2
12 no c 2
19 no d 10
20 no d 10
21 no d 10