Replacing character values with NA in a data frame - r

I have a data frame containing (in random places) a character value (say "foo") that I want to replace with a NA.
What's the best way to do so across the whole data frame?

This:
df[df == "foo"] <- NA

One way to nip this in the bud is to convert that character to NA when you read the data in in the first place.
df <- read.csv("file.csv", na.strings = c("foo", "bar"))

Using dplyr::na_if, you can replace specific values with NA. In this case, that would be "foo".
library(dplyr)
set.seed(1234)
df <- data.frame(
id = 1:6,
x = sample(c("a", "b", "foo"), 6, replace = T),
y = sample(c("c", "d", "foo"), 6, replace = T),
z = sample(c("e", "f", "foo"), 6, replace = T),
stringsAsFactors = F
)
df
#> id x y z
#> 1 1 a c e
#> 2 2 b c foo
#> 3 3 b d e
#> 4 4 b d foo
#> 5 5 foo foo e
#> 6 6 b d e
na_if(df$x, "foo")
#> [1] "a" "b" "b" "b" NA "b"
If you need to do this for multiple columns, you can pass "foo" through from mutate with across (updated for dplyr v1.0.0+).
df %>%
mutate(across(c(x, y, z), na_if, "foo"))
#> id x y z
#> 1 1 a c e
#> 2 2 b c <NA>
#> 3 3 b d e
#> 4 4 b d <NA>
#> 5 5 <NA> <NA> e
#> 6 6 b d e

Another option is is.na<-:
is.na(df) <- df == "foo"
Note that its use may seem a bit counter-intuitive, but it actually assigns NA values to df at the index on the right hand side.

This could be done with dplyr::mutate_all() and replace:
library(dplyr)
df <- data_frame(a = c('foo', 2, 3), b = c(1, 'foo', 3), c = c(1,2,'foobar'), d = c(1, 2, 3))
> df
# A tibble: 3 x 4
a b c d
<chr> <chr> <chr> <dbl>
1 foo 1 1 1
2 2 foo 2 2
3 3 3 foobar 3
df <- mutate_all(df, funs(replace(., .=='foo', NA)))
> df
# A tibble: 3 x 4
a b c d
<chr> <chr> <chr> <dbl>
1 <NA> 1 1 1
2 2 <NA> 2 2
3 3 3 foobar 3
Another dplyr option is:
df <- na_if(df, 'foo')

Assuming you do not know the column names or have large number of columns to select, is.character() might be of use.
df <- data.frame(
id = 1:6,
x = sample(c("a", "b", "foo"), 6, replace = T),
y = sample(c("c", "d", "foo"), 6, replace = T),
z = sample(c("e", "f", "foo"), 6, replace = T),
stringsAsFactors = F
)
df
# id x y z
# 1 1 b d e
# 2 2 a foo foo
# 3 3 a d foo
# 4 4 b foo foo
# 5 5 foo foo e
# 6 6 foo foo f
df %>%
mutate_if(is.character, list(~na_if(., "foo")))
# id x y z
# 1 1 b d e
# 2 2 a <NA> <NA>
# 3 3 a d <NA>
# 4 4 b <NA> <NA>
# 5 5 <NA> <NA> e
# 6 6 <NA> <NA> f

One alternate way to solve is below:
for (i in 1:ncol(DF)){
DF[which(DF[,i]==""),columnIndex]<-"ALL"
FinalData[which(is.na(FinalData[,columnIndex])),columnIndex]<-"ALL"
}

Related

Stack every other column in R

I'm looking to stack every other column under the previous column in R. Any suggestions?
For example:
c1
c2
c3
c4
A
1
D
4
B
2
E
5
C
3
F
6
dat <- data.frame(
c1 = c("A", "B", "C"),
c2 = c(1, 2, 3),
c3 = c("D", "E", "F"),
c4 = c(4, 5, 6))
To look like this:
c1
c2
A
D
1
4
B
E
2
5
C
F
3
6
dat2 <- data.frame(
c1 = c("A", 1, "B", 2, "C", 3),
c2 = c("D", 4, "E", 5, "F", 6))
Thanks in advance.
A basic way with stack():
as.data.frame(sapply(seq(1, ncol(dat), 2), \(x) stack(dat[x:(x+1)])[[1]]))
# V1 V2
# 1 A D
# 2 B E
# 3 C F
# 4 1 4
# 5 2 5
# 6 3 6
You could also rename the data with a structural foramt and pass it into tidyr::pivot_longer():
library(dplyr)
library(tidyr)
dat %>%
rename_with(~ paste0('c', ceiling(seq_along(.x) / 2), '_', 1:2)) %>%
mutate(across(, as.character)) %>%
pivot_longer(everything(), names_to = c(".value", NA), names_sep = '_')
# # A tibble: 6 × 2
# c1 c2
# <chr> <chr>
# 1 A D
# 2 1 4
# 3 B E
# 4 2 5
# 5 C F
# 6 3 6
The rename line transforms c1 - c4 to c1_1, c1_2, c2_1, c2_2.
For an even numbered number of columns you can do something like:
do.call(cbind, lapply(seq(length(dat)/2), \(x) stack(dat[ ,x + ((x-1):x)])[1])) |>
set_names(names(dat)[seq(length(dat)/2)])
c1 c2
1 A D
2 B E
3 C F
4 1 4
5 2 5
6 3 6
This gets the first column when calling stack and cbinds all the stacked columns.

Spread multiple values to unique values in data frame in R [duplicate]

This question already has answers here:
Unique combination of all elements from two (or more) vectors
(6 answers)
Closed 1 year ago.
Suppose I have a data frame with a list of names:
> x <- c("a", "b", "c")
> x <- as.data.frame(x)
# > x
# 1 a
# 2 b
# 3 c
I want to spread each unique name (x, below) to each name (y, below) and create a new column before the original column so that the new data frame looks like this:
# > z
# x y
# a a
# a b
# a c
# b a
# b b
# b c
# c a
# c b
# c c
This is for creating a "from" "to" edge list in igraph where the network is full.
How could I do this? Is there a simple tidyverse solution that I'm missing?
You can use tidyr::expand_grid or tidyr::crossing
tidyr::expand_grid(a = x$x, b = x$x)
#tidyr::crossing(a = x$x, b = x$x)
# a b
# <chr> <chr>
#1 a a
#2 a b
#3 a c
#4 b a
#5 b b
#6 b c
#7 c a
#8 c b
#9 c c
This is similar to base R expand.grid only the order is different.
expand.grid(a = x$x, b = x$x)
Using dplyr and tidyr, you could do:
x %>%
mutate(y = x) %>%
complete(y, x)
y x
<fct> <fct>
1 a a
2 a b
3 a c
4 b a
5 b b
6 b c
7 c a
8 c b
9 c c
A base R solution:
names <- c("a", "b", "c")
x = rep(names, each=length(names))
y = rep(names, length(names))
df = data.frame(x,y)
df
x y
1 a a
2 a b
3 a c
4 b a
5 b b
6 b c
7 c a
8 c b
9 c c
You can also use expand function to return every possible combinations of the two columns:
library(tidyr)
x %>%
mutate(y = x) %>%
expand(x, y)
# A tibble: 9 x 2
x y
<chr> <chr>
1 a a
2 a b
3 a c
4 b a
5 b b
6 b c
7 c a
8 c b
9 c c
You can also use crossing function:
x <- c("a", "b", "c")
x <- as.data.frame(x)
x$y <- c("a", "b", "c")
crossing(x$x, x$y) # But you can't just use it within a pipeline since the first argument is not data
# A tibble: 9 x 2
`x$x` `x$y`
<chr> <chr>
1 a a
2 a b
3 a c
4 b a
5 b b
6 b c
7 c a
8 c b
9 c c
If you really want to use igraph, here might be one option
make_full_graph(
length(x),
directed = TRUE,
loops = TRUE
) %>%
set_vertex_attr(name = "name", value = x) %>%
get.data.frame()
which gives
from to
1 a a
2 a b
3 a c
4 b a
5 b b
6 b c
7 c a
8 c b
9 c c

Error: replacement has 0 rows, data has 22 in for loop [duplicate]

I have a data frame containing (in random places) a character value (say "foo") that I want to replace with a NA.
What's the best way to do so across the whole data frame?
This:
df[df == "foo"] <- NA
One way to nip this in the bud is to convert that character to NA when you read the data in in the first place.
df <- read.csv("file.csv", na.strings = c("foo", "bar"))
Using dplyr::na_if, you can replace specific values with NA. In this case, that would be "foo".
library(dplyr)
set.seed(1234)
df <- data.frame(
id = 1:6,
x = sample(c("a", "b", "foo"), 6, replace = T),
y = sample(c("c", "d", "foo"), 6, replace = T),
z = sample(c("e", "f", "foo"), 6, replace = T),
stringsAsFactors = F
)
df
#> id x y z
#> 1 1 a c e
#> 2 2 b c foo
#> 3 3 b d e
#> 4 4 b d foo
#> 5 5 foo foo e
#> 6 6 b d e
na_if(df$x, "foo")
#> [1] "a" "b" "b" "b" NA "b"
If you need to do this for multiple columns, you can pass "foo" through from mutate with across (updated for dplyr v1.0.0+).
df %>%
mutate(across(c(x, y, z), na_if, "foo"))
#> id x y z
#> 1 1 a c e
#> 2 2 b c <NA>
#> 3 3 b d e
#> 4 4 b d <NA>
#> 5 5 <NA> <NA> e
#> 6 6 b d e
Another option is is.na<-:
is.na(df) <- df == "foo"
Note that its use may seem a bit counter-intuitive, but it actually assigns NA values to df at the index on the right hand side.
This could be done with dplyr::mutate_all() and replace:
library(dplyr)
df <- data_frame(a = c('foo', 2, 3), b = c(1, 'foo', 3), c = c(1,2,'foobar'), d = c(1, 2, 3))
> df
# A tibble: 3 x 4
a b c d
<chr> <chr> <chr> <dbl>
1 foo 1 1 1
2 2 foo 2 2
3 3 3 foobar 3
df <- mutate_all(df, funs(replace(., .=='foo', NA)))
> df
# A tibble: 3 x 4
a b c d
<chr> <chr> <chr> <dbl>
1 <NA> 1 1 1
2 2 <NA> 2 2
3 3 3 foobar 3
Another dplyr option is:
df <- na_if(df, 'foo')
Assuming you do not know the column names or have large number of columns to select, is.character() might be of use.
df <- data.frame(
id = 1:6,
x = sample(c("a", "b", "foo"), 6, replace = T),
y = sample(c("c", "d", "foo"), 6, replace = T),
z = sample(c("e", "f", "foo"), 6, replace = T),
stringsAsFactors = F
)
df
# id x y z
# 1 1 b d e
# 2 2 a foo foo
# 3 3 a d foo
# 4 4 b foo foo
# 5 5 foo foo e
# 6 6 foo foo f
df %>%
mutate_if(is.character, list(~na_if(., "foo")))
# id x y z
# 1 1 b d e
# 2 2 a <NA> <NA>
# 3 3 a d <NA>
# 4 4 b <NA> <NA>
# 5 5 <NA> <NA> e
# 6 6 <NA> <NA> f
One alternate way to solve is below:
for (i in 1:ncol(DF)){
DF[which(DF[,i]==""),columnIndex]<-"ALL"
FinalData[which(is.na(FinalData[,columnIndex])),columnIndex]<-"ALL"
}

How to add a row to data frame based on a condition

I've a dataframe which I want to add a row on the basis of the following conditions. The conditions are when column a is equal to C and column b is equal to 3 or 5.
Here is my dataframe
df <- data.frame(a = c("A", "B", "C", "D", "C", "A", "C", "E"),
b = c(seq(8)), stringsAsFactors = TRUE)
Whenever the condition is TRUE I want to add a row below where the condition is met add 3. I have tried the following
rbind(df, data.frame(a="add", b = "3"))
# a b
# 1 A 1
# 2 B 2
# 3 C 3
# 4 D 4
# 5 C 5
# 6 A 6
# 7 C 7
# 8 E 8
# 9 add 3
This is not the output I want. The output I want is
# a b
# 1 A 1
# 2 B 2
# 3 C 3
# 4 add 3
# 5 D 4
# 6 C 5
# 7 add 3
# 8 A 6
# 9 C 7
# 10 E 8
How can I do that? I am new to R and thank you for your help.
lens = ifelse(df$b %in% c(3, 5) & df$a == "C", 2, 1)
ind = rep(1:NROW(df), lens)
df2 = df[ind,]
df2$a = as.character(df2$a)
df2$a[cumsum(lens)[which(lens == 2)]] = "add"
df2$b[cumsum(lens)[which(lens == 2)]] = 3
df2
# a b
#1 A 1
#2 B 2
#3 C 3
#3.1 add 3
#4 D 4
#5 C 5
#5.1 add 3
#6 A 6
#7 C 7
#8 E 8
A solution using the tidyverse package.
library(tidyverse)
df2 <- df %>%
mutate(Group = lag(cumsum(a == "C" & b %in% c(3, 5)), default = FALSE)) %>%
group_split(Group) %>%
map_dfr(~ .x %>% bind_rows(tibble(a = "add", b = 3))) %>%
slice(-n()) %>%
select(-Group)
df2
# # A tibble: 10 x 2
# a b
# <chr> <dbl>
# 1 A 1
# 2 B 2
# 3 C 3
# 4 add 3
# 5 D 4
# 6 C 5
# 7 add 3
# 8 A 6
# 9 C 7
# 10 E 8
In base R, we can find out position where a = "c" and b is 3 or 5. Repeat those rows in the dataframe and replace them with required values.
pos <- which(df$a == "C" & df$b %in% c(3, 5))
df <- df[sort(c(seq(nrow(df)), pos)), ]
df[seq_along(pos) + pos, ] <- list("add", 3)
row.names(df) <- NULL
df
# a b
#1 A 1
#2 B 2
#3 C 3
#4 add 3
#5 D 4
#6 C 5
#7 add 3
#8 A 6
#9 C 7
#10 E 8
data
df <- data.frame(a = c("A", "B", "C", "D", "C", "A", "C", "E"),
b = c(seq(8)), stringsAsFactors = FALSE)

Fill in cells with alternating pattern

I am trying to fill in blank cells with the value of rows above. Similar to na.locf function, but I have a pattern that needs to be matched. I don't necessarily know how many rows between new values (i.e betweem a,b and c,d).
I have used the na.locf and searched around for a solution to no avail.
df <- df <- data.frame(col1 = c("a","b", NA, NA, NA, NA, "c", "d", NA, NA))
df
# col1
# 1 a
# 2 b
# 3 <NA>
# 4 <NA>
# 5 <NA>
# 6 <NA>
# 7 c
# 8 d
# 9 <NA>
# 10 <NA>
Solution I would like:
df
col1
a
b
a
b
a
b
c
d
c
d
ave(df$col1,
with(rle(!is.na(df$col1)), rep(cumsum(values), lengths)),
FUN = function(x){
rep(x[!is.na(x)], length.out = length(x))
})
# [1] a b a b a b c d c d
Here's way with dplyr. You can drop the group column if needed. -
df %>%
group_by(group = cumsum(is.na(lag(col1)) & !is.na(col1))) %>%
mutate(
col1 = rep(col1[!is.na(col1)], length.out = n())
) %>%
ungroup()
# A tibble: 10 x 2
col1 group
<chr> <int>
1 a 1
2 b 1
3 a 1
4 b 1
5 a 1
6 b 1
7 c 2
8 d 2
9 c 2
10 d 2

Resources