I've a dataframe which I want to add a row on the basis of the following conditions. The conditions are when column a is equal to C and column b is equal to 3 or 5.
Here is my dataframe
df <- data.frame(a = c("A", "B", "C", "D", "C", "A", "C", "E"),
b = c(seq(8)), stringsAsFactors = TRUE)
Whenever the condition is TRUE I want to add a row below where the condition is met add 3. I have tried the following
rbind(df, data.frame(a="add", b = "3"))
# a b
# 1 A 1
# 2 B 2
# 3 C 3
# 4 D 4
# 5 C 5
# 6 A 6
# 7 C 7
# 8 E 8
# 9 add 3
This is not the output I want. The output I want is
# a b
# 1 A 1
# 2 B 2
# 3 C 3
# 4 add 3
# 5 D 4
# 6 C 5
# 7 add 3
# 8 A 6
# 9 C 7
# 10 E 8
How can I do that? I am new to R and thank you for your help.
lens = ifelse(df$b %in% c(3, 5) & df$a == "C", 2, 1)
ind = rep(1:NROW(df), lens)
df2 = df[ind,]
df2$a = as.character(df2$a)
df2$a[cumsum(lens)[which(lens == 2)]] = "add"
df2$b[cumsum(lens)[which(lens == 2)]] = 3
df2
# a b
#1 A 1
#2 B 2
#3 C 3
#3.1 add 3
#4 D 4
#5 C 5
#5.1 add 3
#6 A 6
#7 C 7
#8 E 8
A solution using the tidyverse package.
library(tidyverse)
df2 <- df %>%
mutate(Group = lag(cumsum(a == "C" & b %in% c(3, 5)), default = FALSE)) %>%
group_split(Group) %>%
map_dfr(~ .x %>% bind_rows(tibble(a = "add", b = 3))) %>%
slice(-n()) %>%
select(-Group)
df2
# # A tibble: 10 x 2
# a b
# <chr> <dbl>
# 1 A 1
# 2 B 2
# 3 C 3
# 4 add 3
# 5 D 4
# 6 C 5
# 7 add 3
# 8 A 6
# 9 C 7
# 10 E 8
In base R, we can find out position where a = "c" and b is 3 or 5. Repeat those rows in the dataframe and replace them with required values.
pos <- which(df$a == "C" & df$b %in% c(3, 5))
df <- df[sort(c(seq(nrow(df)), pos)), ]
df[seq_along(pos) + pos, ] <- list("add", 3)
row.names(df) <- NULL
df
# a b
#1 A 1
#2 B 2
#3 C 3
#4 add 3
#5 D 4
#6 C 5
#7 add 3
#8 A 6
#9 C 7
#10 E 8
data
df <- data.frame(a = c("A", "B", "C", "D", "C", "A", "C", "E"),
b = c(seq(8)), stringsAsFactors = FALSE)
Related
I have a dataset like
df <- data.frame(id = c("a","a","b","b","c","d","e","f"),
val = c(1,2,3,4,5,6,7,8),
extracol = c("x",NA,"y","z","t","v","u","p"))
id val extracol
1 a 1 x
2 a 2 <NA>
3 b 3 y
4 b 4 z
5 c 5 t
6 d 6 v
7 e 7 u
8 f 8 p
and I want to sum (and aggregate) the values according to the column id but only for "a". So I want to get something like:
id val extracol
1 a 3 x
2 b 3 y
3 b 4 z
4 c 5 t
5 d 6 v
6 e 7 u
7 f 8 p
I really don't care if I get "x" or NA in the extracol. Any suggestion?
This would work:
library(dplyr)
df <- data.frame(id = c("a","a","b","b","c","d","e","f"),
val = c(1,2,3,4,5,6,7,8),
extracol = c("x",NA,"y","z","t","v","u","p"))
# keep only a
a = df%>% filter(id == "a")
# aggregate a
a_agg= a %>% group_by(id) %>% summarise(val = sum(val), extracol = first(extracol))
# drop a
df = df %>% filter(id != "a")
# append a
df = rbind(df, a_agg)
df
id val extracol
1 b 3 y
2 b 4 z
3 c 5 t
4 d 6 v
5 e 7 u
6 f 8 p
7 a 3 x
A base R option
with(
df,
rbind(
data.frame(
id = "a",
val = sum(val[id == "a"]),
extracol = na.omit(extracol[id == "a"])
),
df[id != "a", ]
)
)
gives
id val extracol
1 a 3 x
3 b 3 y
4 b 4 z
5 c 5 t
6 d 6 v
7 e 7 u
8 f 8 p
I am working with data with multiple row headers. I am willing to add these headers into data.
EX:
I want to change this
X
A 1
B 2
C 3
D 4
E 5
into this
Y X
A 1
B 2
C 3
D 4
E 5
I want to keep Y and X as headers but make A,B,C,D,E are column values.
We can use rownames_to_column from tibble
library(tibble)
library(dplyr)
df1 %>%
rownames_to_column("Y")
-output
# Y X
#1 A 1
#2 B 2
#3 C 3
#4 D 4
#5 E 5
Or with data.frame
data.frame(Y = row.names(df1), X = df1$X)
-output
# Y X
#1 A 1
#2 B 2
#3 C 3
#4 D 4
#5 E 5
NOTE: Both are single line codes
data
df1 <- structure(list(X = 1:5), class = "data.frame", row.names = c("A",
"B", "C", "D", "E"))
You can also try:
#Code
new <- as.data.frame(cbind(Y=rownames(df),df),row.names = NULL)
rownames(new)<-NULL
Output:
new
Y X
1 A 1
2 B 2
3 C 3
4 D 4
5 E 5
Some data used:
#Data
df <- structure(list(X = 1:5), class = "data.frame", row.names = c("A",
"B", "C", "D", "E"))
I have a data frame containing (in random places) a character value (say "foo") that I want to replace with a NA.
What's the best way to do so across the whole data frame?
This:
df[df == "foo"] <- NA
One way to nip this in the bud is to convert that character to NA when you read the data in in the first place.
df <- read.csv("file.csv", na.strings = c("foo", "bar"))
Using dplyr::na_if, you can replace specific values with NA. In this case, that would be "foo".
library(dplyr)
set.seed(1234)
df <- data.frame(
id = 1:6,
x = sample(c("a", "b", "foo"), 6, replace = T),
y = sample(c("c", "d", "foo"), 6, replace = T),
z = sample(c("e", "f", "foo"), 6, replace = T),
stringsAsFactors = F
)
df
#> id x y z
#> 1 1 a c e
#> 2 2 b c foo
#> 3 3 b d e
#> 4 4 b d foo
#> 5 5 foo foo e
#> 6 6 b d e
na_if(df$x, "foo")
#> [1] "a" "b" "b" "b" NA "b"
If you need to do this for multiple columns, you can pass "foo" through from mutate with across (updated for dplyr v1.0.0+).
df %>%
mutate(across(c(x, y, z), na_if, "foo"))
#> id x y z
#> 1 1 a c e
#> 2 2 b c <NA>
#> 3 3 b d e
#> 4 4 b d <NA>
#> 5 5 <NA> <NA> e
#> 6 6 b d e
Another option is is.na<-:
is.na(df) <- df == "foo"
Note that its use may seem a bit counter-intuitive, but it actually assigns NA values to df at the index on the right hand side.
This could be done with dplyr::mutate_all() and replace:
library(dplyr)
df <- data_frame(a = c('foo', 2, 3), b = c(1, 'foo', 3), c = c(1,2,'foobar'), d = c(1, 2, 3))
> df
# A tibble: 3 x 4
a b c d
<chr> <chr> <chr> <dbl>
1 foo 1 1 1
2 2 foo 2 2
3 3 3 foobar 3
df <- mutate_all(df, funs(replace(., .=='foo', NA)))
> df
# A tibble: 3 x 4
a b c d
<chr> <chr> <chr> <dbl>
1 <NA> 1 1 1
2 2 <NA> 2 2
3 3 3 foobar 3
Another dplyr option is:
df <- na_if(df, 'foo')
Assuming you do not know the column names or have large number of columns to select, is.character() might be of use.
df <- data.frame(
id = 1:6,
x = sample(c("a", "b", "foo"), 6, replace = T),
y = sample(c("c", "d", "foo"), 6, replace = T),
z = sample(c("e", "f", "foo"), 6, replace = T),
stringsAsFactors = F
)
df
# id x y z
# 1 1 b d e
# 2 2 a foo foo
# 3 3 a d foo
# 4 4 b foo foo
# 5 5 foo foo e
# 6 6 foo foo f
df %>%
mutate_if(is.character, list(~na_if(., "foo")))
# id x y z
# 1 1 b d e
# 2 2 a <NA> <NA>
# 3 3 a d <NA>
# 4 4 b <NA> <NA>
# 5 5 <NA> <NA> e
# 6 6 <NA> <NA> f
One alternate way to solve is below:
for (i in 1:ncol(DF)){
DF[which(DF[,i]==""),columnIndex]<-"ALL"
FinalData[which(is.na(FinalData[,columnIndex])),columnIndex]<-"ALL"
}
I have data.frame df1 and a data.frame df2. How do I use df2 to mutate/transform df1 to merged data.frame where: column name will be filled with the value on df2$name if df1$id >= df2$start and <= df2$end.
df1 = data.frame(id = 1:10, c = letters[1:10])
df2 = data.frame(name = LETTERS[1:3], start = c(2, 5, 8), end = c(4,7, 9))
merged = data.frame(id = df1$id, c = df1$c, name = c(NA, "A", "A", "A", "B", "B", "B", "C", "C", NA) )
Visually:
> df1
id c
1 1 a
2 2 b
3 3 c
4 4 d
5 5 e
6 6 f
7 7 g
8 8 h
9 9 i
10 10 j
> df2
name start end
1 A 2 4
2 B 5 7
3 C 8 9
> merged
id c name
1 1 a <NA>
2 2 b A
3 3 c A
4 4 d A
5 5 e B
6 6 f B
7 7 g B
8 8 h C
9 9 i C
10 10 j <NA>
We can use non-equi join with data.table and assign a new column with the corresponding values of 'name' where the conditional join is met
library(data.table)
setDT(df1)[df2, cn := name, on = .(id > start, id <= end)]
df1
# id c cn
# 1: 1 a <NA>
# 2: 2 b <NA>
# 3: 3 c A
# 4: 4 d A
# 5: 5 e <NA>
# 6: 6 f B
# 7: 7 g B
# 8: 8 h <NA>
# 9: 9 i C
#10: 10 j <NA>
Or another option is fuzzyjoin
library(fuzzyjoin)
library(dplyr)
fuzzy_left_join(df1, df2, by = c('id' = 'start', 'id' = 'end'),
match_fun = list(`>`, `<=`)) %>%
select(id, c, cn = name)
I am trying to fill in blank cells with the value of rows above. Similar to na.locf function, but I have a pattern that needs to be matched. I don't necessarily know how many rows between new values (i.e betweem a,b and c,d).
I have used the na.locf and searched around for a solution to no avail.
df <- df <- data.frame(col1 = c("a","b", NA, NA, NA, NA, "c", "d", NA, NA))
df
# col1
# 1 a
# 2 b
# 3 <NA>
# 4 <NA>
# 5 <NA>
# 6 <NA>
# 7 c
# 8 d
# 9 <NA>
# 10 <NA>
Solution I would like:
df
col1
a
b
a
b
a
b
c
d
c
d
ave(df$col1,
with(rle(!is.na(df$col1)), rep(cumsum(values), lengths)),
FUN = function(x){
rep(x[!is.na(x)], length.out = length(x))
})
# [1] a b a b a b c d c d
Here's way with dplyr. You can drop the group column if needed. -
df %>%
group_by(group = cumsum(is.na(lag(col1)) & !is.na(col1))) %>%
mutate(
col1 = rep(col1[!is.na(col1)], length.out = n())
) %>%
ungroup()
# A tibble: 10 x 2
col1 group
<chr> <int>
1 a 1
2 b 1
3 a 1
4 b 1
5 a 1
6 b 1
7 c 2
8 d 2
9 c 2
10 d 2