I'm looking to stack every other column under the previous column in R. Any suggestions?
For example:
c1
c2
c3
c4
A
1
D
4
B
2
E
5
C
3
F
6
dat <- data.frame(
c1 = c("A", "B", "C"),
c2 = c(1, 2, 3),
c3 = c("D", "E", "F"),
c4 = c(4, 5, 6))
To look like this:
c1
c2
A
D
1
4
B
E
2
5
C
F
3
6
dat2 <- data.frame(
c1 = c("A", 1, "B", 2, "C", 3),
c2 = c("D", 4, "E", 5, "F", 6))
Thanks in advance.
A basic way with stack():
as.data.frame(sapply(seq(1, ncol(dat), 2), \(x) stack(dat[x:(x+1)])[[1]]))
# V1 V2
# 1 A D
# 2 B E
# 3 C F
# 4 1 4
# 5 2 5
# 6 3 6
You could also rename the data with a structural foramt and pass it into tidyr::pivot_longer():
library(dplyr)
library(tidyr)
dat %>%
rename_with(~ paste0('c', ceiling(seq_along(.x) / 2), '_', 1:2)) %>%
mutate(across(, as.character)) %>%
pivot_longer(everything(), names_to = c(".value", NA), names_sep = '_')
# # A tibble: 6 × 2
# c1 c2
# <chr> <chr>
# 1 A D
# 2 1 4
# 3 B E
# 4 2 5
# 5 C F
# 6 3 6
The rename line transforms c1 - c4 to c1_1, c1_2, c2_1, c2_2.
For an even numbered number of columns you can do something like:
do.call(cbind, lapply(seq(length(dat)/2), \(x) stack(dat[ ,x + ((x-1):x)])[1])) |>
set_names(names(dat)[seq(length(dat)/2)])
c1 c2
1 A D
2 B E
3 C F
4 1 4
5 2 5
6 3 6
This gets the first column when calling stack and cbinds all the stacked columns.
Related
I have a data frame containing (in random places) a character value (say "foo") that I want to replace with a NA.
What's the best way to do so across the whole data frame?
This:
df[df == "foo"] <- NA
One way to nip this in the bud is to convert that character to NA when you read the data in in the first place.
df <- read.csv("file.csv", na.strings = c("foo", "bar"))
Using dplyr::na_if, you can replace specific values with NA. In this case, that would be "foo".
library(dplyr)
set.seed(1234)
df <- data.frame(
id = 1:6,
x = sample(c("a", "b", "foo"), 6, replace = T),
y = sample(c("c", "d", "foo"), 6, replace = T),
z = sample(c("e", "f", "foo"), 6, replace = T),
stringsAsFactors = F
)
df
#> id x y z
#> 1 1 a c e
#> 2 2 b c foo
#> 3 3 b d e
#> 4 4 b d foo
#> 5 5 foo foo e
#> 6 6 b d e
na_if(df$x, "foo")
#> [1] "a" "b" "b" "b" NA "b"
If you need to do this for multiple columns, you can pass "foo" through from mutate with across (updated for dplyr v1.0.0+).
df %>%
mutate(across(c(x, y, z), na_if, "foo"))
#> id x y z
#> 1 1 a c e
#> 2 2 b c <NA>
#> 3 3 b d e
#> 4 4 b d <NA>
#> 5 5 <NA> <NA> e
#> 6 6 b d e
Another option is is.na<-:
is.na(df) <- df == "foo"
Note that its use may seem a bit counter-intuitive, but it actually assigns NA values to df at the index on the right hand side.
This could be done with dplyr::mutate_all() and replace:
library(dplyr)
df <- data_frame(a = c('foo', 2, 3), b = c(1, 'foo', 3), c = c(1,2,'foobar'), d = c(1, 2, 3))
> df
# A tibble: 3 x 4
a b c d
<chr> <chr> <chr> <dbl>
1 foo 1 1 1
2 2 foo 2 2
3 3 3 foobar 3
df <- mutate_all(df, funs(replace(., .=='foo', NA)))
> df
# A tibble: 3 x 4
a b c d
<chr> <chr> <chr> <dbl>
1 <NA> 1 1 1
2 2 <NA> 2 2
3 3 3 foobar 3
Another dplyr option is:
df <- na_if(df, 'foo')
Assuming you do not know the column names or have large number of columns to select, is.character() might be of use.
df <- data.frame(
id = 1:6,
x = sample(c("a", "b", "foo"), 6, replace = T),
y = sample(c("c", "d", "foo"), 6, replace = T),
z = sample(c("e", "f", "foo"), 6, replace = T),
stringsAsFactors = F
)
df
# id x y z
# 1 1 b d e
# 2 2 a foo foo
# 3 3 a d foo
# 4 4 b foo foo
# 5 5 foo foo e
# 6 6 foo foo f
df %>%
mutate_if(is.character, list(~na_if(., "foo")))
# id x y z
# 1 1 b d e
# 2 2 a <NA> <NA>
# 3 3 a d <NA>
# 4 4 b <NA> <NA>
# 5 5 <NA> <NA> e
# 6 6 <NA> <NA> f
One alternate way to solve is below:
for (i in 1:ncol(DF)){
DF[which(DF[,i]==""),columnIndex]<-"ALL"
FinalData[which(is.na(FinalData[,columnIndex])),columnIndex]<-"ALL"
}
I've a dataframe which I want to add a row on the basis of the following conditions. The conditions are when column a is equal to C and column b is equal to 3 or 5.
Here is my dataframe
df <- data.frame(a = c("A", "B", "C", "D", "C", "A", "C", "E"),
b = c(seq(8)), stringsAsFactors = TRUE)
Whenever the condition is TRUE I want to add a row below where the condition is met add 3. I have tried the following
rbind(df, data.frame(a="add", b = "3"))
# a b
# 1 A 1
# 2 B 2
# 3 C 3
# 4 D 4
# 5 C 5
# 6 A 6
# 7 C 7
# 8 E 8
# 9 add 3
This is not the output I want. The output I want is
# a b
# 1 A 1
# 2 B 2
# 3 C 3
# 4 add 3
# 5 D 4
# 6 C 5
# 7 add 3
# 8 A 6
# 9 C 7
# 10 E 8
How can I do that? I am new to R and thank you for your help.
lens = ifelse(df$b %in% c(3, 5) & df$a == "C", 2, 1)
ind = rep(1:NROW(df), lens)
df2 = df[ind,]
df2$a = as.character(df2$a)
df2$a[cumsum(lens)[which(lens == 2)]] = "add"
df2$b[cumsum(lens)[which(lens == 2)]] = 3
df2
# a b
#1 A 1
#2 B 2
#3 C 3
#3.1 add 3
#4 D 4
#5 C 5
#5.1 add 3
#6 A 6
#7 C 7
#8 E 8
A solution using the tidyverse package.
library(tidyverse)
df2 <- df %>%
mutate(Group = lag(cumsum(a == "C" & b %in% c(3, 5)), default = FALSE)) %>%
group_split(Group) %>%
map_dfr(~ .x %>% bind_rows(tibble(a = "add", b = 3))) %>%
slice(-n()) %>%
select(-Group)
df2
# # A tibble: 10 x 2
# a b
# <chr> <dbl>
# 1 A 1
# 2 B 2
# 3 C 3
# 4 add 3
# 5 D 4
# 6 C 5
# 7 add 3
# 8 A 6
# 9 C 7
# 10 E 8
In base R, we can find out position where a = "c" and b is 3 or 5. Repeat those rows in the dataframe and replace them with required values.
pos <- which(df$a == "C" & df$b %in% c(3, 5))
df <- df[sort(c(seq(nrow(df)), pos)), ]
df[seq_along(pos) + pos, ] <- list("add", 3)
row.names(df) <- NULL
df
# a b
#1 A 1
#2 B 2
#3 C 3
#4 add 3
#5 D 4
#6 C 5
#7 add 3
#8 A 6
#9 C 7
#10 E 8
data
df <- data.frame(a = c("A", "B", "C", "D", "C", "A", "C", "E"),
b = c(seq(8)), stringsAsFactors = FALSE)
I am trying to fill in blank cells with the value of rows above. Similar to na.locf function, but I have a pattern that needs to be matched. I don't necessarily know how many rows between new values (i.e betweem a,b and c,d).
I have used the na.locf and searched around for a solution to no avail.
df <- df <- data.frame(col1 = c("a","b", NA, NA, NA, NA, "c", "d", NA, NA))
df
# col1
# 1 a
# 2 b
# 3 <NA>
# 4 <NA>
# 5 <NA>
# 6 <NA>
# 7 c
# 8 d
# 9 <NA>
# 10 <NA>
Solution I would like:
df
col1
a
b
a
b
a
b
c
d
c
d
ave(df$col1,
with(rle(!is.na(df$col1)), rep(cumsum(values), lengths)),
FUN = function(x){
rep(x[!is.na(x)], length.out = length(x))
})
# [1] a b a b a b c d c d
Here's way with dplyr. You can drop the group column if needed. -
df %>%
group_by(group = cumsum(is.na(lag(col1)) & !is.na(col1))) %>%
mutate(
col1 = rep(col1[!is.na(col1)], length.out = n())
) %>%
ungroup()
# A tibble: 10 x 2
col1 group
<chr> <int>
1 a 1
2 b 1
3 a 1
4 b 1
5 a 1
6 b 1
7 c 2
8 d 2
9 c 2
10 d 2
In Rstudio, I have a dataframe which contains 4 columns and I need to get the list of every different triplet of the 3 first columns sorted decreasingly by the sum on the 4th column. For example, with:
A B C 2
D E F 5
A B C 4
G H I 5
D E F 3
I need as a result:
D E F 8
A B C 6
G H I 5
I've tried the following different approach but I can't manage to have exactly the result I need:
df_list<-df_raw_data %>%
group_by(param1, param2, param3) %>%
summarise_all(total = sum(param4))
arrange(df_list, desc(total))
and:
df_list<-unique(df_raw_data[, c('param1', 'param2', 'param3')])
cbind(df_list, total)
for(i in 1:nrow(df_raw_data))
{
filter ???????????
}
I would prefer to use the dplyr package since it's a more elegant solution.
EDIT: Okay, thanks for your working answers. I think that I've lost some time figuring out that the plyr package shouldn't be loaded after dplyr...
We can use group_by_at to select the columns to group.
library(dplyr)
dat2 <- dat %>%
group_by_at(vars(-V4)) %>%
summarise(V4 = sum(V4)) %>%
ungroup()
dat2
# # A tibble: 3 x 4
# V1 V2 V3 V4
# <chr> <chr> <chr> <int>
# 1 A B C 6
# 2 D E F 8
# 3 G H I 5
Or use group_by_if to select columns to group based on column types.
dat2 <- dat %>%
group_by_if(is.character) %>%
summarise(V4 = sum(V4)) %>%
ungroup()
dat2
# # A tibble: 3 x 4
# V1 V2 V3 V4
# <chr> <chr> <chr> <int>
# 1 A B C 6
# 2 D E F 8
# 3 G H I 5
DATA
dat <- read.table(text = "A B C 2
D E F 5
A B C 4
G H I 5
D E F 3",
header = FALSE, stringsAsFactors = FALSE)
Would this be what you are looking for?
df <- data_frame(var1 = c("A", "D", "A", "G", "D"),
var2 = c("B", "E", "B", "H", "E"),
var3 = c("C", "F", "C", "I", "F"),
var4 = c(2, 5, 4, 5, 3))
df %>% group_by(var1, var2, var3) %>%
summarise(sum = sum(var4)) %>%
arrange(desc(sum))
I have a data frame containing (in random places) a character value (say "foo") that I want to replace with a NA.
What's the best way to do so across the whole data frame?
This:
df[df == "foo"] <- NA
One way to nip this in the bud is to convert that character to NA when you read the data in in the first place.
df <- read.csv("file.csv", na.strings = c("foo", "bar"))
Using dplyr::na_if, you can replace specific values with NA. In this case, that would be "foo".
library(dplyr)
set.seed(1234)
df <- data.frame(
id = 1:6,
x = sample(c("a", "b", "foo"), 6, replace = T),
y = sample(c("c", "d", "foo"), 6, replace = T),
z = sample(c("e", "f", "foo"), 6, replace = T),
stringsAsFactors = F
)
df
#> id x y z
#> 1 1 a c e
#> 2 2 b c foo
#> 3 3 b d e
#> 4 4 b d foo
#> 5 5 foo foo e
#> 6 6 b d e
na_if(df$x, "foo")
#> [1] "a" "b" "b" "b" NA "b"
If you need to do this for multiple columns, you can pass "foo" through from mutate with across (updated for dplyr v1.0.0+).
df %>%
mutate(across(c(x, y, z), na_if, "foo"))
#> id x y z
#> 1 1 a c e
#> 2 2 b c <NA>
#> 3 3 b d e
#> 4 4 b d <NA>
#> 5 5 <NA> <NA> e
#> 6 6 b d e
Another option is is.na<-:
is.na(df) <- df == "foo"
Note that its use may seem a bit counter-intuitive, but it actually assigns NA values to df at the index on the right hand side.
This could be done with dplyr::mutate_all() and replace:
library(dplyr)
df <- data_frame(a = c('foo', 2, 3), b = c(1, 'foo', 3), c = c(1,2,'foobar'), d = c(1, 2, 3))
> df
# A tibble: 3 x 4
a b c d
<chr> <chr> <chr> <dbl>
1 foo 1 1 1
2 2 foo 2 2
3 3 3 foobar 3
df <- mutate_all(df, funs(replace(., .=='foo', NA)))
> df
# A tibble: 3 x 4
a b c d
<chr> <chr> <chr> <dbl>
1 <NA> 1 1 1
2 2 <NA> 2 2
3 3 3 foobar 3
Another dplyr option is:
df <- na_if(df, 'foo')
Assuming you do not know the column names or have large number of columns to select, is.character() might be of use.
df <- data.frame(
id = 1:6,
x = sample(c("a", "b", "foo"), 6, replace = T),
y = sample(c("c", "d", "foo"), 6, replace = T),
z = sample(c("e", "f", "foo"), 6, replace = T),
stringsAsFactors = F
)
df
# id x y z
# 1 1 b d e
# 2 2 a foo foo
# 3 3 a d foo
# 4 4 b foo foo
# 5 5 foo foo e
# 6 6 foo foo f
df %>%
mutate_if(is.character, list(~na_if(., "foo")))
# id x y z
# 1 1 b d e
# 2 2 a <NA> <NA>
# 3 3 a d <NA>
# 4 4 b <NA> <NA>
# 5 5 <NA> <NA> e
# 6 6 <NA> <NA> f
One alternate way to solve is below:
for (i in 1:ncol(DF)){
DF[which(DF[,i]==""),columnIndex]<-"ALL"
FinalData[which(is.na(FinalData[,columnIndex])),columnIndex]<-"ALL"
}