Collapsing a dataframe in R [duplicate] - r

This question already has answers here:
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 6 years ago.
I'm attempting to collapse a dataframe onto itself. The aggregate dataset seems like my best bet but I'm not sure how to have some columns add themselves and others remain the same.
My dataframe looks like this
A 1 3 2
A 2 3 4
B 1 2 4
B 4 2 2
How can I use the aggergate function or the ddply function to create something that looks like this:
A 3 3 6
B 5 2 6

We can use dplyr
library(dplyr)
df1 %>%
group_by(col1) %>%
summarise_each(funs(if(n_distinct(.)==1) .[1] else sum(.)))
Or another option if the column 'col3' is the same would be to keep it in the group_by and then summarise others
df1 %>%
group_by(col1, col3) %>%
summarise_each(funs(sum))
# col1 col3 col2 col4
# <chr> <int> <int> <int>
#1 A 3 3 6
#2 B 2 5 6
Or with aggregate
aggregate(.~col1+col3, df1, FUN = sum)
# col1 col3 col2 col4
#1 B 2 5 6
#2 A 3 3 6

Related

In R I want to add a count variable in dataframe [duplicate]

This question already has answers here:
Numbering rows within groups in a data frame
(10 answers)
Closed 2 years ago.
I have a this dataframe in R:
V1
A
A
C
C
C
I want add a variable like this:
V1 V2
A 1
A 2
C 1
C 2
C 3
Thanks!!
Another dplyr approach:
library(dplyr)
df %>% group_by(V1) %>% mutate(V2 = seq_along(V1))
Does this answer:
> df <- data.frame(V1 = c('A','A','C','C','C'))
> df
V1
1 A
2 A
3 C
4 C
5 C
> df %>% group_by(V1) %>% mutate(v2 = row_number())
# A tibble: 5 x 2
# Groups: V1 [2]
V1 v2
<fct> <int>
1 A 1
2 A 2
3 C 1
4 C 2
5 C 3
>

count number of events grouped by id [duplicate]

This question already has answers here:
Counting unique / distinct values by group in a data frame
(12 answers)
Closed 2 years ago.
DF<-data.frame(id=c(1,1,2,3,3),code=c("A","A","A","E","E"))
> DF
id code
1 1 A
2 1 A
3 2 A
4 3 E
5 3 E
Now I want to count nr id with same code. Desired output:
# A tibble: 2 x 2
code count
1 A 2
2 E 1
I´v been trying:
> DF%>%group_by(code)%>%summarize(count=n())
# A tibble: 2 x 2
code count
<fct> <int>
1 A 3
2 E 2
> DF%>%group_by(code,id)%>%summarize(count=n())
# A tibble: 3 x 3
# Groups: code [2]
code id count
<fct> <dbl> <int>
1 A 1 2
2 A 2 1
3 E 3 2
>
Which doesn´t give me the desired output.
Best H
Being pedantic, I'd rephrase your question as "count the number of distinct IDs per code". With that mindset, the answer becomes clearer.
DF %>%
group_by(code) %>%
summarize(count = n_distinct(id))
An option with data.table would be uniqueN (instead of n_distinct from dplyr) after grouping by 'code' and converting to data.table (setDT)
library(data.table)
setDT(DF)[, .(count = uniqueN(id)), code]
# code count
#1: A 2
#2: E 1
A simple base R solution also works:
#Data
DF<-data.frame(id=c(1,1,2,3,3),code=c("A","A","A","E","E"))
#Classic base R sol
aggregate(id~code,data=DF,FUN = function(x) length(unique(x)))
code id
1 A 2
2 E 1

how to merge rows in a dataframe [duplicate]

This question already has answers here:
Group by multiple columns and sum other multiple columns
(7 answers)
Sum multiple variables by group [duplicate]
(2 answers)
Closed 2 years ago.
I have a df
df1 <- data.frame(col1=c("x","y","y","z","z","z"), col2=c(0,1,0,0,0,1), col3=c(0,0,1,0,0,0), col4=c(1,0,0,0,1,0), col5=c(0,1,0,0,0,0))
I want to have a df like this
df2 <- data.frame(col1=c("x","y","z"), col2=c(0,1,1), col3=c(0,1,0),col4=c(1,0,1), col5=c(0,1,0))
Could anyone help me, please? Thank you
A solution using dplyr. The idea is group_by col1 and calculate sum for all the other columns.
library(dplyr)
df <- df1 %>%
group_by(col1) %>%
summarize_all(~sum(.)) %>%
ungroup()
df
# # A tibble: 3 x 5
# col1 col2 col3 col4 col5
# <chr> <dbl> <dbl> <dbl> <dbl>
# 1 x 0 0 1 0
# 2 y 1 1 0 1
# 3 z 1 0 1 0

Use bind_rows to convert list of vectors to dataframe [duplicate]

This question already has answers here:
Tidyverse approach to binding unnamed list of unnamed vectors by row - do.call(rbind,x) equivalent
(8 answers)
Closed 2 years ago.
I have a list of vectors that I would like to convert into a dataframe.
Code
a <- list( c(1,2,3,4),
c(1,2,3,4),
c(4,5,6,3),
c(6,3,2,6))
With help of this post, I was able to do so in the following manner:
library(tidyverse)
a %>%
reduce(rbind) %>%
as.data.frame()
> a %>% reduce(rbind) %>% as.data.frame()
V1 V2 V3 V4
out 1 2 3 4
elt 1 2 3 4
elt.1 4 5 6 3
elt.2 6 3 2 6
I would like to use purrr's bind_rows() function (a %>% bind_rows), as it seems more convenient. However, this generates an error:
Error: Argument 1 must have names.
Questions
What is happening here?
How can I prevent it from happening ;) ?
One option could be:
map_dfr(a, ~ set_names(.x, paste0("V", seq_along(.x))))
V1 V2 V3 V4
<dbl> <dbl> <dbl> <dbl>
1 1 2 3 4
2 1 2 3 4
3 4 5 6 3
4 6 3 2 6

Invert rows using dplyr [duplicate]

This question already has answers here:
Reorder the rows of data frame in dplyr
(2 answers)
dplyr arrange by reverse alphabetical order [duplicate]
(1 answer)
Closed 3 years ago.
How can I invert the rows of a dataframe/tibble using dplyr? I don't want to arrange it by a certain variable, but rather have it just inverted.
I.e. the tibble
# A tibble: 5 x 2
a b
<int> <chr>
1 1 one
2 2 two
3 3 three
4 4 four
5 5 five
should become
# A tibble: 5 x 2
a b
<int> <chr>
1 5 five
2 4 four
3 3 three
4 2 two
5 1 one
Just arrange() by descending row_number() like this:
my_tibble %>%
dplyr::arrange(-dplyr::row_number())
We can use desc
my_tibble %>%
arrange(desc(row_number()))
Or another option is slice
my_tibble %>%
slice(rev(row_number()))
Or the 'a' column
my_tibble %>%
arrange(desc(a))
# a b
#1 5 five
#2 4 four
#3 3 three
#4 2 two
#5 1 one

Resources