How to group a data.frame each integer number of rows? [duplicate] - r

This question already has answers here:
Calculate the mean of every 13 rows in data frame
(4 answers)
Closed 1 year ago.
This seems to me a very simple question but I don't manage to come up with an efficient idea.
I have a data frame in R so composed:
column position generated as seq(from = 1, to = nrow(df), by = 1)
column value, with some values associated with the position
I want to group the dataframe each k rows (k being an integer input) and then calculate the mean of each group.
The dplyr function group_by does not allow me to group for a specific integer number of rows.
How can I do that? Is there a way to avoid creating the column position at all?

Here is one option with gl from base R. Specify the n and k values. The n would be the total number of rows in the dataset
library(dplyr)
k1 <- 5
df1 %>%
group_by(grp = as.integer(gl(n(), k = k1, n()))) %>%
summarise(value = mean(value))

Related

Count the number of observations in the data frame in R [duplicate]

This question already has answers here:
Count number of unique levels of a variable
(7 answers)
Count number of distinct values in a vector
(6 answers)
Closed 2 months ago.
I want to know the way of counting the number of observations using R.
For example, let's say I have a data df as follows:
df <- data.frame(id = c(1,1,1,2,2,2,2,3,3,5,5,5,9,9))
Even though the biggest number of id is 9, there are only 5 numbers: 1,2,3,5,and 9. So there are only 5 numbers in id. I want to count how many numbers exist in id like this.
In base R:
length(unique(df$id))
[1] 5
Here, unique filters only distinct values and length then counts the number of values in the vector
In dplyr:
df %>%
summarise(n = length(unique(id)))
Alternatively:
nrow(distinct(df))
Here, distinct subsets the whole dataframe (not just the column id!) to unique rows before nrow counts the number of remaining rows
Here another two options:
df <- data.frame(id = c(1,1,1,2,2,2,2,3,3,5,5,5,9,9))
sum(!duplicated(df$id))
#> [1] 5
library(dplyr)
n_distinct(df$id)
#> [1] 5
Created on 2022-07-09 by the reprex package (v2.0.1)

Conditional sum of a column according to the value of another column when grouping [duplicate]

This question already has answers here:
How to sum a variable by group
(18 answers)
Closed 1 year ago.
I'm trying to sum PONDERA when ESTADO==1 and then group by AGLOMERADO
new <- recorte %>% group_by(AGLOMERADO) %>%
summarise(TOTocupied=sum(recorte[recorte$ESTADO==1,"PONDERA"]))
The sum is working correctly, but I can't get the result to be grouped by AGLOMERADO, it gives me back the same result for each AGLOMERADO:
AGLOMERADO TOTocupied
1 100
2 100
3 100
What am I doing wrong?
Don't use $ in dplyr pipe. Also no need to refer to the dataframe again since we are using pipes.
You can try -
library(dplyr)
new <- recorte %>%
group_by(AGLOMERADO)%>%
summarise(TOTocupied = sum(PONDERA[ESTADO==1], na.rm = TRUE))

How to extract unique values from a data frame in r [duplicate]

This question already has answers here:
list unique values for each column in a data frame
(2 answers)
Closed 2 years ago.
I would like to extract the unique values from this data frame as an example
test <- data.frame(position=c("chr1_13529", "chr1_13529", "chr1_13538"),
genomic_regions=c("gene", "intergenic", "intergenic"))
The resulting data frame should give me only
chr1_13538 intergenic
Basically I want to extract rows that have a unique position
Here is a tidyverse/dplyr solution.
You are just grouping by position, counting occurances, and selecting those that only have 1 occurance.
library(tidyverse)
test %>%
group_by(position) %>%
mutate(count = n()) %>%
filter(count == 1) %>%
select(-count)
Here is a base R approach:
There are two parts:
We create a list of positions that occur at least twice using duplicated
We look for positions that are not in the list of duplicated positions
Then we subset test on condition 2.
test[!test$position %in% test$position[duplicated(test$position)],]
# position genomic_regions
#3 chr1_13538 intergenic

Deleting duplicate rows based on logical operation in R [duplicate]

This question already has answers here:
Extract row corresponding to minimum value of a variable by group
(9 answers)
Select the row with the maximum value in each group
(19 answers)
Closed 3 years ago.
I have data like this:
ID SHape Length
180139746001000 2
180139746001000 1
I want to delete the duplicate rows whichever has the less shape length.
Can anyone help me with this?
with
df <- data.table(matrix(c(102:106,106:104,1:3,1:3,5:6),nrow = 8))
colnames(df) <- c("ID","Shape Length")
just use duplicated after sorting
setkey(df,"V2")
df[!duplicated(V1, fromLast = TRUE)]
You can select the highest shape length for each ID by performing
df %>%
group_by(ID) %>%
arrange(SHape.Length) %>%
slice(1) %>%
ungroup()

Sorting Column in R [duplicate]

This question already has answers here:
Calculate the mean by group
(9 answers)
Closed 3 years ago.
I have data that includes a treatment group, which is indicated by a 1, and a control group, which is indicated by a 0. This is all contained under the variable treat_invite. How can I separate these and take the mean of pct_missing for the 1's and 0's? I've attached an image for clarification.
enter image description here
assuming your data frame is called df:
df <- df %>% group_by(treat_invite) %>% mutate(MeanPCTMissing = mean(PCT_missing))
Or, if you want to just have the summary table (rather than the original table with an additional column):
df <- df %>% group_by(treat_invite) %>% summarise(MeanPCTMissing =
mean(PCT_missing))

Resources