This question already has answers here:
list unique values for each column in a data frame
(2 answers)
Closed 2 years ago.
I would like to extract the unique values from this data frame as an example
test <- data.frame(position=c("chr1_13529", "chr1_13529", "chr1_13538"),
genomic_regions=c("gene", "intergenic", "intergenic"))
The resulting data frame should give me only
chr1_13538 intergenic
Basically I want to extract rows that have a unique position
Here is a tidyverse/dplyr solution.
You are just grouping by position, counting occurances, and selecting those that only have 1 occurance.
library(tidyverse)
test %>%
group_by(position) %>%
mutate(count = n()) %>%
filter(count == 1) %>%
select(-count)
Here is a base R approach:
There are two parts:
We create a list of positions that occur at least twice using duplicated
We look for positions that are not in the list of duplicated positions
Then we subset test on condition 2.
test[!test$position %in% test$position[duplicated(test$position)],]
# position genomic_regions
#3 chr1_13538 intergenic
This question already has answers here:
Select the row with the maximum value in each group
(19 answers)
Closed 2 years ago.
My data has continent, country and dif(number).I am now trying to find the maximum of dif by each continent, and following is my code. How can I get the countries name at the same time?
dt_dif %>%
group_by(continent)%>%
summarize(max_dif = max(dif))
We can use slice to return the row of the dataset with the max value of 'dif'
library(dplyr)
dt_dif %>%
group_by(continent) %>%
slice(which.max(dif))
Or using filter
dt_dif %>%
group_by(continent) %>%
filter(dif == max(dif))
This question already has answers here:
Select the row with the maximum value in each group
(19 answers)
Closed 3 years ago.
with data like below, have data for hours of each day for each area,loc pair. Need to find out the rows for each area,loc for which value of a is maximum.
day,hour,area,loc,a,b,c
20181231,ar01,loc01,00,99,11.3,18.2
20181231,ar01,loc01,22,96,12.3,15.2
20190101,ar01,loc01,00,98,10.9,22.5
20190101,ar01,loc01,23,97,10.9,22.1
20181231,ar02,loc01,00,93,11.3,18.2
20181231,ar02,loc01,22,96,12.3,15.2
20190101,ar02,loc01,00,97,10.9,22.5
20190101,ar02,loc01,23,97.2,10.9,22.1
expected output
day,hour,area,loc,a,b,c
20181231,ar01,loc01,00,99,11.3,18.2
20190101,ar01,loc01,00,98,10.9,22.5
20181231,ar02,loc01,22,96,12.3,15.2
20190101,ar02,loc01,23,97.2,10.9,22.1
I could do an aggregation using dplyr, like df %>% group_by(day, area, loc) - how do I get the result rows from here ?
You can try:
library(dplyr)
df %>%
group_by(day, area, loc) %>%
filter(., a == max(a))
This question already has answers here:
Calculate the mean by group
(9 answers)
Closed 3 years ago.
I have data that includes a treatment group, which is indicated by a 1, and a control group, which is indicated by a 0. This is all contained under the variable treat_invite. How can I separate these and take the mean of pct_missing for the 1's and 0's? I've attached an image for clarification.
enter image description here
assuming your data frame is called df:
df <- df %>% group_by(treat_invite) %>% mutate(MeanPCTMissing = mean(PCT_missing))
Or, if you want to just have the summary table (rather than the original table with an additional column):
df <- df %>% group_by(treat_invite) %>% summarise(MeanPCTMissing =
mean(PCT_missing))
This question already has answers here:
Select rows with min value by group
(10 answers)
Subset data based on Minimum Value
(2 answers)
Closed 4 years ago.
I would like to find the minimum SkinTemp value and the corresponding Time when it occurs for each id.
df<-data.frame(Time=seq(65),
SkinTemp=rnorm(65,37,0.5),
id=rep(1:10,c(5,4,10,6,7,8,9,8,4,4)))
I have successfully found the minimum value for each group but can't quite work out how to find the corresponding Time:
a<-aggregate(data=df,SkinTemp~id, min)
or
df %>% group_by(id) %>% summarise(minSkinTemp = min(SkinTemp))
I'm missing something like which.min, but I haven't found any examples of this being used with aggregate. Any thoughts?
We can slice with which.min to get the row that have the minimum value of 'SkinTemp' after grouping by 'id'
library(dplyr)
df %>%
group_by(id) %>%
slice(which.min(SkinTemp))