Deleting duplicate rows based on logical operation in R [duplicate]

Deleting duplicate rows based on logical operation in R [duplicate] - r

This question already has answers here:
Extract row corresponding to minimum value of a variable by group
(9 answers)
Select the row with the maximum value in each group
(19 answers)
Closed 3 years ago.
I have data like this:
ID SHape Length
180139746001000 2
180139746001000 1
I want to delete the duplicate rows whichever has the less shape length.
Can anyone help me with this?

with
df <- data.table(matrix(c(102:106,106:104,1:3,1:3,5:6),nrow = 8))
colnames(df) <- c("ID","Shape Length")
just use duplicated after sorting
setkey(df,"V2")
df[!duplicated(V1, fromLast = TRUE)]

You can select the highest shape length for each ID by performing
df %>%
group_by(ID) %>%
arrange(SHape.Length) %>%
slice(1) %>%
ungroup()

Related

How to extract unique values from a data frame in r [duplicate]

This question already has answers here:
list unique values for each column in a data frame
(2 answers)
Closed 2 years ago.
I would like to extract the unique values from this data frame as an example
test <- data.frame(position=c("chr1_13529", "chr1_13529", "chr1_13538"),
genomic_regions=c("gene", "intergenic", "intergenic"))
The resulting data frame should give me only
chr1_13538 intergenic
Basically I want to extract rows that have a unique position

Here is a tidyverse/dplyr solution.
You are just grouping by position, counting occurances, and selecting those that only have 1 occurance.
library(tidyverse)
test %>%
group_by(position) %>%
mutate(count = n()) %>%
filter(count == 1) %>%
select(-count)

Here is a base R approach:
There are two parts:
We create a list of positions that occur at least twice using duplicated
We look for positions that are not in the list of duplicated positions
Then we subset test on condition 2.
test[!test$position %in% test$position[duplicated(test$position)],]
# position genomic_regions
#3 chr1_13538 intergenic

How to output whole information when using group_by and summarize [duplicate]

This question already has answers here:
Select the row with the maximum value in each group
(19 answers)
Closed 2 years ago.
My data has continent, country and dif(number).I am now trying to find the maximum of dif by each continent, and following is my code. How can I get the countries name at the same time?
dt_dif %>%
group_by(continent)%>%
summarize(max_dif = max(dif))

We can use slice to return the row of the dataset with the max value of 'dif'
library(dplyr)
dt_dif %>%
group_by(continent) %>%
slice(which.max(dif))
Or using filter
dt_dif %>%
group_by(continent) %>%
filter(dif == max(dif))

R - find rows corresponding to maximum value of a column among mutliple rows [duplicate]

This question already has answers here:
Select the row with the maximum value in each group
(19 answers)
Closed 3 years ago.
with data like below, have data for hours of each day for each area,loc pair. Need to find out the rows for each area,loc for which value of a is maximum.
day,hour,area,loc,a,b,c
20181231,ar01,loc01,00,99,11.3,18.2
20181231,ar01,loc01,22,96,12.3,15.2
20190101,ar01,loc01,00,98,10.9,22.5
20190101,ar01,loc01,23,97,10.9,22.1
20181231,ar02,loc01,00,93,11.3,18.2
20181231,ar02,loc01,22,96,12.3,15.2
20190101,ar02,loc01,00,97,10.9,22.5
20190101,ar02,loc01,23,97.2,10.9,22.1
expected output
day,hour,area,loc,a,b,c
20181231,ar01,loc01,00,99,11.3,18.2
20190101,ar01,loc01,00,98,10.9,22.5
20181231,ar02,loc01,22,96,12.3,15.2
20190101,ar02,loc01,23,97.2,10.9,22.1
I could do an aggregation using dplyr, like df %>% group_by(day, area, loc) - how do I get the result rows from here ?

You can try:
library(dplyr)
df %>%
group_by(day, area, loc) %>%
filter(., a == max(a))

Sorting Column in R [duplicate]

This question already has answers here:
Calculate the mean by group
(9 answers)
Closed 3 years ago.
I have data that includes a treatment group, which is indicated by a 1, and a control group, which is indicated by a 0. This is all contained under the variable treat_invite. How can I separate these and take the mean of pct_missing for the 1's and 0's? I've attached an image for clarification.
enter image description here

assuming your data frame is called df:
df <- df %>% group_by(treat_invite) %>% mutate(MeanPCTMissing = mean(PCT_missing))
Or, if you want to just have the summary table (rather than the original table with an additional column):
df <- df %>% group_by(treat_invite) %>% summarise(MeanPCTMissing =
mean(PCT_missing))

Find minimum y value and its corresponding x value for each id [duplicate]

This question already has answers here:
Select rows with min value by group
(10 answers)
Subset data based on Minimum Value
(2 answers)
Closed 4 years ago.
I would like to find the minimum SkinTemp value and the corresponding Time when it occurs for each id.
df<-data.frame(Time=seq(65),
SkinTemp=rnorm(65,37,0.5),
id=rep(1:10,c(5,4,10,6,7,8,9,8,4,4)))
I have successfully found the minimum value for each group but can't quite work out how to find the corresponding Time:
a<-aggregate(data=df,SkinTemp~id, min)
or
df %>% group_by(id) %>% summarise(minSkinTemp = min(SkinTemp))
I'm missing something like which.min, but I haven't found any examples of this being used with aggregate. Any thoughts?

We can slice with which.min to get the row that have the minimum value of 'SkinTemp' after grouping by 'id'
library(dplyr)
df %>%
group_by(id) %>%
slice(which.min(SkinTemp))

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Deleting duplicate rows based on logical operation in R [duplicate] - r

with df <- data.table(matrix(c(102:106,106:104,1:3,1:3,5:6),nrow = 8)) colnames(df) <- c("ID","Shape Length") just use duplicated after sorting setkey(df,"V2") df[!duplicated(V1, fromLast = TRUE)]

You can select the highest shape length for each ID by performing df %>% group_by(ID) %>% arrange(SHape.Length) %>% slice(1) %>% ungroup()

Related

How to extract unique values from a data frame in r [duplicate]

How to output whole information when using group_by and summarize [duplicate]

R - find rows corresponding to maximum value of a column among mutliple rows [duplicate]

Sorting Column in R [duplicate]

Find minimum y value and its corresponding x value for each id [duplicate]

Categories

Resources