Changing class from character to integer but retaining all the datas inside - r

q7 <- dbGetQuery(conn,
"SELECT TailNum AS TailNum, AVG(ontime.DepDelay) AS avg_delay, ontime.Year AS Year, planes.Year AS yearmade
FROM planes JOIN ontime USING(tailnum)
WHERE ontime.Cancelled = 0 AND planes.Year != '' AND planes.Year != 'None' AND ontime.Diverted = 0 AND ontime.DepDelay > 0
GROUP BY TailNum
ORDER BY avg_delay")
Codes that i have tried:
q7 <- data.frame(
yearmade = q7.yearmade, stringsAsFactors = FALSE)
^ Dataframe
Hi! Basically I would like to create a new column where the Year would subtract the yearmade and be placed into a new column, but before I could do that, I found out that the data I draw from another table into this dataframe shows as character(yearmade), is there any way to change it but retain the original data inside?

First use as.numeric() to change yearmade into a numeric variable. Then you can simply compute the difference between Year and yearmade.
I believe this will work for you.
set.seed(1)
Year <- 2000:2022
yearmade <- sample(c('2000', '1999', '1998'), length(Year), replace = TRUE)
TailNum <- sample(c('N3738B', 'N3737C', 'N37342'), length(Year), replace = TRUE)
avg_delay <- 1:length(Year)
q7 <- data.frame(TailNum, avg_delay, Year, yearmade)
# compute difference and add to data frame
q7$year_diff <- q7$Year - as.numeric(q7$yearmade)
This retains the original data, but introduces a new column year_diff.
> str(q7)
'data.frame': 23 obs. of 5 variables:
$ TailNum : chr "N3738B" "N3738B" "N3737C" "N3738B" ...
$ avg_delay: int 1 2 3 4 5 6 7 8 9 10 ...
$ Year : int 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 ...
$ yearmade : chr "2000" "1998" "2000" "1999" ...
$ year_diff: num 0 3 2 4 4 7 8 8 9 11 ...

Related

How to mutate columns in place but keep same column types in R

Mutate in place is working fine as I set multiple dataframe columns blank if another dataframe column is blank. However, the mutated columns' types are changed. How to do this without changing column types?
Starting with data1:
I get data2:
Any ideas how to do this without changing any column types? Perhaps save all column types before the mutate and then set them back after the mutate?
Here's my code to create data1 and mutate to data2:
options(stringsasfactors = FALSE)
col_1_ferment <- c(452,768,856,192,905,752) #numeric type
col_1_crutch <- c('15','34','56','49','28','37') #character type
col_1_grease <- c(TRUE,TRUE,FALSE,FALSE,TRUE,FALSE) #boolean type
col_1_pump <- as.factor(c("3","6","3","2","1","2")) #factor type
indicator_col <- c(2,NA,2,1,1,2) #numeric type
data1 <- data.frame(col_1_ferment, col_1_crutch, col_1_grease, col_1_pump, indicator_col, check.rows = TRUE)
data2 <- data1 %>% mutate(dplyr::across(starts_with("col_1_"), ~ ifelse(is.na(indicator_col), "", .x)))
You can use NA instead of ""
data2 <- data1 %>% mutate(dplyr::across(starts_with("col_1_"), ~ ifelse(is.na(indicator_col), NA, .x)))
str(data2)
'data.frame': 6 obs. of 5 variables:
$ col_1_ferment: num 452 NA 856 192 905 752
$ col_1_crutch : chr "15" NA "56" "49" ...
$ col_1_grease : logi TRUE NA FALSE FALSE TRUE FALSE
$ col_1_pump : int 3 NA 3 2 1 2
$ indicator_col: num 2 NA 2 1 1 2

I have an error in missing values are not allowed in subscripted assignments of data frames

I am new to R and I am constructing R codes for my personal project/exercise. The data I am using is about a survey on ethnic identity of people from Hongkong. I used 2019 data from http://data.hkupop.hku.hk/v3/hkupop/ethnic_identity/ch.html.
After removing NA values and reducing the columns to that of my necessity,
I noticed that the data is highly imbalanced so I tried to use under-sampling, ROSE and SMOTE. (the number had greatly reduced from 1015 observations to 573)
I removed the following column # from the set
df_f <- df[,-c(1,2,5,6,8,9,11,12,14,15,17,18,20,21,25,26,27,29,32,33,34,35,37)]
However, this is not a binary data, thus I had to force the factors in eth_id to combine into 0 = 1&3 (Hong Konger and Hong Kong Chinese) and 1 = 2&4 (Chinese and Chinese Hong Kong citizen)
How I combined the factors
df_p$eth_id <- recode(df_p$eth_id, "c('1', '3')='1+3';c('2', '4') = '2+4'")
library(plyr)
revalue(df_p$eth_id, c('1+3' = 0)) -> df_p$eth_id
revalue(df_p$eth_id, c('2+4' = 1)) -> df_p$eth_id
0 = Hong Kong Citizen + Hong Kong Chinese Citizen
1 = Chinese Citizen + Chinese Hong Kong Citizen
How I renamed the columns
df_f <- df_f %>%
rename(
eth_id = Q001,
HongKonger = Q002A,
Chinese = Q003A,
PRC = Q004A,
CH_race = Q005A,
Asian = Q006A,
global = Q007A,
class1 = mid,
housing1 = type,
housing2 = housingv2,
pi = inclin
)
HOW I PROCESSED MY NAs and unnecessary outliers
For the columns [,2:7], I changed their values to 0 for NAs, For example, df_f$HongKonger <- ifelse(is.na(df_f$HongKonger),0,df_f$HongKonger) so on and so forth.
And for the others, I removed the NAs like this:
df_p <- na.omit(df_p, cols= c("eth_id","sex","agegp","edugp","occgp","class","class2","housing1","housing2","pi"), invert=FALSE)
At this point of my data set, I was left with 14 columns and I renamed them (please refer to above). I uploaded the final structure of my data below which I used for ROSE and SMOTE :-)
Furthermore, I also removed rows that were outliers like:
Remove an unidentifiable ethnic_identity (8881 or level = 5)
df_f <- df_f[!df_f$Q001 == "8881",] table(df_f$Q001)
df_f <- df_f[!df_f$eth_id == "Don't know / hard to say",]
these codes must be carefully written, if you run it before the renaming please use eth_id in place of Q001 and vice-versa.
Now, I kept on getting this error when I run ROSE:
Error in [<-.data.frame(*tmp*, , indY, value = c(1L, 1L, 1L, 1L, 1L, : missing values are not allowed in subscripted assignments of data frames.
This is very misleading because I made sure to remove NA values completely (because all the questions related to this were related to NA issue, which is not applicable to mine) and I even changed all my factor values to numerical.
(Because I thought that the program is not understanding? the factor values.)
I am also getting this error message for SMOTE: Error in names(dn) <- dnn : attempt to set an attribute on NULL. This mak
es me even more confused to the level that I am doubting the data itself being not applicable to machine learning.
Here is the final structure of my data for your reference:
'data.frame': 573 obs. of 14 variables:
$ eth_id : Factor w/ 2 levels "0","1": 2 2 1 2 1 1 1 1 1 1 ...
$ HongKonger: num 9 0 0 0 0 2 0 2 0 8 ...
$ Chinese : num 9 9 1 3 7 0 7 9 0 0 ...
$ PRC : num 8 9 1 3 7 3 1 0 1 0 ...
$ CH_race : num 12 10 0 3 7 3 0 7 3 4 ...
$ Asian : num 0 7 6 0 0 2 2 0 0 6 ...
$ global : num 0 0 0 0 0 3 7 0 10 0 ...
$ sex : num 1 2 2 1 2 1 1 2 1 2 ...
$ agegp : num 6 5 2 2 6 5 2 4 6 1 ...
$ edugp : num 2 3 2 3 1 2 2 2 3 3 ...
$ class1 : num 3 3 3 5 3 3 4 4 4 3 ...
$ housing1 : num 1 1 2 2 1 2 1 2 1 1 ...
$ housing2 : num 3 3 1 4 3 1 2 1 3 3 ...
$ pi : num 3 2 1 2 1 1 1 4 1 1 ...
- attr(*, "na.action")= 'omit' Named int 14 24 46 52 58 67 77 84 94 129 ...
..- attr(*, "names")= chr "25" "44" "82" "90" ...
#How I divided the data into train and test set
set.seed(123)
index <- createDataPartition(df_p$eth_id, p = 0.7, list = FALSE)
train_data <- df_p[index, ]
test_data <- df_p[-index, ]
head(test_data)
str(train_data)
#How I used ROSE for under-sampling
library(ROSE)
ovun.sample(formula = train_data$eth_id ~ ., data = train_data, method="under", N = 250,seed = 123)$data
How I used ROSE for "both"
ovun.sample(formula = train_data$eth_id ~ . , data = train_data, method="both",
na.action=options("na.omit")$na.action,p=0.5,seed = 123)$data
How I used SMOTE
SMOTE(form = train_data$eth_id ~., data = train_data, perc.over = 100, k = 5, perc.under = 200)
I am keep on getting :
1) for ROSE: Error in [<-.data.frame(*tmp*, , indY, value = c(1L, 1L, 1L, 1L, 1L, : missing values are not allowed in subscripted assignments of data frames
2) for SMOTE: Error in names(dn) <- dnn : attempt to set an attribute on NULL
I am also confused changing all the factors into numeric value would make it still valid.
Thank you and thank you for sharing your knowledge ahead.

Making multiple named data frames with loop

In the process of learning. Didn't ask my first question well, so I'm trying again and doing my best to be more clear.
I'm trying to create a series of data frames for a reproducible question for my larger issue. I would like to make 4 data frames, each named differently by the year. Eventually I will merge these four data frames to explain where I am encountering my issue.
Here is the most recent solution. This runs, but instead creates a list of four data frames without any frames in the global directory.
datafrom <- list()
years <- c(2006,2008,2010,2012)
for (i in 1:length(years)) {
UniqueID <- 1:10 # <- Not all numeric - Kept as character vector
Name <- LETTERS[seq( from = 1, to = 10 )]
Entity_Type <- factor("This","That")
Data1 <- rnorm(10)
Data2 <- rnorm(10)
Data3 <- rnorm(10)
Data4 <- rnorm(10)
Year <- years[i]
datafrom[[i]] <- data.frame(UniqueID, Name, Entity_Type, Data1, Data2, Data3, Data4, Year)
}
I would like 4 separate data frames, each named datafrom2006, datafrom2008, etc.
Many thanks in advance for your patience with my learning.
I'll demonstrate a few (of many) techniques here, and I'll call them (1) brute force, (2) list-based, and (3) single long-form data.frame.
I'll add to the example the use of a function that you want to apply to each data.frame. Though contrived, it helps makes the point:
## some constants used throughout
years <- c(2006, 2008, 2010, 2012)
n <- 10
myfunc <- function(x) {
interestingPart <- x[ , grepl('^Data', colnames(x)) ]
sapply(interestingPart, mean)
}
Brute Force
Yes, you can create multiple like-named and same-structure data.frames from a loop, though it is typically frowned upon by many experienced (R?) programmers:
set.seed(42)
for (yr in years) {
tmpdf <- data.frame(UniqueID=as.character(1:n),
Name=LETTERS[1:n],
Entity_Type=factor(c('this', 'that')),
Data1=rnorm(n),
Data2=rnorm(n),
Data3=rnorm(n),
Data4=rnorm(n),
Year=yr)
assign(sprintf('datafrom%s', yr), tmpdf)
}
rm(yr, tmpdf)
ls()
## [1] "datafrom2006" "datafrom2008" "datafrom2010" "datafrom2012" "myfunc"
## [6] "n" "years"
head(datafrom2006, n=2)
## UniqueID Name Entity_Type Data1 Data2 Data3 Data4 Year
## 1 1 A this 1.3709584 1.3048697 -0.3066386 0.4554501 2006
## 2 2 B that -0.5646982 2.2866454 -1.7813084 0.7048373 2006
In order to see the results for each data.frame, one would typically (though not always) do something like this:
myfunc(datafrom2006)
## Data1 Data2 Data3 Data4
## 0.5472968 -0.1634567 -0.1780795 -0.3639041
myfunc(datafrom2008)
## Data1 Data2 Data3 Data4
## -0.02021535 0.01839391 0.53907680 -0.21787537
myfunc(datafrom2010)
## Data1 Data2 Data3 Data4
## 0.25110630 -0.08719458 0.22924781 -0.19857243
myfunc(datafrom2012)
## Data1 Data2 Data3 Data4
## -0.7949660 0.2102418 -0.2022066 -0.2458678
List-Based
set.seed(42)
datafrom <- sapply(as.character(years), function(yr) {
data.frame(UniqueID=as.character(1:n),
Name=LETTERS[1:n],
Entity_Type=factor(c('this', 'that')),
Data1=rnorm(n),
Data2=rnorm(n),
Data3=rnorm(n),
Data4=rnorm(n),
Year=yr)
}, simplify=FALSE)
str(datafrom)
## List of 4
## $ 2006:'data.frame': 10 obs. of 8 variables:
## ..$ UniqueID : Factor w/ 10 levels "1","10","2","3",..: 1 3 4 5 6 7 8 9 10 2
## ..$ Name : Factor w/ 10 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10
## ..$ Entity_Type: Factor w/ 2 levels "that","this": 2 1 2 1 2 1 2 1 2 1
## ..$ Data1 : num [1:10] 1.371 -0.565 0.363 0.633 0.404 ...
## ..$ Data2 : num [1:10] 1.305 2.287 -1.389 -0.279 -0.133 ...
## ..$ Data3 : num [1:10] -0.307 -1.781 -0.172 1.215 1.895 ...
## ..$ Data4 : num [1:10] 0.455 0.705 1.035 -0.609 0.505 ...
## ..$ Year : Factor w/ 1 level "2006": 1 1 1 1 1 1 1 1 1 1
## $ 2008:'data.frame': 10 obs. of 8 variables:
## ..$ UniqueID : Factor w/ 10 levels "1","10","2","3",..: 1 3 4 5 6 7 8 9 10 2
#### ...snip...
head(datafrom[[1]], n=2)
## UniqueID Name Entity_Type Data1 Data2 Data3 Data4 Year
## 1 1 A this 1.3709584 1.3048697 -0.3066386 0.4554501 2006
## 2 2 B that -0.5646982 2.2866454 -1.7813084 0.7048373 2006
head(datafrom[['2008']], n=2)
## UniqueID Name Entity_Type Data1 Data2 Data3 Data4 Year
## 1 1 A this 0.2059986 0.32192527 -0.3672346 -1.04311894 2008
## 2 2 B that -0.3610573 -0.78383894 0.1852306 -0.09018639 2008
However, with this you can test your function performance with just one:
myfunc(datafrom[[1]])
myfunc(datafrom[['2010']])
and then run the function on all of them very simply:
lapply(datafrom, myfunc)
## $`2006`
## Data1 Data2 Data3 Data4
## 0.5472968 -0.1634567 -0.1780795 -0.3639041
## $`2008`
## Data1 Data2 Data3 Data4
## -0.02021535 0.01839391 0.53907680 -0.21787537
## $`2010`
## Data1 Data2 Data3 Data4
## 0.25110630 -0.08719458 0.22924781 -0.19857243
## $`2012`
## Data1 Data2 Data3 Data4
## -0.7949660 0.2102418 -0.2022066 -0.2458678
Long-form Data
If instead you keep all of the data in the same data.frame, using your already-defined column of Year, you can still segment it for exploring individual years:
longdf <- do.call('rbind.data.frame', datafrom)
rownames(longdf) <- NULL
longdf[c(1,11,21,31),]
## UniqueID Name Entity_Type Data1 Data2 Data3 Data4 Year
## 1 1 A this 1.3709584 1.3048697 -0.3066386 0.45545012 2006
## 11 1 A this 0.2059986 0.3219253 -0.3672346 -1.04311894 2008
## 21 1 A this 1.5127070 1.3921164 1.2009654 -0.02509255 2010
## 31 1 A this -1.4936251 0.5676206 -0.0861073 -0.04069848 2012
Simple subsets:
subset(longdf, Year == 2006), though subset has its goods and others.
by(longdf, longdf$Year, myfunc)
If using library(dplyr), try longdf %>% filter(Year == 2010) %>% myfunc()
(Side note: when trying to plot aggregate data, it's often easier when the data is in this form, especially when using ggplot2-like layering and aesthetics.)
Rationale Against "Brute Force"
In answer to your comment question, when making different variables with the same structure, it is easy to deduce that you will be doing the same thing to each of them, in turn or immediately-consecutively. In general programming principle, many try to generalize what they do so that it if it can be done once, it can be done an arbitrary number of times without (heavily) adjusting the code. For instance, compare what was necessary in applying myfunc in the two examples above.
Further, if you later want to aggregate the results from your calls to myfunc, it is more laborious in the "brute force" example (as you must capture each return and combine manually), whereas the other two techniques can use simpler summarizing functions (e.g., another lapply, or perhaps Reduce or Filter).

Load data from a .csv file and then save it in a dictionary in R

I need to load data from a .csv file and then save it in a dictionary in R.
There are ten thousands of lines of data entry that need to be loaded from a .csv file.
The data format:
country,region,value
1 , north , 101
1 , north , 219
2 , south , 308
2 , south , 862
... , ... , ...
My expected results that can be save in a data structure of R :
country , region, list of values
1 north 101 , 219
2 south 308 , 862
So that I can get the values that are associated with the same country and region.
Each row may have different country and region.
I need to save the value with the same country and region together.
Any help would be appreciated.
It's not clear exactly what you are willing to assume about the input data, nor exactly what the desired output is. Perhaps
tmp <- read.csv(text="country,region,value
1 , north , 101
1 , north , 219
2 , south , 308
2 , south , 862")
dups <- duplicated(tmp[1:2])
dat <- data.frame(tmp[!dups, 1:2], value = paste(tmp[!dups, 3], tmp[dups, 3], sep = " , "))
dat
## country region value
## 1 1 north 101 , 219
## 3 2 south 308 , 862
If I were you, I would stick with keeping your data in its "long" form. But if you really want to "aggregate" the data this way, you can look at the aggregate function:
Option 1: Values stored as a list in a column. Fun, but hell to deal with later on.
aggregate(value ~ country + region, tmp, I, simplify=FALSE)
# country region value
# 1 1 north 101, 219
# 2 2 south 308, 862
str(.Last.value)
# 'data.frame': 2 obs. of 3 variables:
# $ country: num 1 2
# $ region : Factor w/ 2 levels " north "," south ": 1 2
# $ value :List of 2
# ..$ 1:Class 'AsIs' int [1:2] 101 219
# ..$ 3:Class 'AsIs' int [1:2] 308 862
Option 2: Values stored as a single comma separated character vector column. Less hell to deal with later on, but would likely require further processing (splitting up again) to be of much use.
aggregate(value ~ country + region, tmp, paste, collapse = ",")
# country region value
# 1 1 north 101,219
# 2 2 south 308,862
str(.Last.value)
# 'data.frame': 2 obs. of 3 variables:
# $ country: num 1 2
# $ region : Factor w/ 2 levels " north "," south ": 1 2
# $ value : chr "101,219" "308,862"

a complex merge in R to flag unmatched observations?

I'm trying to join two datasets together. Call them x and y. I believe that the ID variables in y are a subset of the ID variables in x. But not in the pure sense because I know that x contains more IDs than y but I don't know the mapping. That is, some (but not all) of the IDs in x and y can be matched 1:1.
My ultimate goal is to figure out where this 1:1 mapping fails and flag these observations. I thought merge would be the way to go but maybe not. An example is below:
id <- c(1:10, 1:100)
X1 <- rnorm(110, mean = 0, sd = 1)
year <- c("2004","2005","2006","2001","2002")
year <- rep(year, 22)
month = c("Jul","Aug","Sep","Oct","Nov","Dec","Jan","Feb","Mar","Apr")
month <- rep(month, 11)
#dataset X
x <- cbind(id, X1, month, year)
#dataset Y
id2 <- c(1:10, 200)
Y1 <- rnorm(11, mean = 0 , sd = 1)
y <- cbind(id2,Y1)
#merge on the IDs; but we get an error because when id2 == 200 in y we don't
#have a match in x
result <- merge(x, y, by.x="id", by.y = "id2", all =TRUE)
The merge threw an error because id2 == 200 had no match in the x dataset. Unfortunately, I lost the ID and all the information as well! (it should equal 200 in row 111):
tail(result)
id X1 month year Y1
106 95 -0.0748386054887876 Nov 2002 NA
107 96 0.196765325477989 Dec 2004 NA
108 97 0.527922135906927 Jan 2005 NA
109 98 0.197927230533413 Feb 2006 NA
110 99 -0.00720474886698309 Mar 2001 NA
111 <NA> <NA> <NA> <NA> -0.9664941
What's more, I get duplicate observations on the ID variable in the merged file. The id2 == 1 observation only existed once but it just copied it twice (e.g. Y1 takes on the value 1.55 twice).
head(result)
id X1 month year Y1
1 1 -0.67371266313441 Jul 2004 1.553220
2 1 -0.318666983469993 Jul 2004 1.553220
3 10 -0.608192898092431 Apr 2002 1.234325
4 10 -0.72299929212347 Apr 2002 1.234325
5 100 -0.842111221826554 Apr 2002 NA
6 11 -0.16316681842082 Jul 2004 NA
This merge has made things more complicated than I intended. I was hoping I could examine every observation in x and figure out where the id matched id2 in y and flag the ones that didn't. So I would get a new vector, call it flag, that takes on a value 1 if x$id had a match in y$id2 and zero otherwise. This way, I could know where the 1:1 mapping failed. I could potentially get some traction on this by re-coding the NAs, but what about the error that gets thrown when id2 == 200? It just discards the information.
I have tried appending by rows with no luck and it looks like I should give up merge as well, perhaps it's better to wring a loop or function to do something along these lines:
for every observation in x
id2 = which(id2) corresponds to id-month-year
flag = 1 if length of above is == 1, 0 otherwise
etc.
Hopefully this all makes sense. I'd be very grateful for any help or guidance.
If you are looking for which things in x$id are in y$id2, then you can use
x$id %in% y$id2
to get a logical vector returning matches. It does not guarantee a 1-to-1 correspondence, however; just a 1-to-many. You can then add this vector to your data frame
x$match.y <- x$id %in% y$id2
to see what rows of x have a corresponding ID in y.
To see which observations are 1-to-1, you could do something like
y$id2[duplicated(y$id2)] #vector of duplicate elements in y$id2
(x$id %in% y$id2) & !(x$id %in% y$id2[duplicated(y$id2)])
to filter out elements that appear more than once in y$id2. You can also add this to x:
x$match.y.unique <- (x$id %in% y$id2) & !(x$id %in% y$id2[duplicated(y$id2)])
The same procedure can be done for y to determine what rows of y match in x, and which ones match uniquely.
The reason your merge failed was that you gave it two different structures (one a numeric matrix and the other a character matrix) for x and y. Using cbind when data.frame should be chosen is a common strategy for failure.
> str(x)
chr [1:110, 1:4] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "1" "2" ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:4] "id" "X1" "month" "year"
> str(y)
num [1:11, 1:2] 1 2 3 4 5 6 7 8 9 10 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:2] "id2" "Y1"
If you used the data.frame function (since dataframes are what merge is supposed to be working with) it would have succeeded:
> x <- data.frame(id, X1, month, year); y <- data.frame(id2,Y1)
> str( result <- merge(x, y, by.x="id", by.y = "id2", all =TRUE) )
'data.frame': 111 obs. of 5 variables:
$ id : num 1 1 2 2 3 3 4 4 5 5 ...
$ X1 : num 1.5063 2.5035 0.7889 -0.4907 -0.0446 ...
$ month: Factor w/ 10 levels "Apr","Aug","Dec",..: 6 6 2 2 10 10 9 9 8 8 ...
$ year : Factor w/ 5 levels "2001","2002",..: 3 3 4 4 5 5 1 1 2 2 ...
$ Y1 : num 1.449 1.449 -0.134 -0.134 -0.828 ...
> tail( result <- merge(x, y, by.x="id", by.y = "id2", all =TRUE) )
id X1 month year Y1
106 96 -0.3869157 Dec 2004 NA
107 97 0.6373009 Jan 2005 NA
108 98 -0.7735626 Feb 2006 NA
109 99 -1.3537915 Mar 2001 NA
110 100 0.2626190 Apr 2002 NA
111 200 NA <NA> <NA> -1.509818
If you have duplicates in your 'x' argument, then you should get duplicates in the result. It's then your responsibility to use !duplicated in whatever manner you deem appropriate (either before or after the merge), but you cannot expect merge to be making decisions like that for you.

Resources