Remove duplicates from a column R [duplicate] - r

This question already has answers here:
Remove duplicated rows
(10 answers)
Closed 7 years ago.
I have a long column (9500 rows in excel), where I have a lot of gene ids. I want to remove the duplicates.
ID
BXDC2
BXDC5
BXDC5
BZRPL1
BZRPL1
C10orf11
C10orf116
C10orf119
C10orf120
C10orf125
C10orf125
And I want the result to be:
ID
BXDC2
BXDC5
BZRPL1
C10orf11
C10orf116
C10orf119
C10orf120
C10orf125
Can anybody help me with an R script :-)?

You can use duplicated or unique. Here, I am assuming that the column name is 'ID'
df1[!duplicated(df1$ID),,drop=FALSE]
Or
library(data.table)#v1.9.4+
unique(setDT(df1), by='ID')

Related

How to delete rows in r [duplicate]

This question already has answers here:
How do I delete rows in a data frame?
(10 answers)
Closed 1 year ago.
So I've been trying to subset and remove the observations of a country from my data frame (ESS6). I have been able to remove certain variables with this function, -c(variable), but that is not useful since I only want to remove certain rows from the variable countries (cntry).
Thank you for your help :)
Try using dplyr and the "filter" function

How do I subset a dataframe's columns if the data is all the same? [duplicate]

This question already has answers here:
How to remove columns with same value in R
(4 answers)
Closed 2 years ago.
I have a really large dataset and I want to filter out some of the columns because it is the same data all throughout (ex: company name is all "Walmart"). I can go through and do these manually but I'm looking for a code to do it automatically.
I had in mind a function to subset based on if sum(unique(colnam)) == 1 but not sure how to get it to work. Thanks.
which(sapply(dat, function(col) length(unique(col)) == 1))

How do I shorten row names in R? Please read below [duplicate]

This question already has answers here:
R - remove anything after comma from column
(5 answers)
Closed 2 years ago.
I have a table of values in R where the row names are very large. I want to shorten them. My row names look like this:
GSM1051550_7800246087_R02C01
I want to rename every row to only have the first part of the name, i.e., GSM1051550. How can I do this in R?
Building on jay.sf's comment (assuming your table's names is ABC):
row.names(ABC) <- sub("\\_.*", "", row.names(ABC))

How to delete specific columns in R [duplicate]

This question already has an answer here:
How do I delete columns in a dataframe if the name begins with X? [duplicate]
(1 answer)
Closed 3 years ago.
I have a data frame containing 2000 columns. Majority of the columns have "X111, X222 ,X123" and I want remove columns that starts with the name X
df[,-grep("^X",names(df)]
Grep logic looks for words starting (^) with X.

How can I change the names of multiple columns in r using paste0 [duplicate]

This question already has answers here:
Renaming multiple columns with indices in R
(3 answers)
Closed 3 years ago.
I have a dataframe which has 50 columns and I am trying to change the name of half of the columns to include the word "female_" in the title. What code can I use to change the name of multiple columns?
paste is vectorized. So, it can be directly changed with concatenating a string into it and updating the relevant column names
names(df1)[1:25] <- paste0("female_", names(df1)[1:25])
NOTE: Here, we are taking the first 25 column names (as the position is not specified)

Resources