Remove duplicated rows in dataframe in R [duplicate] - r

This question already has answers here:
Delete duplicate rows in two columns simultaneously [duplicate]
(2 answers)
Closed 3 years ago.
I have a following problem:
My dataframe has a lot of columns. I would like to remove rows that have same values in column X, Y and Z.
See my dataframe:
A B C X Y Z
1 2 3 4 5 6
2 5 4 4 5 6
In the dataframe above I would like to delete the first row, because X, Y and Z are the same in both rows.
I tried this, but it returned me something different:
newtable <- df[!duplicated(df$X, df$Z, df$Z), ]
Thanks a lot!

According to ?duplicated, the usage is
duplicated(x, incomparables = FALSE, ...)
where
x- a vector or a data frame or an array or NULL.
i.e. it cannot take more than one argument for 'x'. An option is to subset the dataset columns and apply as x
df[!duplicated(df[c("X", "Y", "Z")]), ]

Related

Finding the maximum value for each row and extract column names [duplicate]

This question already has answers here:
R Create column which holds column name of maximum value for each row
(4 answers)
Closed 1 year ago.
Say we have the following matrix,
x <- matrix(1:9, nrow = 3, dimnames = list(c("X","Y","Z"), c("A","B","C")))
What I'm trying to do is:
1- Find the maximum value of each row. For this part, I'm doing the following,
df <- apply(X=x, MARGIN=1, FUN=max)
2- Then, I want to extract the column names of the maximum values and put them next to the values. Following the reproducible example, it would be "C" for the three rows.
Any assistance would be wonderful.
You can use apply like
maxColumnNames <- apply(x,1,function(row) colnames(x)[which.max(row)])
Since you have a numeric matrix, you can't add the names as an extra column (it would become converted to a character-matrix).
You can choose a data.frame and do
resDf <- cbind(data.frame(x),data.frame(maxColumnNames = maxColumnNames))
resulting in
resDf
A B C maxColumnNames
X 1 4 7 C
Y 2 5 8 C
Z 3 6 9 C

How to add row of a dataframe in r if I have created named list having same name as columns of dataframe? [duplicate]

This question already has answers here:
How to add a row to a data frame in R?
(16 answers)
Closed 2 years ago.
df <-data.frame(x=1:2,y=5:6)
row <- list(x=10,y=20)
add_row(df,row)
Error: New rows can't add columns.
x Can't find column row in .data.
Run rlang::last_error() to see where the error occurred.
but
add_row(df,x=10,y=20)
x y
1 1 5
2 2 6
3 10 20
works. Please help me add named list into df?
Using rbind as suggested by #DanY is an easy solution, to use add_row you can change row to tibble or data.frame :
library(tibble)
df <-data.frame(x=1:2,y=5:6)
row <- tibble(x=10,y=20)
add_row(df, row)
# x y
#1 1 5
#2 2 6
#3 10 20

Delete rows in data frame based on multiple columns from another data frame in R [duplicate]

This question already has answers here:
Find complement of a data frame (anti - join)
(7 answers)
Closed 7 years ago.
I would like to remove rows that have specific values for columns that match values in another data frame.
a<-c(1,1,2,2,2,4,5,5,5,5)
b<-c(10,10,22,30,30,30,40,40,40,40)
c<-c(1,2,1,2,2,2,2,1,1,2)
d<-rnorm(1:10)
data<-data.frame(a,b,c,d)
a<-c(2,5)
b<-c(30,40)
c<-c(2,1)
x<-data.frame(a,b,c)
So that y can become:
a b c d
1 10 1 -0.2509255
1 10 2 0.4142277
2 22 1 -0.1340514
4 30 2 -1.5372009
5 40 2 1.9001932
5 40 2 -1.2825212
I tried the following, which did not work:
y<-data[!data$a==a & !data$b==b & !data$c==c,]
y<-subset(data, !data$a==x$a & !data$b==x$b & !data$c==x$c)
I also tried to just flag the ones that should be removed in order to subset in a second step, but this did not work either:
y<-data
y$rm<-ifelse(y$a==x$a & y$b==x$b & y$c==x$c, 1, 0)
The real "data" and "x" are much longer, and there are variable number of rows in data that match each row in x.
We can use anti_join from dplyr. It will return all rows from 'data' that are not matching values in 'x'. We specify the variables to be considered in the by argument.
library(dplyr)
anti_join(data, x, by=c('a', 'b', 'c'))

filter R data frame with one column - keep data frame format [duplicate]

This question already has an answer here:
Filtering single-column data frames
(1 answer)
Closed 7 years ago.
I am looking for a simple way to display a subset of a one column data frame
Let's assume, I have a a data frame:
> df <- data.frame(a = 1:100)
Now, I only need the first 10 rows. If I subset it by index, I'll get a result vector instead of a data frame:
> df[1:10,]
[1] 1 2 3 4 5 6 7 8 9 10
I tried to use 'subset' but not using the 'subset'-parameter will result in an error (only for one-column-data-frames?):
subset(df[1:10,])
Error in subset.default(df[1:10, ]) :
argument "subset" is missing, with no default
There should be a very easy solution to achive a subset (still a data frame) filtered by row index, no?
I am lookung for a solution with basic R commands (it should not depend on any special library)
you can use drop=FALSE, which prevent from droping the dimensions of the array.
df[1:10, , drop=FALSE]
a
1 1
2 2
3 3
4 4
5 5
...
For subset you need to add a condition.

R, accessing a column vector of a matrix by name [duplicate]

This question already has answers here:
Extract matrix column values by matrix column name
(2 answers)
Closed 7 years ago.
In R I can access the data in a column vector of a column matrix by the following:
mat2[,1]
Each column of mat2 has a name. How can I retrieve the data from the first column by using the name attribute instead of [,1]?
For example suppose my first column had the name "saturn". I want something like
mat2[,1] == mat2[saturn]
The following should do it:
mat2[,'saturn']
For example:
> x <- matrix(1:21, nrow=7, ncol=3)
> colnames(x) <- paste('name', 1:3)
> x[,'name 1']
[1] 1 2 3 4 5 6 7
Bonus information (adding to the first answer)
x[,c('name 1','name 2')]
would return two columns just as if you had done
x[,1:2]
And finally, the same operations can be used to subset rows
x[1:2,]
And if rows were named...
x[c('row 1','row 2'),]
Note the position of the comma within the brackets and with respect to the indices.

Resources