Select duplicate rows by comapring multiple columns in R [duplicate] - r

This question already has answers here:
Find duplicate values in R [duplicate]
(5 answers)
Closed 4 years ago.
I have an issue in selecting duplicate rows in R. A data fame has 14 columns and 1 million rows. I have to do row comparison i.e finding out identical rows, would be duplicate. I want to get the duplicate row by this method. My data frame is like
Data frame sample
Last two rows were identical, so need to mark it as flag value 1.
I don't know how to start with this.
I have tried these codes,
df <- unique(data[,1:97]) //this method gives me unique set not number of duplicates.
dim(data[duplicated(data),])[1] // this method gives me the number of duplicates but not ids.
I need to know the duplicate ids.
my intension is to check each row and written total number of duplicate rows or the line number.

Look into the duplicated() function. It can be used to remove the duplicated rows or inversely keep them as well

Related

Count occurrences of value in a set of variables in R (per column) [duplicate]

This question already has answers here:
Counting the number of elements with the values of x in a vector
(20 answers)
Closed 1 year ago.
I have this data and I want to figure out a way to know how many ones and how many zeros are in each column (ie Arts and Crafts). I have been trying different things but it hasn't been working. Does anyone have any suggestions?
You can use the table() function in R. This creates a categorical representation of your data. Additionally here convert list to vector I have used unlist() function.
df1 <- read.csv("Your_CSV_file_name_here.csv")
table(unlist(df1$ArtsAndCrafts))
If you want to row vice categorize the number of zeros and ones you can refer to this question in Stackoverflow.

Adding new column to R tibble based on values in existing column [duplicate]

This question already has answers here:
How to create a consecutive group number
(13 answers)
Closed 1 year ago.
I have a tibble of participant data from an experiment, and I need to replace a column of identifiable IDs with a new column of anonymous IDs. I have found the ids package which can generate random IDs for me, but I'm not sure how to do this so that they match up with the existing ones.
Specifically, each participant has multiple rows in the tibble (but not always the same number of rows per participant), so I need to insert a column of random IDs such that all of e.g. Bob's rows will get the ID 123, and all of Alice's rows will get the ID 456.
I was assuming it might be best to do this with apply, but I'm just not sure what the function should be so that I don't get a different random ID for every row.
data$randomID <- apply(data, 1, function(x) {random_id(bytes=5)} )
data$randomID <- as.integer(factor(data$ID, levels = unique(data$ID)))

Remove rows with at least 1 value less than a certain limit in R [duplicate]

This question already has answers here:
R keep rows with at least one column greater than value
(3 answers)
Delete rows in R if a cell contains a value larger than x
(1 answer)
Closed 2 years ago.
I have a matrix, and I want to remove all rows that contain at least one element less than a value, 3 on this example. Sample data:
A=matrix(c(10,2,4,8,5,4,8,10,5),byrow=T,ncol=3)
#Remove rows that contain at least one value less than 3
final_matrix=matrix(c(8,5,4,8,10,5), byrow=T,ncol=3)
How to get to the final matrix from the initial matrix A? My real matrix contains thousands of rows tens of columns, this is a toy example. I tried A=A[A>3,] but I get an error "logical subscript too long"

Subsetting number of observations [duplicate]

This question already has answers here:
Remove last N rows in data frame with the arbitrary number of rows
(4 answers)
Closed 2 years ago.
I have a dataset consisting of 250 observations. I want to select all observations expect last. I know I can do this by following codes. But if do not know exact number of observations how I can do this.
dataset(mtcars)
mtcars_lag<-mtcars[1:31,]
## skipping first observation and selecting all
mtcars_forward<-mtcars[2:32,]
Using nrow() gets you the number of observations in the dataset. mtcars_subset <- mtcars[1:(nrow(mtcars)-1), ] will fetch you all observations except the last one.
EDIT: Added parenthesis in line with suggestion from MrFlick.

How do I find how many times a value has been repeated in a column - R [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Count unique values in R
I have just one column full of names, and I need to know how many times each name is on this column.
I can do a
summary(dfUL)
where dfUL is my user list data frame
This will give me a summary with the number of times a particular value is repeated, but it will only do it for the top 6. How can i do that for the entire data frame?
Did you try already table(dfUL)?
Another possibly useful method would be the match() function.
match(x,dfUL$somecol) #Where x is the value in somecol you are looking for
match(max(dfUL$somecol),dfUL) #Returns the row with the maximum value of somecol

Resources