Keep only non-duplicated rows (and remove all others) [duplicate] - r

This question already has answers here:
How can I remove all duplicates so that NONE are left in a data frame?
(3 answers)
Finding ALL duplicate rows, including "elements with smaller subscripts"
(9 answers)
Closed 5 years ago.
I know this question has been asked in all sorts of variants, but I could not
extract the solution to my specific problem. Given a data frame like this:
a <- c(rep("A", 3), rep("B", 3), rep("C",2))
b <- c(1,1,2,4,1,1,2,2)
df <-data.frame(a,b)
This results in:
a b
1 A 1
2 A 1
3 A 2
4 B 4
5 B 1
6 B 1
7 C 2
8 C 2
I want to only keep row number 3 (A 2) and 4 (B 4).
I have tried all combinations of unique(), duplicated() and !duplicated() or
distinct, but could not get the desired result, since there seems to be no
combination of logical TRUE and FALSE that only filters out the non-duplicated rows. Thanks in advance!

Related

Is there a way in R to make all possible combinations between rows of different columns? [duplicate]

This question already has answers here:
Unique combination of all elements from two (or more) vectors
(6 answers)
Generate list of all possible combinations of elements of vector
(10 answers)
Closed 2 years ago.
I have a df with one column and I would like to make combinations with the values of this column in order to have a new df with two columns, like he simple example below: (Obs: my df has ~5000 rows)
df
CG
1
2
3
##I would like a result similar to this:
> head(df1)
C1 C2
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
3 3
Does someone could help me?
Thank you in advance

unique string count in a sequence [duplicate]

This question already has answers here:
transitions in a sequence
(2 answers)
Closed 2 years ago.
I am trying to get the unique counts of the strings in a sequence.
For example,
A<- c('CCE-CRE-DEE-DEE', 'FOE-FOE-GOE-GOE-GOE-ISE', 'ISE-PCE', 'ISE')
library('stringr')
B<- str_count(A, "-")
df<- data.frame(A, B)
I am expecting output as follows:
C here is the total diversity, or different states in the sequence, any thoughts or suggestions? I looked around in SO but couldn't find a reasonable solution.
df$C
4
3
2
1
I would do this using unique:
df$res <- sapply(str_split(A,"-"),function(x) length(unique(x)))
df
A B res
1 CCE-CRE-DEE-DEE 3 3
2 FOE-FOE-GOE-GOE-GOE-ISE 5 3
3 ISE-PCE 1 2
4 ISE 0 1
I supose that what you expect is actually 3 for CCE-CRE-DEE-DEE.

Is there a good way to compare 2 data tables but compare the data from i to data of i+1 in second data table [duplicate]

This question already has answers here:
Remove duplicated rows
(10 answers)
Closed 2 years ago.
I have tried various functions including compare and all.equal but I am having difficulty finding a test to see if variables are the same.
For context, I have a data.frame which in some cases has a duplicate result. I have tried copying the data.frame so I can compare it with itself. I would like to remove the duplicates.
One approach I considered was to look at row A from dataframe 1 and subtract it from row B from dataframe 2. If they equal to zero, I planned to remove one of them.
Is there an approach I can use to do this without copying my data?
Any help would be great, I'm new to R coding.
Suppose I had a data.frame named data:
data
Col1 Col2
A 1 3
B 2 7
C 2 7
D 2 8
E 4 9
F 5 12
I can use the duplicated function to identify duplicated rows and not select them:
data[!duplicated(data),]
Col1 Col2
A 1 3
B 2 7
D 2 8
E 4 9
F 5 12
I can also perform the same action on a single column:
data[!duplicated(data$Col1),]
Col1 Col2
A 1 3
B 2 7
E 4 9
F 5 12
Sample Data
data <- data.frame(Col1 = c(1,2,2,2,4,5), Col2 = c(3,7,7,8,9,12))
rownames(data) <- LETTERS[1:6]

Combining multiple columns in one R [duplicate]

This question already has answers here:
Flatting a dataframe with all values of a column into one
(3 answers)
Closed 5 years ago.
How can I combine multiple all dataframe's columns in just 1 column? , in an efficient way... I mean not using the column names to do it, using dplyr or tidyr on R, cause I have too much columns (10.000+)
For example, converting this data frame
> Multiple_dataframe
a b c
1 4 7
2 5 8
3 6 9
to
> Uni_dataframe
d
1
2
3
4
5
6
7
8
9
I looked around Stack Overflow but without success.
We can use unlist
Uni_dataframe <- data.frame(d = unlist( Multiple_dataframe, use.names = FALSE))
Or using dplyr/tidyr (as the question is specific about it)
library(tidyverse)
Uni_dataframe <- gather(Multiple_dataframe, key, d) %>%
select(-key)

Transform a column into variables in R [duplicate]

This question already has answers here:
Aggregating by unique identifier and concatenating related values into a string [duplicate]
(4 answers)
Closed 5 years ago.
My current dataset :
order product
1 a
1 b
1 c
2 b
2 d
3 a
3 c
3 e
what I want
product order
a 1,3
b 1,2
c 1,3
d 2
e 3
I have tried cast, reshape, but they didn't work
I recently spent way too much time trying to do something similar. What you need here, I believe, is a list-column. The code below will do that, but it turns the order number into a character value.
library(tidyverse)
df <- tibble(order=c(1,1,1,2,2,3,3,3), product=c('a','b','c','b','d','a','c','e')) %>%
group_by(product) %>%
summarise(order=toString(.$order)) %>%
mutate(order=str_split(order, ', ')

Resources