number similar/duplicated rows in R [duplicate] - r

This question already has answers here:
How to create a consecutive group number
(13 answers)
How to convert three columns into single one
(2 answers)
Assign unique ID per multiple columns of data table
(2 answers)
Closed 4 years ago.
Hi I'm using R and I have a data like this:
1 2 3 4 5
1 2 1 2 2
3 4 1 2 3
1 2 3 4 5
3 4 1 2 3
I want to number the identical lines together with the same number, for the above ex
1 2 3 4 5 --> 1
1 2 1 2 2 --> 2
3 4 1 2 3 --> 3
1 2 3 4 5 --> 1
3 4 1 2 3 --> 3
Does any know how to do this in R (for both numeric case and character case)?
Your help is really appreciated!

This is your data:
df <- data.frame(a=c(1,1,3,1,3),
b=c(2,2,4,2,4),
c=c(3,1,1,3,1),
d=c(4,2,2,4,2),
e=c(5,2,3,5,3))
Approach 1:
You would need the data.table package to perform the below approach:
library(data.table)
i <- interaction(data.table(df), drop=TRUE)
df.out <- cbind(df, id=factor(i,labels=length(unique(i)):1))
This would give you the following:
# a b c d e id
#1 1 2 3 4 5 1
#2 1 2 1 2 2 3
#3 3 4 1 2 3 2
#4 1 2 3 4 5 1
#5 3 4 1 2 3 2
Approach 2:
Another approach is by using the plyr package, as follows:
library(plyr)
.id <- 0
df.out <- ddply(df, colnames(df), transform, id=(.id<<-.id+1))
This will give you the following output:
# a b c d e id
#1 1 2 1 2 2 1
#2 1 2 3 4 5 2
#3 1 2 3 4 5 2
#4 3 4 1 2 3 3
#5 3 4 1 2 3 3
Hope it helps.

Related

Generate data frame with parameters [duplicate]

This question already has answers here:
Fill missing dates by group
(3 answers)
Fastest way to add rows for missing time steps?
(4 answers)
Closed 3 years ago.
I have a data frame of ids with number column
df <- read.table(text="
id nr
1 1
2 1
1 2
3 1
1 3
", header=TRUE)
I´d like to create new dataframe from it, where each id will have unique nr from df dataframe. As you may notice, id 3 have only nr 1, but no 2 and 3. So result should be.
result <- read.table(text="
id nr
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
3 3
", header=TRUE)
You can use expand.grid as:
library(dplyr)
result <- expand.grid(id = unique(df$id), nr = unique(df$nr)) %>%
arrange(id)
result
id nr
1 1 1
2 1 2
3 1 3
4 2 1
5 2 2
6 2 3
7 3 1
8 3 2
9 3 3
We can do:
tidyr::expand(df,id,nr)
# A tibble: 9 x 2
id nr
<int> <int>
1 1 1
2 1 2
3 1 3
4 2 1
5 2 2
6 2 3
7 3 1
8 3 2
9 3 3

Data merge with data.table for repeating unique values

I am trying two merge two columns in data table 'A' with another column in another data table 'B' which is the unique value of a column . I want to merge in such a way that for every unique combination of two variables in data table 'A' , we get all unique values of column in data table 'B' repeated.
I tried merge but it doesn't give me all the values.I also tried the automated recycling function in data.table but this also doesn't give me the result.
Input:
data.table A
X Y
1 1
1 2
1 3
2 1
3 1
4 4
4 5
5 6
data.table B
Z
1
2
Expected output
X Y Z
1 1 1
1 1 2
1 2 1
1 2 2
1 3 1
1 3 2
2 1 1
2 1 2
3 1 1
3 1 2
4 4 1
4 4 2
4 5 1
4 5 2
5 6 1
5 6 2
We can make use of crossing from tidyr
library(tidyr)
crossing(A, B)
# X Y Z
#1 1 1 1
#2 1 1 2
#3 1 2 1
#4 1 2 2
#5 1 3 1
#6 1 3 2
#7 2 1 1
#8 2 1 2
#9 3 1 1
#10 3 1 2
#11 4 4 1
#12 4 4 2
#13 4 5 1
#14 4 5 2
#15 5 6 1
#16 5 6 2
Or with merge from base R, but the order will be slightly different
merge(A, B)
To get the correct order, replace the arguments in reverse and then order the columns
merge(B, A)[c(names(A), names(B))]

Find minimal value for a multiple same keys in table [duplicate]

This question already has answers here:
Extract row corresponding to minimum value of a variable by group
(9 answers)
Closed 5 years ago.
I have a table which contains multiple rows of the different data for a key of multiple columns.
Table looks like this:
A B C
1 1 1 2
2 1 1 3
3 2 1 4
4 1 2 4
5 2 2 3
6 2 3 1
7 2 3 2
8 2 3 2
I also discovered how to remove all of the duplicate elements using unique command for multiple colums, so the data duplication is not a problem.
I would like to know how to for every key(columns A and B in example) in the table to find only the minimum value in third column(C column in table)
At the end table should look like this
A B C
1 1 1 2
3 2 1 4
4 1 2 4
5 2 2 3
6 2 3 1
Thanks for any help. It is really appreciated
In any question, feel free to ask
con <- textConnection(" A B C
1 1 1 2
2 1 1 3
3 2 1 4
4 1 2 4
5 2 2 3
6 2 3 1
7 2 3 2
8 2 3 2")
df <- read.table(con, header = T)
df[with(df, order(A, B, C)), ]
df[!duplicated(df[1:2]),]
# A B C
# 1 1 1 2
# 3 2 1 4
# 4 1 2 4
# 5 2 2 3
# 6 2 3 1

r repeat sequence number sequence while keeping the order of the sequence

I want repeat a sequence for specific length:
Sequence is 1:4 and I want to repeat the sequence till number of rows in a data frame.
Lets say length of the data frame is 24
I tried following:
test <- rep(1:4, each=24/4)
1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4
Lengthwise this is fine but i want to retain the sequence
1 2 3 4 1 2 3 4 1 2 3 4.....
You need to use times instead of each
rep(1:4, times=24/4)
[1] 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
We can just pass it without any argument and it takes the times by default
rep(1:4, 24/4)
#[1] 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4

anyone knows how to get the counts of every element of a column in it self [duplicate]

This question already has answers here:
Adding counts of a factor to a dataframe [duplicate]
(4 answers)
Closed 7 years ago.
for example, this is my data
mydata
v
1 1
2 1
3 2
4 2
5 2
6 3
is there any function that can generate a vector or column like
v counts
1 1 2
2 1 2
3 2 3
4 2 3
5 2 3
6 3 1
I tried the method of sum(),but failed
mydata$counts <- sum(mydata$v == mydata$v)
Another base R option with ave:
within(mydata, counts <- ave(v, v, FUN=length))
library(dplyr)
mutate(group_by(mydata,v),count=(length(v)))
Using base R:
mydata$counts <- with(mydata, table(v)[as.character(v)])
Using ddply
library(plyr)
ddply(mydata, .(v), mutate, counts = length(v))
# v counts
#1 1 2
#2 1 2
#3 2 3
#4 2 3
#5 2 3
#6 3 1
Or lapply
do.call(rbind, lapply(split(mydata, mydata$v),
function(x){ x$counts = length(x$v); x}))
# v counts
#1.1 1 2
#1.2 1 2
#2.3 2 3
#2.4 2 3
#2.5 2 3
#3 3 1
> mydata
v
1 1
2 1
3 2
4 2
5 2
6 3
I would use something like this:
as.data.frame(table(a))
a Freq
1 1 2
2 2 3
3 3 1

Resources