Generate data frame with parameters - r

I have a data frame of ids with number column
df <- read.table(text="
id nr
1 1
2 1
1 2
3 1
1 3
", header=TRUE)
I´d like to create new dataframe from it, where each id will have unique nr from df dataframe. As you may notice, id 3 have only nr 1, but no 2 and 3. So result should be.
result <- read.table(text="
id nr
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
3 3
", header=TRUE)

You can use expand.grid as:
result <- expand.grid(id = unique(df$id), nr = unique(df$nr)) %>%
id nr
1 1 1
2 1 2
3 1 3
4 2 1
5 2 2
6 2 3
7 3 1
8 3 2
9 3 3

We can do:
# A tibble: 9 x 2
id nr
<int> <int>
1 1 1
2 1 2
3 1 3
4 2 1
5 2 2
6 2 3
7 3 1
8 3 2
9 3 3


How to keep only first value in every sequence of duplicated values in R

I am trying to create a subset where I keep the first value in each sequence of numbers in a column. I tried to use:
df %>% group_by(x) %>% slice_head(n = 1)
But it only works for the first instance of each sequence.
An example data where x column contains the repeated sequence can be seen below:
x = c(2,2,2,3,3,3,1,1,1,5,5,5,2,2,2,1,1,1,3,3,3)
y = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)
df= data.frame(x,y)
> df
x y
1 2 1
2 2 1
3 2 1
4 3 1
5 3 1
6 3 1
7 1 1
8 1 1
9 1 1
10 5 1
11 5 1
12 5 1
13 2 1
14 2 1
15 2 1
16 1 1
17 1 1
18 1 1
19 3 1
20 3 1
21 3 1
So the end result that I would like to achive is:
x = c(2,3,1,5,2,1,3)
y = c(1,1,1,1,1,1,1)
df= data.frame(x,y)
> df
x y
1 2 1
2 3 1
3 1 1
4 5 1
5 2 1
6 1 1
7 3 1
Could you please help or point me to any useful existing topics as I haven't managed to find it?
You can try rleid from package data.table
> library(data.table)
> setDT(df)[!duplicated(rleid(x))]
x y
1: 2 1
2: 3 1
3: 1 1
4: 5 1
5: 2 1
6: 1 1
7: 3 1
Base R.
df[c(1, diff(df$x)) != 0, ]
Or also with helper functions from data.table.
df[rowid(rleid(df$x)) == 1L, ]
# x y
# 1 2 1
# 4 3 1
# 7 1 1
# 10 5 1
# 13 2 1
# 16 1 1
# 19 3 1
Using rle and match.
df[match(with(rle(df$x), values), df$x), ]
# x y
# 1 2 1
# 4 3 1
# 7 1 1
# 10 5 1
# 1.1 2 1
# 7.1 1 1
# 4.1 3 1

Remove groups with only one individual in R

Consider the following dataset. The data is grouped with either one or two people per group. However, an individual may have several entries.
> df1
group individualID X
1 1 1 0
2 1 1 1
3 1 2 1
4 1 2 1
5 2 3 1
6 2 3 1
7 3 5 1
8 3 5 1
9 3 6 1
10 3 6 1
11 4 7 0
12 4 7 1
From the above Group 1 and group 3 have 2 individuals whereas group 2 and group 4 have 1 individual each.
> aggregate(data = df1, individualID ~ group, function(x) length(unique(x)))
group individualID
1 1 2
2 2 1
3 3 2
4 4 1
How can I subset the data to have only groups that have more than 1 individual. i.e. omit groups with 1 individual.
I should end up with only group 1 and group 3.
You could make a lookup table to identify the groups that have more than one unique individualID (similar to what you did with aggregate), then filter df1 based on that:
lookup <- df1 %>%
group_by(group) %>%
summarise(count = n_distinct(individualID)) %>%
filter(count > 1)
df1 %>% filter(group %in% unique(lookup$group))
group individualID X
1 1 1 0
2 1 1 1
3 1 2 1
4 1 2 1
5 3 5 1
6 3 5 1
7 3 6 1
8 3 6 1
Or, as #MrGumble suggests above, you could also merge df1 after creating lookup:
merge(df1, lookup)
group individualID X count
1 1 1 0 2
2 1 1 1 2
3 1 2 1 2
4 1 2 1 2
5 3 6 1 2
6 3 6 1 2
7 3 5 1 2
8 3 5 1 2

R: Assign incremental ids based on the groups

I have the following sample data frame:
> test = data.frame(UserId = sample(1:5, 10, replace = T)) %>% arrange(UserId)
> test
1 1
2 1
3 1
4 1
5 1
6 3
7 4
8 4
9 4
10 5
I now want another column called loginCount for that user, which is something like assigning incremental ids within each group, something like below. Using the mutate like below creates id within each group, but how do I get the incremental ids within each group independent of each other ?
> test %>% mutate(loginCount = group_indices_(test, .dots = "UserId"))
UserId loginCount
1 1 1
2 1 1
3 1 1
4 1 1
5 1 1
6 3 2
7 4 3
8 4 3
9 4 3
10 5 4
I want something like shown below:
UserId loginCount
1 1
1 2
1 3
1 4
1 5
3 1
4 1
4 2
4 3
5 1
You could group and use row_number:
test %>%
arrange(UserId) %>%
group_by(UserId) %>%
mutate(loginCount = row_number()) %>%
# A tibble: 10 x 2
# Groups: UserId [4]
UserId loginCount
<int> <int>
1 1 1
2 1 2
3 1 3
4 1 4
5 1 5
6 3 1
7 4 1
8 4 2
9 4 3
10 5 1
One solution using base R tapply()
test$loginCount <- unlist(tapply(rep(1, nrow(test)), test$UserId, cumsum))
> test
UserId loginCount
1 1 1
2 1 2
3 1 3
4 1 4
5 1 5
6 3 1
7 4 1
8 4 2
9 4 3
10 5 1

number similar/duplicated rows in R

Hi I'm using R and I have a data like this:
1 2 3 4 5
1 2 1 2 2
3 4 1 2 3
1 2 3 4 5
3 4 1 2 3
I want to number the identical lines together with the same number, for the above ex
1 2 3 4 5 --> 1
1 2 1 2 2 --> 2
3 4 1 2 3 --> 3
1 2 3 4 5 --> 1
3 4 1 2 3 --> 3
Does any know how to do this in R (for both numeric case and character case)?
Your help is really appreciated!
This is your data:
df <- data.frame(a=c(1,1,3,1,3),
Approach 1:
You would need the data.table package to perform the below approach:
i <- interaction(data.table(df), drop=TRUE)
df.out <- cbind(df, id=factor(i,labels=length(unique(i)):1))
This would give you the following:
# a b c d e id
#1 1 2 3 4 5 1
#2 1 2 1 2 2 3
#3 3 4 1 2 3 2
#4 1 2 3 4 5 1
#5 3 4 1 2 3 2
Approach 2:
Another approach is by using the plyr package, as follows:
.id <- 0
df.out <- ddply(df, colnames(df), transform, id=(.id<<
This will give you the following output:
# a b c d e id
#1 1 2 1 2 2 1
#2 1 2 3 4 5 2
#3 1 2 3 4 5 2
#4 3 4 1 2 3 3
#5 3 4 1 2 3 3
Hope it helps.

Find minimal value for a multiple same keys in table

I have a table which contains multiple rows of the different data for a key of multiple columns.
Table looks like this:
1 1 1 2
2 1 1 3
3 2 1 4
4 1 2 4
5 2 2 3
6 2 3 1
7 2 3 2
8 2 3 2
I also discovered how to remove all of the duplicate elements using unique command for multiple colums, so the data duplication is not a problem.
I would like to know how to for every key(columns A and B in example) in the table to find only the minimum value in third column(C column in table)
At the end table should look like this
1 1 1 2
3 2 1 4
4 1 2 4
5 2 2 3
6 2 3 1
Thanks for any help. It is really appreciated
In any question, feel free to ask
con <- textConnection(" A B C
1 1 1 2
2 1 1 3
3 2 1 4
4 1 2 4
5 2 2 3
6 2 3 1
7 2 3 2
8 2 3 2")
df <- read.table(con, header = T)
df[with(df, order(A, B, C)), ]
# A B C
# 1 1 1 2
# 3 2 1 4
# 4 1 2 4
# 5 2 2 3
# 6 2 3 1
