Generate data frame with parameters [duplicate] - r

This question already has answers here:
Fill missing dates by group
(3 answers)
Fastest way to add rows for missing time steps?
(4 answers)
Closed 3 years ago.
I have a data frame of ids with number column
df <- read.table(text="
id nr
1 1
2 1
1 2
3 1
1 3
", header=TRUE)
I´d like to create new dataframe from it, where each id will have unique nr from df dataframe. As you may notice, id 3 have only nr 1, but no 2 and 3. So result should be.
result <- read.table(text="
id nr
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
3 3
", header=TRUE)

You can use expand.grid as:
library(dplyr)
result <- expand.grid(id = unique(df$id), nr = unique(df$nr)) %>%
arrange(id)
result
id nr
1 1 1
2 1 2
3 1 3
4 2 1
5 2 2
6 2 3
7 3 1
8 3 2
9 3 3

We can do:
tidyr::expand(df,id,nr)
# A tibble: 9 x 2
id nr
<int> <int>
1 1 1
2 1 2
3 1 3
4 2 1
5 2 2
6 2 3
7 3 1
8 3 2
9 3 3

Related

How to keep only first value in every sequence of duplicated values in R [duplicate]

This question already has answers here:
Select first row in each contiguous run by group
(4 answers)
Closed 5 months ago.
I am trying to create a subset where I keep the first value in each sequence of numbers in a column. I tried to use:
df %>% group_by(x) %>% slice_head(n = 1)
But it only works for the first instance of each sequence.
An example data where x column contains the repeated sequence can be seen below:
x = c(2,2,2,3,3,3,1,1,1,5,5,5,2,2,2,1,1,1,3,3,3)
y = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)
df= data.frame(x,y)
> df
x y
1 2 1
2 2 1
3 2 1
4 3 1
5 3 1
6 3 1
7 1 1
8 1 1
9 1 1
10 5 1
11 5 1
12 5 1
13 2 1
14 2 1
15 2 1
16 1 1
17 1 1
18 1 1
19 3 1
20 3 1
21 3 1
So the end result that I would like to achive is:
x = c(2,3,1,5,2,1,3)
y = c(1,1,1,1,1,1,1)
df= data.frame(x,y)
> df
x y
1 2 1
2 3 1
3 1 1
4 5 1
5 2 1
6 1 1
7 3 1
Could you please help or point me to any useful existing topics as I haven't managed to find it?
Thanks
You can try rleid from package data.table
> library(data.table)
> setDT(df)[!duplicated(rleid(x))]
x y
1: 2 1
2: 3 1
3: 1 1
4: 5 1
5: 2 1
6: 1 1
7: 3 1
Base R.
df[c(1, diff(df$x)) != 0, ]
Or also with helper functions from data.table.
library(data.table)
df[rowid(rleid(df$x)) == 1L, ]
# x y
# 1 2 1
# 4 3 1
# 7 1 1
# 10 5 1
# 13 2 1
# 16 1 1
# 19 3 1
Using rle and match.
df[match(with(rle(df$x), values), df$x), ]
# x y
# 1 2 1
# 4 3 1
# 7 1 1
# 10 5 1
# 1.1 2 1
# 7.1 1 1
# 4.1 3 1

Remove groups with only one individual in R [duplicate]

This question already has an answer here:
Select groups with more than one distinct value per group [duplicate]
(1 answer)
Closed 1 year ago.
Consider the following dataset. The data is grouped with either one or two people per group. However, an individual may have several entries.
df1<-data.frame(group,individualID,X)
> df1
group individualID X
1 1 1 0
2 1 1 1
3 1 2 1
4 1 2 1
5 2 3 1
6 2 3 1
7 3 5 1
8 3 5 1
9 3 6 1
10 3 6 1
11 4 7 0
12 4 7 1
From the above Group 1 and group 3 have 2 individuals whereas group 2 and group 4 have 1 individual each.
> aggregate(data = df1, individualID ~ group, function(x) length(unique(x)))
group individualID
1 1 2
2 2 1
3 3 2
4 4 1
How can I subset the data to have only groups that have more than 1 individual. i.e. omit groups with 1 individual.
I should end up with only group 1 and group 3.
You could make a lookup table to identify the groups that have more than one unique individualID (similar to what you did with aggregate), then filter df1 based on that:
library(dplyr)
lookup <- df1 %>%
group_by(group) %>%
summarise(count = n_distinct(individualID)) %>%
filter(count > 1)
df1 %>% filter(group %in% unique(lookup$group))
group individualID X
1 1 1 0
2 1 1 1
3 1 2 1
4 1 2 1
5 3 5 1
6 3 5 1
7 3 6 1
8 3 6 1
Or, as #MrGumble suggests above, you could also merge df1 after creating lookup:
merge(df1, lookup)
group individualID X count
1 1 1 0 2
2 1 1 1 2
3 1 2 1 2
4 1 2 1 2
5 3 6 1 2
6 3 6 1 2
7 3 5 1 2
8 3 5 1 2

R: Assign incremental ids based on the groups [duplicate]

This question already has answers here:
Numbering rows within groups in a data frame
(10 answers)
Closed 3 years ago.
I have the following sample data frame:
> test = data.frame(UserId = sample(1:5, 10, replace = T)) %>% arrange(UserId)
> test
UserId
1 1
2 1
3 1
4 1
5 1
6 3
7 4
8 4
9 4
10 5
I now want another column called loginCount for that user, which is something like assigning incremental ids within each group, something like below. Using the mutate like below creates id within each group, but how do I get the incremental ids within each group independent of each other ?
> test %>% mutate(loginCount = group_indices_(test, .dots = "UserId"))
UserId loginCount
1 1 1
2 1 1
3 1 1
4 1 1
5 1 1
6 3 2
7 4 3
8 4 3
9 4 3
10 5 4
I want something like shown below:
UserId loginCount
1 1
1 2
1 3
1 4
1 5
3 1
4 1
4 2
4 3
5 1
You could group and use row_number:
test %>%
arrange(UserId) %>%
group_by(UserId) %>%
mutate(loginCount = row_number()) %>%
ungroup()
# A tibble: 10 x 2
# Groups: UserId [4]
UserId loginCount
<int> <int>
1 1 1
2 1 2
3 1 3
4 1 4
5 1 5
6 3 1
7 4 1
8 4 2
9 4 3
10 5 1
One solution using base R tapply()
test$loginCount <- unlist(tapply(rep(1, nrow(test)), test$UserId, cumsum))
> test
UserId loginCount
1 1 1
2 1 2
3 1 3
4 1 4
5 1 5
6 3 1
7 4 1
8 4 2
9 4 3
10 5 1

number similar/duplicated rows in R [duplicate]

This question already has answers here:
How to create a consecutive group number
(13 answers)
How to convert three columns into single one
(2 answers)
Assign unique ID per multiple columns of data table
(2 answers)
Closed 4 years ago.
Hi I'm using R and I have a data like this:
1 2 3 4 5
1 2 1 2 2
3 4 1 2 3
1 2 3 4 5
3 4 1 2 3
I want to number the identical lines together with the same number, for the above ex
1 2 3 4 5 --> 1
1 2 1 2 2 --> 2
3 4 1 2 3 --> 3
1 2 3 4 5 --> 1
3 4 1 2 3 --> 3
Does any know how to do this in R (for both numeric case and character case)?
Your help is really appreciated!
This is your data:
df <- data.frame(a=c(1,1,3,1,3),
b=c(2,2,4,2,4),
c=c(3,1,1,3,1),
d=c(4,2,2,4,2),
e=c(5,2,3,5,3))
Approach 1:
You would need the data.table package to perform the below approach:
library(data.table)
i <- interaction(data.table(df), drop=TRUE)
df.out <- cbind(df, id=factor(i,labels=length(unique(i)):1))
This would give you the following:
# a b c d e id
#1 1 2 3 4 5 1
#2 1 2 1 2 2 3
#3 3 4 1 2 3 2
#4 1 2 3 4 5 1
#5 3 4 1 2 3 2
Approach 2:
Another approach is by using the plyr package, as follows:
library(plyr)
.id <- 0
df.out <- ddply(df, colnames(df), transform, id=(.id<<-.id+1))
This will give you the following output:
# a b c d e id
#1 1 2 1 2 2 1
#2 1 2 3 4 5 2
#3 1 2 3 4 5 2
#4 3 4 1 2 3 3
#5 3 4 1 2 3 3
Hope it helps.

Find minimal value for a multiple same keys in table [duplicate]

This question already has answers here:
Extract row corresponding to minimum value of a variable by group
(9 answers)
Closed 5 years ago.
I have a table which contains multiple rows of the different data for a key of multiple columns.
Table looks like this:
A B C
1 1 1 2
2 1 1 3
3 2 1 4
4 1 2 4
5 2 2 3
6 2 3 1
7 2 3 2
8 2 3 2
I also discovered how to remove all of the duplicate elements using unique command for multiple colums, so the data duplication is not a problem.
I would like to know how to for every key(columns A and B in example) in the table to find only the minimum value in third column(C column in table)
At the end table should look like this
A B C
1 1 1 2
3 2 1 4
4 1 2 4
5 2 2 3
6 2 3 1
Thanks for any help. It is really appreciated
In any question, feel free to ask
con <- textConnection(" A B C
1 1 1 2
2 1 1 3
3 2 1 4
4 1 2 4
5 2 2 3
6 2 3 1
7 2 3 2
8 2 3 2")
df <- read.table(con, header = T)
df[with(df, order(A, B, C)), ]
df[!duplicated(df[1:2]),]
# A B C
# 1 1 1 2
# 3 2 1 4
# 4 1 2 4
# 5 2 2 3
# 6 2 3 1

Resources