This question already has answers here:
Aggregating by unique identifier and concatenating related values into a string [duplicate]
(4 answers)
Closed 5 years ago.
My current dataset :
order product
1 a
1 b
1 c
2 b
2 d
3 a
3 c
3 e
what I want
product order
a 1,3
b 1,2
c 1,3
d 2
e 3
I have tried cast, reshape, but they didn't work
I recently spent way too much time trying to do something similar. What you need here, I believe, is a list-column. The code below will do that, but it turns the order number into a character value.
library(tidyverse)
df <- tibble(order=c(1,1,1,2,2,3,3,3), product=c('a','b','c','b','d','a','c','e')) %>%
group_by(product) %>%
summarise(order=toString(.$order)) %>%
mutate(order=str_split(order, ', ')
Related
This question already has answers here:
transitions in a sequence
(2 answers)
Closed 2 years ago.
I am trying to get the unique counts of the strings in a sequence.
For example,
A<- c('CCE-CRE-DEE-DEE', 'FOE-FOE-GOE-GOE-GOE-ISE', 'ISE-PCE', 'ISE')
library('stringr')
B<- str_count(A, "-")
df<- data.frame(A, B)
I am expecting output as follows:
C here is the total diversity, or different states in the sequence, any thoughts or suggestions? I looked around in SO but couldn't find a reasonable solution.
df$C
4
3
2
1
I would do this using unique:
df$res <- sapply(str_split(A,"-"),function(x) length(unique(x)))
df
A B res
1 CCE-CRE-DEE-DEE 3 3
2 FOE-FOE-GOE-GOE-GOE-ISE 5 3
3 ISE-PCE 1 2
4 ISE 0 1
I supose that what you expect is actually 3 for CCE-CRE-DEE-DEE.
This question already has answers here:
Find complement of a data frame (anti - join)
(7 answers)
Closed 3 years ago.
I have two datasets, the Ids in the datasets are unordered and there are multiple values which are present in one dataset but not in the other dataset.
What I want at the end is csv file which contains the non-common Ids of both the dataset columns.
Dataset 1
Id Quant
1 a
2 b
3 c
4 d
5 e
6 f
7 g
Dataset 2
Id Quant2
6 d
4 a
5 f
2 e
1 a
3 b
You can use the dplyr package which has a anti_join function for precisely this task:
library(dplyr)
anti_join(dataset1, dataset2, by = "Id")
This will return all rows of dataset1 where there is no matching Id in dataset2. Similarly you can take a look at
anti_join(dataset2, dataset1, by = "Id")
This question already has answers here:
How can I remove all duplicates so that NONE are left in a data frame?
(3 answers)
Finding ALL duplicate rows, including "elements with smaller subscripts"
(9 answers)
Closed 5 years ago.
I know this question has been asked in all sorts of variants, but I could not
extract the solution to my specific problem. Given a data frame like this:
a <- c(rep("A", 3), rep("B", 3), rep("C",2))
b <- c(1,1,2,4,1,1,2,2)
df <-data.frame(a,b)
This results in:
a b
1 A 1
2 A 1
3 A 2
4 B 4
5 B 1
6 B 1
7 C 2
8 C 2
I want to only keep row number 3 (A 2) and 4 (B 4).
I have tried all combinations of unique(), duplicated() and !duplicated() or
distinct, but could not get the desired result, since there seems to be no
combination of logical TRUE and FALSE that only filters out the non-duplicated rows. Thanks in advance!
This question already has answers here:
Flatting a dataframe with all values of a column into one
(3 answers)
Closed 5 years ago.
How can I combine multiple all dataframe's columns in just 1 column? , in an efficient way... I mean not using the column names to do it, using dplyr or tidyr on R, cause I have too much columns (10.000+)
For example, converting this data frame
> Multiple_dataframe
a b c
1 4 7
2 5 8
3 6 9
to
> Uni_dataframe
d
1
2
3
4
5
6
7
8
9
I looked around Stack Overflow but without success.
We can use unlist
Uni_dataframe <- data.frame(d = unlist( Multiple_dataframe, use.names = FALSE))
Or using dplyr/tidyr (as the question is specific about it)
library(tidyverse)
Uni_dataframe <- gather(Multiple_dataframe, key, d) %>%
select(-key)
This question already has answers here:
Frequency counts in R [duplicate]
(2 answers)
Closed 7 years ago.
I'm sure this question has been asked before, but I can't seem to find an answer anywhere, so I apologize if this is a duplicate.
I'm looking for R code that allows me to aggregate a variable in R, but while doing so creates new columns that count instances of levels of a factor.
For example, let's say I have the data below:
Week Var1
1 a
1 b
1 a
1 b
1 b
2 c
2 c
2 a
2 b
2 c
3 b
3 a
3 b
3 a
First, I want to aggregate by week. I'm sure this can be done with group_by in dplyr. I then need to be able to cycle through the code and create a new column each time a new level appears in Var 1. Finally, I need counts of each level of Var1 within each week. Note that I can probably figure out a way to do this manually, but I'm looking for an automated solution as I will have thousands of unique values in Var1. The result would be something like this:
Week a b c
1 2 3 0
2 1 1 3
3 2 2 0
I think from the way you worded your question, you've been looking for the wrong thing/something too complicated. It's a simple data-reshaping problem, and as such can be solved with reshape2:
library(reshape2)
#create wide dataframe (from long)
res <- dcast(Week~Var1, value.var="Var1",
fun.aggregate = length, data=data)
> res
Week a b c
1 1 2 3 0
2 2 1 1 3
3 3 2 2 0