Filter multiple occurrences based on group [duplicate] - r

This question already has answers here:
dplyr - filter by group size
(7 answers)
Keep only groups of data with multiple observations
(2 answers)
Closed 3 years ago.
I have a dataset like mentioned below:
df=data.frame(Supplier_id=c("1","2","7","7","7","4","5","8","12","7"), Supplier=c("Tian","Yan","Goldy","Goldy","Goldy","Amy","Lauren","Cassy","Shaan","Goldy"),Date=c("1/17/2019","4/30/2019","11/29/2018","11/29/2018","11/29/2018","5/21/2018","5/23/2018","5/24/2018","6/15/2018","6/20/2018"),Buyer=c("Unclassified","Unclassified","Kelly","Kelly","Kelly","Kelly","Amanda","Echo","Shao","Shao"))
df$Supplier_id=as.numeric(as.character(df$Supplier_id))
Thus, df appears like below:
| Supplier_id | Supplier | Date | Buyer |
|-------------|----------|------------|--------------|
| 1 | Tian | 1/17/2019 | Unclassified |
| 2 | Yan | 4/30/2019 | Unclassified |
| 7 | Goldy | 11/29/2018 | Kelly |
| 7 | Goldy | 11/29/2018 | Kelly |
| 7 | Goldy | 11/29/2018 | Kelly |
| 4 | Amy | 5/21/2018 | Kelly |
| 5 | Lauren | 5/23/2018 | Amanda |
| 8 | Cassy | 5/24/2018 | Echo |
| 12 | Shaan | 6/15/2018 | Shao |
| 7 | Goldy | 6/20/2018 | Shao |
Now, I want to filter out the Supplier_id's that occur only once for each unique Buyer. For example, in the above dataset, Supplier_id '1' and '2' belong to 'unclassified' buyer, but because they have different ids, I do not want them in my final output. However, when we look at the buyer 'Kelly', it has two supplier_ids, '7' and '4', where, '7' is occurring 3 times and '4' only once. So, the output table should have the record with supplier_id='7'. The grouping should be based on 'Buyer'. So it is important to note that since the supplier_id '7' exists for both 'Kelly' and 'Shao', but it should be grouped differently for both these buyers and not considered together.
The expected output should be:
| Supplier_id | Supplier | Date | Buyer_id |
|-------------|:--------:|-----------:|----------|
| 7 | Goldy | 11/29/2018 | Kelly |
| 7 | Goldy | 11/29/2018 | Kelly |
| 7 | Goldy | 11/29/2018 | Kelly |
I have tried using group_by and filter but this would not work because there will be distinct supplier_id's for every buyer.I have also tried using duplicate but not sure how can I group the supplier_id for each buyer.
df <-df %>% group_by(Buyer) %>% filter(Supplier_id>1)
and also this
df2=df[duplicated(df[1]) | duplicated(df[1], fromLast=TRUE),]
EDIT: The original dataset has many such instances and there are n occurrences of different supplier_id for each buyer.
What could be other way to get the desired output?

I think you need -
df %>% group_by(Supplier_id, Buyer) %>% filter(n() > 1)

Related

Is there a way in R to create a column based on order of multiple values in one another column in dataframe? [duplicate]

This question already has answers here:
Aggregating all unique values of each column of data frame
(2 answers)
Collapse / concatenate / aggregate a column to a single comma separated string within each group
(6 answers)
Closed 1 year ago.
I would like to create a column in my R data frame based on the order in which multiple values occur in one column.
For example, my data frame has an id column and an item type column, and the values of the order column is what I would like to add. Is there a way to tell R to look at the order of values in the item column so that it can spit out "ABCD" or "ADCB" (any other order) as the cell value under the 3rd column?
| id | item | order |
| 11 | A | ABCD |
| 11 | A | ABCD |
| 11 | B | ABCD |
| 11 | B | ABCD |
| 11 | C | ABCD |
| 11 | C | ABCD |
| 11 | D | ABCD |
| 11 | D | ABCD |
| 12 | A | ADCB |
| 12 | A | ADCB |
| 12 | D | ADCB |
| 12 | D | ADCB |
| 12 | C | ADCB |
| 12 | C | ADCB |
| 12 | B | ADCB |
| 12 | B | ADCB |
...

How to add a column from one dataframe to another dataframe when two other columns match [duplicate]

This question already has answers here:
How to join (merge) data frames (inner, outer, left, right)
(13 answers)
Closed 3 years ago.
I have two datasets, db1 and db2, like the following ones:
db1
+---------+-------+-------+------+------+-----------------+
| Authors| IDs | Title | Year | ISSN | Other columns...|
+---------+-------+-------+------+------+-----------------+
| Abad J.| 16400 | 1 | 2014 |14589 | |
| Ares K.| 70058 | 2 | 2012 |15874 | |
| Anto E.| 71030 | 3 | 2011 |16999 | |
| A Banul| 57196 | 1 | 2011 |21546 | |
| A Berat| 56372 | 2 | 2011 |12554 | |
+---------+-------+-------+------+------+-----------------+
and
db2
+---------+-------+-------+------+------+-------+---------------------------+
| Authors| IDs | Title | Year | ISSN | IF | Other different columns...|
+---------+-------+-------+------+------+-------+---------------------------+
| Abad J.| 16400 | 1 | 2013 |14589 | 2,3 | |
| Ares K.| 70058 | 2 | 2012 |15874 | 3,3 | |
| Anto E.| 71030 | 3 | 2011 |14587 | 1,2 | |
| A Banul| 57196 | 1 | 2011 |21546 | 7,8 | |
| A Berat| 56372 | 2 | 2011 |75846 | 4,5 | |
+---------+-------+-------+------+------+-------+---------------------------+
Basically, what i want is to add to db1 the column IF from db2 when the two columns Year and ISSN have the same values. So what i want to achive is the following output in my example:
db1
+---------+-------+-------+------+------+-------+----------------+
| Authors| IDs | Title | Year | ISSN | IF |Other columns...|
+---------+-------+-------+------+------+-------+----------------+
| Abad J.| 16400 | 1 | 2014 |14589 | NA | |
| Ares K.| 70058 | 2 | 2012 |15874 | 3,3 | |
| Anto E.| 71030 | 3 | 2011 |16999 | NA | |
| A Banul| 57196 | 1 | 2011 |21546 | 7,8 | |
| A Berat| 56372 | 2 | 2011 |12554 | NA | |
+---------+-------+-------+------+------+-------+----------------+
i have tried with merge but, since i have also different columns, i obtain a very big dataset.
What i want is to use the function match but with more than one condition applied at the same time.
Any guess ?
dplyr::left_join(db1, db2 %>% dplyr::select(Year, ISSN, IF))
This should work providing the two dataframes have no other columns in common besides the ones you've shown here.

How to select rows based on 3 IF statements?

I have a dataset of patients. In this dataset I have 4 columns ID, PatientID, PhaseCode, EXAMDATE and EXCHANGE.
ID | PatientID | PhaseCode | EXAMDATE | EXCHANGE
--------------------------------------------------------
1 | 7366 | ADNI1 | 21/08/2015 | 1
2 | 7366 | ADNIGO | 21/08/2015 | 3
3 | 7366 | ADNI2 | 21/08/2015 | 2
4 | 7363 | ADNI1 | 21/08/2015 | 1
5 | 7363 | ADNI1 | 21/08/2015 | 1
6 | 7366 | ADNI1 | 21/08/2015 | 4
7 | 7366 | ADNIGO | 21/08/2015 | 5
8 | 7366 | ADNIGO | 21/08/2015 | 0
9 | 7366 | ADNI2 | 21/08/2015 | 1
There are 3 types of Phases (ADNI1,ADNIGO,ADNI2) in which data was recorded. As you might have noticed that a patient my have the same phase name repeated more than once or maybe only have record for one phase.
I need help with selecting patients that have records all of the phases. For example if the patient don't have record for ADNI2 then I would like to remove it. The condition is something like: If patient 7366 has record where phasecode is equal to ADNI1, ADNIGO and ADNI2 then include in the dataset.
Please kindly help.
We can use a little tidyr and dplyr. First we complete all combinations of PhaseCode/PatientID, then we group_by PatientID, then we remove those Patients which have any NA from the completion:
library(tidyr)
library(dplyr)
dat %>% complete(PhaseCode, PatientID) %>%
group_by(PatientID) %>%
filter(!any(is.na(ID)))
subset(d, as.character(PatientID) %in%
names(which(tapply(PhaseCode, PatientID, function(x) length(unique(x)))==3)))

select sql table rows as columns for survey application

I am developing a survey application, a very simple one that has two tables.
table_survey_answers
+------------+------------+----------------+
| customerid | questionID | answer |
+------------+------------+----------------+
| 1 | 100 | Good |
| 1 | 101 | Acceptable |
| 1 | 102 | Excellent |
| 2 | 100 | Not acceptable |
| 2 | 101 | Acceptable |
| 2 | 102 | Good |
+------------+------------+----------------+
table_questions
+------------+-----------------------------------+
| QuestionID | Question |
+------------+-----------------------------------+
| 100 | Kindly rate our customer service? |
| 101 | How fast is our product delivery? |
| 102 | Quality of the Product A? |
+------------+-----------------------------------+
Now I want display survey result as follow in asp.net gridview.
+------------+-----------------------------------+-----------------------------------+---------------------------+
| CustomerID | Kindly rate our customer service? | How fast is our product delivery? | Quality of the Product A? |
+------------+-----------------------------------+-----------------------------------+---------------------------+
| 1 | Good | Acceptable | Excellent |
| 2 | Not Acceptable | acceptable | Good |
+------------+-----------------------------------+-----------------------------------+---------------------------+
I already created tables to get survey responses. Only thing I want export the result in gridview as explained above format.
Use Pivot which will transpose your rows to columns
SELECT *
FROM (SELECT customerid,
answer,
Question
FROM table_questions a
JOIN table_survey_answers b
ON a.QuestionID = b.questionID) a
PIVOT (Max(answer)
FOR Question IN([Kindly rate our customer service?],
[How fast is our product delivery?],
[Quality of the Product A?])) piv
SQL FIDDLE DEMO

Select single row per unique field value with SQL Developer

I have thousands of rows of data, a segment of which looks like:
+-------------+-----------+-------+
| Customer ID | Company | Sales |
+-------------+-----------+-------+
| 45678293 | Sears | 45 |
| 01928573 | Walmart | 6 |
| 29385068 | Fortinoes | 2 |
| 49582015 | Walmart | 1 |
| 49582015 | Joe's | 1 |
| 19285740 | Target | 56 |
| 39506783 | Target | 4 |
| 39506783 | H&M | 4 |
+-------------+-----------+-------+
In every case that a customer ID occurs more than once, the value in 'Sales' is also the same but the value in 'Company' is different (this is true throughout the entire table). I need for each value in 'Customer ID to only appear once, so I need a single row for each customer ID.
In other words, I'd like for the above table to look like:
+-------------+-----------+-------+
| Customer ID | Company | Sales |
+-------------+-----------+-------+
| 45678293 | Sears | 45 |
| 01928573 | Walmart | 6 |
| 29385068 | Fortinoes | 2 |
| 49582015 | Walmart | 1 |
| 19285740 | Target | 56 |
| 39506783 | Target | 4 |
+-------------+-----------+-------+
If anyone knows how I can go about doing this, I'd much appreciate some help.
Thanks!
Well it would have been helpful, if you have put your sql generate that data.
but it might go something like;
SELECT customer_id, Max(Company) as company, Count(sales.*) From Customers <your joins and where clause> GROUP BY customer_id
Assumes; there are many company and picks out the most number of occurance and the sales data to be in a different table.
Hope this helps.

Resources