How to add a column from one dataframe to another dataframe when two other columns match [duplicate] - r

This question already has answers here:
How to join (merge) data frames (inner, outer, left, right)
(13 answers)
Closed 3 years ago.
I have two datasets, db1 and db2, like the following ones:
db1
+---------+-------+-------+------+------+-----------------+
| Authors| IDs | Title | Year | ISSN | Other columns...|
+---------+-------+-------+------+------+-----------------+
| Abad J.| 16400 | 1 | 2014 |14589 | |
| Ares K.| 70058 | 2 | 2012 |15874 | |
| Anto E.| 71030 | 3 | 2011 |16999 | |
| A Banul| 57196 | 1 | 2011 |21546 | |
| A Berat| 56372 | 2 | 2011 |12554 | |
+---------+-------+-------+------+------+-----------------+
and
db2
+---------+-------+-------+------+------+-------+---------------------------+
| Authors| IDs | Title | Year | ISSN | IF | Other different columns...|
+---------+-------+-------+------+------+-------+---------------------------+
| Abad J.| 16400 | 1 | 2013 |14589 | 2,3 | |
| Ares K.| 70058 | 2 | 2012 |15874 | 3,3 | |
| Anto E.| 71030 | 3 | 2011 |14587 | 1,2 | |
| A Banul| 57196 | 1 | 2011 |21546 | 7,8 | |
| A Berat| 56372 | 2 | 2011 |75846 | 4,5 | |
+---------+-------+-------+------+------+-------+---------------------------+
Basically, what i want is to add to db1 the column IF from db2 when the two columns Year and ISSN have the same values. So what i want to achive is the following output in my example:
db1
+---------+-------+-------+------+------+-------+----------------+
| Authors| IDs | Title | Year | ISSN | IF |Other columns...|
+---------+-------+-------+------+------+-------+----------------+
| Abad J.| 16400 | 1 | 2014 |14589 | NA | |
| Ares K.| 70058 | 2 | 2012 |15874 | 3,3 | |
| Anto E.| 71030 | 3 | 2011 |16999 | NA | |
| A Banul| 57196 | 1 | 2011 |21546 | 7,8 | |
| A Berat| 56372 | 2 | 2011 |12554 | NA | |
+---------+-------+-------+------+------+-------+----------------+
i have tried with merge but, since i have also different columns, i obtain a very big dataset.
What i want is to use the function match but with more than one condition applied at the same time.
Any guess ?

dplyr::left_join(db1, db2 %>% dplyr::select(Year, ISSN, IF))
This should work providing the two dataframes have no other columns in common besides the ones you've shown here.

Related

Is there a way in R to create a column based on order of multiple values in one another column in dataframe? [duplicate]

This question already has answers here:
Aggregating all unique values of each column of data frame
(2 answers)
Collapse / concatenate / aggregate a column to a single comma separated string within each group
(6 answers)
Closed 1 year ago.
I would like to create a column in my R data frame based on the order in which multiple values occur in one column.
For example, my data frame has an id column and an item type column, and the values of the order column is what I would like to add. Is there a way to tell R to look at the order of values in the item column so that it can spit out "ABCD" or "ADCB" (any other order) as the cell value under the 3rd column?
| id | item | order |
| 11 | A | ABCD |
| 11 | A | ABCD |
| 11 | B | ABCD |
| 11 | B | ABCD |
| 11 | C | ABCD |
| 11 | C | ABCD |
| 11 | D | ABCD |
| 11 | D | ABCD |
| 12 | A | ADCB |
| 12 | A | ADCB |
| 12 | D | ADCB |
| 12 | D | ADCB |
| 12 | C | ADCB |
| 12 | C | ADCB |
| 12 | B | ADCB |
| 12 | B | ADCB |
...

Functions by groups in another column in R [duplicate]

This question already has answers here:
Adding a column of means by group to original data [duplicate]
(4 answers)
How to sum a variable by group
(18 answers)
Closed 2 years ago.
I have 2 questions regarding groups in a dataframe in R.
Imagine I have a dataframe (df) like this
| CONT | COUNTRY | GDP | AVG_GDP |
|------|---------|-----|---------|
| AF | EGYPT | 3 | 2 |
| AF | SUDAN | 2 | 2 |
| AF | ZAMBIA | 1 | 2 |
| AM | CANADA | 4 | 5 |
| AM | MEXICO | 2 | 5 |
| AM | USA | 9 | 5 |
| EU | FRANCE | 5 | 4 |
| EU | ITALY | 4 | 4 |
| EU | SPAIN | 3 | 4 |
How can I calculate the average of GDP by continents and then put it in the AVG_GDP column so it looks like in the table above?
The second question is how can I sum the GDP by continents so it looks like this:
| CONT | SUM_GDP |
|------|---------|
| AF | 6 |
| AM | 15 |
| EU | 12 |
For this last question I think that in base R the second column would be obtained with something like df$SUM_GDP <- aggregate(df$GDP, by=list(df$CONT), FUN=sum) but maybe there is another way to make it in a new dataframe.
Thank you in advance

Filter multiple occurrences based on group [duplicate]

This question already has answers here:
dplyr - filter by group size
(7 answers)
Keep only groups of data with multiple observations
(2 answers)
Closed 3 years ago.
I have a dataset like mentioned below:
df=data.frame(Supplier_id=c("1","2","7","7","7","4","5","8","12","7"), Supplier=c("Tian","Yan","Goldy","Goldy","Goldy","Amy","Lauren","Cassy","Shaan","Goldy"),Date=c("1/17/2019","4/30/2019","11/29/2018","11/29/2018","11/29/2018","5/21/2018","5/23/2018","5/24/2018","6/15/2018","6/20/2018"),Buyer=c("Unclassified","Unclassified","Kelly","Kelly","Kelly","Kelly","Amanda","Echo","Shao","Shao"))
df$Supplier_id=as.numeric(as.character(df$Supplier_id))
Thus, df appears like below:
| Supplier_id | Supplier | Date | Buyer |
|-------------|----------|------------|--------------|
| 1 | Tian | 1/17/2019 | Unclassified |
| 2 | Yan | 4/30/2019 | Unclassified |
| 7 | Goldy | 11/29/2018 | Kelly |
| 7 | Goldy | 11/29/2018 | Kelly |
| 7 | Goldy | 11/29/2018 | Kelly |
| 4 | Amy | 5/21/2018 | Kelly |
| 5 | Lauren | 5/23/2018 | Amanda |
| 8 | Cassy | 5/24/2018 | Echo |
| 12 | Shaan | 6/15/2018 | Shao |
| 7 | Goldy | 6/20/2018 | Shao |
Now, I want to filter out the Supplier_id's that occur only once for each unique Buyer. For example, in the above dataset, Supplier_id '1' and '2' belong to 'unclassified' buyer, but because they have different ids, I do not want them in my final output. However, when we look at the buyer 'Kelly', it has two supplier_ids, '7' and '4', where, '7' is occurring 3 times and '4' only once. So, the output table should have the record with supplier_id='7'. The grouping should be based on 'Buyer'. So it is important to note that since the supplier_id '7' exists for both 'Kelly' and 'Shao', but it should be grouped differently for both these buyers and not considered together.
The expected output should be:
| Supplier_id | Supplier | Date | Buyer_id |
|-------------|:--------:|-----------:|----------|
| 7 | Goldy | 11/29/2018 | Kelly |
| 7 | Goldy | 11/29/2018 | Kelly |
| 7 | Goldy | 11/29/2018 | Kelly |
I have tried using group_by and filter but this would not work because there will be distinct supplier_id's for every buyer.I have also tried using duplicate but not sure how can I group the supplier_id for each buyer.
df <-df %>% group_by(Buyer) %>% filter(Supplier_id>1)
and also this
df2=df[duplicated(df[1]) | duplicated(df[1], fromLast=TRUE),]
EDIT: The original dataset has many such instances and there are n occurrences of different supplier_id for each buyer.
What could be other way to get the desired output?
I think you need -
df %>% group_by(Supplier_id, Buyer) %>% filter(n() > 1)

How to subset a dataframe using a column from another dataframe in r?

I have 2 dataframes
Dataframe1:
| Cue | Ass_word | Condition | Freq | Cue_Ass_word |
1 | ACCENDERE | ACCENDINO | A | 1 | ACCENDERE_ACCENDINO
2 | ACCENDERE | ALLETTARE | A | 0 | ACCENDERE_ALLETTARE
3 | ACCENDERE | APRIRE | A | 1 | ACCENDERE_APRIRE
4 | ACCENDERE | ASCENDERE | A | 1 | ACCENDERE_ASCENDERE
5 | ACCENDERE | ATTIVARE | A | 0 | ACCENDERE_ATTIVARE
6 | ACCENDERE | AUTO | A | 0 | ACCENDERE_AUTO
7 | ACCENDERE | ACCENDINO | B | 2 | ACCENDERE_ACCENDINO
8 | ACCENDERE| ALLETTARE | B | 3 | ACCENDERE_ALLETTARE
9 | ACCENDERE| ACCENDINO | C | 2 | ACCENDERE_ACCENDINO
10 | ACCENDERE| ALLETTARE | C | 0 | ACCENDERE_ALLETTARE
Dataframe2:
| Group.1 | x
1 | ACCENDERE_ACCENDINO | 5
13 | ACCENDERE_FUOCO | 22
16 | ACCENDERE_LUCE | 10
24 | ACCENDERE_SIGARETTA | 6
....
I want to exclude from Dataframe1 all the rows that contain words (Cue_Ass_word) that are not reported in the column Group.1 in Dataframe2.
In other words, how can I subset Dataframe1 using the strings reported in Dataframe2$Group.1?
It's not quite clear what you mean, but is this what you need?
Dataframe1[!(Dataframe1$Cue_Ass_word %in% Dataframe2$Group1),]

delete whole row of gridview

how can i delete the whole rows of gridview using code behind c sharp like
+-----+-----+------+
|Col1 | Col2| Col3 |
| | | |
| a | 1 | 5 |
| | | |
| a | 2 | 6 |
| | | |
| a | 3 | 7 |
| | | |
| a | 4 | 8 |
+-----+-----+------+
and I want to delete the whole rows in the gridview so that it becomes
+-----+-----+------+
|Col1 | Col2| Col3 |
| | | |
| a | 4 | 8 |
+-----+-----+------+
only last duplicate row left and all the rest is deleted (rows are deleted from all the columns of gridview)
Anyone knows how to achieve this?
Have a look at GridView.DeleteRow(). MSDN documentation is here: GridView.DeleteRow Method

Resources