Is there a way in R to create a column based on order of multiple values in one another column in dataframe? [duplicate] - r

This question already has answers here:
Aggregating all unique values of each column of data frame
(2 answers)
Collapse / concatenate / aggregate a column to a single comma separated string within each group
(6 answers)
Closed 1 year ago.
I would like to create a column in my R data frame based on the order in which multiple values occur in one column.
For example, my data frame has an id column and an item type column, and the values of the order column is what I would like to add. Is there a way to tell R to look at the order of values in the item column so that it can spit out "ABCD" or "ADCB" (any other order) as the cell value under the 3rd column?
| id | item | order |
| 11 | A | ABCD |
| 11 | A | ABCD |
| 11 | B | ABCD |
| 11 | B | ABCD |
| 11 | C | ABCD |
| 11 | C | ABCD |
| 11 | D | ABCD |
| 11 | D | ABCD |
| 12 | A | ADCB |
| 12 | A | ADCB |
| 12 | D | ADCB |
| 12 | D | ADCB |
| 12 | C | ADCB |
| 12 | C | ADCB |
| 12 | B | ADCB |
| 12 | B | ADCB |
...

Related

Functions by groups in another column in R [duplicate]

This question already has answers here:
Adding a column of means by group to original data [duplicate]
(4 answers)
How to sum a variable by group
(18 answers)
Closed 2 years ago.
I have 2 questions regarding groups in a dataframe in R.
Imagine I have a dataframe (df) like this
| CONT | COUNTRY | GDP | AVG_GDP |
|------|---------|-----|---------|
| AF | EGYPT | 3 | 2 |
| AF | SUDAN | 2 | 2 |
| AF | ZAMBIA | 1 | 2 |
| AM | CANADA | 4 | 5 |
| AM | MEXICO | 2 | 5 |
| AM | USA | 9 | 5 |
| EU | FRANCE | 5 | 4 |
| EU | ITALY | 4 | 4 |
| EU | SPAIN | 3 | 4 |
How can I calculate the average of GDP by continents and then put it in the AVG_GDP column so it looks like in the table above?
The second question is how can I sum the GDP by continents so it looks like this:
| CONT | SUM_GDP |
|------|---------|
| AF | 6 |
| AM | 15 |
| EU | 12 |
For this last question I think that in base R the second column would be obtained with something like df$SUM_GDP <- aggregate(df$GDP, by=list(df$CONT), FUN=sum) but maybe there is another way to make it in a new dataframe.
Thank you in advance

How to add a column from one dataframe to another dataframe when two other columns match [duplicate]

This question already has answers here:
How to join (merge) data frames (inner, outer, left, right)
(13 answers)
Closed 3 years ago.
I have two datasets, db1 and db2, like the following ones:
db1
+---------+-------+-------+------+------+-----------------+
| Authors| IDs | Title | Year | ISSN | Other columns...|
+---------+-------+-------+------+------+-----------------+
| Abad J.| 16400 | 1 | 2014 |14589 | |
| Ares K.| 70058 | 2 | 2012 |15874 | |
| Anto E.| 71030 | 3 | 2011 |16999 | |
| A Banul| 57196 | 1 | 2011 |21546 | |
| A Berat| 56372 | 2 | 2011 |12554 | |
+---------+-------+-------+------+------+-----------------+
and
db2
+---------+-------+-------+------+------+-------+---------------------------+
| Authors| IDs | Title | Year | ISSN | IF | Other different columns...|
+---------+-------+-------+------+------+-------+---------------------------+
| Abad J.| 16400 | 1 | 2013 |14589 | 2,3 | |
| Ares K.| 70058 | 2 | 2012 |15874 | 3,3 | |
| Anto E.| 71030 | 3 | 2011 |14587 | 1,2 | |
| A Banul| 57196 | 1 | 2011 |21546 | 7,8 | |
| A Berat| 56372 | 2 | 2011 |75846 | 4,5 | |
+---------+-------+-------+------+------+-------+---------------------------+
Basically, what i want is to add to db1 the column IF from db2 when the two columns Year and ISSN have the same values. So what i want to achive is the following output in my example:
db1
+---------+-------+-------+------+------+-------+----------------+
| Authors| IDs | Title | Year | ISSN | IF |Other columns...|
+---------+-------+-------+------+------+-------+----------------+
| Abad J.| 16400 | 1 | 2014 |14589 | NA | |
| Ares K.| 70058 | 2 | 2012 |15874 | 3,3 | |
| Anto E.| 71030 | 3 | 2011 |16999 | NA | |
| A Banul| 57196 | 1 | 2011 |21546 | 7,8 | |
| A Berat| 56372 | 2 | 2011 |12554 | NA | |
+---------+-------+-------+------+------+-------+----------------+
i have tried with merge but, since i have also different columns, i obtain a very big dataset.
What i want is to use the function match but with more than one condition applied at the same time.
Any guess ?
dplyr::left_join(db1, db2 %>% dplyr::select(Year, ISSN, IF))
This should work providing the two dataframes have no other columns in common besides the ones you've shown here.

How to get a query result into a key value form in HiveQL

I have tried different things, but none succeeded. I have the following issue, and would be very gratefull if someone could help me.
I get the data from a view as several billions of records, for different measures
A)
| s_c_m1 | s_c_m2 | s_c_m3 | s_c_m4 | s_p_m1 | s_p_m2 | s_p_m3 | s_p_m4 |
|--------+--------+--------+--------+--------+--------+--------+--------|
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
|--------+--------+--------+--------+--------+--------+--------+--------|
Then I need to aggregate it by each measure. And so long so fine. I got this figured out.
B)
| s_c_m1 | s_c_m2 | s_c_m3 | s_c_m4 | s_p_m1 | s_p_m2 | s_p_m3 | s_p_m4 |
|--------+--------+--------+--------+--------+--------+--------+--------|
| 3 | 6 | 9 | 12 | 15 | 18 | 21 | 24 |
|--------+--------+--------+--------+--------+--------+--------+--------|
Then I need to get the data in the following form. I need to turn it into a key-value form.
C)
| measure | c | p |
|---------+----+----|
| m1 | 3 | 15 |
| m2 | 6 | 18 |
| m3 | 9 | 21 |
| m4 | 12 | 24 |
|---------+----+----|
The first 4 columns from B) would form in C) the first column, and the second 4 columns would form another column.
Is there an elegant way, that could be easily maintainable? The perfect solution would be if another measure would be introduced in A) and B), there no modification would be required and it would automatically pick up the difference.
I know how to get this done in SqlServer and Postgres, but here I am missing the expirience.
I think you should use map for this

How to subset a dataframe using a column from another dataframe in r?

I have 2 dataframes
Dataframe1:
| Cue | Ass_word | Condition | Freq | Cue_Ass_word |
1 | ACCENDERE | ACCENDINO | A | 1 | ACCENDERE_ACCENDINO
2 | ACCENDERE | ALLETTARE | A | 0 | ACCENDERE_ALLETTARE
3 | ACCENDERE | APRIRE | A | 1 | ACCENDERE_APRIRE
4 | ACCENDERE | ASCENDERE | A | 1 | ACCENDERE_ASCENDERE
5 | ACCENDERE | ATTIVARE | A | 0 | ACCENDERE_ATTIVARE
6 | ACCENDERE | AUTO | A | 0 | ACCENDERE_AUTO
7 | ACCENDERE | ACCENDINO | B | 2 | ACCENDERE_ACCENDINO
8 | ACCENDERE| ALLETTARE | B | 3 | ACCENDERE_ALLETTARE
9 | ACCENDERE| ACCENDINO | C | 2 | ACCENDERE_ACCENDINO
10 | ACCENDERE| ALLETTARE | C | 0 | ACCENDERE_ALLETTARE
Dataframe2:
| Group.1 | x
1 | ACCENDERE_ACCENDINO | 5
13 | ACCENDERE_FUOCO | 22
16 | ACCENDERE_LUCE | 10
24 | ACCENDERE_SIGARETTA | 6
....
I want to exclude from Dataframe1 all the rows that contain words (Cue_Ass_word) that are not reported in the column Group.1 in Dataframe2.
In other words, how can I subset Dataframe1 using the strings reported in Dataframe2$Group.1?
It's not quite clear what you mean, but is this what you need?
Dataframe1[!(Dataframe1$Cue_Ass_word %in% Dataframe2$Group1),]

R : Create a factor from two variables containing ranks and levels

Everything is in the title, I got from a database many columns, paired two-by-two containing codes and labels for some variables, I want an easy way to create half as many factors, with, for each factor levels/codes matching to the original two variables.
Here is an exemple of original data for two factors
| customer_type | customer_type_name | customer_status | customer_status_name |
|----------------------|----------------------|----------------------|----------------------|
| 1 | A | 2 | Beta |
| 2 | B | 2 | Beta |
| 3 | C | 1 | Alpha |
| 2 | B | 3 | Gamma |
| 1 | A | 4 | Delta |
| 3 | C | 2 | Beta |
i.e. a simpler way (simpler to call in a function for lots of variables) to do from dataframe "accounts"
a<-accounts[,c("customertypecode","customertypecodename")]
a<-a[!duplicated(a),]
a<-a[order(a$customertypecode),]
accounts$customertypecode<-factor(accounts$customertypecode,labels=a$customertypecodename[!is.na(a$customertypecodename)])

Resources