Recode Variable in R after matching with another Data Frame - r

I have 2 dataframes in R,
DF1
|attr1|attr2|attr3|
|5 |4 |9 |
|4 |30 |2 |
|5 |18 |1 |
|3 |1 |7 |
|6 |30 |0 |
|8 |18 |12 |
Now, i'm trying to recode the values present within the attr2 column in this dataframe in a manner such that if the value in attr2 is present within the col1 in DF2 then it should be recoded as 1 otherwise as 0. The second dataframe is simply a count of the top 2 unique values within attr2
DF2
|Var1|Freq|
|30 |2 |
|18 |2 |
I want the result to be in the format of something as follows:
|attr1|attr2|attr3|
|5 |0 |9 |
|4 |1 |2 |
|5 |1 |1 |
|3 |0 |7 |
|6 |1 |0 |
|8 |1 |12 |
Thanks for the help!

We can use
library(dplyr)
DF1 %>%
mutate(attr2 = as.integer(attr2 %in% DF2$Var1))

Related

Change outliers from black to colour in grouped box plot in ggplot2

I have a grouped box plot in which I want to change the outlier dots from the default of black to the colour of the boxes keeping everything else the same. There is a previous thread that provides a solution for this for a standard box plot that I am able to implement.
Coloring boxplot outlier points in ggplot2?
However, I want to do it for a grouped box plot.
Below is some example data and code for the grouped box plot.
|ID |Time |Metabolite | Concentration|
|:--|:----|:----------|-------------:|
|1 |1 |A | 40|
|1 |1 |B | 36|
|1 |1 |C | 28|
|1 |2 |A | 13|
|1 |2 |B | 150|
|1 |2 |C | 32|
|1 |3 |A | 45|
|1 |3 |B | 15|
|1 |3 |C | 15|
|2 |1 |A | 7|
|2 |1 |A | 9|
|2 |1 |B | 236|
|2 |1 |C | 33|
|2 |2 |A | 33|
|2 |2 |B | 48|
|2 |2 |C | 39|
|2 |3 |A | 15|
|2 |3 |C | 126|
|3 |1 |A | 13|
|3 |1 |B | 41|
|3 |1 |C | 37|
|3 |2 |A | 3|
|3 |2 |B | 218|
|3 |2 |C | 27|
|3 |3 |A | 7|
|3 |3 |B | 27|
|3 |3 |C | 3|
|4 |1 |A | 4|
|4 |1 |B | 7|
|4 |1 |C | 33|
|4 |2 |A | 133|
|4 |2 |B | 4|
|4 |2 |C | 10|
|4 |3 |A | 122|
|4 |3 |B | 27|
|4 |3 |C | 14|
|5 |1 |A | 7|
|5 |1 |B | 22|
|5 |1 |C | 43|
|5 |2 |A | 3|
|5 |2 |B | 6|
|5 |2 |C | 158|
|5 |3 |A | 48|
|5 |3 |B | 7|
|5 |3 |C | 24|
|6 |1 |A | 15|
|6 |1 |B | 30|
|6 |1 |C | 15|
|6 |2 |A | 27|
|6 |2 |B | 187|
|6 |2 |C | 9|
|6 |3 |A | 31|
|6 |3 |B | 40|
|6 |3 |C | 41|
|7 |1 |A | 37|
|7 |1 |B | 30|
|7 |1 |C | 28|
|7 |2 |A | 142|
|7 |2 |B | 40|
|7 |2 |C | 7|
|7 |3 |A | 45|
|7 |3 |B | 3|
|8 |3 |C | 45|
|8 |1 |A | 34|
|8 |1 |B | 8|
|8 |1 |C | 46|
|8 |2 |A | 167|
|8 |2 |B | 25|
|8 |2 |C | 34|
|8 |3 |A | 27|
|9 |3 |B | 28|
|9 |3 |C | 36|
|9 |1 |A | 44|
|9 |1 |B | 26|
|9 |1 |C | 20|
|9 |2 |A | 11|
|9 |2 |B | 18|
|9 |2 |C | 176|
|9 |3 |A | 1|
|9 |3 |B | 40|
|9 |3 |C | 10|
|10 |1 |A | 8|
|10 |1 |B | 49|
|10 |1 |C | 193|
|10 |2 |A | 13|
|10 |2 |B | 13|
|10 |2 |C | 28|
|10 |3 |A | 50|
|10 |3 |B | 47|
|10 |3 |C | 46|
|11 |1 |A | 21|
|11 |1 |B | 34|
|11 |1 |C | 28|
|11 |2 |A | 13|
|11 |2 |B | 32|
|11 |2 |C | 47|
|11 |3 |A | 15|
|11 |3 |B | 42|
|11 |3 |C | 9|
ggplot(df, aes(x=Time, y=Concentration, fill=Metabolite)) +
geom_boxplot()

Is it possible to output table ordered by group and limited per group?

I have a database with a table of cars, the table has a number of different columns. I need to output the content within that table ordered by the Make of each car, only three cars from each make need to be outputted along side the total from eachh row of car. I also need to have the output ordered in descending order accompanied by a column called Ranking that counts up from 1 to however many outputs there will be.
Below is a sample from my databse table
|Timestamp |Email |Name |Year|Make |Model |Car_ID|Judge_ID|Judge_Name|Racer_Turbo|Racer_Supercharged|Racer_Performance|Racer_Horsepower|Car_Overall|Engine_Modifications|Engine_Performance|Engine_Chrome|Engine_Detailing|Engine_Cleanliness|Body_Frame_Undercarriage|Body_Frame_Suspension|Body_Frame_Chrome|Body_Frame_Detailing|Body_Frame_Cleanliness|Mods_Paint|Mods_Body|Mods_Wrap|Mods_Rims|Mods_Interior|Mods_Other|Mods_ICE|Mods_Aftermarket|Mods_WIP|Mods_Overall|
|--------------|-------------------------|----------|----|--------|---------|------|--------|----------|-----------|------------------|-----------------|----------------|-----------|--------------------|------------------|-------------|----------------|------------------|------------------------|---------------------|-----------------|--------------------|----------------------|----------|---------|---------|---------|-------------|----------|--------|----------------|--------|------------|
|8/5/2018 14:10|honoland13#japanpost.jp |Hernando |2015|Acura |TLX |48 |J04 |Bob |0 |0 |2 |2 |4 |4 |0 |2 |4 |4 |2 |4 |2 |2 |2 |2 |2 |0 |4 |4 |4 |6 |2 |0 |4 |
|8/5/2018 15:11|nlighterness2q#umn.edu |Noel |2015|Jeep |Wrangler |124 |J02 |Carl |0 |6 |4 |2 |4 |6 |6 |4 |4 |4 |6 |6 |6 |6 |6 |4 |6 |6 |6 |6 |6 |4 |6 |4 |6 |
|8/5/2018 17:10|eguest47#microsoft.com |Edan |2015|Lexus |Is250 |222 |J05 |Adrian |0 |0 |0 |0 |0 |0 |0 |0 |6 |6 |6 |0 |0 |6 |6 |6 |0 |0 |0 |0 |0 |0 |0 |0 |4 |
|8/5/2018 17:34|hchilley40#fema.gov |Hieronymus|1993|Honda |Civic eG |207 |J06 |Aaron |0 |0 |2 |2 |2 |2 |2 |2 |0 |4 |2 |2 |2 |2 |2 |2 |4 |2 |2 |0 |0 |0 |2 |2 |0 |
|8/5/2018 14:30|nnowick3d#tuttocitta.it |Nickolas |2016|Ford |Mystang |167 |J02 |Carl |0 |0 |2 |2 |0 |2 |2 |0 |0 |0 |0 |2 |0 |2 |2 |2 |0 |0 |2 |0 |0 |0 |0 |0 |2 |
|8/5/2018 16:12|mdearl39#amazon.co.uk |Martin |2013|Hyundai |Gen coupe|159 |J04 |Bob |0 |0 |2 |0 |0 |0 |2 |0 |0 |0 |0 |2 |0 |2 |2 |0 |2 |0 |2 |0 |0 |0 |0 |0 |0 |
|8/5/2018 17:00|alynamg#blogtalkradio.com|Aldridge |2009|Infiniti|G37 |20 |J06 |Aaron |2 |0 |2 |2 |0 |0 |2 |0 |0 |2 |2 |2 |2 |2 |2 |2 |2 |2 |4 |2 |2 |0 |2 |0 |2 |
|8/5/2018 16:11|abowton3k#spiegel.de |Ambros |2009|Honda |Oddesy |178 |J06 |Aaron |2 |0 |2 |2 |2 |2 |2 |0 |4 |4 |2 |2 |2 |4 |4 |4 |2 |2 | |6 |4 |4 |6 |4 |6 |
|8/5/2018 17:29|qesterbrookn#bandcamp.com|Quincy |2012|Hyundai |Celoster |30 |J04 |Bob |0 |0 |2 |2 |2 |2 |2 |4 |6 |6 |4 |2 |4 |4 |6 |6 |4 |0 |2 |0 |0 |0 |2 |2 |4 |
The expected output is something like this below
|Ranking |Car_ID|Year |Make |Model |Total|
|--------|------|-------|------|-----------|-----|
|1 |48 |2015 |Acura |TLX |89 |
|2 |66 |2012 |Acura |MDX |75 |
|3 |101 |2022 |Acura |TLX |70 |
|4 |22 |2011 |Chevy |Camaro |112 |
|5 |40 |2015 |Chevy |Corvette |99 |
|6 |205 |2022 |Chevy |Corvette |66 |
|7 |111 |2006 |Ford |Mustang |94 |
|8 |97 |2003 |Ford |GT |88 |
|9 |71 |2008 |Ford |Fiesta ST |80 |
Here's the command I've been been able to put together which does something similar to what I need, but I can't figure out how to do the ranking column and order by descending from the total.
SELECT Car_ID, Year, Make, Model, Racer_Turbo + Racer_Supercharged + ... + Mods_Overall FROM Carstable order by Make limit 3;
This query command only returned three results instead of all, I also can't figure out where to put the DESC keyword in the command in order to have them listed in descending order based on the total column or how to do the ranking column as well. Any ideas?
Use a CTE which returns the column Total for each row and ROW_NUMBER() window function to pick the first 3 rows for each Make and to create the column Ranking:
WITH cte AS (
SELECT *,
Racer_Turbo + Racer_Supercharged + Racer_Performance + Racer_Horsepower +
Car_Overall +
Engine_Modifications + Engine_Performance + Engine_Chrome + Engine_Detailing + Engine_Cleanliness +
Body_Frame_Undercarriage + Body_Frame_Suspension + Body_Frame_Chrome + Body_Frame_Detailing + Body_Frame_Cleanliness +
Mods_Paint + Mods_Body + Mods_Wrap + Mods_Rims + Mods_Interior + Mods_Other + Mods_ICE + Mods_Aftermarket + Mods_WIP + Mods_Overall Total
FROM carstable
)
SELECT ROW_NUMBER() OVER (ORDER BY Make, Total DESC) Ranking,
Car_ID, Year, Make, Model, Total
FROM (SELECT *, ROW_NUMBER() OVER (PARTITION BY Make ORDER BY Total) rn FROM cte)
WHERE rn <= 3
ORDER BY Make, Total DESC;
See the demo.

conditionally transpose select rows using tidyverse

I have a dataset that I'm working with that I'm attempting to reshape using tidyverse.
From:
|Name |eval |test |type | score|
|:----|:------|:----|:---------|-----:|
|John |first |1 |pretest | 10|
|John |first |1 |posttest | 15|
|John |first |2 |pretest | 20|
|John |first |2 |posttest | 30|
|John |second |1 |pretest | 35|
|John |second |1 |posttest | 50|
|John |second |2 |pretest | 5|
|John |second |2 |posttest | 10|
|Jane |first |1 |pretest | 40|
|Jane |first |1 |posttest | 20|
|Jane |first |2 |pretest | 10|
|Jane |first |2 |posttest | 20|
To:
|Name |eval |new_name | pre_test| post_test|
|:----|:------|:-------------|--------:|---------:|
|John |first |John_first_1 | 10| 15|
|John |first |John_first_2 | 20| 30|
|John |second |John_second_1 | 35| 50|
|John |second |John_second_2 | 5| 10|
|Jane |first |Jane_first_1 | 40| 20|
|Jane |first |Jane_first_2 | 10| 20|
tried doing group_by in order to group_by Name, eval, and test so that each group would essentially be pre_test vs. post_test for a given person.
also tried using unite on Name, eval, test, and type. But if I do a spread after that then each the unique name end up being a number of columns.
also tried to doing a unite first on Name, eval, test first, and then a spread using key=(new united name) and value =Value, but the output isn't what I wanted
I know a loop function can be written to take every other value and put into a new column, but I'm trying to see if there's a tidyverse way to go about this.
Thanks!!
library(tidyverse)
Name <- c('John', 'John', 'John', 'John',
'John', 'John', 'John', 'John',
'Jane', 'Jane', 'Jane', 'Jane')
eval <- c('first', 'first', 'first', 'first',
'second', 'second', 'second', 'second',
'first', 'first', 'first', 'first')
test <- c('1', '1', '2', '2',
'1', '1', '2', '2',
'1', '1', '2', '2')
type <- c('pretest', 'posttest', 'pretest', 'posttest',
'pretest', 'posttest', 'pretest', 'posttest',
'pretest', 'posttest', 'pretest', 'posttest')
score <- c(10, 15, 20, 30, 35, 50, 5, 10, 40, 20, 10, 20)
df <- data.frame(Name, eval, test, type, score)
df %>%
unite(temp, Name, eval, test) %>%
spread(key=type, value=score)
Edit to show the original table that akrun's code worked on
From:
|Name |eval |test |type | score|
|:----|:------|:----|:---------|-----:|
|John |first |1 |pretest | 10|
|John |first |1 |posttest | 15|
|John |first |2 |pretest | 20|
|John |first |2 |postttest | 30|
|John |second |1 |pretest | 35|
|John |second |1 |posttest | 50|
|John |second |2 |pretest | 5|
|John |second |2 |postttest | 10|
|Jane |first |1 |pretest | 40|
|Jane |first |1 |posttest | 20|
|Jane |first |2 |pretest | 10|
|Jane |first |2 |postttest | 20|
We can replace the multiple 't's in the 'type' column to make it same, then use unite specify the remove = FALSE to keep the initial columns as well and spread
library(dplyr)
library(tidyr)
library(stringr)
df %>%
mutate(type = str_replace(type, "t{2,}", "t")) %>%
unite(new_name, Name, eval, test, remove = FALSE) %>%
spread(type, score)
# new_name Name eval test postest pretest
#1 Jane_first_1 Jane first 1 20 40
#2 Jane_first_2 Jane first 2 20 10
#3 John_first_1 John first 1 15 10
#4 John_first_2 John first 2 30 20
#5 John_second_1 John second 1 50 35
#6 John_second_2 John second 2 10 5
In the new version tidyr_1.0.0, pivot_wider is introduced and it can be used as a more generalized version of spread (would be deprecated in the future). So, instead of the spread line at the end, use
...%>%
pivot_wider(names_from = type, values_from = score)
How about something like....
data <- tibble(
Name = c(rep("John", 8), rep("Jane", 4)),
eval = c(rep("first", 4), rep("second", 4), rep("first", 4)),
type = rep(c("pretest", "posttest"), 6),
score = c(10, 15, 20, 30, 35, 50, 5, 10, 40, 20, 10, 20)
)
data %>%
group_by(Name, eval, type) %>%
mutate(num = 1:n(),
new_name = str_c(Name, "_", eval, "_", num)) %>%
ungroup() %>%
dplyr::select(new_name, type, score) %>%
spread(type, score)
Which yields:
# A tibble: 6 x 3
new_name posttest pretest
<chr> <dbl> <dbl>
1 Jane_first_1 20 40
2 Jane_first_2 20 10
3 John_first_1 15 10
4 John_first_2 30 20
5 John_second_1 50 35
6 John_second_2 10 5

r parser translating symbol_function_call as a symbol

If I parse do.call(what=knitr::kable,args=args) the function kable in do.call is parsed to as a SYMBOL and not as a SYMBOL_FUNCTION_CALL.
Why shouldn't it be the later?
tf <- tempfile()
cat('do.call(knitr::kable,args=args)',file = tf)
parsed <- utils::getParseData(parse(tf))
knitr::kable(parsed)
| | line1| col1| line2| col2| id| parent|token |terminal |text |
|:--|-----:|----:|-----:|----:|--:|------:|:--------------------|:--------|:-------|
|18 | 1| 1| 1| 31| 18| 0|expr |FALSE | |
|1 | 1| 1| 1| 7| 1| 3|SYMBOL_FUNCTION_CALL |TRUE |do.call |
|3 | 1| 1| 1| 7| 3| 18|expr |FALSE | |
|2 | 1| 8| 1| 8| 2| 18|'(' |TRUE |( |
|7 | 1| 9| 1| 20| 7| 18|expr |FALSE | |
|4 | 1| 9| 1| 13| 4| 7|SYMBOL_PACKAGE |TRUE |knitr |
|5 | 1| 14| 1| 15| 5| 7|NS_GET |TRUE |:: |
|6 | 1| 16| 1| 20| 6| 7|SYMBOL |TRUE |kable |
|8 | 1| 21| 1| 21| 8| 18|',' |TRUE |, |
|11 | 1| 22| 1| 25| 11| 18|SYMBOL_SUB |TRUE |args |
|12 | 1| 26| 1| 26| 12| 18|EQ_SUB |TRUE |= |
|13 | 1| 27| 1| 30| 13| 15|SYMBOL |TRUE |args |
|15 | 1| 27| 1| 30| 15| 18|expr |FALSE | |
|14 | 1| 31| 1| 31| 14| 18|')' |TRUE |) |
If you just have ktable its a symbol. That symbol could point to a function or a value. It's not clear until you actually evaluate it what it is.
However if you have ktable(), it's clear that you expect ktable to be a function and that you are calling it.
The do.call obscures the parser's ability to recognize that you are trying to call a function and that intention isn't realized till run-time.
Things can get funny if you do something like
sum <- 5
sum
# [1] 5
sum(1:3)
# [1] 6
Here sum is behaving both like a regular variable and a function. We've actually created a shadow variable in our global environment that masks the sum function from base. But because the parse treats sum and sum() differently we can still get at both meanings.

R: two data frame merge with 2 variables and several other conditions

I am a beginner in R. Here is an example of a datatable (C) that I created using jmp. I have joined Table A and B using A1 and B;C columns to create C . In the datatable B, the cloumn OP that contains CLO is dropped during the join while the column J from datatable A is updated during the join.
I am trying to create the dataframe C using the merge command in R. I used the following expression:
C <- merge(B,A, BY=c("A1","B;C"),all.x = TRUE) but I don't seem to get the Data frame C. I would appreciate any help from the community to solve this.
Data Frame A
A1 | B;C | D |E |F |G | H | I |J |K |L | M |
------|------|---|--|---|---|---|------------|---|----|----|---|
ABCD |SD;TH |HO |2 |FA | |ENG| 201808:SPR |54 |PRO |VAC |MAA|
JCBW |RF;TH |HO |2 |FU |VIN|FUT| 504278:SPR |4 |PRO |VAC |MAA|
TVGH |ED;UJ |HO |2 |FU |VIN|FUT| 504276:SPR |4 |PRO |VAC |MAA|
IGHE |WR;RE |HO |3 |IN | |SPE| 504278:SPR |73 |PRO |VAC |MAA|
UUUU |DF;TH |HO |3 |FU | |FUT| 357193:IT |13 |INT |VAC |MAA|
JFLD |YO;TH |HO |3 |CH |BRI|CHE| 476306:SPR |6 |PRO |VAC |MAA|
|
Data frame B
OWN|COM|OP |GR |J | A1 | B;C | D|E |F |G |H | I |K |L |M
---|---|---|---|--|-----|-----|--|--|--|---|---|-----------|---|---|----
SUP|X |CLO|ARE|16|59HUW|BB;TH|HO|8 |FA|MIC|SPE|90278:SPR |INT|VAC|MAA
SUP|X |OPE|ARE|75|ABCD |SD;TH|HO|8 |FU|MIC|ENG|201808:SPR |INT|VAC|MAA
SUP|X |CLO|ARE|4 |59HVG|BB;RE|HO|8 |FA|MIC|SPE|6074278:SPR|INT|VAC|MAA
PAD|X |CLO|PEN|30|9RHSG|BV;TH|HO|2 |FA| |SPE|201808:SPR |PRO|VAC|MAA
PAD|X |OPE|PEN|99|UUUU |DF;TH|HO|8 |FU|MIC|FUT|357193:IT |PRO|VAC|MAA
PAD|X |OPE|PEN|65|IGHE |WR;RE|HO|8 |IN| |SPE|504278:SPR |PRO|VAC|MAA
PAD|X |CLO|PEN|13|S9K7E|FN;TH|HO|8 |FA|MIC|FUT|394290:SPR |PRO|VAC|MAA
Data frame C
OWN|COM|OP |GR |J |A1 | B;C |D |E |F | G |H | I | K |L |M
---|---|---|---|---|----|-----|--|--|--|---|---|----------|---|---|----
SUP|x |OPE|ARE|99 |ABCD|SD;TH|HO|8 |FU|MIC|ENG|201808:SPR|INT|VAC|MAA
PAD|x |OPE|PEN|120|UUUU|DF;TH|HO|8 |FU|MIC|FUT|357193:IT |PRO|VAC|MAA
PAD|x |OPE|PEN|73 |IGHE|WR;RE|HO|8 |IN| |SPE|504278:SPR|PRO|VAC|MAA
| | | |4 |JCBW|RF;TH|HO|2 |FU|VIN|FUT|504278:SPR|PRO|VAC|MAA
| | | |25 |TVGH|ED;UJ|HO|2 |FU|VIN|FUT|504276:SPR|PRO|VAC|MAA
| | | |15 |JFLD|YO;TH|HO|3 |CH|BRI|CHE|476306:SPR|PRO|VAC|MAA

Resources