I have a grouped box plot in which I want to change the outlier dots from the default of black to the colour of the boxes keeping everything else the same. There is a previous thread that provides a solution for this for a standard box plot that I am able to implement.
Coloring boxplot outlier points in ggplot2?
However, I want to do it for a grouped box plot.
Below is some example data and code for the grouped box plot.
|ID |Time |Metabolite | Concentration|
|:--|:----|:----------|-------------:|
|1 |1 |A | 40|
|1 |1 |B | 36|
|1 |1 |C | 28|
|1 |2 |A | 13|
|1 |2 |B | 150|
|1 |2 |C | 32|
|1 |3 |A | 45|
|1 |3 |B | 15|
|1 |3 |C | 15|
|2 |1 |A | 7|
|2 |1 |A | 9|
|2 |1 |B | 236|
|2 |1 |C | 33|
|2 |2 |A | 33|
|2 |2 |B | 48|
|2 |2 |C | 39|
|2 |3 |A | 15|
|2 |3 |C | 126|
|3 |1 |A | 13|
|3 |1 |B | 41|
|3 |1 |C | 37|
|3 |2 |A | 3|
|3 |2 |B | 218|
|3 |2 |C | 27|
|3 |3 |A | 7|
|3 |3 |B | 27|
|3 |3 |C | 3|
|4 |1 |A | 4|
|4 |1 |B | 7|
|4 |1 |C | 33|
|4 |2 |A | 133|
|4 |2 |B | 4|
|4 |2 |C | 10|
|4 |3 |A | 122|
|4 |3 |B | 27|
|4 |3 |C | 14|
|5 |1 |A | 7|
|5 |1 |B | 22|
|5 |1 |C | 43|
|5 |2 |A | 3|
|5 |2 |B | 6|
|5 |2 |C | 158|
|5 |3 |A | 48|
|5 |3 |B | 7|
|5 |3 |C | 24|
|6 |1 |A | 15|
|6 |1 |B | 30|
|6 |1 |C | 15|
|6 |2 |A | 27|
|6 |2 |B | 187|
|6 |2 |C | 9|
|6 |3 |A | 31|
|6 |3 |B | 40|
|6 |3 |C | 41|
|7 |1 |A | 37|
|7 |1 |B | 30|
|7 |1 |C | 28|
|7 |2 |A | 142|
|7 |2 |B | 40|
|7 |2 |C | 7|
|7 |3 |A | 45|
|7 |3 |B | 3|
|8 |3 |C | 45|
|8 |1 |A | 34|
|8 |1 |B | 8|
|8 |1 |C | 46|
|8 |2 |A | 167|
|8 |2 |B | 25|
|8 |2 |C | 34|
|8 |3 |A | 27|
|9 |3 |B | 28|
|9 |3 |C | 36|
|9 |1 |A | 44|
|9 |1 |B | 26|
|9 |1 |C | 20|
|9 |2 |A | 11|
|9 |2 |B | 18|
|9 |2 |C | 176|
|9 |3 |A | 1|
|9 |3 |B | 40|
|9 |3 |C | 10|
|10 |1 |A | 8|
|10 |1 |B | 49|
|10 |1 |C | 193|
|10 |2 |A | 13|
|10 |2 |B | 13|
|10 |2 |C | 28|
|10 |3 |A | 50|
|10 |3 |B | 47|
|10 |3 |C | 46|
|11 |1 |A | 21|
|11 |1 |B | 34|
|11 |1 |C | 28|
|11 |2 |A | 13|
|11 |2 |B | 32|
|11 |2 |C | 47|
|11 |3 |A | 15|
|11 |3 |B | 42|
|11 |3 |C | 9|
ggplot(df, aes(x=Time, y=Concentration, fill=Metabolite)) +
geom_boxplot()
Related
I am currently trying to create a column that reflects a sequence from a recursive hierarchy in Pyspark. This is how the data looks like.
data = [(1,"A",None),(1,"B","A"),(1,"C","A"),(1,"D","C"),(1,"E","B"),(2,"A",None),(2,"B",None),(2,"C","A"),(2,"D","A"),(2,"E","D")]
df = spark.createDataFrame(data, "ID integer, Child string, Parent string")
+---+-----+------+
| ID|Child|Parent|
+---+-----+------+
| 1| A| null|
| 1| B| A|
| 1| C| A|
| 1| D| C|
| 1| E| B|
| 2| A| null|
| 2| B| null|
| 2| C| A|
| 2| D| A|
| 2| E| D|
+---+-----+------+
The expected result:
+---+-----+------+--------+
| ID|Child|Parent|Sequence|
+---+-----+------+--------+
| 1| A| null| 1|
| 1| B| A| 2|
| 1| C| A| 2|
| 1| D| C| 3|
| 1| E| B| 3|
| 2| A| null| 1|
| 2| B| null| 0|
| 2| C| A| 2|
| 2| D| A| 2|
| 2| E| D| 3|
+---+-----+------+--------+
What would be the best way to approach this?
I am aware that in SQL you can do this with recursive CTE, but there is no similar way to do it via Pyspark according to my investigation.
Recursively joining Dataframes seems to be the way to accomplish this, however it does seem expensive and overcomplex.
Is there a more native/efficient way to accomplish this?
I have a database with a table of cars, the table has a number of different columns. I need to output the content within that table ordered by the Make of each car, only three cars from each make need to be outputted along side the total from eachh row of car. I also need to have the output ordered in descending order accompanied by a column called Ranking that counts up from 1 to however many outputs there will be.
Below is a sample from my databse table
|Timestamp |Email |Name |Year|Make |Model |Car_ID|Judge_ID|Judge_Name|Racer_Turbo|Racer_Supercharged|Racer_Performance|Racer_Horsepower|Car_Overall|Engine_Modifications|Engine_Performance|Engine_Chrome|Engine_Detailing|Engine_Cleanliness|Body_Frame_Undercarriage|Body_Frame_Suspension|Body_Frame_Chrome|Body_Frame_Detailing|Body_Frame_Cleanliness|Mods_Paint|Mods_Body|Mods_Wrap|Mods_Rims|Mods_Interior|Mods_Other|Mods_ICE|Mods_Aftermarket|Mods_WIP|Mods_Overall|
|--------------|-------------------------|----------|----|--------|---------|------|--------|----------|-----------|------------------|-----------------|----------------|-----------|--------------------|------------------|-------------|----------------|------------------|------------------------|---------------------|-----------------|--------------------|----------------------|----------|---------|---------|---------|-------------|----------|--------|----------------|--------|------------|
|8/5/2018 14:10|honoland13#japanpost.jp |Hernando |2015|Acura |TLX |48 |J04 |Bob |0 |0 |2 |2 |4 |4 |0 |2 |4 |4 |2 |4 |2 |2 |2 |2 |2 |0 |4 |4 |4 |6 |2 |0 |4 |
|8/5/2018 15:11|nlighterness2q#umn.edu |Noel |2015|Jeep |Wrangler |124 |J02 |Carl |0 |6 |4 |2 |4 |6 |6 |4 |4 |4 |6 |6 |6 |6 |6 |4 |6 |6 |6 |6 |6 |4 |6 |4 |6 |
|8/5/2018 17:10|eguest47#microsoft.com |Edan |2015|Lexus |Is250 |222 |J05 |Adrian |0 |0 |0 |0 |0 |0 |0 |0 |6 |6 |6 |0 |0 |6 |6 |6 |0 |0 |0 |0 |0 |0 |0 |0 |4 |
|8/5/2018 17:34|hchilley40#fema.gov |Hieronymus|1993|Honda |Civic eG |207 |J06 |Aaron |0 |0 |2 |2 |2 |2 |2 |2 |0 |4 |2 |2 |2 |2 |2 |2 |4 |2 |2 |0 |0 |0 |2 |2 |0 |
|8/5/2018 14:30|nnowick3d#tuttocitta.it |Nickolas |2016|Ford |Mystang |167 |J02 |Carl |0 |0 |2 |2 |0 |2 |2 |0 |0 |0 |0 |2 |0 |2 |2 |2 |0 |0 |2 |0 |0 |0 |0 |0 |2 |
|8/5/2018 16:12|mdearl39#amazon.co.uk |Martin |2013|Hyundai |Gen coupe|159 |J04 |Bob |0 |0 |2 |0 |0 |0 |2 |0 |0 |0 |0 |2 |0 |2 |2 |0 |2 |0 |2 |0 |0 |0 |0 |0 |0 |
|8/5/2018 17:00|alynamg#blogtalkradio.com|Aldridge |2009|Infiniti|G37 |20 |J06 |Aaron |2 |0 |2 |2 |0 |0 |2 |0 |0 |2 |2 |2 |2 |2 |2 |2 |2 |2 |4 |2 |2 |0 |2 |0 |2 |
|8/5/2018 16:11|abowton3k#spiegel.de |Ambros |2009|Honda |Oddesy |178 |J06 |Aaron |2 |0 |2 |2 |2 |2 |2 |0 |4 |4 |2 |2 |2 |4 |4 |4 |2 |2 | |6 |4 |4 |6 |4 |6 |
|8/5/2018 17:29|qesterbrookn#bandcamp.com|Quincy |2012|Hyundai |Celoster |30 |J04 |Bob |0 |0 |2 |2 |2 |2 |2 |4 |6 |6 |4 |2 |4 |4 |6 |6 |4 |0 |2 |0 |0 |0 |2 |2 |4 |
The expected output is something like this below
|Ranking |Car_ID|Year |Make |Model |Total|
|--------|------|-------|------|-----------|-----|
|1 |48 |2015 |Acura |TLX |89 |
|2 |66 |2012 |Acura |MDX |75 |
|3 |101 |2022 |Acura |TLX |70 |
|4 |22 |2011 |Chevy |Camaro |112 |
|5 |40 |2015 |Chevy |Corvette |99 |
|6 |205 |2022 |Chevy |Corvette |66 |
|7 |111 |2006 |Ford |Mustang |94 |
|8 |97 |2003 |Ford |GT |88 |
|9 |71 |2008 |Ford |Fiesta ST |80 |
Here's the command I've been been able to put together which does something similar to what I need, but I can't figure out how to do the ranking column and order by descending from the total.
SELECT Car_ID, Year, Make, Model, Racer_Turbo + Racer_Supercharged + ... + Mods_Overall FROM Carstable order by Make limit 3;
This query command only returned three results instead of all, I also can't figure out where to put the DESC keyword in the command in order to have them listed in descending order based on the total column or how to do the ranking column as well. Any ideas?
Use a CTE which returns the column Total for each row and ROW_NUMBER() window function to pick the first 3 rows for each Make and to create the column Ranking:
WITH cte AS (
SELECT *,
Racer_Turbo + Racer_Supercharged + Racer_Performance + Racer_Horsepower +
Car_Overall +
Engine_Modifications + Engine_Performance + Engine_Chrome + Engine_Detailing + Engine_Cleanliness +
Body_Frame_Undercarriage + Body_Frame_Suspension + Body_Frame_Chrome + Body_Frame_Detailing + Body_Frame_Cleanliness +
Mods_Paint + Mods_Body + Mods_Wrap + Mods_Rims + Mods_Interior + Mods_Other + Mods_ICE + Mods_Aftermarket + Mods_WIP + Mods_Overall Total
FROM carstable
)
SELECT ROW_NUMBER() OVER (ORDER BY Make, Total DESC) Ranking,
Car_ID, Year, Make, Model, Total
FROM (SELECT *, ROW_NUMBER() OVER (PARTITION BY Make ORDER BY Total) rn FROM cte)
WHERE rn <= 3
ORDER BY Make, Total DESC;
See the demo.
If I parse do.call(what=knitr::kable,args=args) the function kable in do.call is parsed to as a SYMBOL and not as a SYMBOL_FUNCTION_CALL.
Why shouldn't it be the later?
tf <- tempfile()
cat('do.call(knitr::kable,args=args)',file = tf)
parsed <- utils::getParseData(parse(tf))
knitr::kable(parsed)
| | line1| col1| line2| col2| id| parent|token |terminal |text |
|:--|-----:|----:|-----:|----:|--:|------:|:--------------------|:--------|:-------|
|18 | 1| 1| 1| 31| 18| 0|expr |FALSE | |
|1 | 1| 1| 1| 7| 1| 3|SYMBOL_FUNCTION_CALL |TRUE |do.call |
|3 | 1| 1| 1| 7| 3| 18|expr |FALSE | |
|2 | 1| 8| 1| 8| 2| 18|'(' |TRUE |( |
|7 | 1| 9| 1| 20| 7| 18|expr |FALSE | |
|4 | 1| 9| 1| 13| 4| 7|SYMBOL_PACKAGE |TRUE |knitr |
|5 | 1| 14| 1| 15| 5| 7|NS_GET |TRUE |:: |
|6 | 1| 16| 1| 20| 6| 7|SYMBOL |TRUE |kable |
|8 | 1| 21| 1| 21| 8| 18|',' |TRUE |, |
|11 | 1| 22| 1| 25| 11| 18|SYMBOL_SUB |TRUE |args |
|12 | 1| 26| 1| 26| 12| 18|EQ_SUB |TRUE |= |
|13 | 1| 27| 1| 30| 13| 15|SYMBOL |TRUE |args |
|15 | 1| 27| 1| 30| 15| 18|expr |FALSE | |
|14 | 1| 31| 1| 31| 14| 18|')' |TRUE |) |
If you just have ktable its a symbol. That symbol could point to a function or a value. It's not clear until you actually evaluate it what it is.
However if you have ktable(), it's clear that you expect ktable to be a function and that you are calling it.
The do.call obscures the parser's ability to recognize that you are trying to call a function and that intention isn't realized till run-time.
Things can get funny if you do something like
sum <- 5
sum
# [1] 5
sum(1:3)
# [1] 6
Here sum is behaving both like a regular variable and a function. We've actually created a shadow variable in our global environment that masks the sum function from base. But because the parse treats sum and sum() differently we can still get at both meanings.
I have 2 dataframes in R,
DF1
|attr1|attr2|attr3|
|5 |4 |9 |
|4 |30 |2 |
|5 |18 |1 |
|3 |1 |7 |
|6 |30 |0 |
|8 |18 |12 |
Now, i'm trying to recode the values present within the attr2 column in this dataframe in a manner such that if the value in attr2 is present within the col1 in DF2 then it should be recoded as 1 otherwise as 0. The second dataframe is simply a count of the top 2 unique values within attr2
DF2
|Var1|Freq|
|30 |2 |
|18 |2 |
I want the result to be in the format of something as follows:
|attr1|attr2|attr3|
|5 |0 |9 |
|4 |1 |2 |
|5 |1 |1 |
|3 |0 |7 |
|6 |1 |0 |
|8 |1 |12 |
Thanks for the help!
We can use
library(dplyr)
DF1 %>%
mutate(attr2 = as.integer(attr2 %in% DF2$Var1))
I am a beginner in R. Here is an example of a datatable (C) that I created using jmp. I have joined Table A and B using A1 and B;C columns to create C . In the datatable B, the cloumn OP that contains CLO is dropped during the join while the column J from datatable A is updated during the join.
I am trying to create the dataframe C using the merge command in R. I used the following expression:
C <- merge(B,A, BY=c("A1","B;C"),all.x = TRUE) but I don't seem to get the Data frame C. I would appreciate any help from the community to solve this.
Data Frame A
A1 | B;C | D |E |F |G | H | I |J |K |L | M |
------|------|---|--|---|---|---|------------|---|----|----|---|
ABCD |SD;TH |HO |2 |FA | |ENG| 201808:SPR |54 |PRO |VAC |MAA|
JCBW |RF;TH |HO |2 |FU |VIN|FUT| 504278:SPR |4 |PRO |VAC |MAA|
TVGH |ED;UJ |HO |2 |FU |VIN|FUT| 504276:SPR |4 |PRO |VAC |MAA|
IGHE |WR;RE |HO |3 |IN | |SPE| 504278:SPR |73 |PRO |VAC |MAA|
UUUU |DF;TH |HO |3 |FU | |FUT| 357193:IT |13 |INT |VAC |MAA|
JFLD |YO;TH |HO |3 |CH |BRI|CHE| 476306:SPR |6 |PRO |VAC |MAA|
|
Data frame B
OWN|COM|OP |GR |J | A1 | B;C | D|E |F |G |H | I |K |L |M
---|---|---|---|--|-----|-----|--|--|--|---|---|-----------|---|---|----
SUP|X |CLO|ARE|16|59HUW|BB;TH|HO|8 |FA|MIC|SPE|90278:SPR |INT|VAC|MAA
SUP|X |OPE|ARE|75|ABCD |SD;TH|HO|8 |FU|MIC|ENG|201808:SPR |INT|VAC|MAA
SUP|X |CLO|ARE|4 |59HVG|BB;RE|HO|8 |FA|MIC|SPE|6074278:SPR|INT|VAC|MAA
PAD|X |CLO|PEN|30|9RHSG|BV;TH|HO|2 |FA| |SPE|201808:SPR |PRO|VAC|MAA
PAD|X |OPE|PEN|99|UUUU |DF;TH|HO|8 |FU|MIC|FUT|357193:IT |PRO|VAC|MAA
PAD|X |OPE|PEN|65|IGHE |WR;RE|HO|8 |IN| |SPE|504278:SPR |PRO|VAC|MAA
PAD|X |CLO|PEN|13|S9K7E|FN;TH|HO|8 |FA|MIC|FUT|394290:SPR |PRO|VAC|MAA
Data frame C
OWN|COM|OP |GR |J |A1 | B;C |D |E |F | G |H | I | K |L |M
---|---|---|---|---|----|-----|--|--|--|---|---|----------|---|---|----
SUP|x |OPE|ARE|99 |ABCD|SD;TH|HO|8 |FU|MIC|ENG|201808:SPR|INT|VAC|MAA
PAD|x |OPE|PEN|120|UUUU|DF;TH|HO|8 |FU|MIC|FUT|357193:IT |PRO|VAC|MAA
PAD|x |OPE|PEN|73 |IGHE|WR;RE|HO|8 |IN| |SPE|504278:SPR|PRO|VAC|MAA
| | | |4 |JCBW|RF;TH|HO|2 |FU|VIN|FUT|504278:SPR|PRO|VAC|MAA
| | | |25 |TVGH|ED;UJ|HO|2 |FU|VIN|FUT|504276:SPR|PRO|VAC|MAA
| | | |15 |JFLD|YO;TH|HO|3 |CH|BRI|CHE|476306:SPR|PRO|VAC|MAA