r parser translating symbol_function_call as a symbol - r

If I parse do.call(what=knitr::kable,args=args) the function kable in do.call is parsed to as a SYMBOL and not as a SYMBOL_FUNCTION_CALL.
Why shouldn't it be the later?
tf <- tempfile()
cat('do.call(knitr::kable,args=args)',file = tf)
parsed <- utils::getParseData(parse(tf))
knitr::kable(parsed)
| | line1| col1| line2| col2| id| parent|token |terminal |text |
|:--|-----:|----:|-----:|----:|--:|------:|:--------------------|:--------|:-------|
|18 | 1| 1| 1| 31| 18| 0|expr |FALSE | |
|1 | 1| 1| 1| 7| 1| 3|SYMBOL_FUNCTION_CALL |TRUE |do.call |
|3 | 1| 1| 1| 7| 3| 18|expr |FALSE | |
|2 | 1| 8| 1| 8| 2| 18|'(' |TRUE |( |
|7 | 1| 9| 1| 20| 7| 18|expr |FALSE | |
|4 | 1| 9| 1| 13| 4| 7|SYMBOL_PACKAGE |TRUE |knitr |
|5 | 1| 14| 1| 15| 5| 7|NS_GET |TRUE |:: |
|6 | 1| 16| 1| 20| 6| 7|SYMBOL |TRUE |kable |
|8 | 1| 21| 1| 21| 8| 18|',' |TRUE |, |
|11 | 1| 22| 1| 25| 11| 18|SYMBOL_SUB |TRUE |args |
|12 | 1| 26| 1| 26| 12| 18|EQ_SUB |TRUE |= |
|13 | 1| 27| 1| 30| 13| 15|SYMBOL |TRUE |args |
|15 | 1| 27| 1| 30| 15| 18|expr |FALSE | |
|14 | 1| 31| 1| 31| 14| 18|')' |TRUE |) |

If you just have ktable its a symbol. That symbol could point to a function or a value. It's not clear until you actually evaluate it what it is.
However if you have ktable(), it's clear that you expect ktable to be a function and that you are calling it.
The do.call obscures the parser's ability to recognize that you are trying to call a function and that intention isn't realized till run-time.
Things can get funny if you do something like
sum <- 5
sum
# [1] 5
sum(1:3)
# [1] 6
Here sum is behaving both like a regular variable and a function. We've actually created a shadow variable in our global environment that masks the sum function from base. But because the parse treats sum and sum() differently we can still get at both meanings.

Related

Get sequence column from recursive hierarchy in Pyspark

I am currently trying to create a column that reflects a sequence from a recursive hierarchy in Pyspark. This is how the data looks like.
data = [(1,"A",None),(1,"B","A"),(1,"C","A"),(1,"D","C"),(1,"E","B"),(2,"A",None),(2,"B",None),(2,"C","A"),(2,"D","A"),(2,"E","D")]
df = spark.createDataFrame(data, "ID integer, Child string, Parent string")
+---+-----+------+
| ID|Child|Parent|
+---+-----+------+
| 1| A| null|
| 1| B| A|
| 1| C| A|
| 1| D| C|
| 1| E| B|
| 2| A| null|
| 2| B| null|
| 2| C| A|
| 2| D| A|
| 2| E| D|
+---+-----+------+
The expected result:
+---+-----+------+--------+
| ID|Child|Parent|Sequence|
+---+-----+------+--------+
| 1| A| null| 1|
| 1| B| A| 2|
| 1| C| A| 2|
| 1| D| C| 3|
| 1| E| B| 3|
| 2| A| null| 1|
| 2| B| null| 0|
| 2| C| A| 2|
| 2| D| A| 2|
| 2| E| D| 3|
+---+-----+------+--------+
What would be the best way to approach this?
I am aware that in SQL you can do this with recursive CTE, but there is no similar way to do it via Pyspark according to my investigation.
Recursively joining Dataframes seems to be the way to accomplish this, however it does seem expensive and overcomplex.
Is there a more native/efficient way to accomplish this?

Change outliers from black to colour in grouped box plot in ggplot2

I have a grouped box plot in which I want to change the outlier dots from the default of black to the colour of the boxes keeping everything else the same. There is a previous thread that provides a solution for this for a standard box plot that I am able to implement.
Coloring boxplot outlier points in ggplot2?
However, I want to do it for a grouped box plot.
Below is some example data and code for the grouped box plot.
|ID |Time |Metabolite | Concentration|
|:--|:----|:----------|-------------:|
|1 |1 |A | 40|
|1 |1 |B | 36|
|1 |1 |C | 28|
|1 |2 |A | 13|
|1 |2 |B | 150|
|1 |2 |C | 32|
|1 |3 |A | 45|
|1 |3 |B | 15|
|1 |3 |C | 15|
|2 |1 |A | 7|
|2 |1 |A | 9|
|2 |1 |B | 236|
|2 |1 |C | 33|
|2 |2 |A | 33|
|2 |2 |B | 48|
|2 |2 |C | 39|
|2 |3 |A | 15|
|2 |3 |C | 126|
|3 |1 |A | 13|
|3 |1 |B | 41|
|3 |1 |C | 37|
|3 |2 |A | 3|
|3 |2 |B | 218|
|3 |2 |C | 27|
|3 |3 |A | 7|
|3 |3 |B | 27|
|3 |3 |C | 3|
|4 |1 |A | 4|
|4 |1 |B | 7|
|4 |1 |C | 33|
|4 |2 |A | 133|
|4 |2 |B | 4|
|4 |2 |C | 10|
|4 |3 |A | 122|
|4 |3 |B | 27|
|4 |3 |C | 14|
|5 |1 |A | 7|
|5 |1 |B | 22|
|5 |1 |C | 43|
|5 |2 |A | 3|
|5 |2 |B | 6|
|5 |2 |C | 158|
|5 |3 |A | 48|
|5 |3 |B | 7|
|5 |3 |C | 24|
|6 |1 |A | 15|
|6 |1 |B | 30|
|6 |1 |C | 15|
|6 |2 |A | 27|
|6 |2 |B | 187|
|6 |2 |C | 9|
|6 |3 |A | 31|
|6 |3 |B | 40|
|6 |3 |C | 41|
|7 |1 |A | 37|
|7 |1 |B | 30|
|7 |1 |C | 28|
|7 |2 |A | 142|
|7 |2 |B | 40|
|7 |2 |C | 7|
|7 |3 |A | 45|
|7 |3 |B | 3|
|8 |3 |C | 45|
|8 |1 |A | 34|
|8 |1 |B | 8|
|8 |1 |C | 46|
|8 |2 |A | 167|
|8 |2 |B | 25|
|8 |2 |C | 34|
|8 |3 |A | 27|
|9 |3 |B | 28|
|9 |3 |C | 36|
|9 |1 |A | 44|
|9 |1 |B | 26|
|9 |1 |C | 20|
|9 |2 |A | 11|
|9 |2 |B | 18|
|9 |2 |C | 176|
|9 |3 |A | 1|
|9 |3 |B | 40|
|9 |3 |C | 10|
|10 |1 |A | 8|
|10 |1 |B | 49|
|10 |1 |C | 193|
|10 |2 |A | 13|
|10 |2 |B | 13|
|10 |2 |C | 28|
|10 |3 |A | 50|
|10 |3 |B | 47|
|10 |3 |C | 46|
|11 |1 |A | 21|
|11 |1 |B | 34|
|11 |1 |C | 28|
|11 |2 |A | 13|
|11 |2 |B | 32|
|11 |2 |C | 47|
|11 |3 |A | 15|
|11 |3 |B | 42|
|11 |3 |C | 9|
ggplot(df, aes(x=Time, y=Concentration, fill=Metabolite)) +
geom_boxplot()

unable to drop column in dataframe, R

I am reading an api using R and was able to retrieve the data, however, the data itself is retrieved in JSON format with what seems to be in 3 sections.
Converting the json data to a df using
fromJSON(data)
It was successful.
The next step was to drop the unnecessary columns, which is the my current issue and not able drop any columns to which I just recently found out how the data is formatted in the data frame.
lead <- GET("some api", add_headers(Authorization="Bearer some apikey"))
data <- content(lead,"text") #display the data looks something like below
The data seems to be formatted into 3 sections where I believe the issue is preventing columns to be dropped
$'data'
$included
$link
I've looked at multiple stackoverflow posts and r guides on dropping column name in df, but was unsuccessful
> df <- df[,-1]
Error in df[, -1] : incorrect number of dimensions
> df <- subset(df, select = -c("$'data'$id"))
Error in subset.default(df, select = -c("$'data'$id")) :
argument "subset" is missing, with no default
The end goal is to drop everything in $links, a few columns in $included, and a few columns in $data
Completely remove links:
df$links <- NULL
Remove columns 1 and 2 from included:
df$included[,c(1, 2)] <- NULL
Remove columns 3 and 4 from data:
df$data[,c(3, 4)] <- NULL
For a working example, see below with iris and mtcars:
exr <- list(iris = iris, mtcars = mtcars)
head(exr$mtcars)
| | mpg| cyl| disp| hp| drat| wt| qsec| vs| am| gear| carb|
|:-----------------|----:|---:|----:|---:|----:|-----:|-----:|--:|--:|----:|----:|
|Mazda RX4 | 21.0| 6| 160| 110| 3.90| 2.620| 16.46| 0| 1| 4| 4|
|Mazda RX4 Wag | 21.0| 6| 160| 110| 3.90| 2.875| 17.02| 0| 1| 4| 4|
|Datsun 710 | 22.8| 4| 108| 93| 3.85| 2.320| 18.61| 1| 1| 4| 1|
|Hornet 4 Drive | 21.4| 6| 258| 110| 3.08| 3.215| 19.44| 1| 0| 3| 1|
|Hornet Sportabout | 18.7| 8| 360| 175| 3.15| 3.440| 17.02| 0| 0| 3| 2|
|Valiant | 18.1| 6| 225| 105| 2.76| 3.460| 20.22| 1| 0| 3| 1|
exr$mtcars[,c(1,2)] <- NULL
head(exr$mtcars)
| | disp| hp| drat| wt| qsec| vs| am| gear| carb|
|:-----------------|----:|---:|----:|-----:|-----:|--:|--:|----:|----:|
|Mazda RX4 | 160| 110| 3.90| 2.620| 16.46| 0| 1| 4| 4|
|Mazda RX4 Wag | 160| 110| 3.90| 2.875| 17.02| 0| 1| 4| 4|
|Datsun 710 | 108| 93| 3.85| 2.320| 18.61| 1| 1| 4| 1|
|Hornet 4 Drive | 258| 110| 3.08| 3.215| 19.44| 1| 0| 3| 1|
|Hornet Sportabout | 360| 175| 3.15| 3.440| 17.02| 0| 0| 3| 2|
|Valiant | 225| 105| 2.76| 3.460| 20.22| 1| 0| 3| 1|

Query to update a SQL table column from datetime to minutes and milliseconds

I'm using SQLite browser, I'm trying to find a query that can update my table from:
Table is called main
| |time_one |time_two|
| 1| 2016-08-21 07:01:04| |
| 2| 2016-08-21 08:01:03| |
| 3| 2016-08-17 09:11:54| |
| 4| 2016-08-18 11:01:59| |
| 5| 2016-08-19 12:01:04| |
| 6| 2016-08-20 01:01:04| |
The result I'm looking for is the minute, second and millisecond:
| |time_one |time_two|
| 1| 2016-08-21 07:01:04| 01:04|
| 2| 2016-08-21 08:01:03| 01:03|
| 3| 2016-08-17 09:11:54| 11:54|
| 4| 2016-08-18 11:01:59| 01:59|
| 5| 2016-08-19 12:01:04| 01:04|
| 6| 2016-08-20 01:01:04| 01:04|
This is the function I used to show the time is
Select strftime('%M:%f', time_1)
From main
;
Now I just want to update time_2 with what is shown from the above command so I can group the ones with the same minutes and seconds together.
Just put that computation into the UPDATE statement:
UPDATE main SET time_2 = strftime('%M:%f', time_1);

importing data from log to sql table with delimter

have a sql table with the columns and want to import it into the sql table, at the same time remove the hypnens under the column and the last delimter. please help.
timestamp |cp |type |count |fail_count |succ |fail |
-------------------|---|--------|-----------|-----------|-----------|-----------|
2014.12.15 00:00:00| 1| 5| 5| 0| 143| 0|
2014.12.15 01:00:00| 1| 5| 30| 0| 945| 0|
2014.12.15 02:00:00| 1| 5| 30| 0| 1055| 0|
2014.12.15 03:00:00| 1| 5| 24| 0| 816| 0|
2014.12.15 04:00:00| 1| 5| 28| 0| 882| 0|
2014.12.15 05:00:00| 1| 5| 6| 0| 155| 0|
2014.12.15 06:00:00| 1| 5| 12| 0| 236| 0|

Resources