Julia Gadfly can't scale axis when Scale.x_log10 - plot

I am new to Julia and try to get a simple x-y plot through Gadfly Pkg.
I am trying to plot x-axis in log scale and set min and max value in the same time.
plot(layer(rdsPmos, x="A", y="B", Geom.line), Scale.x_log10(minvalue= 10),
Theme(default_point_size = 1.5px))
This won't get any error message. The outcome plot has a log scale x-axis but the minvalue seems not work.
I also try to write lise this:
plot(layer(rdsPmos, x="A", y="B", Geom.line), Scale.x_log10, Scale.x_continuous(minvalue= 10), Theme(default_point_size = 1.5px))
And the result is the minvalue work but the logscale fail.

My tests shows that minvalue and maxvalue options works in the way that none of the data missed from view-port, (true for x_continuous or x_log10), so if one wants a narrower view-port, one way is to apply filter on data:
julia> df = DataFrame(A = 1:10, B = 2:2:20)
10x2 DataFrames.DataFrame
| Row | A | B |
|-----|----|----|
| 1 | 1 | 2 |
| 2 | 2 | 4 |
| 3 | 3 | 6 |
| 4 | 4 | 8 |
| 5 | 5 | 10 |
| 6 | 6 | 12 |
| 7 | 7 | 14 |
| 8 | 8 | 16 |
| 9 | 9 | 18 |
| 10 | 10 | 20 |
minvalue is not working, and it's nothing with Scale type:
julia> plot(layer(df, x="A", y="B" ,Geom.line), Scale.x_log10(minvalue=5), Theme(default_point_size = 1.5px))
julia> plot(layer(df, x="A", y="B" ,Geom.line), Scale.x_continuous(
minvalue=5), Theme(default_point_size = 1.5px))
minvalue is working on filtered data
julia> plot(layer(df[df[:A].>5,:], x="A", y="B" ,Geom.line), Scale.x_log10(minvalue=5), Theme(default_point_size = 1.5px))

Related

R Programming: How to drop variable labels as first column name in the table output from the fre function of the expss package?

I'm doing exploratory analysis of survey data and the dataframe is a haven labelled dataset, that is, each variable already has value labels and variable labels.
I want to store frequencies tables in a list, and then name each list element as the variable label. I'm using the expss package. The problem is that the output tables contain in the first column name this description: values2labels(Df$var. How can this description be dropped from the table?
Reproducible example:
# Dataframe
df <- data.frame(sex = c(1, 1, 2, 2, 1, 2, 2, 2, 1, 2),
agegroup= c(1, 3, 1, 2, 3, 3, 2, 2, 2, 1),
weight = c(100, 20, 400, 300, 50, 50, 80, 250, 100, 100))
library(expss)
# Variable labels
var_lab(df$sex) <-"Sex"
var_lab(df$agegroup) <-"Age group"
# Value labels
val_lab(df$sex) <- make_labels("1 Male
2 Female")
val_lab(df$agegroup) <- make_labels("1 1-29
2 30-49
3 50 and more")
# Save variable labels
var_labels1 <- var_lab(df$sex)
var_labels2 <- var_lab(df$agegroup)
# Drop variable labels
var_lab(df$sex) <- NULL
var_lab(df$agegroup) <- NULL
# Save frequencies
f1 <- fre(values2labels(df$sex))
f2 <- fre(values2labels(df$agegroup))
# Note: I use the function 'values2labels' from 'expss' package in order to display the value <br />
labels instead of the values of the variable.In this example, since I manually created the value <br />
labels, I don't need that function, but when I import haven labelled data, I need it to
display value labels by default.
# Add frequencies on list
my_list <- list(f1, f2)
# Name lists elements as variable labels
names(my_list) <- c(var_labels1,
var_labels2)
In the following output, how can I get rid of the first column name on both tables: values2labels(df$sex) and values2labels(df$agegroup) ?
$Sex
| values2labels(df$sex) | Count | Valid percent | Percent | Responses, % | Cumulative responses, % |
| --------------------- | ----- | ------------- | ------- | ------------ | ----------------------- |
| Female | 6 | 60 | 60 | 60 | 60 |
| Male | 4 | 40 | 40 | 40 | 100 |
| #Total | 10 | 100 | 100 | 100 | |
| <NA> | 0 | | 0 | | |
$`Age group`
| values2labels(df$agegroup) | Count | Valid percent | Percent | Responses, % | Cumulative responses, % |
| -------------------------- | ----- | ------------- | ------- | ------------ | ----------------------- |
| 1-29 | 3 | 30 | 30 | 30 | 30 |
| 30-49 | 4 | 40 | 40 | 40 | 70 |
| 50 and more | 3 | 30 | 30 | 30 | 100 |
| #Total | 10 | 100 | 100 | 100 | |
| <NA> | 0 | | 0 | | |
You need to set var_lab to empty string instead of NULL:
library(expss)
a = 1:2
val_lab(a) = c("One" = 1, "Two" = 2)
var_lab(a) = ""
fre(values2labels(a))
# | | Count | Valid percent | Percent | Responses, % | Cumulative responses, % |
# | ------ | ----- | ------------- | ------- | ------------ | ----------------------- |
# | One | 1 | 50 | 50 | 50 | 50 |
# | Two | 1 | 50 | 50 | 50 | 100 |
# | #Total | 2 | 100 | 100 | 100 | |
# | <NA> | 0 | | 0 | | |

Possible to invert the randomForest function in R?

I computed a random forest to predict a target value in a large data structure.
The matrix contains some thousand rows, about 20 input variables and one output/target/response variable.
For example, the dataframe df is like:
| V1 | V2 | V3 | V4 | ... | Rsp |
---------------------------------
| 1 | 8 | 2 | 3 | ... | 1.5 |
| 2 | 4 | 3 | 4 | ... | 1.3 |
| 5 | 7 | 6 | 3 | ... | 1.4 |
| 2 | 8 | 8 | 4 | ... | 1.9 |
| 9 | 3 | 1 | 6 | ... | 2.1 |
. . . . . .
I calculated the forest:
df.r <- randomForest(Rsp ~ . , data = df , subset = train , mtry = 50, ntree=200)
p <- predict(df.r, df[-train,])
I want to minimize the response in order to get the best combinations of input variables. But because the input and output are noisy, I cannot directly take the variables at the minimum response value.
So my question is: Is it possible to go the tree bottom-up? Is it possible to get the combinations of variables which give me a low response value?

How to find the MAX of a calculated value in a window?

I have a simple database table with three columns: id, x, y. x and y are just the coordinates of points in a line. I want to using the SQLite Window function to partition the table using a sliding window of three rows, and then get the y value that is the furthest from the y value of the first coordinate (row) in the window.
An example:
| id | x | y |
|----|---|---|
| 1 | 1 | .5|
| 2 | 2 | .9|
| 3 | 3 | .7|
| 4 | 4 |1.1|
| 5 | 5 | 1 |
So the first partition would consist of:
| id | x | y |
|----|---|---|
| 1 | 1 | .5|
| 2 | 2 | .9|
| 3 | 3 | .7|
And the desired result would be:
| id | x | y | d |
|----|---|---|---|
| 1 | 1 | .5| .4|
| 2 | 2 | .9|
| 3 | 3 | .7|
Since the the window with id = 1 as the CURRENT ROW would have a maximum variation of .4; the maximum distance between the y value of the first row in the partition, .5, and .9, is .4.
The final expected result:
| id | x | y | d |
|----|---|---|---|
| 1 | 1 | .5| .4|
| 2 | 2 | .9| .2|
| 3 | 3 | .7| .4|
| 4 | 4 |1.1| .1|
| 5 | 5 | 1 | |
I've tried using a window function like: WINDOW win1 AS (ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING which gives me the correct window.
With the window defined, I tried doing something like:
SELECT
max(abs(y - first_value(y) OVER win1)) AS d
FROM t
WINDOW win1 AS (ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING)
But I get an error for misuse of first_value.
I think the problem I have is this is not the proper approach to calculate over each row of a partition, but I could not find another solution or approach that matches what I am trying to do here.
For each row of your table you define a window starting from the current row up to the next 2 rows.
In your code y is the value in the current row and first_value() is the 1st value of y of the current window which is also the value of y of the current row.
So even if your code was syntactically correct the difference you calculate would always return 0.
It's easier to solve your problem with LEAD() window function:
WITH cte AS (
SELECT *,
LEAD(y, 1) OVER () AS y1,
LEAD(y, 2) OVER () AS y2
FROM tablename
)
SELECT
id, x, y,
MAX(ABS(y - y1), COALESCE(ABS(y - y2), 0)) d
FROM cte
See the demo.
Results:
id x y d
1 1 0.5 0.4
2 2 0.9 0.2
3 3 0.7 0.4
4 4 1.1 0.1
5 5 1.0

Combining aggregate functions in sqlite

Assuming the following table and using sqlite I have the following question:
Node |Loadcase | Fx | Cluster
---------------------------------
1 | 1 | 50 | A
2 | 1 | -40 | A
3 | 1 | 60 | B
4 | 1 | 80 | C
1 | 2 | 50 | A
2 | 2 | -50 | A
3 | 2 | 80 | B
4 | 2 | -100 | C
I am trying to write a query which fetches the maximum absolute value of Fx and the Load case for each Node 1-4.
An additional requirement is that Fx having the same Cluster shall be summed up before making this query .
In the example above I would expect the following results:
Node | Loadcase | MaxAbsClusteredFx
-----|-----------|-------------------
1 | 1 | 10
2* | |
3 | 2 | 80
4 | 2 | 100
N/A because summed up with node one. Both belonging to cluster A
Query:
For Node 1 I would execute a query similar to this
SELECT Loadcase,abs(Fx GROUP BY Cluster) FROM MyTable WHERE abs(Fx GROUP BY Cluster) = max(abs(Fx GROUP BY Cluster)) AND Node = 1
I keep getting " Error while executing query: near "Forces": syntax error " or alike.
Thankful for any help!

Calculate Grid Position

I'm trying to figure out a way to calculate the positions in the grid like I have below. I know the row, column, totalColumns, totalRows. For example, given column = 2, row = 0, totalColumns = 4, totalRows = 3, the position is B (11)
Cols
+ + + + +
| 0 | 1 | 2 | 3 |
+--+---|---|---|---|---
0 | 9 | A | B | C |
+--+---|---|---|---|--- Rows
1 | 5 | 6 | 7 | 8 |
+--+---|---|---|---|---
2 | 1 | 2 | 3 | 4 |
+--+---|---|---|---|---
ah,, well, i guess you have better thinks to do than school ;))
hex(tr*tc-r*tc-tc+c+1)

Resources