Making a boxplot based on continuous values

Making a boxplot based on continuous values - r

Using 2 classifications, I want to create a boxplot to illustrate the variation in starting and ending times. How I could do this with ggplot?
Data structure:
Desired output:
Sample data:
structure(list(day = c("Mo", "Tue", "Wed", "Thur", "Fri", "Mo",
"Tue", "Wed", "Thur", "Fri"), start_time1 = c(9.75, 6.5, 6.5,
6.5, 6.5, 8.5, 8.5, 8.5, 8.5, 8.75), end_time1 = c(14.75, 14.75,
8.75, 8.75, 14.75, 17.75, 17.25, 17.25, 16.5, 17.5), Pattern = c(0,
0, 0, 0, 0, 1, 1, 1, 1, 1)), class = c("spec_tbl_df", "tbl_df",
"tbl", "data.frame"), row.names = c(NA, -10L), spec = structure(list(
cols = list(day = structure(list(), class = c("collector_character",
"collector")), start_time1 = structure(list(), class = c("collector_double",
"collector")), end_time1 = structure(list(), class = c("collector_double",
"collector")), Pattern = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))

Is not very clear what you want to do, so I'm trying to guess:
Personally, I would put the day on the x-axis, the times on the y-axis, using the colors to differentiate Patterns and facet to differentiate the starting or ending times
library(ggplot2)
m <- melt(dff,id.vars=c("day","Pattern"))
m$Pattern <- as.factor(m$Pattern)
ggplot(m,aes(x=day,y=value,fill=Pattern))+
geom_boxplot()+
facet_wrap(~variable)+
labs(y="times")
the output depends on the length and variability of the data. With the sample that you provided the output is not so informative. Trying to inject some randomness in the data the plot becomes more useful:

Related

Count edges using the adjacency matrix

Based on the adjacency matrix, I would like to count the number of unique edges in a network. In the below example I coloured the unique edges between the different nodes. But I don't know how to proceed.
Desired output:
Sample data
structure(list(...1 = c("m1", "m2", "m3", "m4"), m1 = c(0.2,
0.2, 0.2, 0.3), m2 = c(0.1, 0.2, 0.2, 0.6), m3 = c(0.5, 0.2,
1, 0), m4 = c(0.3, 0, 0, 0.1)), row.names = c(NA, -4L), spec = structure(list(
cols = list(...1 = structure(list(), class = c("collector_character",
"collector")), m1 = structure(list(), class = c("collector_double",
"collector")), m2 = structure(list(), class = c("collector_double",
"collector")), m3 = structure(list(), class = c("collector_double",
"collector")), m4 = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), delim = ","), class = "col_spec"), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"))

Assuming that this is an undirected graph such that 0 indicates no edge and a positive number indicates an edge, convert the input DF to a logical matrix and from that to an igraph object. Then get its edges and the names of those edges. (Another possible output is by using as_edgelist(g) to get a 2 column matrix such that each row defines an edge.)
If it were intended that the graph be directed then replace "undirected" with "directed" and in that case a character vector of 13 edge names will be produced instead of the 9 undirected edges shown below.
library(igraph)
m <- as.matrix(DF[-1])
rownames(m) <- colnames(m)
g <- graph_from_adjacency_matrix(m > 0, "undirected")
e <- E(g)
attr(e, "vnames")
## [1] "m1|m1" "m1|m2" "m1|m3" "m1|m4" "m2|m2" "m2|m3" "m2|m4" "m3|m3" "m4|m4"
Alternately as a pipeline
library(igraph)
library(tibble)
DF %>%
column_to_rownames("...1") %>%
as.matrix %>%
sign %>%
graph_from_adjacency_matrix("undirected") %>%
E %>%
attr("vnames")
## [1] "m1|m1" "m1|m2" "m1|m3" "m1|m4" "m2|m2" "m2|m3" "m2|m4" "m3|m3" "m4|m4"
The graph of g looks like this. (If "directed" had been chosen above then the edges would have arrowheads on them.)
set.seed(123)
plot(g)
Note
DF <-
structure(list(...1 = c("m1", "m2", "m3", "m4"), m1 = c(0.2,
0.2, 0.2, 0.3), m2 = c(0.1, 0.2, 0.2, 0.6), m3 = c(0.5, 0.2,
1, 0), m4 = c(0.3, 0, 0, 0.1)), row.names = c(NA, -4L), spec = structure(list(
cols = list(...1 = structure(list(), class = c("collector_character",
"collector")), m1 = structure(list(), class = c("collector_double",
"collector")), m2 = structure(list(), class = c("collector_double",
"collector")), m3 = structure(list(), class = c("collector_double",
"collector")), m4 = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), delim = ","), class = "col_spec"), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"))

Loop in tidyverse

I am learning tidyverse() and I am using a time-series dataset, and I selected columns that start with sec. What I would like basically to identify those values from columns that equal 123, keep these and have the rest replace with 0. But I don't know how to loop from sec1:sec4. Also how can I sum() per columns?
df1<-df %>%
select(starts_with("sec")) %>%
select(ifelse("sec1:sec4"==123, 1, 0))
Sample data:
structure(list(sec1 = c(1, 123, 1), sec2 = c(123, 1, 1), sec3 = c(123,
0, 0), sec4 = c(1, 123, 1)), spec = structure(list(cols = list(
sec1 = structure(list(), class = c("collector_double", "collector"
)), sec2 = structure(list(), class = c("collector_double",
"collector")), sec3 = structure(list(), class = c("collector_double",
"collector")), sec4 = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), delim = ","), class = "col_spec"), row.names = c(NA,
-3L), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"))

I think you would have to use mutate and across to accomplish this. below you will mutate across each column starting with sec and then keep all values that are 123 and replace all others with 0.
df1<-df %>%
select(starts_with("sec")) %>%
mutate(across(starts_with("sec"),.fns = function(x){ifelse(x == 123,x,0)}))

Standard deviation based on group id from a data frame

This relates to one of y previous question. My end goal is to rank items based on the serial variable, which is derived from a standard deviation value for the start and end of the day. To simply summarise, I would like to calculate both of them (start and end day)  and then I would like to mark it with a 1 if the standard deviations are less than 0.5. What is the best way to do this in R?
Rule that i would like to implement in R:
=IF(AND(STDEV.S(D2,D3,D4)<0.5,STDEV.P(E2, E3, E4)<0.5),1,0)
Sample data structure:
Sample output:
Sample data
df<-structure(list(serial = c(11011209, 11011209, 11011209, 11011209,
11011209, 11011210, 11011210, 11011210, 11011210), pnum = c(1,
1, 1, 2, 2, 2, 2, 2, 2), Day = c("Tue", "Wed", "Thur", "Wed",
"Thur", "Mo", "Tue", "Wed", "Thur"), Start = c(7, 7, 7, 8, 8,
9.75, 6.5, 6.5, 6.5), End = c(14.5, 14.5, 14.5, 15.75, 15.75,
17.75, 14.75, 14.75, 8.75)), class = c("spec_tbl_df", "tbl_df",
"tbl", "data.frame"), row.names = c(NA, -9L), spec = structure(list(
cols = list(serial = structure(list(), class = c("collector_double",
"collector")), pnum = structure(list(), class = c("collector_double",
"collector")), Day = structure(list(), class = c("collector_character",
"collector")), Start = structure(list(), class = c("collector_double",
"collector")), End = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))

After grouping by 'serial', 'num', create the 'Pattern', by checking the sd of 'Start', 'End' columns are less than 0.5 and connect the multiple expressions to a single one with &
library(dplyr)
df %>%
group_by(serial, pnum) %>%
mutate(Pattern = +(sd(Start) < 0.5 & sd(End) < 0.5)) %>%
ungroup
Or instead of specifying each column separately, use if_all
df %>%
group_by(serial, pnum) %>%
mutate(Pattern = +(if_all(c(Start, End), ~ sd(.) < 0.5))) %>%
ungroup

Construct a loop based on multiple conditions in a column R

I have a df attached and I would like to create a loop that would apply a specific sequence based on conditions in column "x9". I would like to be able to set the sequence myself so I can try different sequences for this data frame, I will explain more below.
I have a df of losses and wins for an algorithm. On the first instance of a win I want to take the value in "x9" and divide it by the sequence value. I want to keep iterating through the sequence values until a loss is achieved. Once a loss is achieved the sequence will restart.
Risk control is the column I am attempting to create, it takes values from "x9" and divides them by the sequence value. I want to have the ability to alter the sequence values.
In short I need assistance in:
Constructing a sequence to apply to my df, would like to be able to alter this to try different sequences;
Take values in "x9" and create a new column that would apply the sequence values set. The sequence is taking the value in "x9" and dividing it by the sequence number;
Construct a loop to iterate through the entire df to apply this over all of the values.
I would appreciate any help / insight anyone can provide.
structure(list(x1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), x2 = c("2016.01.04 01:05",
"2016.01.04 01:12", "2016.01.04 01:13", "2016.01.04 01:17", "2016.01.04 01:20",
"2016.01.04 01:23", "2016.01.04 01:25", "2016.01.04 01:30", "2016.01.04 01:31",
"2016.01.04 01:59"), x3 = c("buy", "close", "buy", "close", "buy",
"close", "buy", "t/p", "buy", "close"), x4 = c(1, 1, 2, 2, 3,
3, 4, 4, 5, 5), x5 = c(8.46, 8.46, 8.6, 8.6, 8.69, 8.69, 8.83,
8.83, 9, 9), x6 = c(1.58873, 1.58955, 1.5887, 1.58924, 1.58862,
1.58946, 1.58802, 1.58902, 1.58822, 1.58899), x7 = c(1.57873,
1.57873, 1.5787, 1.5787, 1.57862, 1.57862, 1.57802, 1.57802,
1.57822, 1.57822), x8 = c(1.58973, 1.58973, 1.5897, 1.5897, 1.58962,
1.58962, 1.58902, 1.58902, 1.58922, 1.58922), x9 = c("$0.00",
"$478.69", "$0.00", "$320.45", "$0.00", "$503.70", "$0.00", "$609.30",
"$0.00", "$478.19"), x10 = c("$30,000.00", "$30,478.69", "$30,478.69",
"$30,799.14", "$30,799.14", "$31,302.84", "$31,302.84", "$31,912.14",
"$31,912.14", "$32,390.33"), `Risk Control` = c(NA, "$478.69",
NA, "$320.45", NA, "$251.85", NA, "$304.65", NA, "$159.40"),
Sequence = c(NA, 1, NA, 1, NA, 2, NA, 2, NA, 3)), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"), spec = structure(list(
cols = list(x1 = structure(list(), class = c("collector_double",
"collector")), x2 = structure(list(), class = c("collector_character",
"collector")), x3 = structure(list(), class = c("collector_character",
"collector")), x4 = structure(list(), class = c("collector_double",
"collector")), x5 = structure(list(), class = c("collector_double",
"collector")), x6 = structure(list(), class = c("collector_double",
"collector")), x7 = structure(list(), class = c("collector_double",
"collector")), x8 = structure(list(), class = c("collector_double",
"collector")), x9 = structure(list(), class = c("collector_character",
"collector")), x10 = structure(list(), class = c("collector_character",
"collector")), `Risk Control` = structure(list(), class = c("collector_character",
"collector")), ...12 = structure(list(), class = c("collector_logical",
"collector")), Sequence = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), delim = ","), class = "col_spec"))

Maybe there are better ways but I believe the following function does what the question asks for. It takes two arguments, a vector x to be processed and a sequence Seq. The return value is the risk control described in the question.
constructRisk <- function(x, Seq){
stopifnot(length(x) > 0)
stopifnot(length(Seq) > 0)
n <- length(x)
m <- length(Seq)
y <- numeric(n)
iSeq <- 1L
for(i in seq_len(n)){
y[i] <- x[i]/Seq[iSeq]
if(!is.na(y[i])){
if(y[i] < 0) iSeq <- 0L
}
iSeq <- iSeq + 1L
if(iSeq > m) iSeq <- 1L
}
y
}
Note that since the posted data has column x9 with dollar signs and is, therefore, of class "character", the test below is on a numeric version of it, X9. And the same goes for the risk control column, as posted.
X9 <- as.numeric(sub("\\$", "", df1$x9))
RskCntr <- as.numeric(sub("\\$", "", df1$`Risk Control`))
RC <- constructRisk(X9, df1$Sequence)
all.equal(RskCntr, RC)
#[1] "Mean relative difference: 2.091175e-05"
all.equal(RskCntr, round(RC, 2))
#[1] TRUE

ggplot loop deal with special characters

Hi there I'm trying to plot a defined number of graphs using gridExtra.
This is working but unfortunately it is not dealing with special characters in its name. I tried to work around by using R friendly names and add in the actual name as a subtitle
library(gridExtra)
library(ggplot2)
Dataframe<-read.csv2(File_with_R_friendly_names.csv)
names<-read.csv2(File_with_actual_names.csv)
bar<-colnames(names)
list_of_plots<-lapply(names(Dataframe)[2:10], function(i) {
ggplot(Dataframe, aes_string(x="X1", y=i)) + geom_point()+labs(x=i, y="Intensity", subtitle=bar[i])
})
do.call(grid.arrange, c(list_of_plots, ncol=3))
If I put in bar[2] all graphs get the actual name but it is the same one for all while if I set bar to i, all graphs get NA.
The names I use to suit R are
Met1, Met2, Met3, Met4, Met5, Met6, Met7, Met8, Met9 and Met10
Examples of names that I need on the plots are:
-(-)-Corey lactone
-(2R)-2,3-Dihydroxypropanoic acid
-(D-(+)-Glyceric acid?)
-1,5-Naphthalenediamine
-12-Aminododecanoic acid
-2,5-di-tert-Butylhydroquinone
-2,6-di-tert-Butylphenol
-2-Amino-N,N-diethylacetamide
-2-Ethyl-2-phenylmalonamide
-2-Naphthalenesulfonic acid
Here is the dput to reproduce the bar (names):
`bar<-c("X1", "(-)-Corey lactone", "(2R)-2,3-Dihydroxypropanoic acid (D-(+)- Glyceric acid?)", "1,5-Naphthalenediamine", "12-Aminododecanoic acid", "2,5-di- tert-Butylhydroquinone", "2,6-di-tert-Butylphenol", "2-Amino-N,N- diethylacetamide", "2-Ethyl-2-phenylmalonamide", "2-Naphthalenesulfonic acid")`
Here is the dput to reproduce the dataframe:
Dataframe<-structure(list(X1 = c(0, 0, 0.25, 0.25, 0.5, 0.5, 1, 1, 2, 2),
Met1 = c(0, 0, 38096319.85, 45978353.93, 35077691.7, 42146132.41,
62606961.17, 32786049.6, 51054004.82, 48898547.32), Met2 = c(0,
0, 1288905.771, 948466.4001, 645979.6463, 1228663.251, 1137957.136,
940928.9344, 1443680.706, 1755726.385), Met3 = c(0, 0, 575887.464,
693692.0349, 1362477.6, 1515767.293, 2241120.502, 2417932.908,
3866432.112, 3894701.876), Met4 = c(0, 0, 16737068.73, 21915551.3,
12088089.1, 16003037.3, 17720785.29, 11957614.24, 13127281.5,
14192542.13), Met5 = c(0, 0, 4556006.426, 4782909.936, 4484706.271,
8019957.826, 5112289.476, 8537488.48, 6680688.948, 5959748.061
), Met6 = c(0, 0, 16874476.32, 15721984.25, 18093323.61,
18619817.92, 22055835.04, 19754379.11, 29211315.88, 27321333.35
), Met7 = c(0, 0, 6604385.457, 6396794.568, 13823034.64,
15449539.63, 26013299.82, 20262673.28, 35301685.57, 33367520.66
), Met8 = c(0, 0, 6727973.448, 7166827.569, 13238311.46,
13986568.69, 20957194.23, 19186953.76, 34513697.47, 31192991.75
), Met9 = c(0, 0, 2373752.304, 3259738.104, 1998529.732,
2387445.15, 2479309.442, 26924139.6, 4611277.427, 2439602.098
)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-10L), .Names = c("X1", "Met1", "Met2", "Met3", "Met4", "Met5",
"Met6", "Met7", "Met8", "Met9"), spec = structure(list(cols = structure(list(
X1 = structure(list(), class = c("collector_double", "collector"
)), Met1 = structure(list(), class = c("collector_double",
"collector")), Met2 = structure(list(), class = c("collector_double",
"collector")), Met3 = structure(list(), class = c("collector_double",
"collector")), Met4 = structure(list(), class = c("collector_double",
"collector")), Met5 = structure(list(), class = c("collector_double",
"collector")), Met6 = structure(list(), class = c("collector_double",
"collector")), Met7 = structure(list(), class = c("collector_double",
"collector")), Met8 = structure(list(), class = c("collector_double",
"collector")), Met9 = structure(list(), class = c("collector_double",
"collector"))), .Names = c("X1", "Met1", "Met2", "Met3",
"Met4", "Met5", "Met6", "Met7", "Met8", "Met9")), default = structure(list(), class = c("collector_guess",
"collector"))), .Names = c("cols", "default"), class = "col_spec"))

Because names(Dataframe)[2:10] is not number. Below will work:
list_of_plots<-lapply(as.numeric(names(Dataframe)[2:10]), function(i) {
ggplot(Dataframe, aes_string(x="X1", y=i)) + geom_point()+labs(x=i,
y="Intensity", subtitle=bar[i])
})
do.call(grid.arrange, c(list_of_plots, ncol=3))

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Making a boxplot based on continuous values - r

Related

Count edges using the adjacency matrix

Loop in tidyverse

Standard deviation based on group id from a data frame

Construct a loop based on multiple conditions in a column R

ggplot loop deal with special characters

Categories

Resources