Multiple plot layers in ggvis - r

I want to create a ggvis plot which has lines and SOME points. That is, I want to plot say three lines and a single point on each. I've tried:
data %>%
ggvis(x = ~x, y = ~value, stroke = ~variable) %>%
group_by(variable) %>% layer_lines() %>%
layer_points(x = points[,1], y = points[,2])
where points is a dataframe with the points I want plotted, but this doesn't work.
Any ideas?
Thanks
Here's sample data:
structure(list(x = c(0, 100, 200, 300, 400, 500, 600, 700, 800,
900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900,
2000), total = c(0, 12561.8501420367, 22436.5635024327, 31648.8752584019,
39817.0210051915, 46555.4211041481, 51777.569893653, 56035.9877883034,
59440.4188225383, 62168.6432420093, 64700.5282644047, 67507.9276458711,
69915.0080895388, 71993.1498628426, 73801.8840967967, 75218.7298009381,
76055.2047330129, 76752.2339780277, 77351.4962240531, 77893.1254687784,
78434.7547135037)), .Names = c("x", "total"), row.names = c(NA,
-21L), class = "data.frame")
Here are points:
structure(list(V1 = c(46.3220011507472, 193.838339639222, 259.839658412055,
499.999999202025), y = c(6040.39184354985, 16023.4572876059,
26201.4660177994, 46555.4211041481)), .Names = c("V1", "y"), row.names = c("V1",
"V2", "V3", "total_spend"), class = "data.frame")

Related

Create mean value plot without missing values count to total

Using a dataframe with missing values:
structure(list(id = c("id1", "test", "rew", "ewt"), total_frq_1 = c(54, 87, 10, 36), total_frq_2 = c(45, 24, 202, 43), total_frq_3 = c(24, NA, 25, 8), total_frq_4 = c(36, NA, 104, NA)), row.names = c(NA, 4L), class = "data.frame")
How is is possible to create a bar plot with the mean for every column, excluding the id column, but without filling the missing values with 0 but leaving out the row with missing values example for total_frq_3 24+25+8 = 57/3 = 19
You can use colMeans function and pass it the appropriate argument to ignore NA.
library(ggplot2)
xy <- structure(list(id = c("id1", "test", "rew", "ewt"),
total_frq_1 = c(54, 87, 10, 36), total_frq_2 = c(45, 24, 202, 43), total_frq_3 = c(24, NA, 25, 8),
total_frq_4 = c(36, NA, 104, NA)),
row.names = c(NA, 4L),
class = "data.frame")
xy.means <- colMeans(x = xy[, 2:ncol(xy)], na.rm = TRUE)
xy.means <- as.data.frame(xy.means)
xy.means$total <- rownames(xy.means)
ggplot(xy.means, aes(x = total, y = xy.means)) +
theme_bw() +
geom_col()
Or just use base image graphic
barplot(height = colMeans(x = xy[, 2:ncol(xy)], na.rm = TRUE))

save list elements into separated dfs in R

I have a list containing 180.000 elements each represents data about an investor and a specific traded asset.
I want to save all the elements of the list into single dataframes called df into a specific folder "dev/test-data/investors-singleass/" , so that I can later on apply a specific function on all the dfs of the folder
The list of my data has a structure similar to this
list(`4Z627.004125` = structure(list(investor = c("4Z627", "4Z627",
"4Z627"), asset = c("004125", "004125", "004125"), datetime = c("2015-05-12",
"2015-05-28", "2016-08-19"), Avgprice = c(169.4, 168, 162), operation = c(2000,
1000, -3000), portfolio = c(2000, 3000, 0), last_port = c(0,
2000, 3000), marketprice = c(169.4, 166.5, 161.75), portprice = c(169.4,
168.933333333333, 0), G = c(0, 0, 1), gainminus = c(2, 0, 0),
numasset = c(5, 8, 13)), row.names = c(NA, -3L), class = c("tbl_df",
"tbl", "data.frame"))
Basically each elements of the list is an "investor" ID and an "asset" code for which i then have multiple other columns to work with
I would do it like this based on link
df1 <- list(`4Z627.004125` = structure(list(investor = c("4Z627", "4Z627",
"4Z627"), asset = c("004125", "004125", "004125"),
datetime = c("2015-05-12", "2015-05-28", "2016-08-19"),
Avgprice = c(169.4, 168, 162),
operation = c(2000, 1000, -3000), portfolio = c(2000, 3000, 0),
last_port = c(0,2000, 3000), marketprice = c(169.4, 166.5, 161.75),
portprice = c(169.4,
168.933333333333, 0), G = c(0, 0, 1), gainminus = c(2, 0, 0),
numasset = c(5, 8, 13)), row.names = c(NA, -3L),
class = c("tbl_df", "tbl", "data.frame")),
`4Z628.004128` = structure(list(investor = c("4Z627", "4Z627",
"4Z627"), asset = c("004125", "004125", "004125"),
datetime = c("2015-05-12", "2015-05-28", "2016-08-19"),
Avgprice = c(169.4, 168, 162),
operation = c(2000, 1000, -3000), portfolio = c(2000, 3000, 0),
last_port = c(0,2000, 3000), marketprice = c(169.4, 166.5, 161.75),
portprice = c(169.4,
168.933333333333, 0), G = c(0, 0, 1), gainminus = c(2, 0, 0),
numasset = c(5, 8, 13)), row.names = c(NA, -3L),
class = c("tbl_df", "tbl", "data.frame")))
library(purrr)
iwalk(df1, ~saveRDS(.x, paste0("dev/test-data/investors-singleass/", .y, '.RData')))
You can get the data back into R with
library(dplyr)
df <- list.files(path = "dev/test-data/investors-singleass/", pattern = ".RData") %>%
map_dfr(readRDS)

Annotate ggplot based on a second data frame

I have a faceted plot made with ggplot that is already working, it shows data about river altitude against years. I'm trying to add arrows based on a second dataframe which details when floods occurred.
Here's the current plot:
I would like to draw arrows in the top part of each graph based on date information in my second dataframe where each row corresponds to a flood and contains a date.
The link between the two dataframes is the Station_code column, each river has one or more stations which is indicated by this data (in this case only the Var river has two stations).
Here is the dput of the data frame used to create the original plot:
structure(list(River = c("Durance", "Durance", "Durance", "Durance",
"Roya", "Var"), Reach = c("La Brillanne", "Les Mées", "La Brillanne",
"Les Mées", "Basse vallée", "Basse vallée"), Area_km = c(465,
465, 465, 465, 465, 465), Type = c("restored", "target", "restored",
"target", "witness", "restored"), Year = c(2017, 2017, 2012,
2012, 2018, 2011), Restoration_year = c(2013, 2013, 2013, 2013,
NA, 2009), Station_code = c("X1130010", "X1130010", "X1130010",
"X1130010", "Y6624010", "Y6442015"), BRI_adi_moy_sstransect = c(0.00375820736746399,
0.00244752138003355, 0.00446807607783864, 0.0028792618981479,
0.00989200896930529, 0.00357247516596474), SD_sstransect = c(0.00165574247612667,
0.0010044634990875, 0.00220534492332107, 0.00102694633805149,
0.00788573233793128, 0.00308489160008849), min_BRI_sstransect = c(0.00108123849595469,
0.00111493913953216, 0.000555500340370182, 0.00100279590198288,
0, 0), max_BRI_sstransect = c(0.0127781240385231, 0.00700537285706352,
0.0210216858227621, 0.00815151653110584, 0.127734814926934, 0.0223738711013954
), Nb_sstr_unique_m = c(0.00623321576795815, 0.00259754717331206,
0.00117035034437559, 0.00209845092352825, 0.0458628969163946,
3.60620609570031), BRI_adi_moy_transect = c(0.00280232169999531,
0.00173868254527501, 0.00333818552810438, 0.00181398859573415,
0.00903651639185542, 0.00447856455432537), SD_transect = c(0.00128472161839638,
0.000477209421076879, 0.00204050725984513, 0.000472466654940182,
0.00780731734792112, 0.00310039904793707), min_BRI_transect = c(0.00108123849595469,
0.00106445386542223, 0.000901992689363725, 0.000855135344651009,
0.000944414463851629, 0.000162012161197014), max_BRI_transect = c(0.00709151795418251,
0.00434366293208643, 0.011717024999411, 0.0031991369873946, 0.127734814926934,
0.0187952134332499), Nb_tr_unique_m = c(0, 0, 0, 0, 0, 0), Error_reso = c(0.0011,
8e-04, 0.0018, 0.0011, 0.0028, 0.0031), W_BA = c(296.553323029366,
411.056574923547, 263.944186046512, 363.32874617737, 88.6420798065296,
158.66866970576), W_BA_sd = c(84.1498544481585, 65.3909073242282,
100.067554749308, 55.5534084807705, 35.2337070278364, 64.6978349498119
), W_BA_min = c(131, 206, 33, 223, 6, 45), W_BA_max = c(472,
564, 657, 513, 188, 381), W_norm = c(5.73271228619998, 7.9461900926133,
5.10234066090722, 7.02355699765464, 5.09378494746752, 4.81262001531126
), W_norm_sd = c(1.62671218635823, 1.2640804493236, 1.93441939783807,
1.07391043231191, 2.02469218788178, 1.96236658443141), W_norm_min = c(2.53237866910643,
3.98221378500706, 0.637927450996277, 4.31084307794454, 0.344787822572658,
1.36490651299098), W_norm_max = c(9.12429566273463, 10.9027600715727,
12.7005556152895, 9.91687219276031, 10.8033517739433, 11.5562084766569
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))
And here is the dput of the date frame containing the flooding date:
structure(list(Station_code = c("Y6042010", "Y6042010", "Y6042010",
"Y6042010", "Y6042010", "Y6042010"), Date = structure(c(12006,
12007, 12016, 12017, 13416, 13488), class = "Date"), Qm3s = c(156,
177, 104, 124, 125, 90.4), Qual = c(5, 5, 5, 5, 5, 5), Year = c(2002,
2002, 2002, 2002, 2006, 2006), Month = c(11, 11, 11, 11, 9, 12
), Station_river = c("Var#Entrevaux", "Var#Entrevaux", "Var#Entrevaux",
"Var#Entrevaux", "Var#Entrevaux", "Var#Entrevaux"), River = c("Var",
"Var", "Var", "Var", "Var", "Var"), Mod_inter = c(13.32, 13.32,
13.32, 13.32, 13.32, 13.32), Qm3s_norm = c(11.7117117117117,
13.2882882882883, 7.80780780780781, 9.30930930930931, 9.38438438438438,
6.78678678678679), File_name = c("Var#Entrevaux.dat", "Var#Entrevaux.dat",
"Var#Entrevaux.dat", "Var#Entrevaux.dat", "Var#Entrevaux.dat",
"Var#Entrevaux.dat"), Station_name = c("#Entrevaux", "#Entrevaux",
"#Entrevaux", "#Entrevaux", "#Entrevaux", "#Entrevaux"), Reach = c("Daluis",
"Daluis", "Daluis", "Daluis", "Daluis", "Daluis"), Restauration_year = c(2009,
2009, 2009, 2009, 2009, 2009), `Area_km[BH]` = c(676, 676, 676,
676, 676, 676), Starting_year = c(1920, 1920, 1920, 1920, 1920,
1920), Ending_year = c("NA", "NA", "NA", "NA", "NA", "NA"), Accuracy = c("good",
"good", "good", "good", "good", "good"), Q2 = c(86, 86, 86, 86,
86, 86), Q5 = c(120, 120, 120, 120, 120, 120), Q10 = c(150, 150,
150, 150, 150, 150), Q20 = c(170, 170, 170, 170, 170, 170), Q50 = c(200,
200, 200, 200, 200, 200), Data_producer = c("DREAL_PACA", "DREAL_PACA",
"DREAL_PACA", "DREAL_PACA", "DREAL_PACA", "DREAL_PACA"), Coord_X_L2e_Z32 = c(959313,
959313, 959313, 959313, 959313, 959313), Coord_Y_L2e_Z32 = c(1893321,
1893321, 1893321, 1893321, 1893321, 1893321), Coord_X_L93 = c(1005748.88,
1005748.88, 1005748.88, 1005748.88, 1005748.88, 1005748.88),
Coord_Y_L93 = c(6324083.97, 6324083.97, 6324083.97, 6324083.97,
6324083.97, 6324083.97), New_FN = c("Var#Entrevaux.csv",
"Var#Entrevaux.csv", "Var#Entrevaux.csv", "Var#Entrevaux.csv",
"Var#Entrevaux.csv", "Var#Entrevaux.csv"), NA_perc = c(14.92,
14.92, 14.92, 14.92, 14.92, 14.92), Q2_norm = c(6.45645645645646,
6.45645645645646, 6.45645645645646, 6.45645645645646, 6.45645645645646,
6.45645645645646), Q5_norm = c(9.00900900900901, 9.00900900900901,
9.00900900900901, 9.00900900900901, 9.00900900900901, 9.00900900900901
), Q10_norm = c(11.2612612612613, 11.2612612612613, 11.2612612612613,
11.2612612612613, 11.2612612612613, 11.2612612612613), Q20_norm = c(12.7627627627628,
12.7627627627628, 12.7627627627628, 12.7627627627628, 12.7627627627628,
12.7627627627628), Q50_norm = c(15.015015015015, 15.015015015015,
15.015015015015, 15.015015015015, 15.015015015015, 15.015015015015
)), row.names = c(NA, -6L), groups = structure(list(Station_code = "Y6042010",
.rows = structure(list(1:6), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = 1L, class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
EDIT
Here is an example of what I would like to do on the plot:
This is the code I use currently to do the plot:
ggplot(data = tst_formule[tst_formule$River != "Roya",], aes(x = Year, y = BRI_adi_moy_transect, shape = River, col = Type)) +
geom_point(size = 3) +
geom_errorbar(aes(ymin = BRI_adi_moy_transect - SD_transect, ymax = BRI_adi_moy_transect + SD_transect), size = 0.7, width = 0.3) +
geom_errorbar(aes(ymin = BRI_adi_moy_transect - Error_reso, ymax = BRI_adi_moy_transect + Error_reso, linetype = "Error due to resolution"), size = 0.3, width = 0.3, colour = "black") +
scale_linetype_manual(name = NULL, values = 2) +
scale_shape_manual(values = c(15, 18, 17, 16)) +
scale_colour_manual(values = c("chocolate1", "darkcyan")) +
new_scale("linetype") +
geom_vline(aes(xintercept = Restoration_year, linetype = "Restoration"), colour = "chocolate1") +
scale_linetype_manual(name = NULL, values = 5) +
new_scale("linetype") +
geom_hline(aes(yintercept = 0.004, linetype = "Threshold"), colour= 'black') +
scale_linetype_manual(name = NULL, values = 4) +
scale_y_continuous("BRI*", limits = c(min(tst_formule$BRI_adi_moy_transect - tst_formule$SD_transect, tst_formule$BRI_adi_moy_transect - tst_formule$Error_reso ), max(tst_formule$BRI_adi_moy_transect + tst_formule$SD_transect, tst_formule$BRI_adi_moy_transect + tst_formule$Error_reso))) +
scale_x_continuous(limits = c(min(tst_formule$Year - 1),max(tst_formule$Year + 1)), breaks = scales::breaks_pretty(n = 6)) +
theme_bw() +
facet_wrap(vars(River)) +
theme(legend.spacing.y = unit(-0.01, "cm")) +
guides(shape = guide_legend(order = 1),
colour = guide_legend(order = 2),
line = guide_legend(order = 3))
After tests and more research, I managed to do it by adding the second dataframe in geom_text():
new_scale("linetype") +
geom_segment(data = Flood_plot, aes(x = Date, xend = Date, y = 0.025, yend = 0.020, linetype = "Morphogenic flood"), arrow = arrow(length = unit(0.2, "cm")), inherit.aes = F, guide = guide_legend(order = 6)) +
scale_linetype_manual(name = NULL, values = 1) +
new_scale() creates a new linetype definition after the ones I created before, geom_segment() allows to draw arrows which I wanted but it works with geom_text() and scale_linetype_manual() draws the arrow in the legend without the mention "linetype" above. The second dataframe has the same column (River) as the 1st one to wrap and create the panels.

Create a Range Bar Chart in R

I would like to create a range bar chart in R.
I found a similar question that was posted about 4 years ago, but the answer was a bit awkward and cumbersome. It did the job, but I was hoping that a cleaner approach might be possible now. Using ggplot2 would be ideal.
Floating bar chart with dates as the plotted values in r
If possible, I would also like to include some sort of point data, or some other way to indicate a summary statistic, like the mean or median for that particular range.
Compared to the attached picture, the Item IDs would take the place of the months given along the y-axis and the Start and End values would be represented horizontally along the x-axis.
Although the data doesn't have to be structured this way, here is a sample dataframe to make things easier:
structure(list(Item = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
12, 13, 14, 15, 16, 17, 18, 19, 20), Start = c(500, 550, 500,
450, 400, 400, 500, 400, 300, 300, 350, 250, 300, 200, 200, 100,
100, 50, 0, 0), End = c(550, 600, 550, 550, 700, 600, 600, 700,
850, 600, 650, 650, 750, 900, 800, 900, 1000, 950, 900, 1000),
Median = c(525, 575, 525, 500, 550, 500, 550, 550, 575, 450,
500, 450, 525, 550, 500, 500, 550, 500, 450, 500)), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -20L), spec = structure(list(
cols = list(Item = structure(list(), class = c("collector_double",
"collector")), Start = structure(list(), class = c("collector_double",
"collector")), End = structure(list(), class = c("collector_double",
"collector")), Median = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))```
The example picture was taken from the range bar chart entry at the "AnyChart" website.
As suggested by RAB, you can try to use geom_segment with your dataframe df:
library(ggplot2)
ggplot(data = df)+
geom_segment(aes(x = Item, xend = Item, y = Start, yend = End), size = 5, colour = "red", alpha = 0.6) +
geom_segment(aes(x = Item, xend = Item, y = Median-1, yend = Median+1), size = 5, colour = "black") +
coord_flip() +
scale_x_discrete(limits = as.character(ddf$Item))+
ylab("Value")
Alternatively, you can also use geom_crossbar:
ggplot(data = ddf, aes(x = as.factor(Item), y = Median)) +
geom_crossbar(aes(ymin = Start, ymax = End), width = 0.5, fill = "grey") +
coord_flip() +
xlab("Item") +
ylab("Value")

R Plotly: Connecting bars in waterfall chart

I'd like to create a waterfall chart similar to the one below in plotly with connecting bars between the waterfall segments, and it appears to be possible based on the below.
#would like a plot similar to this with connecting bars
library(waterfalls)
library(plotly)
waterfall(.data = data.frame(category = letters[1:5],
value = c(100, -20, 10, 20, 110)),
calc_total = T,
fill_by_sign = T)
#appears to be possible
p <- waterfall(.data = data.frame(category = letters[1:5],
value = c(100, -20, 10, 20, 110)),
calc_total = T,
fill_by_sign = T)
ggplotly(p)
Looking at the plotly graph it seems as if the letters labels are actually numbers(?). Though I'm not quite sure how to create the bar chart with number indices. Below is my try at getting it right. Could someone point me in the right direction here?
library(waterfalls)
library(plotly)
shap <- structure(list(values = c(5.82983875274658, 0, 0, 0, -0.0259701404720545, -0.103678397834301, -1.02624976634979, 0),
names = structure(1:8, .Label = c("bias", "speciessetosa", "speciesversicolor", "speciesvirginica", "sepal_width", "petal_width", "petal_length", "Ttl. Target Pred"), class = c("ordered", "factor")),
base = c(0, 5.82983875274658, 5.82983875274658, 5.82983875274658, 5.82983875274658, 5.80386861227453, 5.70019021444023, 0),
positive = c(5.82983875274658, 0, 0, 0, 0, 0, 0, 0),
negative = c(0, 0, 0, 0, -0.0259701404720545, -0.103678397834301, -1.02624976634979, 0),
shap_total = c(0, 0, 0, 0, 0, 0, 0, 4.67394044809043),
position = c(2.91491937637329, 5.82983875274658, 5.82983875274658, 5.82983875274658, 5.81685368251055, 5.75202941335738, 5.18706533126533, 2.33697022404522),
text_vals = c("5.83", "0", "0", "0", "-0.03", "-0.1", "-1.03", "4.67"), row_num = 1:8),
class = "data.frame", row.names = c(NA, -8L))
p <- plotly::plot_ly(shap, y = ~names, x = ~base, type = 'bar', marker = list(color = 'rgba(1,1,1, 0.0)')) %>%
add_trace(x = ~positive, marker = list(color = 'rgba(50, 171, 96, 0.7)',
line = list(color = 'rgba(50, 171, 96, 1.0)',
width = 2))) %>%
add_trace(x = ~negative, marker = list(color = 'rgba(219, 64, 82, 0.7)',
line = list(color = 'rgba(219, 64, 82, 1.0)',
width = 2))) %>%
add_trace(x = ~shap_total, marker = list(color = 'rgba(55, 128, 191, 0.7)',
line = list(color = 'rgba(55, 128, 191, 1.0)',
width = 2))) %>%
layout(title = 'SHAP Value Prediction Contributions',
xaxis = list(title = "Prediction Contribution"),
yaxis = list(title = ""),
barmode = 'stack',
showlegend = FALSE) %>%
add_annotations(text = ~text_vals,
y = ~names,
x = ~position,
xref = "x",
yref = "y",
font = list(family = 'Arial',
size = 12,
color = 'rgba(0, 0, 0, 1)'),
showarrow = FALSE)
p

Resources