Create a Range Bar Chart in R - r

I would like to create a range bar chart in R.
I found a similar question that was posted about 4 years ago, but the answer was a bit awkward and cumbersome. It did the job, but I was hoping that a cleaner approach might be possible now. Using ggplot2 would be ideal.
Floating bar chart with dates as the plotted values in r
If possible, I would also like to include some sort of point data, or some other way to indicate a summary statistic, like the mean or median for that particular range.
Compared to the attached picture, the Item IDs would take the place of the months given along the y-axis and the Start and End values would be represented horizontally along the x-axis.
Although the data doesn't have to be structured this way, here is a sample dataframe to make things easier:
structure(list(Item = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
12, 13, 14, 15, 16, 17, 18, 19, 20), Start = c(500, 550, 500,
450, 400, 400, 500, 400, 300, 300, 350, 250, 300, 200, 200, 100,
100, 50, 0, 0), End = c(550, 600, 550, 550, 700, 600, 600, 700,
850, 600, 650, 650, 750, 900, 800, 900, 1000, 950, 900, 1000),
Median = c(525, 575, 525, 500, 550, 500, 550, 550, 575, 450,
500, 450, 525, 550, 500, 500, 550, 500, 450, 500)), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -20L), spec = structure(list(
cols = list(Item = structure(list(), class = c("collector_double",
"collector")), Start = structure(list(), class = c("collector_double",
"collector")), End = structure(list(), class = c("collector_double",
"collector")), Median = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))```
The example picture was taken from the range bar chart entry at the "AnyChart" website.

As suggested by RAB, you can try to use geom_segment with your dataframe df:
library(ggplot2)
ggplot(data = df)+
geom_segment(aes(x = Item, xend = Item, y = Start, yend = End), size = 5, colour = "red", alpha = 0.6) +
geom_segment(aes(x = Item, xend = Item, y = Median-1, yend = Median+1), size = 5, colour = "black") +
coord_flip() +
scale_x_discrete(limits = as.character(ddf$Item))+
ylab("Value")
Alternatively, you can also use geom_crossbar:
ggplot(data = ddf, aes(x = as.factor(Item), y = Median)) +
geom_crossbar(aes(ymin = Start, ymax = End), width = 0.5, fill = "grey") +
coord_flip() +
xlab("Item") +
ylab("Value")

Related

Create mean value plot without missing values count to total

Using a dataframe with missing values:
structure(list(id = c("id1", "test", "rew", "ewt"), total_frq_1 = c(54, 87, 10, 36), total_frq_2 = c(45, 24, 202, 43), total_frq_3 = c(24, NA, 25, 8), total_frq_4 = c(36, NA, 104, NA)), row.names = c(NA, 4L), class = "data.frame")
How is is possible to create a bar plot with the mean for every column, excluding the id column, but without filling the missing values with 0 but leaving out the row with missing values example for total_frq_3 24+25+8 = 57/3 = 19
You can use colMeans function and pass it the appropriate argument to ignore NA.
library(ggplot2)
xy <- structure(list(id = c("id1", "test", "rew", "ewt"),
total_frq_1 = c(54, 87, 10, 36), total_frq_2 = c(45, 24, 202, 43), total_frq_3 = c(24, NA, 25, 8),
total_frq_4 = c(36, NA, 104, NA)),
row.names = c(NA, 4L),
class = "data.frame")
xy.means <- colMeans(x = xy[, 2:ncol(xy)], na.rm = TRUE)
xy.means <- as.data.frame(xy.means)
xy.means$total <- rownames(xy.means)
ggplot(xy.means, aes(x = total, y = xy.means)) +
theme_bw() +
geom_col()
Or just use base image graphic
barplot(height = colMeans(x = xy[, 2:ncol(xy)], na.rm = TRUE))

Place elements from vector on histogram bins (R ggplot)

I have a ggplot histogram, showing two histograms of a continuous variable, one for each level of a group.
Through use of ggplot_build, I now also have vectors where each element is the proportional count of one group (1) versus the other (0), per bin.
So for the following histogram built with
ggplot(data,aes(x=nonfordist)) + geom_histogram(aes(fill=presence),
position="identity",alpha=0.5,bins=30)+ coord_cartesian(xlim=c(NA,1750))
I have the following list, showing sequential proportions of group1/group0 per bin
list(0.398927744608261, 0.35358629130967, 0.275296034083078,
0.247361252979231, 0.260224274406332, 0.22107969151671, 0.252847380410023,
0.230055658627087, 0.212244897959184, 0.242105263157895,
0.235294117647059, 0.115384615384615, 0.2, 0.421052631578947,
0.4375, 0.230769230769231, 0.222222222222222, 0.5, 0, 0,
0, NaN, 1, 1, 0, 0, NaN, NaN, NaN, Inf)
What I want now is to plot the elements of this list on the corresponding bins, preferably above the bars showing the counts for group1.
I do not want to include the proportions for bins that fall outside of the histogram due to my xlim command.
You could use stat_bin with a text geom, using the same breaks as you do for your histogram. We don't have your actual data, so I've tried to approximate it here (see footnote for reproducible data). You haven't told us what your list of proportions is called, so I have named it props in this example.
ggplot(data,aes(x=nonfordist)) +
geom_histogram(aes(fill = presence),
breaks = seq(-82.5, by = 165, length = 11),
position = "identity", alpha = 0.5, bins = 30) +
stat_bin(data = data[data$presence == 1, ], geom = "text",
breaks = seq(-82.5, by = 165, length = 11),
label = round(unlist(props)[1:10], 2), vjust = -0.5) +
coord_cartesian(xlim = c(NA, 1750))
Approximation of data
data <- data.frame(
nonfordist = rep(165 * c(0:10, 0:10),
c(24800, 20200, 16000, 6000, 2800, 1300, 700, 450, 100,
50, 30, 9950, 7400, 4500, 600, 300, 150, 80, 50, 30, 20,
10)),
presence = factor(rep(c(0, 1), c(72430, 23090))))

Annotate ggplot based on a second data frame

I have a faceted plot made with ggplot that is already working, it shows data about river altitude against years. I'm trying to add arrows based on a second dataframe which details when floods occurred.
Here's the current plot:
I would like to draw arrows in the top part of each graph based on date information in my second dataframe where each row corresponds to a flood and contains a date.
The link between the two dataframes is the Station_code column, each river has one or more stations which is indicated by this data (in this case only the Var river has two stations).
Here is the dput of the data frame used to create the original plot:
structure(list(River = c("Durance", "Durance", "Durance", "Durance",
"Roya", "Var"), Reach = c("La Brillanne", "Les Mées", "La Brillanne",
"Les Mées", "Basse vallée", "Basse vallée"), Area_km = c(465,
465, 465, 465, 465, 465), Type = c("restored", "target", "restored",
"target", "witness", "restored"), Year = c(2017, 2017, 2012,
2012, 2018, 2011), Restoration_year = c(2013, 2013, 2013, 2013,
NA, 2009), Station_code = c("X1130010", "X1130010", "X1130010",
"X1130010", "Y6624010", "Y6442015"), BRI_adi_moy_sstransect = c(0.00375820736746399,
0.00244752138003355, 0.00446807607783864, 0.0028792618981479,
0.00989200896930529, 0.00357247516596474), SD_sstransect = c(0.00165574247612667,
0.0010044634990875, 0.00220534492332107, 0.00102694633805149,
0.00788573233793128, 0.00308489160008849), min_BRI_sstransect = c(0.00108123849595469,
0.00111493913953216, 0.000555500340370182, 0.00100279590198288,
0, 0), max_BRI_sstransect = c(0.0127781240385231, 0.00700537285706352,
0.0210216858227621, 0.00815151653110584, 0.127734814926934, 0.0223738711013954
), Nb_sstr_unique_m = c(0.00623321576795815, 0.00259754717331206,
0.00117035034437559, 0.00209845092352825, 0.0458628969163946,
3.60620609570031), BRI_adi_moy_transect = c(0.00280232169999531,
0.00173868254527501, 0.00333818552810438, 0.00181398859573415,
0.00903651639185542, 0.00447856455432537), SD_transect = c(0.00128472161839638,
0.000477209421076879, 0.00204050725984513, 0.000472466654940182,
0.00780731734792112, 0.00310039904793707), min_BRI_transect = c(0.00108123849595469,
0.00106445386542223, 0.000901992689363725, 0.000855135344651009,
0.000944414463851629, 0.000162012161197014), max_BRI_transect = c(0.00709151795418251,
0.00434366293208643, 0.011717024999411, 0.0031991369873946, 0.127734814926934,
0.0187952134332499), Nb_tr_unique_m = c(0, 0, 0, 0, 0, 0), Error_reso = c(0.0011,
8e-04, 0.0018, 0.0011, 0.0028, 0.0031), W_BA = c(296.553323029366,
411.056574923547, 263.944186046512, 363.32874617737, 88.6420798065296,
158.66866970576), W_BA_sd = c(84.1498544481585, 65.3909073242282,
100.067554749308, 55.5534084807705, 35.2337070278364, 64.6978349498119
), W_BA_min = c(131, 206, 33, 223, 6, 45), W_BA_max = c(472,
564, 657, 513, 188, 381), W_norm = c(5.73271228619998, 7.9461900926133,
5.10234066090722, 7.02355699765464, 5.09378494746752, 4.81262001531126
), W_norm_sd = c(1.62671218635823, 1.2640804493236, 1.93441939783807,
1.07391043231191, 2.02469218788178, 1.96236658443141), W_norm_min = c(2.53237866910643,
3.98221378500706, 0.637927450996277, 4.31084307794454, 0.344787822572658,
1.36490651299098), W_norm_max = c(9.12429566273463, 10.9027600715727,
12.7005556152895, 9.91687219276031, 10.8033517739433, 11.5562084766569
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))
And here is the dput of the date frame containing the flooding date:
structure(list(Station_code = c("Y6042010", "Y6042010", "Y6042010",
"Y6042010", "Y6042010", "Y6042010"), Date = structure(c(12006,
12007, 12016, 12017, 13416, 13488), class = "Date"), Qm3s = c(156,
177, 104, 124, 125, 90.4), Qual = c(5, 5, 5, 5, 5, 5), Year = c(2002,
2002, 2002, 2002, 2006, 2006), Month = c(11, 11, 11, 11, 9, 12
), Station_river = c("Var#Entrevaux", "Var#Entrevaux", "Var#Entrevaux",
"Var#Entrevaux", "Var#Entrevaux", "Var#Entrevaux"), River = c("Var",
"Var", "Var", "Var", "Var", "Var"), Mod_inter = c(13.32, 13.32,
13.32, 13.32, 13.32, 13.32), Qm3s_norm = c(11.7117117117117,
13.2882882882883, 7.80780780780781, 9.30930930930931, 9.38438438438438,
6.78678678678679), File_name = c("Var#Entrevaux.dat", "Var#Entrevaux.dat",
"Var#Entrevaux.dat", "Var#Entrevaux.dat", "Var#Entrevaux.dat",
"Var#Entrevaux.dat"), Station_name = c("#Entrevaux", "#Entrevaux",
"#Entrevaux", "#Entrevaux", "#Entrevaux", "#Entrevaux"), Reach = c("Daluis",
"Daluis", "Daluis", "Daluis", "Daluis", "Daluis"), Restauration_year = c(2009,
2009, 2009, 2009, 2009, 2009), `Area_km[BH]` = c(676, 676, 676,
676, 676, 676), Starting_year = c(1920, 1920, 1920, 1920, 1920,
1920), Ending_year = c("NA", "NA", "NA", "NA", "NA", "NA"), Accuracy = c("good",
"good", "good", "good", "good", "good"), Q2 = c(86, 86, 86, 86,
86, 86), Q5 = c(120, 120, 120, 120, 120, 120), Q10 = c(150, 150,
150, 150, 150, 150), Q20 = c(170, 170, 170, 170, 170, 170), Q50 = c(200,
200, 200, 200, 200, 200), Data_producer = c("DREAL_PACA", "DREAL_PACA",
"DREAL_PACA", "DREAL_PACA", "DREAL_PACA", "DREAL_PACA"), Coord_X_L2e_Z32 = c(959313,
959313, 959313, 959313, 959313, 959313), Coord_Y_L2e_Z32 = c(1893321,
1893321, 1893321, 1893321, 1893321, 1893321), Coord_X_L93 = c(1005748.88,
1005748.88, 1005748.88, 1005748.88, 1005748.88, 1005748.88),
Coord_Y_L93 = c(6324083.97, 6324083.97, 6324083.97, 6324083.97,
6324083.97, 6324083.97), New_FN = c("Var#Entrevaux.csv",
"Var#Entrevaux.csv", "Var#Entrevaux.csv", "Var#Entrevaux.csv",
"Var#Entrevaux.csv", "Var#Entrevaux.csv"), NA_perc = c(14.92,
14.92, 14.92, 14.92, 14.92, 14.92), Q2_norm = c(6.45645645645646,
6.45645645645646, 6.45645645645646, 6.45645645645646, 6.45645645645646,
6.45645645645646), Q5_norm = c(9.00900900900901, 9.00900900900901,
9.00900900900901, 9.00900900900901, 9.00900900900901, 9.00900900900901
), Q10_norm = c(11.2612612612613, 11.2612612612613, 11.2612612612613,
11.2612612612613, 11.2612612612613, 11.2612612612613), Q20_norm = c(12.7627627627628,
12.7627627627628, 12.7627627627628, 12.7627627627628, 12.7627627627628,
12.7627627627628), Q50_norm = c(15.015015015015, 15.015015015015,
15.015015015015, 15.015015015015, 15.015015015015, 15.015015015015
)), row.names = c(NA, -6L), groups = structure(list(Station_code = "Y6042010",
.rows = structure(list(1:6), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = 1L, class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
EDIT
Here is an example of what I would like to do on the plot:
This is the code I use currently to do the plot:
ggplot(data = tst_formule[tst_formule$River != "Roya",], aes(x = Year, y = BRI_adi_moy_transect, shape = River, col = Type)) +
geom_point(size = 3) +
geom_errorbar(aes(ymin = BRI_adi_moy_transect - SD_transect, ymax = BRI_adi_moy_transect + SD_transect), size = 0.7, width = 0.3) +
geom_errorbar(aes(ymin = BRI_adi_moy_transect - Error_reso, ymax = BRI_adi_moy_transect + Error_reso, linetype = "Error due to resolution"), size = 0.3, width = 0.3, colour = "black") +
scale_linetype_manual(name = NULL, values = 2) +
scale_shape_manual(values = c(15, 18, 17, 16)) +
scale_colour_manual(values = c("chocolate1", "darkcyan")) +
new_scale("linetype") +
geom_vline(aes(xintercept = Restoration_year, linetype = "Restoration"), colour = "chocolate1") +
scale_linetype_manual(name = NULL, values = 5) +
new_scale("linetype") +
geom_hline(aes(yintercept = 0.004, linetype = "Threshold"), colour= 'black') +
scale_linetype_manual(name = NULL, values = 4) +
scale_y_continuous("BRI*", limits = c(min(tst_formule$BRI_adi_moy_transect - tst_formule$SD_transect, tst_formule$BRI_adi_moy_transect - tst_formule$Error_reso ), max(tst_formule$BRI_adi_moy_transect + tst_formule$SD_transect, tst_formule$BRI_adi_moy_transect + tst_formule$Error_reso))) +
scale_x_continuous(limits = c(min(tst_formule$Year - 1),max(tst_formule$Year + 1)), breaks = scales::breaks_pretty(n = 6)) +
theme_bw() +
facet_wrap(vars(River)) +
theme(legend.spacing.y = unit(-0.01, "cm")) +
guides(shape = guide_legend(order = 1),
colour = guide_legend(order = 2),
line = guide_legend(order = 3))
After tests and more research, I managed to do it by adding the second dataframe in geom_text():
new_scale("linetype") +
geom_segment(data = Flood_plot, aes(x = Date, xend = Date, y = 0.025, yend = 0.020, linetype = "Morphogenic flood"), arrow = arrow(length = unit(0.2, "cm")), inherit.aes = F, guide = guide_legend(order = 6)) +
scale_linetype_manual(name = NULL, values = 1) +
new_scale() creates a new linetype definition after the ones I created before, geom_segment() allows to draw arrows which I wanted but it works with geom_text() and scale_linetype_manual() draws the arrow in the legend without the mention "linetype" above. The second dataframe has the same column (River) as the 1st one to wrap and create the panels.

How to create a factor variable from a data.frame and plot the columns on a side-by-side plot

Some of you this could be an easy question.
I have 2 data frames:
dput(head(Activitieslessthan35))
structure(list(`Main job: Working time in main job` = c(470,
440, 430, 430, 410, 150), Sleep = c(420, 450, 450, 420, 450,
460), `Unspecified TV video or DVD watching` = c(60, 40, 210,
190, 60, 0), Eating = c(80, 60, 40, 70, 60, 130), `Other personal care:Wash and dress` = c(60,
60, 50, 50, 70, 50), `Travel to work from home and back only` = c(60,
60, 50, 90, 90, 30), `Unspecified radio listening` = c(140, 180,
50, 90, 140, 160), `Other specified social life` = c(350, 270,
310, 330, 710, 440), `Socialising with family` = c(350, 270,
360, 330, 730, 540), `Food preparation and baking` = c(410, 310,
420, 380, 1000, 950)), row.names = c(NA, 6L), class = "data.frame")
and
dput(head(ActivitiesMoreOrEqual35))
structure(list(`Main job: Working time in main job` = c(360,
420, 390, 490, 540, 390), Sleep = c(590, 480, 310, 560, 280,
370), `Unspecified TV video or DVD watching` = c(100, 60, 130,
120, 60, 30), Eating = c(70, 100, 70, 40, 190, 80), `Other personal care:Wash and dress` = c(10,
30, 100, 60, 270, 90), `Travel to work from home and back only` = c(0,
50, 260, 50, 0, 0), `Unspecified radio listening` = c(50, 80,
260, 80, 210, 200), `Other specified social life` = c(190, 320,
790, 250, 580, 420), `Travel in the course of work` = c(50, 80,
260, 70, 120, 200), `Food preparation and baking` = c(440, 570,
820, 570, 820, 590)), row.names = c(NA, 6L), class = "data.frame")
I would like to convert the data.frames into factors - for example to have a factor variable called Activitieslessthan35 with colums of the data frame to be used as levels such as `Main job: Working time in main job', 'Sleep', etc. Later I would like also to plot (the sum) the levels of the factors on a side-by-side bar plot.
I don't know if you care transform a data.frame into factor variable as well how to change the format of the data.frames to create the plot
Any suggestion is welcome
If I understand well, you want to have both of your dataframe in a long format of two columns, one column containing all colnames of your dataframe, and the second column with all values, then summarise each "factor" of the first column, merging both dataframes and plotting both dataframes into a single plot. Am I right ?
Here a way to do it. I called df the dataframe Activitieslessthan35 and df2 the
dataframe ActivitiesMoreOrEqual35.
First, we are going to transpose to a long format each of your dataframe using pivot_longer
library(tidyr)
library(dplyr)
df <- df %>% pivot_longer(everything(), names_to = "Activities", values_to = "Values_less_than35")
df2 <- df2 %>% pivot_longer(everything(),names_to = "Activities", values_to = "Values_More_than_35")
Then, we will calculate the sum value for each factor of each of your dataframe:
df_sum = df%>% group_by(Activities) %>% summarise(Values_less_than35 = sum(Values_less_than35))
df2_sum = df2 %>% group_by(Activities) %>% summarise(Values_More_than_35 = sum(Values_More_than_35))
Then, we are merging both dataframe into a singe one by using "Activities" as merging columns
final_df = merge(df_sum,df2_sum, by.x = "Activities", by.y = "Activities", all = TRUE)
Finally, we are transposing one last time final_df in order to have values in the correct shape for plotting them with ggplot2
final_df <- final_df %>% pivot_longer(., -Activities, names_to = "Variable", values_to = "Value")
And now we can plot your final dataframe using ggplot2
library(ggplot2)
ggplot(final_df, aes(x = stringr::str_wrap(Activities, 15), y = Value, fill = Variable)) +
geom_col(stat = "identity", position = position_dodge()) +
coord_flip()+
xlab("")
And you get the following plot:
Does it look what you are expecting ?

Multiple plot layers in ggvis

I want to create a ggvis plot which has lines and SOME points. That is, I want to plot say three lines and a single point on each. I've tried:
data %>%
ggvis(x = ~x, y = ~value, stroke = ~variable) %>%
group_by(variable) %>% layer_lines() %>%
layer_points(x = points[,1], y = points[,2])
where points is a dataframe with the points I want plotted, but this doesn't work.
Any ideas?
Thanks
Here's sample data:
structure(list(x = c(0, 100, 200, 300, 400, 500, 600, 700, 800,
900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900,
2000), total = c(0, 12561.8501420367, 22436.5635024327, 31648.8752584019,
39817.0210051915, 46555.4211041481, 51777.569893653, 56035.9877883034,
59440.4188225383, 62168.6432420093, 64700.5282644047, 67507.9276458711,
69915.0080895388, 71993.1498628426, 73801.8840967967, 75218.7298009381,
76055.2047330129, 76752.2339780277, 77351.4962240531, 77893.1254687784,
78434.7547135037)), .Names = c("x", "total"), row.names = c(NA,
-21L), class = "data.frame")
Here are points:
structure(list(V1 = c(46.3220011507472, 193.838339639222, 259.839658412055,
499.999999202025), y = c(6040.39184354985, 16023.4572876059,
26201.4660177994, 46555.4211041481)), .Names = c("V1", "y"), row.names = c("V1",
"V2", "V3", "total_spend"), class = "data.frame")

Resources