I have a ggplot histogram, showing two histograms of a continuous variable, one for each level of a group.
Through use of ggplot_build, I now also have vectors where each element is the proportional count of one group (1) versus the other (0), per bin.
So for the following histogram built with
ggplot(data,aes(x=nonfordist)) + geom_histogram(aes(fill=presence),
position="identity",alpha=0.5,bins=30)+ coord_cartesian(xlim=c(NA,1750))
I have the following list, showing sequential proportions of group1/group0 per bin
list(0.398927744608261, 0.35358629130967, 0.275296034083078,
0.247361252979231, 0.260224274406332, 0.22107969151671, 0.252847380410023,
0.230055658627087, 0.212244897959184, 0.242105263157895,
0.235294117647059, 0.115384615384615, 0.2, 0.421052631578947,
0.4375, 0.230769230769231, 0.222222222222222, 0.5, 0, 0,
0, NaN, 1, 1, 0, 0, NaN, NaN, NaN, Inf)
What I want now is to plot the elements of this list on the corresponding bins, preferably above the bars showing the counts for group1.
I do not want to include the proportions for bins that fall outside of the histogram due to my xlim command.
You could use stat_bin with a text geom, using the same breaks as you do for your histogram. We don't have your actual data, so I've tried to approximate it here (see footnote for reproducible data). You haven't told us what your list of proportions is called, so I have named it props in this example.
ggplot(data,aes(x=nonfordist)) +
geom_histogram(aes(fill = presence),
breaks = seq(-82.5, by = 165, length = 11),
position = "identity", alpha = 0.5, bins = 30) +
stat_bin(data = data[data$presence == 1, ], geom = "text",
breaks = seq(-82.5, by = 165, length = 11),
label = round(unlist(props)[1:10], 2), vjust = -0.5) +
coord_cartesian(xlim = c(NA, 1750))
Approximation of data
data <- data.frame(
nonfordist = rep(165 * c(0:10, 0:10),
c(24800, 20200, 16000, 6000, 2800, 1300, 700, 450, 100,
50, 30, 9950, 7400, 4500, 600, 300, 150, 80, 50, 30, 20,
10)),
presence = factor(rep(c(0, 1), c(72430, 23090))))
I have a faceted plot made with ggplot that is already working, it shows data about river altitude against years. I'm trying to add arrows based on a second dataframe which details when floods occurred.
Here's the current plot:
I would like to draw arrows in the top part of each graph based on date information in my second dataframe where each row corresponds to a flood and contains a date.
The link between the two dataframes is the Station_code column, each river has one or more stations which is indicated by this data (in this case only the Var river has two stations).
Here is the dput of the data frame used to create the original plot:
structure(list(River = c("Durance", "Durance", "Durance", "Durance",
"Roya", "Var"), Reach = c("La Brillanne", "Les Mées", "La Brillanne",
"Les Mées", "Basse vallée", "Basse vallée"), Area_km = c(465,
465, 465, 465, 465, 465), Type = c("restored", "target", "restored",
"target", "witness", "restored"), Year = c(2017, 2017, 2012,
2012, 2018, 2011), Restoration_year = c(2013, 2013, 2013, 2013,
NA, 2009), Station_code = c("X1130010", "X1130010", "X1130010",
"X1130010", "Y6624010", "Y6442015"), BRI_adi_moy_sstransect = c(0.00375820736746399,
0.00244752138003355, 0.00446807607783864, 0.0028792618981479,
0.00989200896930529, 0.00357247516596474), SD_sstransect = c(0.00165574247612667,
0.0010044634990875, 0.00220534492332107, 0.00102694633805149,
0.00788573233793128, 0.00308489160008849), min_BRI_sstransect = c(0.00108123849595469,
0.00111493913953216, 0.000555500340370182, 0.00100279590198288,
0, 0), max_BRI_sstransect = c(0.0127781240385231, 0.00700537285706352,
0.0210216858227621, 0.00815151653110584, 0.127734814926934, 0.0223738711013954
), Nb_sstr_unique_m = c(0.00623321576795815, 0.00259754717331206,
0.00117035034437559, 0.00209845092352825, 0.0458628969163946,
3.60620609570031), BRI_adi_moy_transect = c(0.00280232169999531,
0.00173868254527501, 0.00333818552810438, 0.00181398859573415,
0.00903651639185542, 0.00447856455432537), SD_transect = c(0.00128472161839638,
0.000477209421076879, 0.00204050725984513, 0.000472466654940182,
0.00780731734792112, 0.00310039904793707), min_BRI_transect = c(0.00108123849595469,
0.00106445386542223, 0.000901992689363725, 0.000855135344651009,
0.000944414463851629, 0.000162012161197014), max_BRI_transect = c(0.00709151795418251,
0.00434366293208643, 0.011717024999411, 0.0031991369873946, 0.127734814926934,
0.0187952134332499), Nb_tr_unique_m = c(0, 0, 0, 0, 0, 0), Error_reso = c(0.0011,
8e-04, 0.0018, 0.0011, 0.0028, 0.0031), W_BA = c(296.553323029366,
411.056574923547, 263.944186046512, 363.32874617737, 88.6420798065296,
158.66866970576), W_BA_sd = c(84.1498544481585, 65.3909073242282,
100.067554749308, 55.5534084807705, 35.2337070278364, 64.6978349498119
), W_BA_min = c(131, 206, 33, 223, 6, 45), W_BA_max = c(472,
564, 657, 513, 188, 381), W_norm = c(5.73271228619998, 7.9461900926133,
5.10234066090722, 7.02355699765464, 5.09378494746752, 4.81262001531126
), W_norm_sd = c(1.62671218635823, 1.2640804493236, 1.93441939783807,
1.07391043231191, 2.02469218788178, 1.96236658443141), W_norm_min = c(2.53237866910643,
3.98221378500706, 0.637927450996277, 4.31084307794454, 0.344787822572658,
1.36490651299098), W_norm_max = c(9.12429566273463, 10.9027600715727,
12.7005556152895, 9.91687219276031, 10.8033517739433, 11.5562084766569
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))
And here is the dput of the date frame containing the flooding date:
structure(list(Station_code = c("Y6042010", "Y6042010", "Y6042010",
"Y6042010", "Y6042010", "Y6042010"), Date = structure(c(12006,
12007, 12016, 12017, 13416, 13488), class = "Date"), Qm3s = c(156,
177, 104, 124, 125, 90.4), Qual = c(5, 5, 5, 5, 5, 5), Year = c(2002,
2002, 2002, 2002, 2006, 2006), Month = c(11, 11, 11, 11, 9, 12
), Station_river = c("Var#Entrevaux", "Var#Entrevaux", "Var#Entrevaux",
"Var#Entrevaux", "Var#Entrevaux", "Var#Entrevaux"), River = c("Var",
"Var", "Var", "Var", "Var", "Var"), Mod_inter = c(13.32, 13.32,
13.32, 13.32, 13.32, 13.32), Qm3s_norm = c(11.7117117117117,
13.2882882882883, 7.80780780780781, 9.30930930930931, 9.38438438438438,
6.78678678678679), File_name = c("Var#Entrevaux.dat", "Var#Entrevaux.dat",
"Var#Entrevaux.dat", "Var#Entrevaux.dat", "Var#Entrevaux.dat",
"Var#Entrevaux.dat"), Station_name = c("#Entrevaux", "#Entrevaux",
"#Entrevaux", "#Entrevaux", "#Entrevaux", "#Entrevaux"), Reach = c("Daluis",
"Daluis", "Daluis", "Daluis", "Daluis", "Daluis"), Restauration_year = c(2009,
2009, 2009, 2009, 2009, 2009), `Area_km[BH]` = c(676, 676, 676,
676, 676, 676), Starting_year = c(1920, 1920, 1920, 1920, 1920,
1920), Ending_year = c("NA", "NA", "NA", "NA", "NA", "NA"), Accuracy = c("good",
"good", "good", "good", "good", "good"), Q2 = c(86, 86, 86, 86,
86, 86), Q5 = c(120, 120, 120, 120, 120, 120), Q10 = c(150, 150,
150, 150, 150, 150), Q20 = c(170, 170, 170, 170, 170, 170), Q50 = c(200,
200, 200, 200, 200, 200), Data_producer = c("DREAL_PACA", "DREAL_PACA",
"DREAL_PACA", "DREAL_PACA", "DREAL_PACA", "DREAL_PACA"), Coord_X_L2e_Z32 = c(959313,
959313, 959313, 959313, 959313, 959313), Coord_Y_L2e_Z32 = c(1893321,
1893321, 1893321, 1893321, 1893321, 1893321), Coord_X_L93 = c(1005748.88,
1005748.88, 1005748.88, 1005748.88, 1005748.88, 1005748.88),
Coord_Y_L93 = c(6324083.97, 6324083.97, 6324083.97, 6324083.97,
6324083.97, 6324083.97), New_FN = c("Var#Entrevaux.csv",
"Var#Entrevaux.csv", "Var#Entrevaux.csv", "Var#Entrevaux.csv",
"Var#Entrevaux.csv", "Var#Entrevaux.csv"), NA_perc = c(14.92,
14.92, 14.92, 14.92, 14.92, 14.92), Q2_norm = c(6.45645645645646,
6.45645645645646, 6.45645645645646, 6.45645645645646, 6.45645645645646,
6.45645645645646), Q5_norm = c(9.00900900900901, 9.00900900900901,
9.00900900900901, 9.00900900900901, 9.00900900900901, 9.00900900900901
), Q10_norm = c(11.2612612612613, 11.2612612612613, 11.2612612612613,
11.2612612612613, 11.2612612612613, 11.2612612612613), Q20_norm = c(12.7627627627628,
12.7627627627628, 12.7627627627628, 12.7627627627628, 12.7627627627628,
12.7627627627628), Q50_norm = c(15.015015015015, 15.015015015015,
15.015015015015, 15.015015015015, 15.015015015015, 15.015015015015
)), row.names = c(NA, -6L), groups = structure(list(Station_code = "Y6042010",
.rows = structure(list(1:6), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = 1L, class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
EDIT
Here is an example of what I would like to do on the plot:
This is the code I use currently to do the plot:
ggplot(data = tst_formule[tst_formule$River != "Roya",], aes(x = Year, y = BRI_adi_moy_transect, shape = River, col = Type)) +
geom_point(size = 3) +
geom_errorbar(aes(ymin = BRI_adi_moy_transect - SD_transect, ymax = BRI_adi_moy_transect + SD_transect), size = 0.7, width = 0.3) +
geom_errorbar(aes(ymin = BRI_adi_moy_transect - Error_reso, ymax = BRI_adi_moy_transect + Error_reso, linetype = "Error due to resolution"), size = 0.3, width = 0.3, colour = "black") +
scale_linetype_manual(name = NULL, values = 2) +
scale_shape_manual(values = c(15, 18, 17, 16)) +
scale_colour_manual(values = c("chocolate1", "darkcyan")) +
new_scale("linetype") +
geom_vline(aes(xintercept = Restoration_year, linetype = "Restoration"), colour = "chocolate1") +
scale_linetype_manual(name = NULL, values = 5) +
new_scale("linetype") +
geom_hline(aes(yintercept = 0.004, linetype = "Threshold"), colour= 'black') +
scale_linetype_manual(name = NULL, values = 4) +
scale_y_continuous("BRI*", limits = c(min(tst_formule$BRI_adi_moy_transect - tst_formule$SD_transect, tst_formule$BRI_adi_moy_transect - tst_formule$Error_reso ), max(tst_formule$BRI_adi_moy_transect + tst_formule$SD_transect, tst_formule$BRI_adi_moy_transect + tst_formule$Error_reso))) +
scale_x_continuous(limits = c(min(tst_formule$Year - 1),max(tst_formule$Year + 1)), breaks = scales::breaks_pretty(n = 6)) +
theme_bw() +
facet_wrap(vars(River)) +
theme(legend.spacing.y = unit(-0.01, "cm")) +
guides(shape = guide_legend(order = 1),
colour = guide_legend(order = 2),
line = guide_legend(order = 3))
After tests and more research, I managed to do it by adding the second dataframe in geom_text():
new_scale("linetype") +
geom_segment(data = Flood_plot, aes(x = Date, xend = Date, y = 0.025, yend = 0.020, linetype = "Morphogenic flood"), arrow = arrow(length = unit(0.2, "cm")), inherit.aes = F, guide = guide_legend(order = 6)) +
scale_linetype_manual(name = NULL, values = 1) +
new_scale() creates a new linetype definition after the ones I created before, geom_segment() allows to draw arrows which I wanted but it works with geom_text() and scale_linetype_manual() draws the arrow in the legend without the mention "linetype" above. The second dataframe has the same column (River) as the 1st one to wrap and create the panels.
Some of you this could be an easy question.
I have 2 data frames:
dput(head(Activitieslessthan35))
structure(list(`Main job: Working time in main job` = c(470,
440, 430, 430, 410, 150), Sleep = c(420, 450, 450, 420, 450,
460), `Unspecified TV video or DVD watching` = c(60, 40, 210,
190, 60, 0), Eating = c(80, 60, 40, 70, 60, 130), `Other personal care:Wash and dress` = c(60,
60, 50, 50, 70, 50), `Travel to work from home and back only` = c(60,
60, 50, 90, 90, 30), `Unspecified radio listening` = c(140, 180,
50, 90, 140, 160), `Other specified social life` = c(350, 270,
310, 330, 710, 440), `Socialising with family` = c(350, 270,
360, 330, 730, 540), `Food preparation and baking` = c(410, 310,
420, 380, 1000, 950)), row.names = c(NA, 6L), class = "data.frame")
and
dput(head(ActivitiesMoreOrEqual35))
structure(list(`Main job: Working time in main job` = c(360,
420, 390, 490, 540, 390), Sleep = c(590, 480, 310, 560, 280,
370), `Unspecified TV video or DVD watching` = c(100, 60, 130,
120, 60, 30), Eating = c(70, 100, 70, 40, 190, 80), `Other personal care:Wash and dress` = c(10,
30, 100, 60, 270, 90), `Travel to work from home and back only` = c(0,
50, 260, 50, 0, 0), `Unspecified radio listening` = c(50, 80,
260, 80, 210, 200), `Other specified social life` = c(190, 320,
790, 250, 580, 420), `Travel in the course of work` = c(50, 80,
260, 70, 120, 200), `Food preparation and baking` = c(440, 570,
820, 570, 820, 590)), row.names = c(NA, 6L), class = "data.frame")
I would like to convert the data.frames into factors - for example to have a factor variable called Activitieslessthan35 with colums of the data frame to be used as levels such as `Main job: Working time in main job', 'Sleep', etc. Later I would like also to plot (the sum) the levels of the factors on a side-by-side bar plot.
I don't know if you care transform a data.frame into factor variable as well how to change the format of the data.frames to create the plot
Any suggestion is welcome
If I understand well, you want to have both of your dataframe in a long format of two columns, one column containing all colnames of your dataframe, and the second column with all values, then summarise each "factor" of the first column, merging both dataframes and plotting both dataframes into a single plot. Am I right ?
Here a way to do it. I called df the dataframe Activitieslessthan35 and df2 the
dataframe ActivitiesMoreOrEqual35.
First, we are going to transpose to a long format each of your dataframe using pivot_longer
library(tidyr)
library(dplyr)
df <- df %>% pivot_longer(everything(), names_to = "Activities", values_to = "Values_less_than35")
df2 <- df2 %>% pivot_longer(everything(),names_to = "Activities", values_to = "Values_More_than_35")
Then, we will calculate the sum value for each factor of each of your dataframe:
df_sum = df%>% group_by(Activities) %>% summarise(Values_less_than35 = sum(Values_less_than35))
df2_sum = df2 %>% group_by(Activities) %>% summarise(Values_More_than_35 = sum(Values_More_than_35))
Then, we are merging both dataframe into a singe one by using "Activities" as merging columns
final_df = merge(df_sum,df2_sum, by.x = "Activities", by.y = "Activities", all = TRUE)
Finally, we are transposing one last time final_df in order to have values in the correct shape for plotting them with ggplot2
final_df <- final_df %>% pivot_longer(., -Activities, names_to = "Variable", values_to = "Value")
And now we can plot your final dataframe using ggplot2
library(ggplot2)
ggplot(final_df, aes(x = stringr::str_wrap(Activities, 15), y = Value, fill = Variable)) +
geom_col(stat = "identity", position = position_dodge()) +
coord_flip()+
xlab("")
And you get the following plot:
Does it look what you are expecting ?
I want to create a ggvis plot which has lines and SOME points. That is, I want to plot say three lines and a single point on each. I've tried:
data %>%
ggvis(x = ~x, y = ~value, stroke = ~variable) %>%
group_by(variable) %>% layer_lines() %>%
layer_points(x = points[,1], y = points[,2])
where points is a dataframe with the points I want plotted, but this doesn't work.
Any ideas?
Thanks
Here's sample data:
structure(list(x = c(0, 100, 200, 300, 400, 500, 600, 700, 800,
900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900,
2000), total = c(0, 12561.8501420367, 22436.5635024327, 31648.8752584019,
39817.0210051915, 46555.4211041481, 51777.569893653, 56035.9877883034,
59440.4188225383, 62168.6432420093, 64700.5282644047, 67507.9276458711,
69915.0080895388, 71993.1498628426, 73801.8840967967, 75218.7298009381,
76055.2047330129, 76752.2339780277, 77351.4962240531, 77893.1254687784,
78434.7547135037)), .Names = c("x", "total"), row.names = c(NA,
-21L), class = "data.frame")
Here are points:
structure(list(V1 = c(46.3220011507472, 193.838339639222, 259.839658412055,
499.999999202025), y = c(6040.39184354985, 16023.4572876059,
26201.4660177994, 46555.4211041481)), .Names = c("V1", "y"), row.names = c("V1",
"V2", "V3", "total_spend"), class = "data.frame")