ggvis: x-axis with whole number scale - r

I want to get whole numbers for x-axis for ggvis plot.
MWE
df <-
structure(list(Factor = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L), .Label = c("A", "B", "C"), class = "factor"),
X = c(15.5133333333333, 14.63, 14.41, 14.1266666666667, 13.1833333333333,
12.9466666666667, 13.6133333333333, 13.55, 13.5333333333333,
11.5566666666667, 11.3066666666667, 11.4566666666667), Y = c(20L,
20L, 20L, 40L, 40L, 40L, 70L, 70L, 70L, 100L, 100L, 100L)), .Names = c("Factor",
"X", "Y"), row.names = c(NA, -12L), class = "data.frame")
library(ggvis)
ggvis(data=df
, x= ~X
, y= ~Y
, fill= ~Factor
, stroke = ~Factor) %>%
arrange(Y) %>%
group_by(Factor) %>%
layer_points(shape=~Factor) %>%
layer_paths(fill := NA) %>%
add_axis('x', orient=c('bottom'), format='####')
One possibility is use values=seq(from=10, to=16, by=1) in add_axis(). But this is approach is not automated.

Setting the format argument to 'd' will display only integer values in the axis label:
library(ggvis)
library(dplyr)
##
ggvis(data=df
, x= ~X
, y= ~Y
, fill= ~Factor
, stroke = ~Factor) %>%
arrange(Y) %>%
group_by(Factor) %>%
layer_points(shape=~Factor) %>%
layer_paths(fill := NA) %>%
add_axis('x', orient=c('bottom'), format='d')
More information on d3 formatting specifications is available on this page, as mentioned in the Common Properties section of this ggvis guide.

Related

mistake when using dplyr, trying to plot a variable in proportion of the total

I have a dataset which has the following structure < dput(head(df)) > :
structure(list(type_de_sejour = c("Amb", "Hosp",
"Hosp", "Amb", "Hosp", "Sea"),
specialite = c("ANES", "ANES",
"Autres", "CARD", "CARD", "CARD"
), CA_annee_N = c(2712L, 122180L, 0L, 822615L, 6905494L,
0L), nb_sejours_N = c(8L, 32L, 0L, 1052L, 2776L, 0L), nb_doc_N = c(5L,
8L, 0L, 12L, 15L, 0L), CA_annee_N1 = c(4231L, 78858L, 6587L,
327441L, 6413083L, 0L), nb_sejours_N1 = c(13L, 29L, 2L, 532L,
2819L, 0L), nb_doc_N1 = c(6L, 9L, 1L, 12L, 12L, 0L
), CA_annee_N2 = c(4551L, 27432L, 0L, 208326L, 7465440L,
575L), nb_sejours_N2 = c(15L, 8L, 0L, 463L, 3393L, 1L), nb_doc_N2 = c(6L,
4L, 0L, 11L, 13L, 1L), site = c("FR", "FR", "FR", "FR",
"FR", "FR")), row.names = c(NA, 6L), class = "data.frame")
I am trying to plot a graph showing the percentage each "specialite" (distinguishing per "site", ideally by faceting or doing 2 plots, one per site) represents in the total "nb_sejours_N", after having filtered by type_de_sejour == "Amb".
I have tried the following code :
df %>%
mutate(volume_N == nb_sejours_N,
volume_N1 == nb_sejours_N1,
volume_N2 == nb_sejours_N2)%>%
filter(type_de_sejour == "Amb")%>%
group_by(site) %>%
mutate(proportion_N = volume_N/sum(volume_N, na.rm = TRUE),
proportion_N1 = volume_N1/sum(volume_N1, na.rm = TRUE),
proportion_N2 = volume_N2/sum(volume_N2, na.rm = TRUE))
Unfortunately, it doesn't work, so I can't go any further. I would also like to know if anyone knows an efficient code to plot what I'm trying to represent ?
I believe the following works:
# creating plot
p = df %>% filter(type_de_sejour == "Amb") %>%
pivot_longer(cols = c("nb_sejours_N","nb_sejours_N1","nb_sejours_N2"), values_to = "visit") %>%
ggplot(aes(fill=name, y=visit, x=name)) + geom_bar(position="stack", stat="identity")
# creating summary of totals for each column
totals = df %>% filter(type_de_sejour == "Amb") %>%
pivot_longer(cols = c("nb_sejours_N","nb_sejours_N1","nb_sejours_N2"), values_to = "visit") %>%
group_by(name) %>% summarise(total = sum(visit))
# adding totals on top of bars to plot
p + geom_text(aes(name, total, label = total, fill = NULL), data = totals)

Can I making a grouped barplot for percentages in R using ggplot?

This sounds like a popular plot but I really was trying to figure it out without any solution! Can I produce a plot that shows the percentage of the occurrence in each Blocked lanes inside each Duration? My data is
data<- structure(list(Lanes.Cleared.Duration = c(48, 55, 20, 38, 22,
32, 52, 21, 39, 14, 69, 13, 14, 13, 25), Blocked.Lanes = c(1L,
2L, 1L, 2L, 5L, 3L, 3L, 1L, 3L, 2L, 2L, 2L, 2L, 3L, 1L), Durations = structure(c(3L,
3L, 2L, 3L, 2L, 3L, 3L, 2L, 3L, 2L, 4L, 2L, 2L, 2L, 2L), .Label = c("<10",
"<30", "<60", "<90", "<120", ">120"), class = "factor")), row.names = c(NA,
-15L), na.action = structure(c(`17` = 17L, `26` = 26L, `28` = 28L,
`103` = 103L, `146` = 146L, `166` = 166L, `199` = 199L, `327` = 327L,
`368` = 368L, `381` = 381L, `431` = 431L, `454` = 454L, `462` = 462L,
`532` = 532L, `554` = 554L, `703` = 703L, `729` = 729L, `768` = 768L,
`769` = 769L, `785` = 785L, `970` = 970L, `1043` = 1043L, `1047` = 1047L,
`1048` = 1048L, `1081` = 1081L, `1125` = 1125L), class = "omit"), class = "data.frame")
I tried the following code and it gave me Real Duration rather than percentage. Here is my code.
data %>%
ggplot(aes(fill=factor(Blocked.Lanes), y=Lanes.Cleared.Duration, x=Durations)) +
geom_bar(position="dodge", stat="identity")
My result should show the percentage of occurrence of each Blocked lane inside each Duration.
I tried to group by Durations but it did not work.
Not quite elegant, but you can do a tally by duration and blocked lane first, and then do a percentage with grouped duration.
df1 <- data %>% group_by(Durations, Blocked.Lanes) %>% tally()
df1 <- df1 %>% ungroup %>% group_by(Durations) %>% mutate(perc = n/sum(n))
ggplot(df1, aes(fill=factor(Blocked.Lanes), y=perc, x=Durations)) +
geom_bar(position="dodge", stat="identity")
You can do:
library(tidyverse)
data %>%
count(Durations, Blocked.Lanes) %>%
group_by(Durations) %>%
mutate(n = prop.table(n) * 100) %>%
ggplot(aes(fill = factor(Blocked.Lanes), y = n, x = Durations)) +
geom_bar(position = "dodge", stat = "identity") +
ylab("Percentage of Blocked Lane") +
guides(fill = guide_legend(title = "Blocked Lane"))
Output

Altering thickness of line graph based on counts

Dataframe "id" has the columns year, id, and matriline, where each row is an incident. I wanted to count the number of incidents by matriline per year, so I did:
events.bymatr =
id %>%
group_by(year, matr, .drop = FALSE) %>%
dplyr::summarise(n = n()) %>%
ungroup()
events.bymatr
I plotted a line graph of the number of incidents over time, by matriline.
ggplot(events.bymatr, aes(x=year, y=n, group=matr)) + geom_line(aes(color=matr))
My question is twofold:
Is there a way I could recreate this line graph where the thickness of the lines is determined by how many IDs there were, per matriline? I imagine this would involve reshaping my data above but when I tried to group_by(year,matr,id,.drop=FALSE) my data came out all wonky.
I want to change the color palete so that each color is very distinct - how do I attach a new color palette? I tried using this c25 palette with this code but it makes all my lines disappear.
ggplot(events.bymatr, aes(x=year, y=n, group=matr)) + geom_line(aes(color=c25))
Thanks so much in advance!
Output of "id" (shortened to just the first five rows per column):
> dput(id)
structure(list(date = structure(c(8243, 8243, 8243, 8248, 8947,
class = "Date"), year = c(1992L, 1992L, 1992L, 1992L, 1994L),
event.id = c(8L, 8L, 8L, 10L, 11L), id = structure(c(51L, 55L, 59L,
46L, 51L), .Label = c("J11", "J16", "J17", "J2", "J22"),
class = "factor"), sex = structure(c(1L, 2L, 2L, 1L, 1L),
.Label = c("0", "1"), class = "factor"), age = c(28L, 12L, 6L, 42L,
30L), matr = structure(c(20L, 20L, 20L, 11L, 20L), .Label = c("J2",
"J4", "J7", "J9", "K11"), class = "factor"),
matralive = structure(c(2L, 2L, 2L, 2L, 2L),
.Label = c("0", "1"), class = "factor"), pod = structure(c(3L, 3L,
3L, 3L, 3L), .Label = c("J", "K", "L"), class = "factor")),
row.names = c(NA, -134L), class = c("tbl_df", "tbl", "data.frame"))
Output of events.bymatr:
> dput(events.bymatr)
structure(list(year = c(1992L, 1992L, 1992L, 1992L, 1992L),
matr = structure(c(1L, 2L, 3L, 4L, 5L), .Label = c("J2", "J4",
"J7", "J9", "K11"), class = "factor"), n = c(0L, 0L, 0L, 0L, 0L)),
row.names = c(NA, -380L), class = c("tbl_df", "tbl",
"data.frame"))
As #r2evans noted, it is surprisingly hard to distinguish clearly among more than a handful of colors. I used an example 20-color scale here that does a pretty good job, but even so a few can be tricky to distinguish. Here's an attempt using the storms dataset included with dplyr.
library(dplyr)
storms %>%
group_by(name, year) %>%
summarize(n = n(), .groups = "drop") %>% # = number of name per year View
tidyr::complete(name, year = 1975:2015, fill = list(n = 0)) %>%
group_by(name) %>%
mutate(total = sum(n)) %>% # = number of name overall
ungroup() %>%
filter(total %% 12 == 0) %>% # Arbitrary, to reduce scope of data for example
ggplot(aes(year, n, color = name, size = total, group = name)) +
geom_line() +
guides(color = guide_legend(override.aes = list(size = 3))) +
ggthemes::scale_color_tableau(palette = "Tableau 20")

How could I plot a continuous line in bar- line Plotly object in R?

Here is my code that I am using:
library(dplyr)
library(plotly)
DF <- data.frame(
Month = lubridate::as_date(c("2019-Dec-01", "2019-Dec-01","2020-Jan-01","2020-Jan-01","2020-Jan-01", "2020-Feb-01", "2020-Feb-01")),
#Week = c(4L, 5L, 5L, 1L, 1L, 1L, 2L, 1L),
Cat = c("A", "C", "A", "B", "C", "A", "C"),
n = c(38L, 10L, 19L, 20L, 12L, 14L, 20L)
)
DF1 <- data.frame(
Month = lubridate::as_date(c("2019-Dec-01", "2020-Jan-01", "2020-Feb-01")),
n = c(20L, 41L, 9L)
)
plot_ly() %>%
add_bars(data = DF, x = ~Month, y = ~n, type = 'bar', split = ~Cat) %>%
add_lines(data = DF1, x = ~Month, y = ~n, color = I("black")) %>%
layout(barmode = "stack", xaxis = list(type = 'date', tickformat = '%Y-%b', tickangle = 90, nticks = 4))
Output is:
In the above visualization, the line starts from mid December and ends in mid February.
Is it possible to start the line from extreme left and ends in extreme right so that it looks more continuous type?
I tried to play with the dates in DF and DF1 and it seems possible to achive what you are looking for by just moving DF1 to mid of the month 15th and moving DF to beginning of december, mid of january and end of february. Just used +-2 days at the beginning and the end for the lines to look nicer:
DF <- data.frame(
Month = lubridate::as_date(c("2019-Dec-15", "2019-Dec-15","2020-Jan-15","2020-Jan-15","2020-Jan-15", "2020-Feb-15", "2020-Feb-15")),
#Week = c(4L, 5L, 5L, 1L, 1L, 1L, 2L, 1L),
Cat = c("A", "C", "A", "B", "C", "A", "C"),
n = c(38L, 10L, 19L, 20L, 12L, 14L, 20L)
)
DF1 <- data.frame(
Month = lubridate::as_date(c("2019-Dec-03", "2020-Jan-15", "2020-Feb-27")),
n = c(20L, 41L, 9L)
)
If you want the labels to appear correctly (just took me a while to figure it out)
plot_ly() %>%
add_bars(data = DF, x = ~Month, y = ~n, type = 'bar', split = ~Cat) %>%
add_lines(data = DF1, x = ~Month, y = ~n, color = I("black")) %>%
layout(barmode = "stack", xaxis = list(
type = 'date',
ticktext = list("2019-Dec", "2020-Jan", "2020-Feb"),
tickvals = list(as.Date("2019-12-15"),as.Date("2020-01-15"),as.Date("2020-02-15"))
))

Format column within dplyr chain

I have this data set:
dat <-
structure(list(date = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L,
3L, 3L, 4L, 4L), .Label = c("3/31/2014", "4/1/2014", "4/2/2014",
"4/3/2014"), class = "factor"), site = structure(c(1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L), .Label = c("a", "b"), class = "factor"),
clicks = c(73L, 64L, 80L, 58L, 58L, 61L, 70L, 60L, 84L, 65L,
77L), impressions = c(55817L, 78027L, 77017L, 68797L, 92437L,
94259L, 88418L, 55420L, 69866L, 86767L, 92088L)), .Names = c("date",
"site", "clicks", "impressions"), class = "data.frame", row.names = c(NA,
-11L))
dat
date site clicks impressions
1 3/31/2014 a 73 55817
2 3/31/2014 b 64 78027
3 3/31/2014 a 80 77017
4 4/1/2014 b 58 68797
...
Is it possible include the date formatting of one column within the chain? (I've also tried using with, but that only returns the date column.)
library(dplyr)
> dat %.%
+ select(date, clicks, impressions) %.%
+ group_by(date) %.%
+ summarise(clicks = sum(clicks),
+ impressions = sum(impressions)) %.%
+ as.Date(Date, format = '%m/%d/%Y')
Error in as.Date.default(`__prev`, Date, format = "%m/%d/%Y") :
do not know how to convert '__prev' to class “Date”
If I don't include the formatting within the chain, it works. I know it's simple to write this outside of the chain, but I would like to confirm if this is doable.
dat %.%
select(date, clicks, impressions) %.%
group_by(date) %.%
summarise(clicks = sum(clicks),
impressions = sum(impressions))
dat$date <- as.Date(dat$Date, format = '%m/%d/%Y')
Is this what you want?
dat %>%
select(date, clicks, impressions) %>%
group_by(date) %>%
summarise(clicks = sum(clicks),
impressions = sum(impressions)) %>%
mutate(date = as.Date(date, format = '%m/%d/%Y'))
Sometimes the Error: cannot modify grouping variable message comes when you're trying to run group_by() operations on something that has already been grouped. You might try including ungroup first. In the syntax of Robert's answer:
dat %>%
ungroup %>%
select(date, clicks, impressions) %>%
group_by(date) %>%
summarize(clicks = sum(clicks),
impressions = sum(impressions)) %>%
mutate(date = as.Date(date, format = "%m/%d/%Y"))

Resources