I really like the possibilities this package offers and would like to use it in a shiny app. however i am struggling to recreate a plot from ggplot to echarts4r
library(tidyverse)
library(echarts4r)
data = tibble(time = factor(sort(rep(c(4,8,24), 30)), levels = c(4,8,24)),
dose = factor(rep(c(1,2,3), 30), levels = c(1,2,3)),
id = rep(sort(rep(LETTERS[1:10], 3)),3),
y = rnorm(n = 90, mean = 5, sd = 3))
This is the plot i am aiming to recreate:
ggplot(data = data, mapping = aes(x = time, y = y, group = id)) +
geom_point() +
geom_line() +
facet_wrap(~dose)
The problem i am having is to make groups of my data using group = id in ggplot syntax in echarts4r . I am aiming to do e_facet on grouped data using group_by() however i can not (or dont know how to) add a group to connect the dots using geom_line()
data %>%
group_by(dose) %>%
e_charts(time) %>%
e_line(y) %>%
e_facet(rows = 1, cols = 3)
You can do this with echarts4r.
There are two methods that I know of that work, one uses e_list. I think that method would make this more complicated than it needs to be, though.
It might be useful to know that e_facet, e_arrange, and e_grid all fall under echarts grid functionality—you know, sort of like everything that ggplot2 does falls under base R's grid.
I used group_split from dplyr and imap from purrr to create the faceted graph. You'll notice that I didn't use e_facet due to its constraints.
group_split is interchangeable with base R's split and either could have been used.
I used imap so I could map over the groups and have the benefit of using an index. If you're familiar with the use of enumerate in a Python for statement or a forEach in Javascript, this sort of works the same way. In the map call, j is a data frame; k is an index value. I appended the additional arguments needed for e_arrange, then made the plot.
library(tidyverse) # has both dplyr and purrrrrr (how many r's?)
library(echarts4r)
data %>% group_split(dose) %>%
imap(function(j, k) {
j %>% group_by(id) %>%
e_charts(time, name = paste0("chart_", k)) %>%
e_line(y, name = paste0("Dose ", k)) %>%
e_color(color = "black")
}) %>% append(c(rows = 1, cols = 3)) %>%
do.call(e_arrange, .)
Related
While exploring with ggplot a common workflow is to do some data manipulation and then pipe directly into a ggplot(). When you do that, all of that manipulation flows nicely through the ggplot into the geoms with no variables needed. Like so:
data %>%
filter(route %in% c('01','08','15')) %>%
ggplot() +
geom_sf() +
geom_sf_text(aes(label=route))
If however you want to use a ggmap() for a nice background, then there doesn't appear to be a way to use the piped workflow. You have to save the manipulations to a variable first, which isn't a huge deal, but would love to know if there's a way to avoid it.
This doesn't work:
background = get_stamenmap(....) # somewhat irrelevant to the question
data %>%
filter(route %in% c('01','08','15')) %>%
ggmap(background) +
geom_sf() +
geom_sf_text(aes(label=route))
Thought maybe trying to force the data into ggmap() might help but it doesn't
background = get_stamenmap(....) # somewhat irrelevant to the question
data %>%
filter(route %in% c('01','08','15')) %>%
ggmap(background, data=.) +
geom_sf() +
geom_sf_text(aes(label=route))
Or maybe there's some other way to combine ggplot() and ggmap() to accomplish it? I know I can save off the manipulated data as a variable and then hard-code that into each geom_sf() layer, but it's just not as convenient and thought I might be missing something simple.
The magrittr::%>% infix operator expects to pass the data as the first argument of the first expression in a +-chain as you have here. Unfortunately, you want to pass it to one of the not-first expressions. You can use a {-block.
library(ggmap)
library(ggplot2)
library(dplyr) # for %>%, could do magrittr as well
### from ?get_stamenmap
bbox <- c(left = -97.1268, bottom = 31.536245, right = -97.099334, top = 31.559652)
background <- get_stamenmap(bbox, zoom = 14)
### from my brain
set.seed(42)
dat <- data.frame(x=runif(4, bbox[1], bbox[3]), y=runif(4, bbox[2], bbox[4]), lbl = sample(LETTERS, 4))
dat
# x y lbl
# 1 -97.10167 31.55127 Q
# 2 -97.10106 31.54840 O
# 3 -97.11894 31.55349 X
# 4 -97.10399 31.53940 G
dat %>% {
ggmap(background) +
geom_point(aes(x, y), data = .) +
geom_text(aes(x, y, label = lbl), data = ., color = "red",
hjust = 0, vjust = 0)
}
I have a large dataset with 30 different variables. I want to investigate some characteristics of each variable by making a histogram for each variable.
For example, for my variable A this now looks like:
hist = qplot(A, data = full_data_noNO, geom="histogram",
binwidth = 50, fill=I("lightblue"))+
theme_light()
Now, I want do this for all my variables. Does anyone know how I can loop through the names of all variables of my dataframe (so A should change each iteration).
Also, I want to loop through all variables in this code for the same purpose:
avg_price = full_data_noNO %>%
group_by(Month, Country) %>%
dplyr::summarize(total = mean(A, na.rm = TRUE))
You could reference your variables by column number:
histograms = list()
for(i in 1:ncol(full_data_noNO)){
histograms[[i]] = qplot(full_data_noNO[,i], geom="histogram",
binwidth = 50, fill=I("lightblue"))+
theme_light()
}
If all your variables are numeric, then you can do the following to produce a list of all plots, which you can then explore one by one with list indexing:
library(tidyverse)
list_of_plots <-
full_data_noNO %>%
map(~ qplot(x = ., geom = "histogram"))
I have often wondered if you can get ggplot to do on-the-fly calculations by the facet groups of the plot in a similar way that they would be done using dplyr::group_by. So in the example below is it possible to calculate the cumsum for each different category, rather than the overall cumsum without altering df first?
library(ggplot2)
df <- data.frame(X = rep(1:20,2), Y = runif(40), category = rep(c("A","B"), each = 20))
ggplot(df, aes(x = X, y = cumsum(Y), colour = category))+geom_line()
I can obviously do an easy workaround using dplyr, however as I do this frequently I was keen to know if there is a way to prevent having to specify the grouping variables multiple times (here in group_by and aes(colour = …).
Working alternative, but not what I'm asking for in this case
library(dplyr)
library(ggplot2)
df %>% group_by(category) %>% mutate(Ysum = cumsum(Y)) %>%
ggplot(aes(x = X, y = Ysum, colour = category))+geom_line()
Edit: (To answer in response to the #42- comment) I am mainly asking out of curiosity if this is possible, not because the alternative doesn't work. I also think it would be neater in my code if I am making a number of plots which are summing (or other similar calculations) different variables based on different columns or in different datasets, rather than continuously having to group, mutate then plot. I could write a function to do it for me but I thought it might be inbuilt functionality that I missing (the ggplot help doesn't go into the real details).
I have added stat_apply_group() and stat_apply_panel() to the development version of my package 'ggpmisc'. It will take some time before this update makes it to CRAN as the previous update has just been accepted.
For the time being 'ggpmisc' should be installed from Bitbucket for the new stats to be available.
devtools::install_bitbucket("aphalo/ggpmisc", ref = "no-debug")
Then this solves the question:
library(ggplot2)
library(ggpmisc)
set.seed(123456)
df <- data.frame(X = rep(1:20,2),
Y = runif(40),
category = rep(c("A","B"), each = 20))
ggplot(df, aes(x = X, y = Y, colour = category)) +
stat_apply_group(.fun.y = cumsum)
Applying cumsum() within the ggplot code instead of using a 'dplyr' "pipe" as in the second example saves us from having to specify the grouping twice.
Let's say two different raters are evaluating behavioral problems. They use the same scale (from 0 to 50) and the child being evaluated is the same for both raters. In social sciences, this method is common and there are some useful statistics, such as correlation coefficient and Intra-Class Correlation.
In addition, one graph that comes to my mind is the scatter-plot, and in the x-axys I'll plot the behavioral problems scores considering the first rater and in the y-axis, I'll do the same for the second rater.
gplot2 creates amazing plots, however, some simple routines and action become really difficult to do.
Please see the code below and the r base plot. I would like to know if ggplot can create this plot as well.
Thanks much
set.seed(123)
ds <- data.frame(behavior_problems = rnorm(100,30,2), evaluator=sample(1:2))
plot(ds$behavior_problems[ds$evaluator == '1'] ,
y = ds$behavior_problems[ds$evaluator == '2'])
== I had to edit to make clear why a scatter-plot would be informative==
I think the key problem here is the way in which you have set up the data frame. It is not clear that each individual gets a pair of scores, one from each evaluator. So the first thing to do is add an ID for each individual: 50 IDs in your example data.
library(tidyverse)
ds %>%
mutate(id = rep(1:50, each = 2)
Now we can use tidyr::spread to create a column for each evaluator. But numbers for column names are not a great idea, so we'll rename them to e1 and e2.
ds %>%
mutate(id = rep(1:50, each = 2)) %>%
spread(evaluator, behavior_problems) %>%
rename(e1 = `1`, e2 = `2`)
Now we have column names that can be supplied to ggplot:
ds %>%
mutate(id = rep(1:50, each = 2)) %>%
spread(evaluator, behavior_problems) %>%
rename(e1 = `1`, e2 = `2`) %>%
ggplot(aes(e1, e2)) +
geom_point()
If this seems like a "long way around", it's because ggplot2 works better with "long" data (before the spread) than "wide" (after the spread). To illustrate, here's another way to visualize the difference in scores by individual, which I think works quite well:
ds %>%
mutate(id = rep(1:50, each = 2),
evaluator = factor(evaluator)) %>%
ggplot(aes(id, behavior_problems)) +
geom_point(aes(color = evaluator)) +
geom_line(aes(group = id))
One pattern I do a lot is to facet plots on cuts of numeric values. facet_wrap in ggplot2 doesn't allow you to call a function from within, so you have to create a temporary factor variable. This is okay using mutate from dplyr. The advantage of this is that you can play around doing EDA and varying the number of quantiles, or changing to set cut points etc. and view the changes in one line. The downside is that the facets are only labelled by the factor level; you have to know, for example, that it's a temperature. This isn't too bad for yourself, but even I get confused if I'm doing a facet_grid on two such variables and have to remember which is which. So, it's really nice to be able to relabel the facets by including a meaningful name.
The key points of this problem is that the levels will change as you change the number of quantiles etc.; you don't know what they are in advance. You could use the base levels() function, but that means augmenting the data frame with the cut variable, then calling levels(), then passing this augmented data frame to ggplot().
So, using plyr::mapvalues, we can wrap all this into a dplyr::mutate, but the required arguments for mapvalues() makes it quite clunky. Having to retype "Temp.f" many times is not very "dplyr"!
Is there a neater way of renaming such factor levels "on the fly"? I hope this description is clear enough and the code example below helps.
library(ggplot2)
library(plyr)
library(dplyr)
library(Hmisc)
df <- data.frame(Temp = seq(-100, 100, length.out = 1000), y = rnorm(1000))
# facet_wrap doesn't allow functions so have to create new, temporary factor
# variable Temp.f
ggplot(df %>% mutate(Temp.f = cut2(Temp, g = 4))) + geom_histogram(aes(x = y)) + facet_wrap(~Temp.f)
# fine, but facet headers aren't very clear,
# we want to highlight that they are temperature
ggplot(df %>% mutate(Temp.f = paste0("Temp: ", cut2(Temp, g = 4)))) + geom_histogram(aes(x = y)) + facet_wrap(~Temp.f)
# use of paste0 is undesirable because it creates a character vector and
# facet_wrap then recodes the levels in the wrong numerical order
# This has the desired effect, but is very long!
ggplot(df %>% mutate(Temp.f = cut2(Temp, g = 4), Temp.f = mapvalues(Temp.f, levels(Temp.f), paste0("Temp: ", levels(Temp.f))))) + geom_histogram(aes(x = y)) + facet_wrap(~Temp.f)
I think you can do this from within facet_wrap using a custom labeller function, like so:
myLabeller <- function(x){
lapply(x,function(y){
paste("Temp:", y)
})
}
ggplot(df %>% mutate(Temp.f = cut2(Temp, g = 4))) +
geom_histogram(aes(x = y)) +
facet_wrap(~Temp.f
, labeller = myLabeller)
That labeller is clunky, but at least an example. You could write one for each variable that you are going to use (e.g. tempLabeller, yLabeller, etc).
A slight tweak makes this even better: it automatically uses the name of the thing you are facetting on:
betterLabeller <- function(x){
lapply(names(x),function(y){
paste0(y,": ", x[[y]])
})
}
ggplot(df %>% mutate(Temp.f = cut2(Temp, g = 4))) +
geom_histogram(aes(x = y)) +
facet_wrap(~Temp.f
, labeller = betterLabeller)
Okay, with thanks to Mark Peterson for pointing me towards the labeller argument/function, the exact answer I'm happy with is:
ggplot(df %>% mutate(Temp.f = cut2(Temp, g = 4))) + geom_histogram(aes(x = y)) + facet_wrap(~Temp.f, labeller = labeller(Temp.f = label_both))
I'm a fan of lazy and "label_both" means I can simply create a meaningful temporary (or overwrite the original) variable column and both the name and the value are given. Rolling your own labeller function is more powerful, but using label_both is a good, easy option.