plot multiple variables by group [r]

plot multiple variables by group [r] - r

i want to plot multiple plots, where in each plot i have observations of a set variable for different time sets in function of a distance
short example of my df:
year <- c("2018","2018","2018","2018","2019","2019","2019","2019")
polutatnt <- c("NO2","NO2","SO2","SO2","NO2","NO2","SO2","SO2")
radius <- c("500m", "1000m","500m", "1000m","500m", "1000m","500m", "1000m")
value <- c(0.5,0.8,0.1,-0.2,0.3,-0.6,0.2,-0.2)
df <- data.frame(year,polutatnt,radius,value)
i would like to have one plot for each polutant, where i would have one line for each year in function of distance. i tried this line of code but i get a waring and empty plots:
ggplot(df, aes(radius, value, col = year)) +
geom_line() + facet_grid(polutatnt ~.)
geom_path: Each group consists of only one observation. Do you need to
adjust the group aesthetic?

According to the requirements described by you, this is what you want:
[EDIT] All of the blue points and red points linked
ggplot(df, aes(radius, value, color = year, group=polutatnt, shape=year)) +
geom_point(size=3) + geom_line(aes(group = year)) + facet_grid(polutatnt ~.)

Related

data points misaligned when using a third value with position jitterdodge

Edited with sample data:
When I try to plot a grouped boxplot together with jittered points using position=position_jitterdodge(), and add an additional group indicated by e.g. shape, I end up with a graph where the jittered points are misaligned within the individual groups:
n <- 16
data <- data.frame(
age = factor(rep(c('young', 'old'), each=8)),
group=rep(LETTERS[1:2], n/2),
yval=rnorm(n)
)
ggplot(data, aes(x=group, y=yval))+
geom_boxplot(aes(color=group), outlier.shape = NA)+
geom_point(aes(color=group, shape=age, fill=group),size = 1.5, position=position_jitterdodge())+
scale_shape_manual(values = c(21,24))+
scale_color_manual(values=c("black", "#015393"))+
scale_fill_manual(values=c("white", "#015393"))+
theme_classic()
Is there a way to suppress that additional separation?
Thank you!

OP, I think I get what you are trying to explain. It seems the points are grouped according to age, rather than treated as the same for each group. The reason for this is that you have not specified what to group together. In order to jitter the points, they are first grouped together according to some aesthetic, then the jitter is applied. If you don't specify the grouping, then ggplot2 gives it a guess as to how you want to group the points.
In this case, it is grouping according to age and group, since both are defined to be used in the aesthetics (x=, fill=, and color= are assigned to group and shape= is assigned to age).
To define that you only want to group the points by the column group, you can use the group= aesthetic modifier. (reposting your data with a seed so you see the same thing)
set.seed(8675309)
n <- 16
data <- data.frame(
age = factor(rep(c('young', 'old'), each=8)),
group=rep(LETTERS[1:2], n/2),
yval=rnorm(n)
)
ggplot(data, aes(x=group, y=yval))+
geom_boxplot(aes(color=group), outlier.shape = NA)+
geom_point(aes(color=group, shape=age, fill=group, group=group),size = 1.5, position=position_jitterdodge())+
scale_shape_manual(values = c(21,24))+
scale_color_manual(values=c("black", "#015393"))+
scale_fill_manual(values=c("white", "#015393"))+
theme_classic()

Draw line between points with groups in ggplot

I have a time-series, with each point having a time, a value and a group he's part of. I am trying to plot it with time on x axis and value on y axes with the line appearing a different color depending on the group.
I tried using geom_path and geom_line, but they end up linking points to points within groups. I found out that when I use a continuous variable for the groups, I have a normal line; however when I use a factor or a categorical variable, I have the link problem.
Here is a reproducible example that is what I would like:
df = data.frame(time = c(1,2,3,4,5,6,7,8,9,10), value = c(5,4,9,3,8,2,5,8,7,1), group = c(1,2,2,2,1,1,2,2,2,2))
ggplot(df, aes(time, value, color = group)) + geom_line()
And here is a reproducible example that is what I have:
df = data.frame(time = c(1,2,3,4,5,6,7,8,9,10), value = c(5,4,9,3,8,2,5,8,7,1), group = c("apple","pear","pear","pear","apple","apple","pear","pear","pear","pear"))
ggplot(df, aes(time, value, color = group)) + geom_line()
So the first example works well, but 1/ it adds a few lines to change the legend to have the labels I want, 2/ out of curiosity I would like to know if I missed something.
Is there any option in ggplot I could use to have the behavior I expect, or is it an internal constraint?

As pointed by Richard Telford and Carles Sans Fuentes, adding group = 1 within the ggplot aesthetic makes the job. So the normal code should be:
ggplot(df, aes(time, value, color = group, group = 1)) + geom_line()

ggplot2: add RMSE values of two models to each facet

Using this data.frame
DATA
#import_data
df <- read.csv(url("https://www.dropbox.com/s/1fdi26qy4ozs4xq/df_RMSE.csv?raw=1"))
and this script
library(ggplot2)
ggplot(df, aes( measured, simulated, col = indep_cumulative))+
geom_point()+
geom_smooth(method ="lm", se = F)+
facet_grid(drain~scenario)
I got this plot
I want to add RMSE for each of the two models (independent and accumulative; two values only) to the top left in each facet.
I tried
geom_text(data = df , aes(measured, simulated, label= RMSE))
It resulted in RMSE values being added to each point in the facets.
I will appreciate any help with adding the two RMSE values only to the top left of each facet.

In case you want to plot two numbers per facet you need to do some data preparation to avoid text overlapping.
library(dplyr)
df <- df %>%
mutate(label_vjust=if_else(indep_cumulative == "accumulative",
1, 2.2))
In your question you explicitly told ggplot2 to add label=RMSE at points with x=measured and y=simulated. To add labels at top left corner you could use x=-Inf and y=Inf. So the code will look like this:
ggplot(df, aes(measured, simulated, colour = indep_cumulative)) +
geom_point() +
geom_smooth(method ="lm", se = F) +
geom_text(aes(x=-Inf, y=Inf, label=RMSE, vjust=label_vjust),
hjust=0) +
facet_grid(drain~scenario)

setting a unique span parameter for each geom_smooth() on a multi-facet ggplot

I was wondering if it's possible to set a different span parameter for each facet of my ggplot object. I have four sets of related industry data that I would like to compare on a single ggplot object. I would like to modify the span for each geom_smooth() line to more accurately model my data.
library(ggplot2)
library(reshape2)
a=rnorm(50,0,1)
b=rnorm(50,0,3)
ind=1:100
df=data.frame(ind,sort(a),sort(b))
df1=melt(df, id='ind')
t=ggplot(df1, aes(x=ind,y=value, color=variable))+
geom_smooth(color='black', span=.5)+
geom_point(color='black')+
facet_wrap(~variable,ncol=2)
For instance, is it possible to have a span of .5 for the first facet and a span of .8 for the second facet?

You can filter your data and only provide the filtered subset to each geom_smooth
ggplot(df1, mapping = aes(x=ind, y=value, color=variable)) +
geom_point(color='black') +
geom_smooth(data = df1 %>% filter(variable=='sort.a'), span=0.5, method='loess') +
geom_smooth(data = df1 %>% filter(variable=='sort.b'), span=0.3, method='loess') +
facet_wrap(~variable,ncol=2)

How to specify ggplot2 boxplot fill colour for continuous data?

I want to plot a ggplot2 boxplot using all columns of a data.frame, and I want to reorder the columns by the median for each column, rotate the x-axis labels, and fill each box with the colour corresponding to the same median. I can't figure out how to do the last part. There are plenty of examples where the fill colour corresponds to a factor variable, but I haven't seen a clear example of using a continuous variable to control fill colour. (The reason I'm trying to do this is that the resultant plot will provide context for a force-directed network graph with nodes that will be colour-coded in the same way as the boxplot -- the colour will then provide a mapping between the two plots.) It would be nice if I could re-use the value-to-colour mapping for later plots so that colours are consistent between plots. So, for example, the box corresponding to the column variable with a high median value will have a colour that denotes this mapping and matches perfectly the colour for the same column variable in other plots (such as the corresponding node in a force-directed network graph).
So far, I have something like this:
# Melt the data.frame:
DT.m <- melt(results, id.vars = NULL) # using reshape2
# I can now make a boxplot for every column in the data.frame:
g <- ggplot(DT.m, aes(x = reorder(variable, value, FUN=median), y = value)) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
stat_summary(fun.y=mean, colour="darkred", geom="point") +
geom_boxplot(???, alpha=0.5)
The colour fill information is what I'm stuck on. "value" is a continuous variable in the range [0,1] and there are 55 columns in my data.frame. Various approaches I've tried seem to result in the boxes being split vertically down the middle, and I haven't got any further. Any ideas?

You can do this by adding the median-by-group to your data frame and then mapping the new median variable to the fill aesthetic. Here's an example with the built-in mtcars data frame. By using this same mapping across different plots, you should get the same colors:
library(ggplot2)
library(dplyr)
ggplot(mtcars %>% group_by(carb) %>%
mutate(medMPG = median(mpg)),
aes(x = reorder(carb, mpg, FUN=median), y = mpg)) +
geom_boxplot(aes(fill=medMPG)) +
stat_summary(fun.y=mean, colour="darkred", geom="point") +
scale_fill_gradient(low=hcl(15,100,75), high=hcl(195,100,75))
If you have various data frames with different ranges of medians, you can still use the method above, but to get a consistent mapping of color to median across all your plots, you'll need to also set the same limits for scale_fill_gradient in each plot. In this example, the median of mpg (by carb grouping) varies from 15.0 to 22.8. But let's say across all my data sets, it varies from 13.3 to 39.8. Then I could add this to all my plots:
scale_fill_gradient(limits=c(13.3, 39.8),
low=hcl(15,100,75), high=hcl(195,100,75))
This is just for illustration. For ease of maintenance if your data might change, you'll want to set the actual limits programmatically.

I built on eipi10's solution and obtained the following code which does what I want:
# "results" is a 55-column data.frame containing
# bootstrapped estimates of the Gini impurity for each column variable
# (But can synthesize fake data for testing with a bunch of rnorms)
DT.m <- melt(results, id.vars = NULL) # using reshape2
g <- ggplot(DT.m %>% group_by(variable) %>%
mutate(median.gini = median(value)),
aes(x = reorder(variable, value, FUN=median), y = value)) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
geom_boxplot(aes(fill=median.gini)) +
stat_summary(fun.y=mean, colour="darkred", geom="point") +
scale_fill_gradientn(colours = heat.colors(9)) +
ylab("Gini impurity") +
xlab("Feature") +
guides(fill=guide_colourbar(title="Median\nGini\nimpurity"))
plot(g)
Later, for the second plot:
medians <- lapply(results, median)
color <- colorRampPalette(colors =
heat.colors(9))(1000)[cut(unlist(medians),1000,labels = F)]
color is then a character vector containing the colours of the nodes in my subsequent network graph, and these colours match those in the boxplot. Job done!

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

plot multiple variables by group [r] - r

According to the requirements described by you, this is what you want: [EDIT] All of the blue points and red points linked ggplot(df, aes(radius, value, color = year, group=polutatnt, shape=year)) + geom_point(size=3) + geom_line(aes(group = year)) + facet_grid(polutatnt ~.)

Related

data points misaligned when using a third value with position jitterdodge

Draw line between points with groups in ggplot

ggplot2: add RMSE values of two models to each facet

setting a unique span parameter for each geom_smooth() on a multi-facet ggplot

How to specify ggplot2 boxplot fill colour for continuous data?

Categories

Resources