Question: Use a factor's index to plot variables - r

I'm very new to R so I'm sorry if this is something really simple.
I've had a look on a bunch of cheat sheets and can't see anything obvious.
I have a simple set of data that has date, temperature, and 4 different factors (based on the bloom of a tree // 1 = "", 2 = "bloom", 3 = "full", 4 = "scatter")
What I want to do, but have no idea how, is to do a scatter plot of the date and temperature of each factor individually.

One approach is to use ggplot2 with facet_wrap. First, be sure to set the level names of the Bloom factor so the plots will label usefully.
Then, we use ggplot to plot the data and group = by the Bloom factor. Then we add facet_wrap with the formula that . (everything else) should be grouped by Bloom.
library(ggplot2)
levels(TreeData$Bloom) <- c("None","Bloom","Full","Scatter")
ggplot(TreeData, aes(x=Date,y=Temp,group = Bloom, color = Bloom)) +
geom_point(show.legend = FALSE) +
facet_wrap(. ~ Bloom)
Per your comment, if you wanted individual graphs you could use base R subsetting with TreeData[TreeData$Bloom == "Full",]. Note that "Full" is the factor level we set earlier.
ggplot(TreeData[TreeData$Bloom == "Full",], aes(x=Date,y=Temp)) +
geom_point() + labs(title="Full Bloom")
Data
set.seed(1)
TreeData <- data.frame(Date = rep(seq.Date(from=as.Date("2019-04-01"), to = as.Date("2019-08-01"), by = "week"),each = 10) , Temp = round(runif(22,38,n=180)), Bloom = as.factor(sample(1:4,180,replace = TRUE)))

Related

How to reorder bars in barplot using ggplot 2 [duplicate]

This question already has answers here:
Order Bars in ggplot2 bar graph
(16 answers)
Closed 1 year ago.
I wanted to move my bars according to this particular order for the beetle number i.e., from 0 to 1-5 to 6-10 to 11-15 to Above 15. I also wanted to place Village first and the Municipality. The plots should also be arranged in terms of the age of the building. Under 5 years first, then 5-10 years followed by Above 10 years
ggplot(g,aes(x=Locality.Division))+
geom_bar(aes(fill=Number.of.Beetle),position="dodge")+
facet_wrap(~Building.Age)
#> Error in ggplot(g, aes(x = Locality.Division)): could not find function "ggplot"
Created on 2021-05-30 by the reprex package (v2.0.0)
The order of the bars is determined by the order of the factor levels of the variable.
You have the Number.of.Beetle variable in your data a character variable. ggplot() converts this to a factor variable with factor(), which by default sorts character variables alphabetically. To specify a different order, convert the variable to a factor yourself before plotting:
g <- mutate(g,
Number.of.Beetle = factor(Number.of.Beetle, levels = c("1-5", "6-10", "11-15", "15+))
)
If the order is shown backwards, then also use forcats::fct_rev() to reverse the order:
g <- mutate(g,
Number.of.Beetle = forcats::fct_rev(factor(Number.of.Beetle, levels = c("1-5", "6-10", "11-15", "15+)))
)
I hope the following helps to get you started. You did not provide a minimal reproducible example, thus, I simulate some data. I also adapted the variable names.
A key strategy to control the order of variables is making them a factor. I do this when plotting.
Note: number of beetles is quasi-sorted given the values used. Here you could also work with a factor, if needed.
library(ggplot2)
set.seed(666) # fix random picks for replicability
# simulate data of 30 buildings
df <- data.frame(
Building = 1:30
, Building.Age = sample(x = c("U5","5-10","A10"), size = 30, replace = TRUE)
, Nbr.Beetle = sample(x = c("1-5","6-10","11-15","15+"), size = 30, replace = TRUE)
, Locality = sample(x = c("A","B","C"), size = 30, replace = TRUE))
# plot my example
ggplot(data = df, aes(x=Locality)) +
geom_bar(aes(fill=Nbr.Beetle),position="dodge") +
# --------------------- control the sequence of panels by forcing level sequence of factor
facet_wrap(. ~ factor( Building.Age, levels = c("U5","5-10","A10") ) )
This yields:

Reorder discrete variables in ggplot2

I'm trying to reorder the discrete variables that I have in ggplot2. I would like to display it like WTT, KOT, WTD, KOD in that order in the graph however, I am currently getting KOD,KOT,WTD,WTT in the graph. I have tried using match to manually order the dataframe but I don't see a change in the graph itself.
The data looks something like this:
type mean
WTT 100
KOT 110
WTD 1000
KOD 1300
The means will vary and I only care that the correct factors are paired to each other in a graph.
And the code I am primarily using is the following:
graph = ggplot(data = data_subset,aes(y = Mean, x = Type, color = Type))
A straight forward way would be to re-level your type variable:
graph = ggplot(data = data_subset,aes(y = Mean, x = factor(Type, levels = c("WTT", "KOT", "WTD", "KOD"), color = Type))

geom_line : How to connect only a few points

I have this dataframe and this plot :
df <- data.frame(Groupe = rep(c("A","B"),4),
Period = gl(4,2,8,c("t0","t1","t2","t3","t4")),
rate = c(0.83,0.96,0.75,0.93,0.67,0.82,0.65,0.73))
ggplot(data = df, mapping = aes(y = rate, x = Period ,group = Groupe, colour=Groupe, shape=Groupe)) +
geom_line(size=1.2) +
geom_point(size=5)
How could i organize my data so that the points between t1 and t2 are not connected with a line ? I'd like t0 and t1 to be connected (blue or red according to the group), t2 and t3 connected in the same way, but no lines between t1 and t2. I tried several things by looking at similar questions, but it always mess up my grouping colors :/
Creating a new grouping variable manually is mostly not the best way. So, a slightly different approach which requires less hardcoding:
# create new grouping variable
df$grp <- c(1,2)[df$Period %in% c("t2","t3","t4") + 1L]
# create the plot and use the interaction between 'Group' and 'grp' as group
ggplot(df, aes(x = Period, y = rate,
group = interaction(Groupe,grp),
colour = Groupe,
shape = Groupe)) +
geom_line(size=1.2) +
geom_point(size=5)
this gives the same plot as in the other answer:
The best way to handle a problem like this in ggplot is often to create an additional column in your data frame that indicates the grouping you want to work with in your data. For example, here I've added an extra column gp to your data frame:
df$gp <- c(1,2,1,2,3,4,3,4)
ggplot(data = df, aes(y = rate, x = Period, group = gp, colour=Groupe, shape=Groupe)) +
geom_line(size=1.2) +
geom_point(size=5)
The result is, I believe, what you are looking for:
If you make Period a numerical column rather than a character vector or factor, you can more easily generate a column like gp automatically rather than manually specifying it (perhaps using ifelse or cases to create it) - this would be useful if you wanted to do the same thing many times or with a large data frame.

Manually added legend not working in ggplot2?

Here's facsimile of my data:
d1 <- data.frame(
e=rnorm(3000,10,10)
)
d2 <- data.frame(
e=rnorm(2000,30,30)
)
So, I got around the problem of plotting two different density distributions from two very different datasets on the same graph by doing this:
ggplot() +
geom_density(aes(x=e),fill="red",data=d1) +
geom_density(aes(x=e),fill="blue",data=d2)
But when I try to manually add a legend, like so:
ggplot() +
geom_density(aes(x=e),fill="red",data=d1) +
geom_density(aes(x=e),fill="blue",data=d2) +
scale_fill_manual(name="Data", values = c("XXXXX" = "red","YYYYY" = "blue"))
Nothing happens. Does anybody know what's going wrong? I thought I could actually manually add legends if need be.
Generally ggplot works best when your data is in a single data.frame and in long format. In your case we therefore want to combine the data from both data.frames. For this simple example, we just concatenate the data into a long variable called d and use an additional column id to indicate to which dataset that value belongs.
d.f <- data.frame(id = rep(c("XXXXX", "YYYYY"), c(3000, 2000)),
d = c(d1$e, d2$e))
More complex data manipulations can be done using packages such as reshape2 and tidyr. I find this cheat sheet often useful. Then when we plot we map fill to id, and ggplot will take of the legend automatically.
ggplot(d.f, aes(x = d, fill = id)) +
geom_density()

Multiple plots by factor in ggplot (facets)

I have a data frame with two qualitative variables (Q1, Q2) which are both measured on a scale of LOW, MEDIUM, HIGH and a continuous variable CV on a scale 0-100.
s = 5
trial <- data.frame(id = c(1:s),
Q1 = ordered(sample(c("LOW","MED","HIGH"),size=s,replace=T)),
Q2 = ordered(sample(c("LOW","MED","HIGH"),size=s,replace=T)),
CV = runif(s,0,100))
I need to use ggplot to show a faceted plot (preferably a horizontal boxplot/jitter) of the continous variable for each qualitative variable (x2) for each level (x3). This would result in a 3 x 2 layout.
As I'm very new to ggplot I'm unsure how this should be achieved. I've played with qplot and and can't work out how to control the facets to display both Q1 and Q2 boxplots on the same chart!!
Do I need to run multiple qplots to the same window (in base I would use par to control layout) or can it be achieved from a single command. Or should I try to melt the data twice?
trial = rbind(data.frame(Q = "Q1",Level = trial[,2], CV = trial[,4]),
data.frame(Q = "Q2",Level = trial[,3], CV = trial[,4]))
I'll keep trying and hope somebody can provide some hints in the meantime.
I'm not entirely clear on what you want, but maybe this helps:
ggplot(trial, aes(Level, CV)) +
geom_boxplot() +
geom_jitter() +
facet_wrap(~Q) +
coord_flip()

Resources