Custom shape in ggplot (geom_point) - r

Aim
I am trying to change the shape of the geom_point into a cross (so not a "plus/addition" sign, but a 'death' cross).
Attempt
Let say I have the following data:
library(tidyverse)
df <- read.table(text="x y
1 3
2 4
3 6
4 7 ", header=TRUE)
I am able to change the shape using the shape parameter in geom_point into different shapes, like this:
ggplot(data = df, aes(x =x, y=y)) +
geom_point(shape=2) # change shape
However, there is no option to change the shape into a cross.
Question
How do I change the shape of a value into a cross using ggplot in R?

Shape can be set to a unicode character. The below uses the skull and crossbones but you can look up a more suitable symbol.
Note that the final result will depend on the font used to generate the plot.
ggplot(data = df, aes(x =x, y=y)) +
geom_point(shape="\u2620", size = 10)

Related

Can someone explain why my first ggplot2 box plot was just one big box and how the solution worked?

So my first ggplot2 box plot was just one big stretched out box plot, the second one was correct but I don't understand what changed and why the second one worked. I'm new to R and ggplot2, let me know if you can, thanks.
#----------------------------------------------------------
# This is the original ggplot that didn't work:
#----------------------------------------------------------
zSepalFrame <- data.frame(zSepalLength, zSepalWdth)
zPetalFrame <- data.frame(zPetalLength, zPetalWdth)
p1 <- ggplot(data = zSepalFrame, mapping = aes(x=zSepalWdth, y=zSepalLength, group = 4)) + #fill = zSepalLength
geom_boxplot(notch=TRUE) +
stat_boxplot(geom = 'errorbar', width = 0.2) +
theme_classic() +
labs(title = "Iris Data Box Plot") +
labs(subtitle ="Z Values of Sepals From Iris.R")
p1
#----------------------------------------------------------
# This is the new ggplot box plot line that worked:
#----------------------------------------------------------
bp = ggplot(zSepalFrame, aes(x=factor(zSepalWdth), y=zSepalLength, color = zSepalWdth)) + geom_boxplot() + theme(legend.position = "none")
bp
This is what the ggplot box plot looked like
I don't have your precise dataset, OP, but it seems to stem from assigning a continuous variable to your x axis, when boxplots require a discrete variable.
A continuous variable is something like a numeric column in a dataframe. So something like this:
x <- c(4,4,4,8,8,8,8)
Even though the variable x only contains 4's and 8's, R assigns this as a numeric type of variable, which is continuous. It means that if you plot this on the x axis, ggplot will have no issue with something falling anywhere in-between 4 or 8, and will be positioned accordingly.
The other type of variable is called discrete, which would be something like this:
y <- c("Green", "Green", "Flags", "Flags", "Cars")
The variable y contains only characters. It must be discrete, since there is no such thing as something between "Green" and "Cars". If plotted on an x axis, ggplot will group things as either being "Green", "Flags", or "Cars".
The cool thing is that you can change a continuous variable into a discrete one. One way to do that is to factorize or force R to consider a variable as a factor. If you typed factor(x), you get this:
[1] 4 4 4 8 8 8 8
Levels: 4 8
The values in x are the same, but now there is no such thing as a number between 4 and 8 when x is a factor - it would just add another level.
That is in short why your box plot changes. Let's demonstrate with the iris dataset. First, an example like yours. Notice that I'm assigning x=Sepal.Length. In the iris dataset, Sepal.Length is numeric, so continuous.
ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width)) +
geom_boxplot()
This is similar to yours. The reason is that the boxplot is drawn by grouping according to x and then calculating statistics on those groups. If a variable is continuous, there are no "groups", even if data is replicated (like as in x above). One way to make groups is to force the data to be discrete, as in factor(Sepal.Length). Here's what it looks like when you do that:
ggplot(iris, aes(x=factor(Sepal.Length), y=Sepal.Width)) +
geom_boxplot()
The other way to have this same effect would be to use the group= aesthetic, which does what you might think: it groups according to that column in the dataset.
ggplot(iris, aes(x=Sepal.Length), y=Sepal.Width, group=Sepal.Length)) +
geom_boxplot()

ggplot2: insert dashed line in legend [duplicate]

I'm trying to create a histogram with two superimposed density plots. The problem: is I want one density to be a dashed line, which works perfectly but in the legend the dashed line will not appear, as in the following example
x<-sort(rnorm(1000))
data<-data.frame(x=x,Normal=dnorm(x,mean(x),sd=sd(x)),Student=dt(x,df=3))
ggplot(data,aes(y=x))+geom_histogram(aes(x=x,y=..density..),
color="black",fill="darkgrey")+geom_line(aes(x=x,y=Normal,color="Normal"),size=1,
linetype=2)+ylab("")+xlab("")+labs(title="Density estimations")+geom_line(aes(x=x,y=Student,color="Student"),size=1)+
scale_color_manual(values=c("Student"="black","Normal"="black"))
Any ideas how I get the dashed line in the legend?
Thank you very much!
Rainer
The "ggplot" way generally likes data to be in "long" format with separate columns to specify each aesthetic. In this case, linetype should be interpreted as an aesthetic. The easiest way to deal with this is to prep your data into the appropriate format with reshape2 package:
library(reshape2)
data.m <- melt(data, measure.vars = c("Normal", "Student"), id.vars = "x")
And then modify your plotting code to look something like this:
ggplot(data,aes(y=x)) +
geom_histogram(aes(x=x,y=..density..),color="black",fill="darkgrey") +
geom_line(data = data.m, aes(x = x, y = value, linetype = variable), size = 1) +
ylab("") +
xlab("") +
labs(title="Density estimations")
Results in something like this:
You want to reshape this to long format ...makes it simpler
x<-sort(rnorm(1000))
Normal=dnorm(x,mean(x),sd=sd(x))
Student=dt(x,df=3)
y= c(Normal,Student)
DistBn= rep(c('Normal', 'Student'), each=1000)
# don't call it 'data' that is an R command
df<-data.frame(x=x,y=y, DistBn=DistBn)
head(df)
x y DistBn
1 -2.986430 0.005170920 Normal
2 -2.957834 0.005621358 Normal
3 -2.680157 0.012126747 Normal
4 -2.601635 0.014864165 Normal
5 -2.544302 0.017179353 Normal
6 -2.484082 0.019930239 Normal
ggplot(df,aes(x=x, y=y))+
geom_histogram(aes(x=x,y=..density..),color="black",fill="darkgrey")+
geom_line(aes(x=x,y=y,linetype=DistBn))+
ylab("")+xlab("")+labs(title="Density estimations")+
scale_color_manual(values=c("Student"="black","Normal"="black"))

How to create surface plot in R

I'm currently trying to develop a surface plot that examines the results of the below data frame. I want to plot the increasing values of noise on the x-axis and the increasing values of mu on the y-axis, with the point estimate values on the z-axis. After looking at ggplot2 and ggplotly, it's not clear how I would plot each of these columns in surface or 3D plot.
df <- "mu noise0 noise1 noise2 noise3 noise4 noise5
1 1 0.000000 0.9549526 0.8908646 0.919630 1.034607
2 2 1.952901 1.9622004 2.0317115 1.919011 1.645479
3 3 2.997467 0.5292921 2.8592976 3.034377 3.014647
4 4 3.998339 4.0042379 3.9938346 4.013196 3.977212
5 5 5.001337 4.9939060 4.9917115 4.997186 5.009082
6 6 6.001987 5.9929932 5.9882173 6.015318 6.007156
7 7 6.997924 6.9962483 7.0118066 6.182577 7.009172
8 8 8.000022 7.9981131 8.0010066 8.005220 8.024569
9 9 9.004437 9.0066182 8.9667536 8.978415 8.988935
10 10 10.006595 9.9987245 9.9949733 9.993018 10.000646"
Thanks in advance.
Here's one way using geom_tile(). First, you will want to get your data frame into more of a Tidy format, where the goal is to have columns:
mu: nothing changes here
noise: need to combine your "noise0", "noise1", ... columns together, and
z: serves as the value of the noise and we will apply the fill= aesthetic using this column.
To do that, I'm using dplyr and gather(), but there are other ways (melt(), or pivot_longer() gets you that too). I'm also adding some code to pull out just the number portion of the "noise" columns and then reformatting that as an integer to ensure that you have x and y axes as numeric/integers:
# assumes that df is your data as data.frame
df <- df %>% gather(key="noise", value="z", -mu)
df <- df %>% separate(col = "noise", into=c('x', "noise"), sep=5) %>% select(-x)
df$noise <- as.integer(df$noise)
Here's an example of how you could plot it, but aesthetics are up to you. I decided to also include geom_text() to show the actual values of df$z so that we can see better what's going on. Also, I'm using rainbow because "it's pretty" - you may want to choose a more appropriate quantitative comparison scale from the RColorBrewer package.
ggplot(df, aes(x=noise, y=mu, fill=z)) + theme_bw() +
geom_tile() +
geom_text(aes(label=round(z, 2))) +
scale_fill_gradientn(colors = rainbow(5))
EDIT: To answer OP's follow up, yes, you can also showcase this via plotly. Here's a direct transition:
p <- plot_ly(
df, x= ~noise, y= ~mu, z= ~z,
type='mesh3d', intensity = ~z,
colors= colorRamp(rainbow(5))
)
p
Static image here:
A much more informative way to show this particular set of information is to see the variation of df$z as it relates to df$mu by creating df$delta_z and then using that to plot. (you can also plot via ggplot() + geom_tile() as above):
df$delta_z <- df$z - df$mu
p1 <- plot_ly(
df, x= ~noise, y= ~mu, z= ~delta_z,
type='mesh3d', intensity = ~delta_z,
colors= colorRamp(rainbow(5))
)
Giving you this (static image here):
ggplot accepts data in the long format, which means that you need to melt your dataset using, for example, a function from the reshape2 package:
dfLong = melt(df,
id.vars = "mu",
variable.name = "noise",
value.name = "meas")
The resulting column noise contains entries such as noise0, noise1, etc. You can extract the numbers and convert to a numeric column:
dfLong$noise = with(dfLong, as.numeric(gsub("noise", "", noise)))
This converts your data to:
mu noise meas
1 1 0 1.0000000
2 2 0 2.0000000
3 3 0 3.0000000
...
As per ggplot documentation:
ggplot2 can not draw true 3D surfaces, but you can use geom_contour(), geom_contour_filled(), and geom_tile() to visualise 3D surfaces in 2D.
So, for example:
ggplot(dfLong,
aes(x = noise
y = mu,
fill = meas)) +
geom_tile() +
scale_fill_gradientn(colours = terrain.colors(10))
Produces:

Simple ggplot2 situation with colors and legend

Trying to make some plots with ggplot2 and cannot figure out how colour works as defined in aes. Struggling with errors of aesthetic length.
I've tried defining colours in either main ggplot call aes to give legend, but also in geom_line aes.
# Define dataset:
number<-rnorm(8,mean=10,sd=3)
species<-rep(c("rose","daisy","sunflower","iris"),2)
year<-c("1995","1995","1995","1995","1996","1996","1996","1996")
d.flowers<-cbind(number,species,year)
d.flowers<-as.data.frame(d.flowers)
#Plot with no colours:
ggplot(data=d.flowers,aes(x=year,y=number))+
geom_line(group=species) # Works fine
#Adding colour:
#Defining aes in main ggplot call:
ggplot(data=d.flowers,aes(x=year,y=number,colour=factor(species)))+
geom_line(group=species)
# Doesn't work with data size 8, asks for data of size 4
ggplot(data=d.flowers,aes(x=year,y=number,colour=unique(species)))+
geom_line(group=species)
# doesn't work with data size 4, now asking for data size 8
The first plot gives
Error: Aesthetics must be either length 1 or the same as the data (4): group
The second gives
Error: Aesthetics must be either length 1 or the same as the data (8): x, y, colour
So I'm confused - when given aes of length either 4 or 8 it's not happy!
How could I think about this more clearly?
Here are #kath's comments as a solution. It's subtle to learn at first but what goes inside or outside the aes() is key. Some more info here - When does the aesthetic go inside or outside aes()? and lots of good googleable "ggplot aesthetic" centric pages with lots of examples to cut and paste and try.
library(ggplot2)
number <- rnorm(8,mean=10,sd=3)
species <- rep(c("rose","daisy","sunflower","iris"),2)
year <- c("1995","1995","1995","1995","1996","1996","1996","1996")
d.flowers <- data.frame(number,species,year, param1, param2)
head(d.flowers)
#number species year
#1 8.957372 rose 1995
#2 7.145144 daisy 1995
#3 9.864917 sunflower 1995
#4 7.645287 iris 1995
#5 4.996174 rose 1996
#6 8.859320 daisy 1996
ggplot(data = d.flowers, aes(x = year,y = number,
group = species,
colour = species)) + geom_line()
#note geom_point() doesn't need to be grouped - try:
ggplot(data = d.flowers, aes(x = year,y = number, colour = species)) + geom_point()

ggplot: coordinate axes are unordered when using geom_point()

I want to create a scatter plot, but the scale of the axes is messed up. I want it to have an increasing order, but in the plot y = 7 lies between y = 8.8 and y = 11.8.
It is a bit difficult to explain, so I uploaded a picture of the plot to
splot <- ggplot(df, aes(x_val, y_val)) + geom_point() + ggtitle(title) + xlab(label) + ylab(label)
df looks like that
x_val y_val x_min x_max y_min y_max series
1 8.2640626 7.1605616 7.43370308695577 9.09442211304423 5.62731954407747 8.69380365592253 1IWG
2 10.0321728 8.8790822 8.43774194466477 11.6266036553352 6.97682936735609 10.7813350326439 1J4N
3 13.4994332665331 11.8238683366733 12.4200921869666 14.5787743460995 9.99549351881522 13.6522431545315 1KPL
Thanks for any help.
Use str(df) to examine your data frame df. If the variables you are trying to plot are factors, then use as.numeric() to convert them so that they are interpreted as numbers. Or you can try to specify that they are numeric when you create your data set, depending on how the frame is defined.

Resources