Add categorical grouping to scatter plot of continuous data in R? - r

Sorry if image 1 is a little basic - layout sent by my project supervisor! I have created a scatterplot of total grey seal abundance (Total) over observation time (Obsv_time), and fitted a gam over the top, as seen in image 2:
plot(Total ~ Obsv_time,
data = R_Count,
ylab = "Total",
xlab = "Observation Time (Days)",
pch = 20, cex = 1, bty = "l",col="dark grey")
lines(R_Count$Obsv_time, fitted(gam.tot2))
I would like to somehow show on the graph the corresponding Season (Image 1) - from a categorical factor variable (4 levels: Pre-breeding,Breeding,Post-breeding,Moulting), which corresponds to Obsv_time.
I am unsure if I need to plot a secondary axis or just add labels to the graph...and how to do each! Thanks!
Wanted graph layout - indicate season from factor variable
Scatterplot with GAM curve

You can do this with base R graphics. Leave off the x-axis in the original plot, and add an axis with the season labels separately. You can get indicate the season by overlaying polygons.
## Some bogus data
x = sort(runif(50,0,250))
y = 800*(sin(x/40) + x/100 + rnorm(50,0, 0.2)) + 500
FittedY = 800*(sin(x/40) + x/100)+500
plot(x,y, pch= 20, col='lightgray', ylim=c(300,2700), xaxt='n',
xlab="", ylab='Total')
lines(x, FittedY)
axis(1, at=c(25,95,155,215), tick=FALSE,
labels=c('PreBreed', 'Repro', 'PostBreed', 'Moulting'))
rect(c(-10,65,125,185), 0, c(65,125,185,260), 3000,
col=rainbow(4, alpha=0.05), border=NA)

If you are able to use ggplot2, you could add (or compute from time) another factor variable to your data-frame which would be your season. Then it is just a matter of using color (or any other) aesthetic which would use this season variable.
require(ggplot2)
df <- data.frame(total = c(26, 41, 31, 75, 64, 32, 7, 89),
time = c(1, 2, 3, 4, 5, 6, 7, 8))
df$season <- cut(df$time, breaks=c(0, 2, 4, 6, 8),
labels=c("winter", "spring", "summer", "autumn"))
ggplot(df, aes(x=time, y=total)) +
geom_smooth(color="black") +
geom_point(aes(color=season))

Related

Is there a way to only show two labels on the x-axis in a scatterplot?

I'm trying to make a scatterplot that shows the age of people on the y-axis and the way they have been positioned on the x-axis (either 0° or 15° elevated).
My dataset is called raw.
I have used the function plot(raw$position, raw$age). Instead of just showing 0 and 15, the x-axis gives out 0, 5, 10, 15 (with no dots for 5 or 10, since the only two positionings are 0 and 15°).
Is there a way to get it to only show my 0 and 15 on the x-axis?
As you didn't supply the original data below is a reproducible example with my own. There's two parts to doing this:
Include yaxt = "n" in plot to suppress the original y-axis in the plot
Use axis(2, labels = c(0, 15), at = c(0, 15)) to set the y-axis (side = 2) with labels (labels) called c(0,15) at the points (at) on the axis (0,15).
set.seed(1)
df = data.frame(
age = round(runif(10, 20, 30)),
position = rbinom(10, 1, 0.5)*15
)
plot(df$age, df$position, yaxt = "n")
axis(2, labels = c(0, 15), at = c(0, 15))
Edit: Just re-read your question and saw you want to edit the x-axis, which you do with this:
Same as above but now set xaxt = "n" and side = 1
set.seed(1)
df = data.frame(
age = round(runif(10, 20, 30)),
position = rbinom(10, 1, 0.5)*15
)
plot(df$position, df$age, xaxt = "n")
axis(1, labels = c(0, 15), at = c(0, 15))

Plotting the area under the curve of various distributions in R

Suppose I'm trying to find the area below a certain value for a student t distribution. I calculate my t test statistic to be t=1.78 with 23 degrees of freedom, for example. I know how to get the area under the curve above t=1.78 with the pt() function. How can I get a plot of the student distribution with 23 degrees of freedom and the area under the curve above 1.78 shaded in. That is, I want the curve for pt(1.78,23,lower.tail=FALSE) plotted with the appropriate area shaded. Is there a way to do this?
ggplot version:
ggplot(data.frame(x = c(-4, 4)), aes(x)) +
stat_function(fun = dt, args =list(df =23)) +
stat_function(fun = dt, args =list(df =23),
xlim = c(1.78,4),
geom = "area")
This should work:
x_coord <- seq(-5, 5, length.out = 200) # x-coordinates
plot(x_coord, dt(x_coord, 23), type = "l",
xlab = expression(italic(t)), ylab = "Density", bty = "l") # plot PDF
polygon(c(1.78, seq(1.78, 5, by = .3), 5, 5), # polygon for area under curve
c(0, dt(c(seq(1.78, 5, by = .3), 5), 23), 0),
col = "red", border = NA)
Regarding arguments to polygon():
your first and last points should be [1.78, 0] and [5, 0] (5 only in case the plot goes to 5) - these basically devine the bottom edge of the red polygon
2nd and penultimate points are [1.78, dt(1.78, 23)] and [5, dt(5, 23)] - these define the end points of the upper edge
the stuff in between is just X and Y coordinates of an arbitrary number of points along the curve [x, dt(x, 23)] - the more points, the smoother the polygon
Hope this helps

Changing legend labels in ggplotly()

I have a plot of polygons that are colored according to a quantitative variable in the dataset being cut off at certain discrete values (0, 5, 10, 15, 20, 25). I currently have a static ggplot() output that "works" the way I intend. Namely, the legend values are the cut off values (0, 5, 10, 15, 20, 25). The static plot is below -
However, when I simply convert this static plot to an interactive plot, the legend values become hexadecimal values (#54278F, #756BB1, etc.) instead of the cut off values (0, 5, 10, 15, 20, 25). A screenshot of this interactive plot is shown below -
I am trying to determine a way to change the legend labels in the interactive plot to be the cut off values (0, 5, 10, 15, 20, 25). Any suggestions or support would be greatly appreciated!
Below is the code I used to create the static and interactive plot:
library(plotly)
library(ggplot2)
library(RColorBrewer)
set.seed(1)
x = abs(rnorm(30))
y = abs(rnorm(30))
value = runif(30, 1, 30)
myData <- data.frame(x=x, y=y, value=value)
cutList = c(5, 10, 15, 20, 25)
purples <- brewer.pal(length(cutList)+1, "Purples")
myData$valueColor <- cut(myData$value, breaks=c(0, cutList, 30), labels=rev(purples))
# Static plot
sp <- ggplot(myData, aes(x=x, y=y, fill=valueColor)) + geom_polygon(stat="identity") + scale_fill_manual(labels = as.character(c(0, cutList)), values = levels(myData$valueColor), name = "Value")
# Interactive plot
ip <- ggplotly(sp)
Label using the cut points and use scale_fill_manual for the colors.
cutList = c(5, 10, 15, 20, 25)
purples <- brewer.pal(length(cutList)+1, "Purples")
myData$valueLab <- cut(myData$value, breaks=c(0, cutList, 30), labels=as.character(c(0, cutList)))
# Static plot
sp <- ggplot(myData, aes(x=x, y=y, fill=valueLab)) + geom_polygon(stat="identity") + scale_fill_manual(values = rev(purples))
# Interactive plot
ip <- ggplotly(sp)

misplaced label on scatter plot data

I am quite new to R and was wondering if anyone could help with this problem:
I am trying to graph a set of data. I use plot to plot the scatter data and use text to add labels to the values. However the last label is misplaced on the graph and I can't figure out why. Below is the code:
#specify the dataset
x<-c(1:10)
#find p: the percentile of each data in the dataset
y=quantile(x, probs=seq(0,1,0.1), na.rm=FALSE, type=5)
#print the values of p
y
#plot p against x
plot(y, tck=0.02, main="Percentile Graph of Dataset D", xlab="Data of the dataset", ylab="Percentile", xlim=c(0, 11), ylim=c(0, 11), pch=10, seq(1, 11, 1), col="blue", las=1, cex.lab=0.9, cex.axis=0.9, cex.main=0.9)
#change the x-axis scale
axis(1, seq(1, 11, 1), tck=0.02)
#draw disconnected line segments
abline(h = 1:11, v = 1:11, col = "#EDEDED")
#Add data labels to the graph
text(y, x, labels= (y), cex=0.6, pos=1, col="red")
Your probs request returns 11 values, but you only have 10 x values. Therefore R recycles your y values, and the 11th label is plotted at y = 1 when you add the text. How to fix this depends upon what you are trying to do. Perhaps in your probs sequence you want seq(0, 1, length.out = 10)?

How to make a grouped barchart with two groups on x-axis

I have a data that looks like this
Name, Clusters, incorrectly_classified
PCA, 2, 34.37
PCA, 6, 60.80
ICA2, 2, 37.89
ICA6, 2, 33.20
ICA2, 6, 69.66
ICA6, 6, 60.54
RP2, 2, 32.94
RP4, 2, 33.59
RP6, 2, 31.25
RP2, 6, 68.75
RP4, 6, 61.58
RP6, 6, 56.77
I would like to create a barplot for the above data that is similar to this plot I drew
x axis will have two numbers 2 or 6. Y-axis will have incorrectly_classified and the Name will be plotted for each 2 or 6. Each Name for each group (2 or 6) would be colored consistently among the two groups.
Is this possible to achieve with barchart? If not with barchart, then what is a good way to plot this data
I think the following is what you are after.
ggplot(data = mydf, aes(x = factor(Clusters), y = incorrectly_classified, fill = Name)) +
geom_bar(stat = "identity", position = "dodge") +
labs(x = "Clusters", y = "Incorrectly classified")
This can be done with barplot.
An example:
counts <- table(mtcars$vs, mtcars$gear)
barplot(counts, main="Car Distribution by Gears and VS",
xlab="Number of Gears", col=c("darkblue","red"),
legend = rownames(counts), beside=TRUE)
EDIT
I will also work my answer out to demonstrate the barplot option (although ggplot is much cooler :-) ):
if df is your dataframe:
dfwide<-reshape(df,timevar="Clusters",v.names="incorrectly_classified",idvar="Name",direction="wide")
rownames(dfwide) <- dfwide$Name
dfwide$Name<-NULL
names(dfwide)[names(dfwide)=="incorrectly_classified.2"] <- "2"
names(dfwide)[names(dfwide)=="incorrectly_classified.6"] <- "6"
dfwide<-as.matrix(dfwide)
barplot(dfwide, main="Your Graph",
xlab="Clusters",ylab="incorrectly_classified",col=c("darkblue","red","orange","green","purple","grey"),
legend = rownames(dfwide), beside=TRUE,args.legend = list(x = "topleft", bty = "n", inset=c(0.15, -0.15)))

Resources