I have the plot:
plot(Combined$TIMESTAMP, # Draw first time series
Combined$GPP_NT_VUT_REF,
type = "l",
col = 2,
ylim = c(0, 15),
xlab = "",
ylab = expression(paste("GPP [gC m"^"-2 "," day "^"-1]"))) +
lines(Combined$TIMESTAMP, # Draw second time series
Combined$GPP_WRF_mean,
type = "l",
col = 3) +
legend("topright", # Add legend to plot
c("OBS", "WRF"),
lty = 1,
col = 2:4)
which produces the timeseries graph with the name of x values equal to gen, feb, mar and I want to convert my x values in J,F,M,A,M,J...
I tried with:
axis(1, at=1:12, labels=month.name, cex.axis=0.5)
but it doesn't work - any help?
Keep the first letter of month.name or from month.abb:
month.1st.letter <- sub("(^[[:upper:]]).*$", "\\1", month.abb)
Then use this vector as the labels argument.
axis(1, at = 1:12, labels = month.1st.letter, cex.axis = 0.5)
Edit
Start by removing the plus signs from the code, base R graphics do not add up. Then, in the plot instruction include xaxt = "n". And plot the axis right after it.
plot(Combined$TIMESTAMP, # Draw first time series
Combined$GPP_NT_VUT_REF,
type = "l",
col = 2,
ylim = c(0, 15),
xaxt = "n", # here, no x axis
xlab = "",
ylab = expression(paste("GPP [gC m"^"-2 "," day "^"-1]")))
axis(1, at = 1:12, labels = month.1st.letter, cex.axis = 0.5)
I am working with the R programming language. I am trying to plot some categorical and continuous data that I am working with, but I am getting an error that tells me that such plots are only possible with "only numeric variables".
library(survival)
library(ggplot2)
data(lung)
data = lung
data$sex = as.factor(data$sex)
data$status = as.factor(data$status)
data$ph.ecog = as.factor(data$ph.ecog)
str(data)
#plot
mycolours <- rainbow(length(unique(data$sex)), end = 0.6)
# png("gally.png", 500, 400, type = "cairo", pointsize = 14)
par(mar = c(4, 4, 0.5, 0.75))
plot(NULL, NULL, xlim = c(1, 5), ylim = range(data[, 1:6]) + c(-0.2, 0.2),
bty = "n", xaxt = "n", xlab = "Variable", ylab = "Standardised value")
axis(1, 1:5, labels = colnames(data)[1:6])
abline(v = 1:5, col = "#00000033", lwd = 2)
abline(h = seq(-2.5, 2.5, 0.5), col = "#00000022", lty = 2)
for (i in 1:nrow(data)) lines(as.numeric(data[i, 1:6]), col = mycolours[as.numeric(data$sex[i])])
legend("topright", c("Female", "Male"), lwd = 2, col = mycolours, bty = "n")
# dev.off()
Does anyone know if this is possible to do with both categorical and continuous data?
Thanks
Sources: R: Parallel Coordinates Plot without GGally
Yup. You just have to be careful with the values. Remember how the factors are coded internally: they are just spicy integer variables with value labels (similar to names). You can losslessly cast it to character or to numeric. For the sake of plotting, you need numbers for line coordinates, so the factor-y nature of your variables will come at the end.
Remember that the quality of your visualisation and the information content depends on the order of your variables in you data set. For factors, labels are absolutely necessary. Help the reader by doing some completely custom improvements impossible in ggplot2 in small steps!
I wrote a custom function allowing anyone to add super-legible text on top of the values that are not so obvious to interpret. Give meaningful names, choose appropriate font size, pass all those extra parameters to the custom function as an ellipsis (...)!
Here you can see that most of the dead patients are female and most of the censored ones are males. Maybe adding some points with slight jitter will give the reader idea about the distributions of these variables.
library(survival)
data(lung)
# Data preparation
lung.scaled <- apply(lung, 2, scale)
drop.column.index <- which(colnames(lung) == "sex")
lung.scaled <- lung.scaled[, -drop.column.index] # Dropping the split variable
split.var <- lung[, drop.column.index]
lung <- lung[, -drop.column.index]
mycolours <- rainbow(length(unique(split.var)), end = 0.6, v = 0.9, alpha = 0.4)
# png("gally.png", 500, 400, type = "cairo", pointsize = 14)
par(mar = c(5.5, 4, 0.5, 0.75))
plot(NULL, NULL, xlim = c(1, ncol(lung.scaled)), ylim = range(lung.scaled, na.rm = TRUE) + c(-0.2, 0.2),
bty = "n", xaxt = "n", xlab = "", ylab = "Standardised value")
axis(1, 1:ncol(lung.scaled), labels = colnames(lung), cex.axis = 0.95, las = 2)
abline(v = 1:ncol(lung), col = "#00000033", lwd = 2)
abline(h = seq(round(min(lung.scaled, na.rm = TRUE)), round(max(lung.scaled, na.rm = TRUE), 0.5)), col = "#00000022", lty = 2)
for (i in 1:nrow(lung.scaled)) lines(as.numeric(lung.scaled[i, ]), col = mycolours[as.numeric(split.var[i])])
legend("topleft", c("Female", "Male"), lwd = 3, col = mycolours, bty = "n")
# Labels for some categorical variables with a white halo for readability
labels.with.halo <- function(varname, data.scaled, labels, nhalo = 32, col.halo = "#FFFFFF44", hscale = 0.04, vscale = 0.04, ...) {
offsets <- cbind(cos(seq(0, 2*pi, length.out = nhalo + 1)) * hscale, sin(seq(0, 2*pi, length.out = nhalo + 1)) * vscale)[-(nhalo + 1), ]
ind <- which(colnames(data.scaled) == varname)
yvals <- sort(unique(data.scaled[, ind]))
for (i in 1:nhalo) text(rep(ind, length(yvals)) + offsets[i, 1], yvals + offsets[i, 2], labels = labels, col = col.halo, ...)
text(rep(ind, length(yvals)), yvals, labels = labels, ...)
}
labels.with.halo("status", lung.scaled, c("Censored", "Dead"), pos = 3)
labels.with.halo("ph.ecog", lung.scaled, c("Asymptomatic", "Symp. but ambul.", "< 50% bed", "> 50% bed"), pos = 3, cex = 0.9)
# dev.off()
I am attempting to create several histograms that display the effects a drug has on the frequency of heart attacks.
Currently, R is organizing my data into the bins [0 - 0.5, 0.5 - 1.0, 1.0 - 1.5, etc.], but I would like for it to only use integer values: [0 - 1, 1 - 2, 2 - 3, etc.].
I have tried using the xaxt="n" argument and the axis() function. They "worked," but they did not solve the problem above. I also tried to use breaks=seq(0,5,l=6), but this converted my y-axis from frequency into density.
Here is the code for my latest two attempts:
hist(fourTrials$red_5, breaks=5, right = FALSE,
xlab = "Number of Heart Attacks",
xlim = c(0, 4), ylim = c(0,4),
main = "Experimental Group 1, n = 400", col = "light blue")
hist(fourTrials$red_5, breaks=seq(0,5,l=6), freq = F, right = FALSE,
xlab = "Number of Heart Attacks",
xlim = c(0, 4), ylim = c(0,4),
main = "Experimental Group 1, n = 400", col = "light blue",yaxs="i",xaxs="i")
Thanks for any help!
I believe that what you want is:
hist(fourTrials$red_5, breaks=0:4, freq = TRUE, right = FALSE,
xlab = "Number of Heart Attacks",
xlim = c(0, 4), ylim = c(0,4),
main = "Experimental Group 1, n = 400",
col = "lightblue", yaxs="i", xaxs="i")
In the followup of the discussion on stackexchange I tried to implement the following plot
from
Cumming, G., & Finch, S. (2005). [Inference by Eye: Confidence Intervals and How to Read Pictures of Data][5]. American Psychologist, 60(2), 170–180. doi:10.1037/0003-066X.60.2.170
I share some people's dislike of double axis, but I think this is a fair use.
Below my partial attempt, the second axis is still missing. I am looking for more elegant alternatives, intelligent variations are welcome.
library(lattice)
library(latticeExtra)
d = data.frame(what=c("A","B","Difference"),
mean=c(75,105,30),
lower=c(50,80,-3),
upper = c(100,130,63))
# Convert Differences to left scale
d1 = d
d1[d1$what=="Difference",-1] = d1[d1$what=="Difference",-1]+d1[d1=="A","mean"]
segplot(what~lower+upper,centers=mean,data=d1,horizontal=FALSE,draw.bands=FALSE,
lwd=3,cex=3,ylim=c(0,NA),pch=c(16,16,17),
panel = function (x,y,z,...){
centers = list(...)$centers
panel.segplot(x,y,z,...)
panel.abline(h=centers[1:2],lty=3)
} )
## How to add the right scale, close to the last bar?
par(mar=c(3,5,3,5))
plot(NA, xlim=c(.5,3.5), ylim=c(0, max(d$upper[1:2])), bty="l", xaxt="n", xlab="",ylab="Mean")
points(d$mean[1:2], pch=19)
segments(1,d$mean[1],5,d$mean[1],lty=2)
segments(2,d$mean[2],5,d$mean[2],lty=2)
axis(1, 1:3, d$what)
segments(1:2,d$lower[1:2],1:2,d$upper[1:2])
axis(4, seq((d$mean[1]-30),(d$mean[1]+50),by=10), seq(-30,50,by=10), las=1)
points(3,d$mean[1]+d$mean[3],pch=17, cex=1.5)
segments(3,d$lower[3]+d$lower[2],3,d$lower[3]+d$upper[2], lwd=2)
mtext("Difference", side=4, at=d$mean[1], line=3)
As a starting point another base R solution with Hmisc:
library(Hmisc)
with(d1,
errbar(as.integer(what),mean,upper,lower,xlim=c(0,4),xaxt="n",xlab="",ylim=c(0,150))
)
points(3,d1[d1$what=="Difference","mean"],pch=15)
axis(1,at=1:3,labels=d1$what)
atics <- seq(floor(d[d$what=="Difference","lower"]/10)*10,ceiling(d[d$what=="Difference","upper"]/10)*10,by=10)
axis(4,at=atics+d1[d1=="A","mean"],labels=atics,pos=3.5)
I would also go with base graph, as it includes the possibility to actually have two y-axis, see the answer here:
Here is my soultion that uses only d:
xlim <- c(0.5, 3.5)
plot(1:2, d[d$what %in% LETTERS[1:2], "mean"], xlim = xlim, ylim = c(0, 140),
xlab = "", ylab = "", xaxt = "n", bty = "l", yaxs = "i")
lines(c(1,1), d[1, 3:4])
lines(c(2,2), d[2, 3:4])
par(new = TRUE)
plot(3, d[d$what == "Difference", "mean"], ylim = c(-80, 130), xlim = xlim,
yaxt = "n", xaxt = "n", xlab = "", ylab = "", bty = "n")
lines(c(3,3), d[3, 3:4])
Axis(x = c(-20, 60), at = c(-20, 0, 20, 40, 60), side = 4)
axis(1, at = c(1:3), labels = c("A", "B", "Difference"))
Which gives:
To make it clearer that the difference is something different, you can increase the distance from the other two points:
xlim <- c(0.5, 4)
plot(1:2, d[d$what %in% LETTERS[1:2], "mean"], xlim = xlim, ylim = c(0, 140),
xlab = "", ylab = "", xaxt = "n", bty = "l", yaxs = "i")
lines(c(1,1), d[1, 3:4])
lines(c(2,2), d[2, 3:4])
par(new = TRUE)
plot(3.5, d[d$what == "Difference", "mean"], ylim = c(-80, 130), xlim = xlim,
yaxt = "n", xaxt = "n", xlab = "", ylab = "", bty = "n")
lines(c(3.5,3.5), d[3, 3:4])
Axis(x = c(-20, 60), at = c(-20, 0, 20, 40, 60), side = 4)
axis(1, at = c(1,2,3.5), labels = c("A", "B", "Difference"))
I think you can do that also with base R, what about:
d = data.frame(what=c("A","B","Difference"),
mean=c(75,105,30),
lower=c(50,80,-3),
upper = c(100,130,63))
plot(-1,-1,xlim=c(1,3),ylim=c(0,140),xaxt="n")
lines(c(1,1),c(d[1,3],d[1,4]))
points(rep(1,3),d[1,2:4],pch=4)
lines(c(1.5,1.5),c(d[2,3],d[2,4]))
points(rep(1.5,3),d[2,2:4],pch=4)
lines(c(2,2),c(d[3,3],d[3,4]))
points(rep(2,3),d[3,2:4],pch=4)
lines(c(1.5,2.2),c(d[2,2],d[2,2]),lty="dotted")
axis(1, at=c(1,1.5,2), labels=c("A","B","Difference"))
axis(4,at=c(40,80,120),labels=c(-1,0,1),pos=2.2)
I simplified some things and didn't wrote it as function, but I think the idea is clear and could easily be extended to a function.