How to plot deviation from mean - r

In R I have created a simple matrix of one column yielding a list of numbers with a set mean and a given standard deviation.
rnorm2 <- function(n,mean,sd) { mean+sd*scale(rnorm(n)) }
r <- rnorm2(100,4,1)
I now would like to plot how these numbers differ from the mean. I can do this in Excel as shown below:
But I would like to use ggplot2 to create a graph in R. in the Excel graph I have cheated by using a line graph but if I could do this as columns it would be better. I have tried using a scatter plot but I cant work out how to turn this into deviations from the mean.

Perhaps you want:
rnorm2 <- function(n,mean,sd) { mean+sd*scale(rnorm(n)) }
set.seed(101)
r <- rnorm2(100,4,1)
x <- seq_along(r) ## sets up a vector from 1 to length(r)
par(las=1,bty="l") ## cosmetic preferences
plot(x, r, col = "green", pch=16) ## draws the points
## if you don't want points at all, use
## plot(x, r, type="n")
## to set up the axes without drawing anything inside them
segments(x0=x, y0=4, x1=x, y1=r, col="green") ## connects them to the mean line
abline(h=4)
If you were plotting around 0 you could do this automatically with type="h":
plot(x,r-4,type="h", col="green")
To do this in ggplot2:
library("ggplot2")
theme_set(theme_bw()) ## my cosmetic preferences
ggplot(data.frame(x,r))+
geom_segment(aes(x=x,xend=x,y=mean(r),yend=r),colour="green")+
geom_hline(yintercept=mean(r))

Ben's answer using ggplot2 works great, but if you don't want to manually adjust the line width, you could do this:
# Half of Ben's data
rnorm2 <- function(n,mean,sd) { mean+sd*scale(rnorm(n)) }
set.seed(101)
r <- rnorm2(50,4,1)
x <- seq_along(r) ## sets up a vector from 1 to length(r)
# New variable for the difference between each value and the mean
value <- r - mean(r)
ggplot(data.frame(x, value)) +
# geom_bar anchors each bar at zero (which is the mean minus the mean)
geom_bar(aes(x, value), stat = "identity"
, position = "dodge", fill = "green") +
# but you can change the y-axis labels with a function, to add the mean back on
scale_y_continuous(labels = function(x) {x + mean(r)})

in base R it's quite simple, just do
plot(r, col = "green", type = "l")
abline(4, 0)
You also tagged ggplot2, so in that case it will be a bit more complicated, because ggplot requires creating a data frame and then melting it.
library(ggplot2)
library(reshape2)
df <- melt(data.frame(x = 1:100, mean = 4, r = r), 1)
ggplot(df, aes(x, value, color = variable)) +
geom_line()

Related

Change colors of select lines in ggplot2 coefficient plot in R

I would like to change the color of coefficient lines based on whether the point estimate is negative or positive in a ggplot2 coefficient plot in R. For example:
require(coefplot)
set.seed(123)
dat <- data.frame(x = rnorm(100), z = rnorm(100))
mod1 <- lm(y1 ~ x + z, data = dat)
coefplot.lm(mod1)
Which produces the following plot:
In this plot, I would like to change the "x" variable to red when plotted. Any ideas? Thanks.
I think, you cannot do this with a plot produced by coefplot.lm. The package coefplot uses ggplot2 as the plotting system, which is good itself, but does not allow to play with colors as easily as you would like. To achieve the desired colors, you need to have a variable in your dataset that would color-code the values; you need to specify color = color-code in aes() function within the layer that draws the dots with CE. Apparently, this is impossible to do with the output of coefplot.lm function. Maybe, you can change the colors using ggplot2 ggplot_build() function. I would say, it's easier to write your own function for this task.
I've done this once to plot odds. If you want, you may use my code. Feel free to change it. The idea is the same as in coefplot. First, we extract coefficients from a model object and prepare the data set for plotting; second, actually plot.
The code for extracting coefficients and data set preparation
df_plot_odds <- function(x){
tmp<-data.frame(cbind(exp(coef(x)), exp(confint.default(x))))
odds<-tmp[-1,]
names(odds)<-c('OR', 'lower', 'upper')
odds$vars<-row.names(odds)
odds$col<-odds$OR>1
odds$col[odds$col==TRUE] <-'blue'
odds$col[odds$col==FALSE] <-'red'
odds$pvalue <- summary(x)$coef[-1, "Pr(>|t|)"]
return(odds)
}
Plot the output of the extract function
plot_odds <- function(df_plot_odds, xlab="Odds Ratio", ylab="", asp=1){
require(ggplot2)
p <- ggplot(df_plot_odds, aes(x=vars, y=OR, ymin=lower, ymax=upper),asp=asp) +
geom_errorbar(aes(color=col),width=0.1) +
geom_point(aes(color=col),size=3)+
geom_hline(yintercept = 1, linetype=2) +
scale_color_manual('Effect', labels=c('Positive','Negative'),
values=c('blue','red'))+
coord_flip() +
theme_bw() +
theme(legend.position="none",aspect.ratio = asp)+
ylab(xlab) +
xlab(ylab) #switch because of the coord_flip() above
return(p)
}
Plotting your example
set.seed(123)
dat <- data.frame(x = rnorm(100),y = rnorm(100), z = rnorm(100))
mod1 <- lm(y ~ x + z, data = dat)
df <- df_plot_odds(mod1)
plot <- plot_odds(df)
plot
Which yields
Note that I chose theme_wb() as the default. Output is a ggplot2object. So, you may change it quite a lot.

Automatically determine number of axis ticks for discrete variable

I want to automatically set the number of breaks and the position of the breaks itself for the axis of a discrete variable such that the labels which are plotted are actually readable.
For example in the code below, the resulting plot should only show a portion of the labels/the x-variable.
ggData <- data.frame(x=paste0('B',1:100), y=rnorm(100))
ggplot(ggData, aes_string('x', 'y')) +
geom_point(size=2.5, shape=19, na.rm = TRUE)
So far, I tried to use pretty, and pretty_breaks which are, however, not for discrete variables.
Fist we turn the factor into a character and then into a ordered factor. Secondly, we subset ggData$x to create a vector (labels) with the ticks we want. In the example every 10 elements. Finally, we create the plot using scale_x_discrete, using the previous vector (labels), inside the parameter breaks.
ggData <- data.frame(x=paste0('B',1:100), y=rnorm(100))
ggData$x <- as.character(ggData$x)
ggData$x <- factor(ggData$x, levels=unique(ggData$x))
labels <- ggData$x[seq(0, 100, by= 10)]
ggplot(ggData, aes_string('x', 'y')) +
geom_point(size=2.5, shape=19, na.rm = TRUE) +
scale_x_discrete(breaks=labels)

Combining two plot in R

I wish to compare the observed values to the fitted ones. To do so, I decided to use a plot in R. What I want to do is to plot X vs Y and X vs Y.fitted on the same plot. I have written some code, but it is incomplete. My plot needs to look like this one below. On the plot, circles and crosses represent the observed and fitted values respectively
set.seed(1)
x <- runif(8,0,1)
y <- runif(8,0,1)
y.fitted <- runif(8,0,1)
plot(x,y,pch=1)
plot(x,y.fitted,pch=5)
In your code, the second plot will not add points to the existing plot but create a new one. You can + use the function points to add points to the existing plot.
plot(x, y, pch = 1)
points(x, y.fitted, pch = 4)
running plot the second time will create a new one. You could use points
set.seed(1)
x <- runif(8,0,1)
y <- runif(8,0,1)
y.fitted <- runif(8,0,1)
plot(x,y,pch=1)
points(x,y.fitted,pch=5)
A solution with ggplot2 giving a better and neat graph outlook:
library(ggplot2)
df = data.frame(x=runif(8,0,1),y=runif(8,0,1),y.fitted=runif(8,0,1))
df = melt(df, id=c('x'))
ggplot() + geom_point(aes(x=x,y=value, shape=variable, colour=variable), df)

plot with overlapping points

I have data in R with overlapping points.
x = c(4,4,4,7,3,7,3,8,6,8,9,1,1,1,8)
y = c(5,5,5,2,1,2,5,2,2,2,3,5,5,5,2)
plot(x,y)
How can I plot these points so that the points that are overlapped are proportionally larger than the points that are not. For example, if 3 points lie at (4,5), then the dot at position (4,5) should be three times as large as a dot with only one point.
Here's one way using ggplot2:
x = c(4,4,4,7,3,7,3,8,6,8,9,1,1,1,8)
y = c(5,5,5,2,1,2,5,2,2,2,3,5,5,5,2)
df <- data.frame(x = x,y = y)
ggplot(data = df,aes(x = x,y = y)) + stat_sum()
By default, stat_sum uses the proportion of instances. You can use raw counts instead by doing something like:
ggplot(data = df,aes(x = x,y = y)) + stat_sum(aes(size = ..n..))
Here's a simpler (I think) solution:
x <- c(4,4,4,7,3,7,3,8,6,8,9,1,1,1,8)
y <- c(5,5,5,2,1,2,5,2,2,2,3,5,5,5,2)
size <- sapply(1:length(x), function(i) { sum(x==x[i] & y==y[i]) })
plot(x,y, cex=size)
## Tabulate the number of occurrences of each cooordinate
df <- data.frame(x, y)
df2 <- cbind(unique(df), value = with(df, tapply(x, paste(x,y), length)))
## Use cex to set point size to some function of coordinate count
## (By using sqrt(value), the _area_ of each point will be proportional
## to the number of observations it represents)
plot(y ~ x, cex = sqrt(value), data = df2, pch = 16)
You didn't really ask for this approach but alpha may be another way to address this:
library(ggplot2)
ggplot(data.frame(x=x, y=y), aes(x, y)) + geom_point(alpha=.3, size = 3)
You need to add the parameter cex to your plot function. First what I would do is use the function as.data.frame and table to reduce your data to unique (x,y) pairs and their frequencies:
new.data = as.data.frame(table(x,y))
new.data = new.data[new.data$Freq != 0,] # Remove points with zero frequency
The only downside to this is that it converts numeric data to factors. So convert back to numeric, and plot!
plot(as.numeric(new.data$x), as.numeric(new.data$y), cex = as.numeric(new.data$Freq))
You may also want to try sunflowerplot.
sunflowerplot(x,y)
Let me propose alternatives to adjusting the size of the points. One of the drawbacks of using size (radius? area?) is that the reader's evaluation of spot size vs. the underlying numeric value is subjective.
So, option 1: plot each point with transparency --- ninja'd by Tyler!
option 2: use jitter to push your data around slightly so the plotted points don't overlap.
A solution using lattice and table ( similar to #R_User but no need to remove 0 since lattice do the job)
dt <- as.data.frame(table(x,y))
xyplot(dt$y~dt$x, cex = dt$Freq^2, col =dt$Freq)

Nonparametric quantile regression curves to scatterplot

I created a scatterplot (multiple groups GRP) with IV=time, DV=concentration. I wanted to add the quantile regression curves (0.025,0.05,0.5,0.95,0.975) to my plot.
And by the way, this is what I did to create the scatter-plot:
attach(E) ## E is the name I gave to my data
## Change Group to factor so that may work with levels in the legend
Group<-as.character(Group)
Group<-as.factor(Group)
## Make the colored scatter-plot
mycolors = c('red','orange','green','cornflowerblue')
plot(Time,Concentration,main="Template",xlab="Time",ylab="Concentration",pch=18,col=mycolors[Group])
## This also works identically
## with(E,plot(Time,Concentration,col=mycolors[Group],main="Template",xlab="Time",ylab="Concentration",pch=18))
## Use identify to identify each point by group number (to check)
## identify(Time,Concentration,col=mycolors[Group],labels=Group)
## Press Esc or press Stop to stop identify function
## Create legend
## Use locator(n=1,type="o") to find the point to align top left of legend box
legend('topright',legend=levels(Group),col=mycolors,pch=18,title='Group')
Because the data that I created here is a small subset of my larger data, it may look like it can be approximated as a rectangular hyperbole. But I don't want to call a mathematical relationship between my independent and dependent variables yet.
I think nlrq from the package quantreg may be the answer, but I don't understand how to use the function when I don't know the relationship between my variables.
I find this graph from a science article, and I want to do precisely the same kind of graph:
Again, thanks for your help!
Update
Test.csv
I was pointed out that my sample data is not reproducible. Here is a sample of my data.
library(evd)
qcbvnonpar(p=c(0.025,0.05,0.5,0.95,0.975),cbind(TAD,DV),epmar=T,plot=F,add=T)
I also tried qcbvnonpar::evd,but the curve doesn't seem very smooth.
Maybe have a look at quantreg:::rqss for smoothing splines and quantile regression.
Sorry for the not so nice example data:
set.seed(1234)
period <- 100
x <- 1:100
y <- sin(2*pi*x/period) + runif(length(x),-1,1)
require(quantreg)
mod <- rqss(y ~ qss(x))
mod2 <- rqss(y ~ qss(x), tau=0.75)
mod3 <- rqss(y ~ qss(x), tau=0.25)
plot(x, y)
lines(x[-1], mod$coef[1] + mod$coef[-1], col = 'red')
lines(x[-1], mod2$coef[1] + mod2$coef[-1], col = 'green')
lines(x[-1], mod3$coef[1] + mod3$coef[-1], col = 'green')
I have in the past frequently struggled with rqss and my issues have almost always been related to the ordering of the points.
You have multiple measurements at various time points, which is why you're getting different lengths. This works for me:
dat <- read.csv("~/Downloads/Test.csv")
library(quantreg)
dat <- plyr::arrange(dat,Time)
fit<-rqss(Concentration~qss(Time,constraint="N"),tau=0.5,data = dat)
with(dat,plot(Time,Concentration))
lines(unique(dat$Time)[-1],fit$coef[1] + fit$coef[-1])
Sorting the data frame prior to fitting the model appears necessary.
In case you want ggplot2 graphic...
I based this example on that of #EDi. I increased the x and y so that the quantile lines would be less wiggly. Because of this increase, I need to use unique(x) in place of x in some of the calls.
Here's the modified set-up:
set.seed(1234)
period <- 100
x <- rep(1:100,each=100)
y <- 1*sin(2*pi*x/period) + runif(length(x),-1,1)
require(quantreg)
mod <- rqss(y ~ qss(x))
mod2 <- rqss(y ~ qss(x), tau=0.75)
mod3 <- rqss(y ~ qss(x), tau=0.25)
Here are the two plots:
# #EDi's base graphics example
plot(x, y)
lines(unique(x)[-1], mod$coef[1] + mod$coef[-1], col = 'red')
lines(unique(x)[-1], mod2$coef[1] + mod2$coef[-1], col = 'green')
lines(unique(x)[-1], mod3$coef[1] + mod3$coef[-1], col = 'green')
# #swihart's ggplot2 example:
## get into dataset so that ggplot2 can have some fun:
qrdf <- data.table(x = unique(x)[-1],
median = mod$coef[1] + mod$coef[-1],
qupp = mod2$coef[1] + mod2$coef[-1],
qlow = mod3$coef[1] + mod3$coef[-1]
)
line_size = 2
ggplot() +
geom_point(aes(x=x, y=y),
color="black", alpha=0.5) +
## quantiles:
geom_line(data=qrdf,aes(x=x, y=median),
color="red", alpha=0.7, size=line_size) +
geom_line(data=qrdf,aes(x=x, y=qupp),
color="blue", alpha=0.7, size=line_size, lty=1) +
geom_line(data=qrdf,aes(x=x, y=qlow),
color="blue", alpha=0.7, size=line_size, lty=1)

Resources