If all my data is a long row of comma separated values (duration times), then R can give me a lot of info by
summary(dat)
Question
Can R also make a plot of the data with confidence intervals?
As the first google match stated, you can try doing as:
n<-50
x<-sample(40:70,n,rep=T)
y<-.7*x+rnorm(n,sd=5)
plot(x,y,xlim=c(20,90),ylim=c(0,80))
mylm<-lm(y~x)
abline(mylm,col="red")
newx<-seq(20,90)
prd<-predict(mylm,newdata=data.frame(x=newx),interval = c("confidence"),
level = 0.90,type="response")
lines(newx,prd[,2],col="red",lty=2)
lines(newx,prd[,3],col="red",lty=2)
Of course it is one of possibilities
Another example would be:
x <- rnorm(15)
y <- x + rnorm(15)
new <- data.frame(x = seq(-3, 3, 0.5))
pred.w.clim <- predict(lm(y ~ x), new, interval="confidence")
# Just create a blank plot region with axes first. We'll add to this
plot(range(new$x), range(pred.w.clim), type = "n", ann = FALSE)
# For convenience
CI.U <- pred.w.clim[, "upr"]
CI.L <- pred.w.clim[, "lwr"]
# Create a 'loop' around the x values. Add values to 'close' the loop
X.Vec <- c(new$x, tail(new$x, 1), rev(new$x), new$x[1])
# Same for y values
Y.Vec <- c(CI.L, tail(CI.U, 1), rev(CI.U), CI.L[1])
# Use polygon() to create the enclosed shading area
# We are 'tracing' around the perimeter as created above
polygon(X.Vec, Y.Vec, col = "grey", border = NA)
# Use matlines() to plot the fitted line and CI's
# Add after the polygon above so the lines are visible
matlines(new$x, pred.w.clim, lty = c(1, 2, 2), type = "l", col =
c("black", "red", "red"))
Example 1
Example 2
Related
I need to simulate 400 observations pairs of observations and plot a scatter plot of X1, X2 of the pairs of observations with a different set of colors in R. I used the following code below but I do not believe it is correct.
group <- rbinom(200, 1, 0.3) + 0 # Create grouping variable
group_pch <- group # Create variable for symbols
group_pch[group_pch == 1] <- 16
group_pch[group_pch == 0] <- 16
group_col <- group # Create variable for colors
group_col[group_col == 1] <- "red"
group_col[group_col == 0] <- "green"
plot(x, y, # Scatterplot with two groups
pch = group_pch,
col = group_col)
First create a dataframe with 400 x- and y-values, here (X1, X2). I created the values by using random numbers generated with rnorm(), see help(rnorm) for details. Then I created a third variable, which returns the label higher if a respective x (X1) value is greater than 3 (mean) and lower if not. Again, see help(cut) for further information.
Finally, with plot(x = df$X1, y = df$X2) a simple scatterplot can be visualised. Using the argument col = df$Color, the points are seperated by color according to the breaks = c(-Inf, 3, +Inf).
Unfortunately, from your question it is not clear how many colors/seperations/groups/... are required, and which rule should be used to differentiate between them.
# if you need to submit your code:
set.seed(1209) # to make results reproducible; see:
# ?set.seed()
# Create dataframe with 400 x and y-values
df <- data.frame(
X1 = rnorm(400, mean=3, sd=1),
X2 = rnorm(400, mean=5, sd=3))
# Create new variable (color) for plotting
df$Color <- cut(df$X1, breaks = c(-Inf, 3, +Inf),
labels = c("lower", "higher"),
right = FALSE)
# Plot #1
# Plot the points and differentiate X1 smaller/greater 3 by color
plot(x = df$X1, y = df$X2,
col = df$Color,
main = "Plot Title",
pch = 19, cex = 0.5)
If you are asked to use specific colors, then do the following:
# class(df$Color) # returns factor - perfect!
preferredColors <- c("red", "green")[df$Color]
# you need as many colors as labels, here 2 (lower, higher)
# Plot #2
plot(x = df$X1, y = df$X2,
col = preferredColors,
main = "Plot Title",
xlab = "Description of x-axes",
ylab = "Description of y-axes",
pch = 19, cex = 0.5)
Just for illustration; copy previous code only:
# output first three rows of the df for inspection purpose:
head(df, n=3)
#> X1 X2 Color
#> 1 2.450875 5.6721845 lower
#> 2 5.582115 4.8569917 higher
#> 3 2.324129 -0.3660018 lower
Created on 2021-09-12 by the reprex package (v2.0.1)
Output, Plot #2:
Background
I have a function called TPN. When you run this function, it produces two plots (see picture below). The bottom-row plot samples from the top-row plot.
Question
I'm wondering how I could fix the ylim of the bottom-row plot to be always (i.e., regardless of the input values) the same as ylim of the top-row plot?
R code is provided below the picture (Run the entire block of code).
############## Input Values #################
TPN = function( each.sub.pop.n = 150,
sub.pop.means = 20:10,
predict.range = 10:0,
sub.pop.sd = .75,
n.sample = 2 ) {
#############################################
par( mar = c(2, 4.1, 2.1, 2.1) )
m = matrix( c(1, 2), nrow = 2, ncol = 1 ); layout(m)
set.seed(2460986)
Vec.rnorm <- Vectorize(function(n, mean, sd) rnorm(n, mean, sd), 'mean')
y <- c( Vec.rnorm(each.sub.pop.n, sub.pop.means, sub.pop.sd) )
set.seed(NULL)
x <- rep(predict.range, each = each.sub.pop.n)
plot(x, y) ## Plot #1
sample <- lapply(split(y, x), function(z) sample(z, n.sample, replace = TRUE))
sample <- data.frame(y = unlist(sample),
x = as.numeric(rep(names(sample), each = n.sample)))
plot(sample$x, sample$y) ## Plot # 2
}
## TEST HERE:
TPN()
You can get the ylim using par("yaxp")[1:2]. So, you can change the second plot code to have its ylim as the first plot's:
plot(sample$x, sample$y, ylim = par("yaxp")[1:2]) ## Plot # 2
or as mentioned in the comments, you can simply set the ylim for both plots to be range of both data-sets and add that to both plots:
ylim = range(c(y, sample$y))
Another option: Produce the same plot again but with type = "n" and then filling the points with points(). For example, change your plot 2 to
plot(x, y, type = "n")
points(sample$x, sample$y)
A benefit of this approach is that everything in the plot will be exactly the same, not just the y-axis (which may or may not matter for your function).
I found many resources on how to draw Venn diagrams in R. Stack Overflow has a lot of them. However, I still can't draw my diagrams the way I want. Take the following code as an example:
library("VennDiagram")
A <- 1:4
B <- 3:6
d <- list(A, B)
vp <- venn.diagram(d, fill = c("white", "white"), alpha = 1, filename = NULL,
category.names=c("A", "B"))
grid.draw(vp)
I want the intersection between the sets to be red. However, if I change any of the white colors to red, I get the following:
vp_red <- venn.diagram(d, fill = c("red", "white"), alpha = 1, filename = NULL,
category.names=c("A", "B"))
grid.draw(vp_red)
That's not quite what I want. I want only the intersection to be red. If I change the alpha, this is what I get:
vp_alpha <- venn.diagram(d, fill = c("red", "white"), alpha = 0.5, filename = NULL,
category.names=c("A", "B"))
grid.draw(vp_alpha)
Now I have pink in my intersection. This is not what I want as well. What I want is something like this image from Wikipedia:
How can I do this? Maybe VennDiagram package can't do it and I need some other package, but I've been testing different ways to do it, and I'm not being able to find a solution.
I will show two different possibilities. In the first example, polyclip::polyclip is used to get the intersection. In the second example, circles are converted to sp::SpatialPolygons and we get the intersection using rgeos::gIntersection. Then we re-plot the circles and fill the intersecting area.
The resulting object when using venn.diagram is
"of class gList containing the grid objects that make up the diagram"
Thus, in both cases we can grab relevant data from "vp". First, check the structure and list the grobs of the object:
str(vp)
grid.ls()
# GRID.polygon.234
# GRID.polygon.235
# GRID.polygon.236 <~~ these are the empty circles
# GRID.polygon.237 <~~ $ col : chr "black"; $ fill: chr "transparent"
# GRID.text.238 <~~ labels
# GRID.text.239
# GRID.text.240
# GRID.text.241
# GRID.text.242
1. polyclip
Grab x- and y-values, and put them in the format required for polyclip:
A <- list(list(x = as.vector(vp[[3]][[1]]), y = as.vector(vp[[3]][[2]])))
B <- list(list(x = as.vector(vp[[4]][[1]]), y = as.vector(vp[[4]][[2]])))
Find intersection:
library(polyclip)
AintB <- polyclip(A, B)
Grab labels:
ix <- sapply(vp, function(x) grepl("text", x$name, fixed = TRUE))
labs <- do.call(rbind.data.frame, lapply(vp[ix], `[`, c("x", "y", "label")))
Plot it!
plot(c(0, 1), c(0, 1), type = "n", axes = FALSE, xlab = "", ylab = "")
polygon(A[[1]])
polygon(B[[1]])
polygon(AintB[[1]], col = "red")
text(x = labs$x, y = labs$y, labels = labs$label)
2. SpatialPolygons and gIntersection
Grab the coordinates of the circles:
# grab x- and y-values from first circle
x1 <- vp[[3]][["x"]]
y1 <- vp[[3]][["y"]]
# grab x- and y-values from second circle
x2 <- vp[[4]][["x"]]
y2 <- vp[[4]][["y"]]
Convert points to SpatialPolygons and find their intersection:
library(sp)
library(rgeos)
p1 <- SpatialPolygons(list(Polygons(list(Polygon(cbind(x1, y1))), ID = 1)))
p2 <- SpatialPolygons(list(Polygons(list(Polygon(cbind(x2, y2))), ID = 2)))
ip <- gIntersection(p1, p2)
Plot it!
# plot circles
plot(p1, xlim = range(c(x1, x2)), ylim = range(c(y1, y2)))
plot(p2, add = TRUE)
# plot intersection
plot(ip, add = TRUE, col = "red")
# add labels (see above)
text(x = labs$x, y = labs$y, labels = labs$label)
I'm quite sure you could work directly on the grobs using clipping functions in grid or gridSVG package.
It's very easy in eulerr R package
library(eulerr)
plot(euler(c("A"=5,"B"=4,"A&B"=2)),quantities = TRUE,fills=c("white","white","red"))
euler set colours
I have data cdecn:
set.seed(0)
cdecn <- sample(1:10,570,replace=TRUE)
a <- rnorm(cdecn,mean(cdecn),sd(cdecn))
I have created a plot which displays the cumulative probabilities.
aprob <- ecdf(a)
plot(aprob)
I am wondering how I can switch the x-axis and y-axis to get a new plot, i.e., the inverse of ECDF.
Also, for the new plot, is there a way to add a vertical line through where the my curve intersects 0?
We can do the following. My comments along the code is very explanatory.
## reproducible example
set.seed(0)
cdecn <- sample(1:10,570,replace=TRUE)
a <- rnorm(cdecn,mean(cdecn),sd(cdecn)) ## random samples
a <- sort(a) ## sort samples in ascending order
e_cdf <- ecdf(a) ## ecdf function
e_cdf_val <- 1:length(a) / length(a) ## the same as: e_cdf_val <- e_cdf(a)
par(mfrow = c(1,2))
## ordinary ecdf plot
plot(a, e_cdf_val, type = "s", xlab = "ordered samples", ylab = "ECDF",
main = "ECDF")
## switch axises to get 'inverse' ECDF
plot(e_cdf_val, a, type = "s", xlab = "ECDF", ylab = "ordered sample",
main = "'inverse' ECDF")
## where the curve intersects 0
p <- e_cdf(0)
## [1] 0.01578947
## highlight the intersection point
points(p, 0, pch = 20, col = "red")
## add a dotted red vertical line through intersection
abline(v = p, lty = 3, col = "red")
## display value p to the right of the intersection point
## round up to 4 digits
text(p, 0, pos = 4, labels = round(p, 4), col = "red")
cdecn <- sample(1:10,570,replace=TRUE)
a <- rnorm(cdecn,mean(cdecn),sd(cdecn))
aprob <- ecdf(a)
plot(aprob)
# Switch the x and y axes
x <- seq(0,1,0.001754386)
plot(y=knots(aprob), x=x, ylab = "Fn(y)")
# Add a 45 degree straight line at 0, 0
my_line <- function(x,y,...){
points(x,y,...)
segments(min(x), y==0, max(x), max(y),...)
}
lines(my_line(x=x, y = knots(aprob)))
The "straight line at x==0" bit makes me suspect that you want a QQplot:
qqnorm(a)
qqline(a)
I have a 2 by 2 matrix and I would like to color the numbers based on their values (say I have numbers from 0-20 and I want to color 0-2=blue; 2-4=sky blue... 12-14=yellow, 18-20=red, etc.). In Excel I was only able to have 3 colors with the Conditional Formatting option (see the figure). Anyone knows if I can have more colors in another program (preferably R). Thanks!
PS: Please note, I do not need a heatmap or contour plot per se, because I am interested in the exact values of the numbers.
Here is a solution, I hope it helps
# you need this for the colour ramp
library(RColorBrewer)
# setup
rows <- 10
collumns <- 10
# data matrix
zVals <- round(rnorm(rows*collumns), 2)
z <- matrix(zVals, rows, collumns)
# pick the number of colours (granularity of colour scale)
nColors <- 100
# create the colour pallete
cols <-colorRampPalette(colors=c("blue", "grey", "red"))(nColors)
# get a zScale for the colours
zScale <- seq(min(z), max(z), length.out = nColors)
# function that returns the nearest colour given a value of z
findNearestColour <- function(x) {
colorIndex <- which(abs(zScale - x) == min(abs(zScale - x)))
return(cols[colorIndex])
}
# empty plot
plot(1, 1, type = "n", xlim = c(1, rows), ylim = c(1, collumns),
axes = F, xlab = "", ylab = "")
# populate it with the data
for(r in 1:rows){
for(c in 1:collumns){
text(c, r, z[c,r], col = findNearestColour(z[c,r]))
}
}