I have two vectors:
x = c(0, 20, 10000, 50, 30000)
y = c(0, 3, 800, 1000, 7000)
I would like to do a scatterplot of my data in R. This is not complicated with the plot function. It would look best on a log scale, but values equal to 0 are not shown on the graph. I know log(0) is nonexistent. But I was hoping there was a way to show them on the scatterplot? (for example a point on the y-axis or the x-axis). Does anyone know how to do that?
In order to plot the data points, add a very small increment to all values:
plot(x + 0.1, y + 0.1, log = 'xy')
Now this hides which values are 0. This can be visualised well by using another symbol for null values:
plot(x + 0.1, y + 0.1, log = 'xy', pch = ifelse(x == 0 | y == 0, 17, 16))
Alternatively, you could also choose a different colour.
In order to plot the actual log values, don’s use the log='xy' argument but rather apply the log to the numbers directly:
plot(log(x + 0.1), log(y + 0.1), pch = ifelse(x == 0 | y == 0, 17, 16))
Related
I want to present a barplot with baseline y=1. I want to present fold change, therefore starting with 1. How do I change y starting value with the function barplot? Thanks!
a <- c(0.5,1.5)
barplot(a)
Simulate a new y axis baseline by subtracting 1 and then compensating in the axis labels.
a <- c(0.5,1.5)
at <- c(-0.5, 0, 0.5, 1)
barplot(a - 1, yaxt = "n")
axis(2, at = at, labels = at + 1)
abline(h = 0)
Created on 2022-10-17 with reprex v2.0.2
You could subset on values greater than or equal to 1, use ylim together with xpd. This 1. does not show FC < 1 and the plot has baseline at 1.
barplot(a[a >= 1], ylim=c(1, max(a)*1.1), xpd=FALSE)
box()
Data:
set.seed(334322)
a <- runif(10, 0, 6)
The direct solution is
barplot(a - 1, offset = 1)
Although, being fold changes, consider whether it may be better to use log2 scales or transformations.
I am plotting the density of F(1,49) in R. It seems that the simulated plot does not match the theoretical plot when values approach the zero.
set.seed(123)
val <- rf(1000, df1=1, df2=49)
plot(density(val), yaxt="n",ylab="",xlab="Observation",
main=expression(paste("Density plot (",italic(n),"=1000, ",italic(df)[1],"=1, ",italic(df)[2],"=49)")),
lwd=2)
curve(df(x, df1=1, df2=49), from=0, to=10, add=T, col="red",lwd=2,lty=2)
legend("topright",c("Theoretical","Simulated"),
col=c("red","black"),lty=c(2,1),bty="n")
Using density(val, from = 0) gets you much closer, although still not perfect. Densities near boundaries are notoriously difficult to calculate in a satisfactory way.
By default, density uses a Gaussian kernel to estimate the probability density at a given point. Effectively, this means that at each point an observation was found, a normal density curve is placed there with its center at the observation. All these normal densities are added up, then the result is normalized so that the area under the curve is 1.
This works well if observations have a central tendency, but gives unrealistic results when there are sharp boundaries (Try plot(density(runif(1000))) for a prime example).
When you have a very high density of points close to zero, but none below zero, the left tail of all the normal kernels will "spill over" into the negative values, giving a Gaussian-type which doesn't match the theoretical density.
This means that if you have a sharp boundary at 0, you should remove values of your simulated density that are between zero and about two standard deviations of your smoothing kernel - anything below this will be misleading.
Since we can control the standard deviation of our smoothing kernel with the bw parameter of density, and easily control which x values are plotted using ggplot, we will get a more sensible result by doing something like this:
library(ggplot2)
ggplot(as.data.frame(density(val), bw = 0.1), aes(x, y)) +
geom_line(aes(col = "Simulated"), na.rm = TRUE) +
geom_function(fun = ~ df(.x, df1 = 1, df2 = 49),
aes(col = "Theoretical"), lty = 2) +
lims(x = c(0.2, 12)) +
theme_classic(base_size = 16) +
labs(title = expression(paste("Density plot (",italic(n),"=1000, ",
italic(df)[1],"=1, ",italic(df)[2],"=49)")),
x = "Observation", y = "") +
scale_color_manual(values = c("black", "red"), name = "")
The kde1d and logspline packages are not bad for such densities.
sims <- rf(1500, 1, 49)
library(kde1d)
kd <- kde1d(sims, bw = 1, xmin = 0)
plot(kd, col = "red", xlim = c(0, 2), ylim = c(0, 2))
curve(df(x, 1, 49), add = TRUE)
library(logspline)
fit <- logspline(sims, lbound = 0, knots = c(0, 0.5, 1, 1.5, 2))
plot(fit, col = "red", xlim = c(0, 2), ylim = c(0, 2))
curve(df(x, 1, 49), add = TRUE)
I have a log-normal density with a mean of -0.4 and standard deviation of 2.5.
At x = 0.001 the height is over 5 (I double checked this value with the formula for the log-normal PDF):
dlnorm(0.001, -0.4, 2.5)
5.389517
When I plot it using the curve function over the input range 0-6 it looks like with a height just over 1.5:
curve(dlnorm(x, -.4, 2.5), xlim = c(0, 6), ylim = c(0, 6))
When I adjust the input range to 0-1 the height is nearly 4:
curve(dlnorm(x, -.4, 2.5), xlim = c(0, 1), ylim = c(0, 6))
Similarly with ggplot2 (output not shown, but looks like the curve plots above):
library(ggplot2)
ggplot(data = data.frame(x = 0), mapping = aes(x = x)) +
stat_function(fun = function(x) dlnorm(x, -0.4, 2.5)) +
xlim(0, 6) +
ylim(0, 6)
ggplot(data = data.frame(x = 0), mapping = aes(x = x)) +
stat_function(fun = function(x) dlnorm(x, -0.4, 2.5)) +
xlim(0, 1) +
ylim(0, 6)
Does someone know why the density height is changing when the x-axis scale is adjusted? And why neither attempt above seems to reach the correct height? I tried this with just the normal density and this doesn't happen.
curves generates a set of discrete points in the range you give it. By default it generates n = 101 points, so there is a step problem. If you increase the number of points you will have almost the correct value:
curve(dlnorm(x, -.4, 2.5), xlim = c(0, 1), ylim = c(0, 6), n = 1000)
In the first case you propose curve generates 101 points in the interval x <- c(0,6), while in the second case generates 101 points in the interval x <- c(0,1), so the step is more dense
I'm trying to plot multiple circles of different sizes on a plot using ggplot2's geom_point inside of a for loop. Every time I run it though, it plots all the circles, but all in the location of the last circle instead of in their respective locations as given by the data frame. Below is an example of the code I am running. I'm wondering how I would fix this or if there's a better way to get at what I'm trying to do here.
data <- data.frame("x" = c(0, 500, 1000, 1500, 2000),
"y" = c(1500, 500, 2000, 0, 1000),
"size" = c(3, 5, 1.5, 4.2, 2.6)
)
g <- ggplot(data = data, aes(x = x, y = y)) + xlim(0,2000) + ylim(0,2000)
for(i in 1:5) {
g <- g + geom_point(aes(x=data$x[i],y=data$y[i]), size = data$size[i], pch = 1)
}
print(g)
It's pretty rare to need a for-loop for a plot -- ggplot2 will take the whole dataframe and process it all without you needing to manage each row.
ggplot(data = data, aes(x = x, y = y, size = size)) +
geom_point(pch = 1)
I have the following kind of data: on a rectangular piece of land (120x50 yards), there are 6 (also rectabgular) smaller areas each with a different kind of plant. The idea is to study the attractiveness of the various kinds of plant to birds. Each time a bird sits down somewhere on the land, I have the exact coordinates of where the bird sits down.
I don't care exactly where the bird sits down, but only care which of the six areas it is. To show the relative preference of birds for the various plants, I want to make a heatmap that makes the areas that are frequented most the darkest.
So, I need to convert the coordinates to code which area the bird visits, and then create a heatmap that shows the differential preference for each land area.
(the research is a bit more involved than this, but this is the general idea.)
How would I do this in R? Is there a R function that takes a vector of coordinates and turns that in such a heatmap? If not, do you have some hints for more on how to do this?
Not meant to be the answer you are looking for, but might give you some inspiration.
# Simulate some data
birdieLandingSimulator <- data.frame(t(sapply(1:100, function(x) c(runif(1, -10,10), runif(1, -10,10)))))
# Assign some coordinates, which ended up not really being used much at all, except for the point colors
assignCoord <- function(x)
{
# Assign the four coordinates clockwise: 1, 2, 3, 4
ifelse(all(x>0), 1, ifelse(!sum(x>0), 3, ifelse(x[1]>0, 2, 4)))
}
birdieLandingSimulator <- cbind(birdieLandingSimulator, Q = apply(birdieLandingSimulator, 1, assignCoord))
# Plot
require(ggplot2)
ggplot(birdieLandingSimulator, aes(x = X1, y = X2)) +
stat_density2d(geom="tile", aes(fill = 1/..density..), contour = FALSE) +
geom_point(aes(color = factor(Q))) + theme_classic() +
theme(axis.title = element_blank(),
axis.line = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank()) +
scale_color_discrete(guide = FALSE, h=c(180, 270)) +
scale_fill_continuous(name = "Birdie Landing Location")
Use ggplot2. Take a look at the examples for geom_bin2d. It's pretty simple to get 2d bins. Notice that you pass in binwidth for both x and y:
> df = data.frame(x=c(1,2,4,6,3,2,4,2,1,7,4,4),y=c(2,1,4,2,4,4,1,4,2,3,1,1))
> ggplot(df,aes(x=x, y=y,alpha=0.5)) + geom_bin2d(binwidth=c(2,2))
If you don't want to use ggplot, you can use the cut function to separate your data into bins.
# Test data.
x <- sample(1:120, 100, replace=T)
y <- sample(1:50, 100, replace=T)
# Separate the data into bins.
x <- cut(x, c(0, 40, 80, 120))
y <- cut(y, c(0, 25, 50))
# Now plot it, suppressing reordering.
heatmap(table(y, x), Colv=NA, Rowv=NA)
Alternatively, to actually plot the regions in their true geographic location, you could draw the boxes yourself with rect. You would have to count the number of points in each region.
# Test data.
x <- sample(1:120, 100, replace=T)
y <- sample(1:50, 100, replace=T)
regions <- data.frame(xleft=c(0, 40, 40, 80, 0, 80),
ybottom=c(0, 0, 15, 15, 30, 40),
xright=c(40, 120, 80, 120, 80, 120),
ytop=c(30, 15, 30, 40, 50, 50))
# Color gradient.
col <- colorRampPalette(c("white", "red"))(30)
# Make the plot.
plot(NULL, xlim=c(0, 120), ylim=c(0, 50), xlab="x", ylab="y")
apply(regions, 1, function (r) {
count <- sum(x >= r["xleft"] & x < r["xright"] & y >= r["ybottom"] & y < r["ytop"])
rect(r["xleft"], r["ybottom"], r["xright"], r["ytop"], col=col[count])
text( (r["xright"]+r["xleft"])/2, (r["ytop"]+r["ybottom"])/2, count)
})