Related
So I came across this answer here, and my question is, if I have three variables and I want to use the x and y to create bins, like using cut and table in the other answer, how can I then graph the z as the average of all the variable Z data that falls into those bins?
This what I have:
library(plot3D)
x <- data$OPEXMKUP_PT_1d
y <- data$prod_opex
z <- data$ab90_ROIC_wogw3
x_c <- cut(x, 20)
y_c <- cut(y, 20)
cutup <- table(x_c, y_c)
mat <- data.frame(cutup)
hist3D(z = cutup, border="black", bty ="g",
main = "Data", xlab = "Markup",
ylab ="Omega", zlab = "Star")
But it show the z as the frequency, and when I try,
hist3D(x, y, z, phi = 0, bty = "g", type = "h", main = 'NEWer',
ticktype = "detailed", pch = 19, cex = 0.5,
xlim=c(0,3),
ylim=c(-10,20),
zlim=c(0,1))
It thinks for a long time and throws an error,
Error: protect(): protection stack overflow
Graphics error: Plot rendering error
It will do the 3d scatter fine but the data doesn't make sense since the Z variable is a ratio that falls mostly between 0 and 1, so you get a bunch of tall lines and and a bunch of short lines. I would like them averaged by bin to show a visual of how the average ratio changes as x and y change. Please let me know if there is a way to do this.
Not sure exactly what your data looks like, so I made some up. You should be able to adjust to your needs. It's a bit hacky/brute force-ish, but could work just fine if your data isn't too large to slow down the loop.
library(plot3D)
# Fake it til you make it
n = 5000
x = runif(n)
y = runif(n)
z = x + 2*y + sin(x*2*pi)
# Divide into bins
x_c = cut(x, 20)
y_c = cut(y, 20)
x_l = levels(x_c)
y_l = levels(y_c)
# Compute the mean of z within each x,y bin
z_p = matrix(0, 20, 20)
for (i in 1:length(x_l)){
for (j in 1:length(y_l)){
z_p[i,j] = mean(z[x_c %in% x_l[i] & y_c %in% y_l[j]])
}
}
# Get the middle of each bin
x_p = sapply(strsplit(gsub('\\(|]', '', x_l), ','), function(x) mean(as.numeric(x)))
y_p = sapply(strsplit(gsub('\\(|]', '', y_l), ','), function(x) mean(as.numeric(x)))
# Plot
hist3D(x_p, y_p, z_p, bty = "g", type = "h", main = 'NEWer',
ticktype = "detailed", pch = 19, cex = 0.5)
Basically, we're just manually computing the average bin height z by looping over the bins. There may be a better way to do the computation.
I have been trying to generate random values so that I can construct a circle.
The values of x and y are expected to satisfy the following equation
x^2 + y^2 = 1
Here is the code that I used.
par(type = "s")
x <- runif(1000, min = -1, max = 1)
y <- sqrt(1 - x^2)
z <- NULL
z$x <- x
z$y <- y
z <- as.data.frame(z)
plot.new()
plot(z$x, z$y, type = "p")
plot.window(xlim = c(-10,10), ylim = c(-10,10), asp = 1)
But the graph I get is not quite what I expected it to be.
The graph resembles an upper half of an ellipse rather than a semicircle
Why are there no values for y where y < 0
Please find the plot here.
I am also interested in finding out, how to generate random values for x, y, z, a; where x^2 + y^2 + z^2 + a^2 = 10
Maybe you missed #thelatemail's comment:
png()
plot(z$x, z$y, type = "p", asp=1)
dev.off()
The reason passing asp=1 to plot.window would fail(if it were called first, and this is what you might have tried) is that plot itself calls plot.window again, and in the process reacquires the default values. You can see that in the code of plot.default:
> plot.default
function (x, y = NULL, type = "p", xlim = NULL, ylim = NULL,
log = "", main = NULL, sub = NULL, xlab = NULL, ylab = NULL,
ann = par("ann"), axes = TRUE, frame.plot = axes, panel.first = NULL,
panel.last = NULL, asp = NA, ...)
{
localAxis <- function(..., col, bg, pch, cex, lty, lwd) Axis(...)
localBox <- function(..., col, bg, pch, cex, lty, lwd) box(...)
localWindow <- function(..., col, bg, pch, cex, lty, lwd) plot.window(...)
#.... omitted the rest of the R code.
(Calling plot.window after that plot call should not be expected to have any favorable effect.)
The problem is within this part of your code:
x <- runif(1000, min = -1, max = 1)
y <- sqrt(1 - x^2)enter code here
This problem arises from interpreting two distinct mathematical entities as the same (functions and equations are two different things). A function f takes an input x, and returns a single output f(x). Equations don't have this limitation, so if you are encoding this equation as a function, you will lose half the points in the circle, you will generate all the points in the upper semicircle.
Since the circle equation has two y outputs for any x value you can just generate two pairs of coordinates for each point generated by your uniform distribution like this:
x1 = runif(1000, min = -1, max = 1)
x2 = x1
y1 = sqrt(1 - x1^2)
y2 = (-1)*y1
x = c(x1,x2)
y = c(y1,y2)
plot(x,y, asp=1)
As John Coleman recommended in his comment, i'd prefer using parametric/polar coordinates instead. Generate angles in radians between 0 and 2pi and then calculate the appropriate x and y positions using the generated angle and the radius you want.
radius = 1
theta = runif(1000, min = 0, max = 2*pi)
x = radius * cos(theta)
y = radius * sin(theta)
plot(x,y, asp=1)
For the last part of your question, for each value of a variable, you'd have to work out all the possible tuples that solve the equation, and if z and a are also variables, it may not be possible to represent it solely on a 2-dimensional graph.
I'm quite new to R and ggplot2 so apologies if this is an obvious question, but I've searched around and can't find anything about this exact issue
I have a ggplot density plot for 6 variables on the same plot, overlapping. What I am trying to do is to change the maximum height of each variable to be a certain value without changing the distribution. e.g. :
variable_1 - 1, //on Y axis
variable_2 - 0.5 etc.
This way I can get an idea of the distribution (across the x axis) whilst also showing a second independent parameter through the y axis
Is this possible at all?
Yes this is possible although I wouldn't recommend it. What you can do is just divide the distribution by it's maximum and then multiply with the target height.
# some example data:
x = seq(-5, 5, .1)
y1 = dnorm(x)
y2 = dnorm(x, .5, .2)
Y = cbind(y1, y2)
matplot(x, Y, type = 'l', bty = 'n', lty = 1, las = 1)
# now I want the red line to be max 1
# and the black line to be mack .5
y1 = .5*y1 / max(y1)
y2 = 1*y2 / max(y2)
Y = cbind(y1, y2)
matplot(x, Y, type = 'l', bty = 'n', lty = 1, las = 1)
The important part here is that I used two different transformations for y1 and y2. The consequence is that in the second figure the distributions cannot be compared anymore. You can avoid this by only applying the same transformation to all distributions.
I wrote following R script:
#energy diagram
x <- c(0.1, 0.3, 0.5, 0.7, 0.9 ) #chosen randomly, reaction axis
y <- c(-5.057920, -5.057859, -5.057887,-5.057674, -5.057919 ) #energy of the educt, intermediate, transtition states and product
plot(x,y, type="p",
xlim=c(0,1),
ylim=c(-5.058,-5.0575),
xlab="reaction axis",
ylab=expression(paste(E[el] ," / ",10^6," ",kJ/mol)),
xaxt="n" #hide x-axis
)
#h- and v-lines, so i can draw curves by hand
abline(v=seq(0,1,0.1),h=seq(-5.0600,-5.0500,0.00005),col="black",lty=1,lwd=1)
abline(h=c(-5.057920, -5.057859, -5.057887,-5.057674), col="blue", lty=1,lwd=0.7)
Is it possible to draw a curve through the points that would look like a energy diagram. An example of an energy diagram is here:
A lot could be done to streamline / vectorize this code, but for a smallish diagram this works pretty well:
# get that data
x <- c(0.1, 0.3, 0.5, 0.7, 0.9 ) # reaction axis
y <- c(-5.057920, -5.057859, -5.057887,-5.057674, -5.057919 ) # energies
I'm going to make a little Bezier curve to connect each point to the next---this way we can make sure the smooth line passes through the data, not just close to it. I'll give each point a single 'control point' to define the slope. By using the same y-values for a point and it's control point, the slope at the point will be 0. I'll call the offset between the point and the control point delta. We'll start with one point-pair:
library(Hmisc)
delta = 0.15
bezx = c(0.1, 0.1 + delta, 0.3 - delta, 0.3)
bezy = rep(y[1:2], each = 2)
plot(bezx, bezy, type = 'b', col = "gray80")
lines(bezier(bezx, bezy), lwd = 2, col = "firebrick4")
Here I plotted the points and control points in gray, and the smooth line in red so we can see what's going on.
It looks promising, let's turn it into a function that we can apply to each pair of points:
bezf = function(x1, x2, y1, y2, delta = 0.15) {
bezier(x = c(x1, x1 + delta, x2 - delta, x2), y = c(y1, y1, y2, y2))
}
You can play with the delta parameter, I think 0.1 looks pretty good.
plot(x, y, xlab = "Reaction coordinate", ylab = "E", axes = F)
box(bty = "L")
axis(side = 2)
for(i in 1:(length(x) - 1)) {
lines(bezf(x1 = x[i], x2 = x[i + 1], y1 = y[i], y2 = y[i + 1], delta = 0.1))
}
You can of course tweak the plot, add labels, and ablines as in your original. (Use my for loop with the lines command to draw only the smoothed lines.) I left the points on to show that we are passing through them, not just getting close.
I prefer plotting in ggplot2, if you do too you'll need to extract the data into a data.frame:
bezlist = list()
for (i in 1:(length(x) - 1)) {
bezlist[[i]] = bezf(x1 = x[i], x2 = x[i + 1], y1 = y[i], y2 = y[i + 1], delta = 0.1)
}
xx = unlist(lapply(bezlist, FUN = '[', 'y'))
yy = unlist(lapply(bezlist, FUN = '[', 'y'))
bezdat = data.frame(react = xx, E = yy)
library(ggplot2)
ggplot(bezdat, aes(x = react, y = E)) +
geom_line() +
labs(x = "Reaction coordinate")
You could use a spline fit. Define some points along the energy diagram, and then fit to them using a spline function. The more points that you provide, the better that your fit will be. You can check out the smooth.splines function in the stats package for one implementation of the spline fit.
I am using the following code to create a standard normal distribution in R:
x <- seq(-4, 4, length=200)
y <- dnorm(x, mean=0, sd=1)
plot(x, y, type="l", lwd=2)
I need the x-axis to be labeled at the mean and at points three standard deviations above and below the mean. How can I add these labels?
The easiest (but not general) way is to restrict the limits of the x axis. The +/- 1:3 sigma will be labeled as such, and the mean will be labeled as 0 - indicating 0 deviations from the mean.
plot(x,y, type = "l", lwd = 2, xlim = c(-3.5,3.5))
Another option is to use more specific labels:
plot(x,y, type = "l", lwd = 2, axes = FALSE, xlab = "", ylab = "")
axis(1, at = -3:3, labels = c("-3s", "-2s", "-1s", "mean", "1s", "2s", "3s"))
Using the code in this answer, you could skip creating x and just use curve() on the dnorm function:
curve(dnorm, -3.5, 3.5, lwd=2, axes = FALSE, xlab = "", ylab = "")
axis(1, at = -3:3, labels = c("-3s", "-2s", "-1s", "mean", "1s", "2s", "3s"))
But this doesn't use the given code anymore.
If you like hard way of doing something without using R built in function or you want to do this outside R, you can use the following formula.
x<-seq(-4,4,length=200)
s = 1
mu = 0
y <- (1/(s * sqrt(2*pi))) * exp(-((x-mu)^2)/(2*s^2))
plot(x,y, type="l", lwd=2, col = "blue", xlim = c(-3.5,3.5))
An extremely inefficient and unusual, but beautiful solution, which works based on the ideas of Monte Carlo simulation, is this:
simulate many draws (or samples) from a given distribution (say the normal).
plot the density of these draws using rnorm. The rnorm function takes as arguments (A,B,C) and returns a vector of A samples from a normal distribution centered at B, with standard deviation C.
Thus to take a sample of size 50,000 from a standard normal (i.e, a normal with mean 0 and standard deviation 1), and plot its density, we do the following:
x = rnorm(50000,0,1)
plot(density(x))
As the number of draws goes to infinity this will converge in distribution to the normal. To illustrate this, see the image below which shows from left to right and top to bottom 5000,50000,500000, and 5 million samples.
In general case, for example: Normal(2, 1)
f <- function(x) dnorm(x, 2, 1)
plot(f, -1, 5)
This is a very general, f can be defined freely, with any given parameters, for example:
f <- function(x) dbeta(x, 0.1, 0.1)
plot(f, 0, 1)
I particularly love Lattice for this goal. It easily implements graphical information such as specific areas under a curve, the one you usually require when dealing with probabilities problems such as find P(a < X < b) etc.
Please have a look:
library(lattice)
e4a <- seq(-4, 4, length = 10000) # Data to set up out normal
e4b <- dnorm(e4a, 0, 1)
xyplot(e4b ~ e4a, # Lattice xyplot
type = "l",
main = "Plot 2",
panel = function(x,y, ...){
panel.xyplot(x,y, ...)
panel.abline( v = c(0, 1, 1.5), lty = 2) #set z and lines
xx <- c(1, x[x>=1 & x<=1.5], 1.5) #Color area
yy <- c(0, y[x>=1 & x<=1.5], 0)
panel.polygon(xx,yy, ..., col='red')
})
In this example I make the area between z = 1 and z = 1.5 stand out. You can move easily this parameters according to your problem.
Axis labels are automatic.
This is how to write it in functions:
normalCriticalTest <- function(mu, s) {
x <- seq(-4, 4, length=200) # x extends from -4 to 4
y <- (1/(s * sqrt(2*pi))) * exp(-((x-mu)^2)/(2*s^2)) # y follows the formula
of the normal distribution: f(Y)
plot(x,y, type="l", lwd=2, xlim = c(-3.5,3.5))
abline(v = c(-1.96, 1.96), col="red") # draw the graph, with 2.5% surface to
either side of the mean
}
normalCriticalTest(0, 1) # draw a normal distribution with vertical lines.
Final result: