3D binning with R - r

I have triplets cloud of points (x,y,z).
I would like to bin them and average their values in each bin. So far, I used binning function which works fine as shown in my test code:
n=1000
x <- rnorm(n)
y <- 0.2*x + 0.5*rnorm(n)
z <- y + 0.5*rnorm(n)
x1 <- cbind(x,y,z)
xb<- binning(x1, nbins=12)
mz1<-apply(xb$table.freq, c(1,2), sum)
dim1 <- ncol(mz1)
dim2 <- length(mz1[,1])
image(1:dim1, 1:dim2, mz1, axes = TRUE, xlab="", ylab="")
However, instead of frequencies I wish to get average z values and instead of bins to be able to plot against x, y average values.
Can this be done with/extending binning command ?
Thank you !

Related

How to draw a graph with both x-axis and y-axis are functions in R?

I have a function,
x= (z-z^2.5)/(1+2*z-z^2)
y = z-z^2.5
where z is the only variable. How to draw a graph where x-axis shows value of function x, and y-axis shows value of function y as z range from 0 to 5?
You can get a very basic plot by simply following your own instructions.
## z ranges from 0 to 5
z = seq(0,5,0.01)
## x and y are functions of z
x = (z-z^2.5)/(1+2*z-z^2)
y = z-z^2.5
##plot
plot(x,y, pch=20, cex=0.5)
If you want a smooth curve it is a little trickier. There is a discontinuity in the curve at
z = 1 + sqrt(2) ~ 2.414. If you just draw the curve as one piece, you get an unwanted line connecting across the discontinuity. So, in two pieces,
plot(x[1:242],y[1:242], type='l', xlab='x', ylab='y',
xlim=range(x), ylim=range(y))
lines(x[243:501],y[243:501])
But be careful about interpreting this. There is something tricky going on from z=0 to z=1.
Using ggplot2
# z ranges from -1000 to 1000 (The range can be arbitrary)
z = seq(-1000,1000,.25)
# x as a function of z
x = (z-z^2.5) / ((1+2*z)-z^2)
# y as a function of z
y = z-z^2.5
# make a dataframe of x,y and z
df <- data.frame(x=x, y=y, z=z)
# subset the df where z is between 0 and 5
df_5 <- subset(df, (df$z>=0 & df$z<=5))
# plot the graph
library(ggplot2)
ggplot(df_5, aes(x,y))+ geom_point(color="red")
The only addition to #G5W answer is subset() of values between 0 and 5 from your dataset to plot and the use of ggplot2.

How to plot an indexed set of (x,y) pairs such that the index is parallel to the x axis, but on the top of the frame

For example, let say:
x <- rnorm(20)
y <- rnorm(20) + 1
n <- seq(1,20,1)
data <- data.frame(n, x, y)
Is it possible to plot y~x with the indexed value of each pair at the top of the plot?
Can it be done with the base graphics, not ggplot?
It may be simple, but I am struggling to find help via Google. My guess is I'm using a poor selection of words.
Any help is much appreciated!
plot(x,y)
text(x = x, y = y, n, pos = 3)
#Adds text 'n' at co-ordinate (x,y)
# "pos = 3" means the text will be just above the co-ordinates
#See ?text for more
If you wanted to plot all the indices on a same line above the plot boundary, you can specify the appropriate value for y when using text. However, you will first have to pass par(xpd=TRUE) to be able to draw outside plot boundary
Yes we can add label. Try this code:
x <- rnorm(20)
y <- rnorm(20) + 1
n <- seq(1,20,1)
data <- data.frame(n, x, y)
plot(y~x)
with(data, text(y~x, labels = row.names(data)))

How to create Lines connecting two points in R

Is there any way to create lines in R connecting two points?
I am aware of lines(), function, but it creates line segment what I am looking for is an infinite length line.
Here's an example of Martha's suggestion:
set.seed(1)
x <- runif(2)
y <- runif(2)
# function
segmentInf <- function(xs, ys){
fit <- lm(ys~xs)
abline(fit)
}
plot(x,y)
segmentInf(x,y)
#define x and y values for the two points
x <- rnorm(2)
y <- rnorm(2)
slope <- diff(y)/diff(x)
intercept <- y[1]-slope*x[1]
plot(x, y)
abline(intercept, slope, col="red")
# repeat the above as many times as you like to satisfy yourself
Use segment() function.
#example
x1 <- stats::runif(5)
x2 <- stats::runif(5)+2
y <- stats::rnorm(10)
plot(c(x1,x2), y)
segments(x1, y[1:5], x2, y[6:10], col= 'blue')

RGL surface plot from data frame

I've created a nice plot using scatter3d() and Rcmdr. That plot contains two nice surface smooths. Now I'd like to add to this plot one more surface, the truth (i.e. the surface defined by the function generating my observations minus the noise component).
Here is my code so far:
library(car)
set.seed(1)
n <- 200 # number of observations (x,y,z) to be generated
sd <- 0.3 # standard deviation for error term
x <- runif(n) # generate x component
y <- runif(n) # generate y component
r <- sqrt(x^2+y^2) # used to compute z values
z_t <- sin(x^2+3*y^2)/(0.1+r^2) + (x^2+5*y^2)*exp(1-r^2)/2 # calculate values of true regression function
z <- z_t + rnorm(n, sd = sd) # overlay normally distrbuted 'noise'
dm <- data.frame(x=x, y=y, z=z) # data frame containing (x,y,z) observations
dm_t <- data.frame(x=x,y=y, z=z_t) # data frame containing (x,y) observations and the corresponding value of the *true* regression function
# Create 3D scatterplot of:
# - Observations (this includes 'noise')
# - Surface given by Additive Model fit
# - Surface given by bivariate smoother fit
scatter3d(dm$x, dm$y, dm$z, fit=c("smooth","additive"), bg="white",
axis.scales=TRUE, grid=TRUE, ellipsoid=FALSE, xlab="x", ylab="z", zlab="y")
The solution given in another thread is to then define a function:
my_surface <- function(f, n=10, ...) {
ranges <- rgl:::.getRanges()
x <- seq(ranges$xlim[1], ranges$xlim[2], length=n)
y <- seq(ranges$ylim[1], ranges$ylim[2], length=n)
z <- outer(x,y,f)
surface3d(x, y, z, ...)
}
f <- function(x, y)
sin(x^2+3*y^2)/(0.1+r^2) + (x^2+5*y^2)*exp(1-r^2)/2
my_surface(f, alpha=0.2)
This however yields an error, saying (translated from German since this is my system language, I apologize):
Error in outer(x, y, f) :
Dimension [Product 100] does not match the length of the object [200]
I then tried an alternative approach:
x <- seq(0,1,length=20)
y <- x
z <- outer(x,y,f)
surface3d(x,y,z)
This does add a surface to my plot but it doesn't look right at all (i.e. the observations are not even close to it). Here's what the supposed true surface looks like (this is obviously wrong):
Thanks!
I think the problem may in fact be scaling. Here I created a couple of points that sit on the plane z = x+y. Then I proceeded to try to plot that plane using my method above:
library(car)
n <- 50
x <- runif(n)
y <- runif(n)
z <- x+y
scatter3d(x,y,z, surface = FALSE)
f <- function(x,y)
x + y
x_grid <- seq(0,1, length=20)
y_grid <- x_grid
z_grid <- outer(x_grid, y_grid, f)
surface3d(x_grid, y_grid, z_grid)
This gives me the following plot:
Maybe one of you can help me out with this?
The scatter3d function in car rescales data before plotting it, which makes it incompatible with essentially all rgl plotting functions, including surface3d.
You can get a plot something like what you want by using all rgl functions, e.g. plot3d(x, y, z) in place of scatter3d, but of course it will have rgl-style axes rather than car-style axes.

Plot fitted line within certain range R

Using R, I would like to plot a linear relationship between two variables, but I would like the fitted line to be present only within the range of the data.
For example, if I have the following code, I would like the line to exist only from x and y values of 1:10 (with default parameters this line extends beyond the range of data points).
x <- 1:10
y <- 1:10
plot(x,y)
abline(lm(y~x))
In addition to using predict with lines or segments you can also use the clip function with abline:
x <- 1:10
y <- 1:10
plot(x,y)
clip(1,10, -100, 100)
abline(lm(y~x))
Instead of using abline(), (a) save the fitted model, (b) use predict.lm() to find the fitted y-values corresponding to x=1 and x=10, and then (c) use lines() to add a line between the two points:
f <- lm(y~x)
X <- c(1, 10)
Y <- predict(f, newdata=data.frame(x=X))
plot(x,y)
lines(x=X, y=Y)
You can do this using predict.
You can predict on specific values of x (see ?predict)
x<-1:10
y<-1:10
plot(x,y)
new <- data.frame(x = seq(1, 5, 0.5))
lines(new$x, predict(lm(y~x), new))
The plotrix library has the ablineclip() function for just this:
x <- 1:10
y <- 1:10
plot(x,y)
ablineclip(lm(y~x),x1=1,x2=5)
An alternative is to use the segments function (doc here).
Say you estimated the line, and you got an intercept of a and a slope of b. Thus, your fitted function is y = a + bx.
Now, say you want to show the line for x between x0 and x1. Then, the following code plots your line:
# inputs
a <- 0.5
b <- 2
x0 <- 1
x1 <- 5
# graph
plot(c(0,5), c(0,5), type = "n", xlab = "", ylab = "", bty='l')
segments(x0, a+b*x0, x1, a+b*x1)
Simply replace the values of a, b, x0, x1 with those of your choosing.
For those like me who came to this question wanting to plot a line for an arbitrary pair of numbers (and not those that fit a given regression), the following code is what you need:
plot(c(0,5), c(0,5), type = "n", xlab = "", ylab = "", bty='l')
segments(x0, yo, x1, y1)
Simply replace the values of x0, y0, x1, y1 with those of your choosing.

Resources