R best fit of 45 degree line - r

(i know it must be incredibely easy, but i'm strugling with it in R:)
i have dataset of x and y values saved in X and Y vectors. I know that plot of the data should follow exactly -45 degree line (see image below)
How do i find such -45 degree line that best fits the data (+ all these statistics available from summary(lm(...))? I've tried lm, but i can't force it to abandon fitting the slope parameter
Thank you
After trying: lm(y~1,offset=-x) and applying abline(coefficient, -1) i obtain following plot (see below)
black line is abline plot, yellow one is mine guess of fit -- what's wrong with lm or do i miss totally something?

I believe the solution from #BenBolker is correct and perhaps you are using the wrong coefficient:
lm1 <- lm(y~1,offset=-x,data=df)
plot(df)
abline(coefficients(lm1),-1)
This produces:
This fit looks like the correct fit to me. The intercept is -2.217.

Since you state that:
y = -1*x + b
then
y+x = b
So calculate the mean of (y+x) and you get the average value of b
mean(y+x)

Related

How to create a random walk in R that goes in different directions than -1 or +1?

Consider this two‐dimensional random walk:
where, Zt, Wt, t = 1,2,3, … are independent and identically distributed standard normal
random variables.
I am having problems in finding a way to simulate and plot the sample path of (X,Y) for t = 0,1, … ,100. I was given a sample:
The following code is an example of the way I am used to plot random walks in R:
set.seed(13579)
r<-sample(c(-1,1),size=100,replace=T,prob=c(0.5,0.5))
r<-c(10,r))
(w<-cumsum(r))
w<-as.ts(w)
plot(w,main="random walk")
I am not very sure of how to achieve this.
The problem I am having is that this kind of codes has a more "simple" result, with a line that goes either up or down, -1 or +1:
while the plot I need to create also goes from left to right and viceversa.
Would you help me in correcting the code I know so that it fits my task/suggesting a smarterst way to go about it? It would be greatly appreciated.
Cheers!
Instead of using sample, you need to use rnorm(100) to draw 100 samples from a standard normal distribution. Since the walk starts at [0, 0], we need to append a 0 at the start and do a cumsum on the result, i.e. cumsum(c(0, rnorm(100))).
We want to do this for both the x and y variables, then plot. The whole thing can be done in a single line of code in base R:
plot(x = cumsum(c(0, rnorm(100))), y = cumsum(c(0, rnorm(100))), type = 'l')

Simplifying 3D points. R

I need to work with 3D data (spatial) very long tables with for coumns:
x, y, z, Value
There are too many data to be plotted with scatterplot3d or similar (rgl, lattice...)
I would like to reduce the number of data.
One idea could be to sample.
But I'd like to know how to reduce the data, getting new points that summarize the nearby points.
Is there any package to do it and work with this kind of data?
Something like creating a predefined 3D grid and averaging the points in each grid.
But I don't know whether it's better to choose the new points equidistants or just get their coordinates averaging the old ones locally. Or even weighting their final contribution with the distance to the new point.
Other issues:
The "optimal" grid could be tilted, but I don't know it beforehand.
I don't know if the grid should be extended a little bit beyond the data nor how much.
PD: I don't want to create surfaces nor wireframes nor adjust anything.
PD: I've checked spatial packages but as far as I see they are useful for data on a surface, such as the earth, but without height.
To reduce the size of the data set, have you thought about using a clustering methods such as kmeans or hierarchical clustering (hclust). These methods could reduce your data set down to a reasonable size. Be aware, if your data set is large enough these methods could still be too computational time consuming.
Seems like you might benefiit from fitting some sort of model to your data and then displaying the prediction on a resolution of your choice.
Here is an example of fitting with a GAM model:
library(sinkr) # https://github.com/marchtaylor/sinkr
library(mgcv)
library(rgl)
# make data ---------------------------------------------------------------
n <- 1000
x <- runif(n, min=-10, max=10)
y <- runif(n, min=-10, max=10)
z <- runif(n, min=-10, max=10)
value <- (-0.01*x^3 + -0.2*y^2 + -0.3*z^2) * rlnorm(n, 0, 0.1)
# fit model (GAM) ---------------------------------------------------------
fit <- gam(value ~ s(x) + s(y) + s(z))
plot.gam(fit, pages = 1)
This visualization is already helpful in understanding the 3d pattern of value, but you could also predict the values to a new grid. To visualize the prediction in 3d, the rgl package might be useful:
# predict to new grid -----------------------------------------------------
grd <- expand.grid(
x=seq(min(x), max(x),,10),
y=seq(min(y), max(y),,10),
z=seq(min(z), max(z),,10)
)
grd$value <- predict.gam(fit, newdata = grd)
# plot prediction with rgl ------------------------------------------------
# original data
plot3d(x, y, z, col=val2col(value, col=jetPal(100)))
rgl.snapshot("original.png")
# interpolated data
plot3d(grd$x, grd$y, grd$z, col=val2col(grd$value, col=jetPal(100)), alpha=0.5, size=5)
rgl.snapshot("points.png")
spheres3d(grd$x, grd$y, grd$z, col=val2col(grd$value, col=jetPal(100)), alpha=0.3, radius=1)
rgl.snapshot("spheres.png")
I've found the way to do it.
I'll post an example, just in case it's useful for others.
I write only two dimensions (and only working on the coordinates) to make it clear, but it can be generalized to higher dimensions and summarizing the functions at every coordinate).
set.seed(1)
xx <- runif(30,0,100); yy <- runif(30,0,100)
datos <- data.frame(xx,yy) #sample data
plot(xx,yy,pch=20) # 2D plot to visualize it.
n <- 4 # Same number of splits on every axis. Simple example.
rango <- function(ii){(max(ii)-min(ii))+0.000001}
renorm<- function(jj) {trunc(n*(jj-min(jj))/rango(jj))+1}
result <- aggregate(cbind(xx,yy)~renorm(xx) + renorm(yy),datos, mean)
points(result$xx,result$yy,pch=20, col="red")
abline(v=( min(xx) + (rango(xx)/n)*0:n) )
abline(h=( min(yy) + (rango(yy)/n)*0:n) )
Everything could be modified with na.rm=T
Maybe there are a simpler solutions with split, cut, dplyr, data.table, tapply...
I like this way more than fixing the new points coordinates at the center of every subregion because if you have only 1 point it keeps its original coordinates.
The +0.000000001 is to avoid the last point to move to a subregion further.
The full solution would have been:
aggregate(cbind(xx,yy,zz, Value)~renorm(xx)+renorm(yy)+renorm(zz),datos, mean)
And it could be further improved by weighting distances.

drawing the graph of a function f(x) = x^3 - 6x^2 + 9x - 4 in d3.js

I am back at college learning maths and I want to try and use some this knowledge to create some svg with d3.js.
If I have a function f(x) = x^3 - 3x^2 + 3x - 1
I would take the following steps:
Find the x intercepts for when y = 0
Find the y intercept when x = 0
Find the stationary points when dy\dx = 0
I would then have 2 x values from point 3 to plug into the original equation.
I would then draw a nature table do judge the flow of the graph or curve.
Plot the known points from the above and sketch the graph.
Translating what I would do on pen and paper into code instructions is what I really could do with any sort of advice on the following:
How can I programmatically factorise point 1 of the above to find the x-intercepts for when y = 0. I honestly do not know where to even start.
How would I programmatically find dy/dx and the values for the stationary points.
If I actually get this far then what should I use in d3 to join the points on the graph.
Your other "steps" have nothing to do with d3 or plotting.
Find the x intercepts for when y = 0
This is root finding. Look for algorithms to help with this.
Find the y intercept when x = 0
Easy: substitute to get y = 1.
Find the stationary points when dy\dx = 0
Take the first derivative to get 3x^2 - 12x + 9 and repeat the root finding step. Easy to get using quadratic equation.
I would then have 2 x values from point 3 to plug into the original
equation. I would then draw a nature table do judge the flow of the
graph or curve. Plot the known points from the above and sketch the
graph.
I would just draw the curve. Pick a range for x and go.
It's great to learn d3. You'll end up with something like this:
https://maurizzzio.github.io/function-plot/
For a cubic polynomial, there are closed formulas available to find all the particular points that you want (https://en.wikipedia.org/wiki/Cubic_function), and it is a sound approach to determine them.
Anyway, you will have to plot the smooth curve, which means that you will need to compute close enough points and draw a polyline that joins them.
Doing this, you are actually performing the first steps of numerical root isolation, with such an accuracy that the approximate and exact roots will be practically undistinguishable.
So an easy combined solution is to draw the curve as a polyline and find the intersections with the X axis as well as extrema using this polyline representation, rather than by means of more sophisticated methods.
This approach works for any continuous curve and is very easy to implement. So you actually draw the curve to find particular points rather than conversely as is done by analytical methods.
For best results on complicated curves, you can adapt the point density based on the local curvature, but this is another story.

R - locate intersection of two curves

There are a number of questions in this forum on locating intersections between a fitted model and some raw data. However, in my case, I am in an early stage project where I am still evaluating data.
To begin with, I have created a data frame that contains a ratio value whose ideal value should be 1.0. I have plotted the data frame and also used abline() function to plot a horizontal line at y=1.0. This horizontal line and the plot of ratios intersect at some point.
plot(a$TIME.STAMP, a$PROCESS.RATIO,
xlab='Time (5s)',
ylab='Process ratio',
col='darkolivegreen',
type='l')
abline(h=1.0,col='red')
My aim is to locate the intersection point, say x and draw two vertical lines at x±k, as abline(v=x-k) and abline(v=x+k) where, k is certain band of tolerance.
Applying a grid on the plot is not really an option because this plot will be a part of a multi-panel plot. And, because ratio data is very tightly laid out, the plot will not be too readable. Finally, the x±k will be quite valuable in my discussions with the domain experts.
Can you please guide me how to achieve this?
Here are two solutions. The first one uses locator() and will be useful if you do not have too many charts to produce:
x <- 1:5
y <- log(1:5)
df1 <-data.frame(x= 1:5,y=log(1:5))
k <-0.5
plot(df1,type="o",lwd=2)
abline(h=1, col="red")
locator()
By clicking on the intersection (and stopping the locator top left of the chart), you will get the intersection:
> locator()
$x
[1] 2.765327
$y
[1] 1.002495
You would then add abline(v=2.765327).
If you need a more programmable way of finding the intersection, we will have to estimate the function of your data. Unfortunately, you haven’t provided us with PROCESS.RATIO, so we can only guess what your data looks like. Hopefully, the data is smooth. Here’s a solution that should work with nonlinear data. As you can see in the previous chart, all R does is draw a line between the dots. So, we have to fit a curve in there. Here I’m fitting the data with a polynomial of order 2. If your data is less linear, you can try increasing the order (2 here). If your data is linear, use a simple lm.
fit <-lm(y~poly(x,2))
newx <-data.frame(x=seq(0,5,0.01))
fitline = predict(fit, newdata=newx)
est <-data.frame(newx,fitline)
plot(df1,type="o",lwd=2)
abline(h=1, col="red")
lines(est, col="blue",lwd=2)
Using this fitted curve, we can then find the closest point to y=1. Once we have that point, we can draw vertical lines at the intersection and at +/-k.
cross <-est[which.min(abs(1-est$fitline)),] #find closest to 1
plot(df1,type="o",lwd=2)
abline(h=1)
abline(v=cross[1], col="green")
abline(v=cross[1]-k, col="purple")
abline(v=cross[1]+k, col="purple")

Create curve 'algorithm' after capturing points from user?

I am capturing some points on an X, Y plane to represent some values. I ask the user to select a few points and I want the system to then generate a curve following the trend that the user creates. How do I calculate this? So say it is this:
Y = dollar amount
X = unit count
user input: (2500, 200), (4500, 500), (9500, 1000)
Is there a way I can calculate some sort of curve to follow those points so I would know based off that selection what Y = 100 would be on the same scale/trend?
EDIT: People keep asking for the nature of the curve, yes logarithmic. But I'd also like to check out some other options. It's for pricing the the restraint is that the as X increases Y should always be higher. However the rate of change of the curve should change related to the two adjacent points that the user selected, we could probably require a certain number of points. Does that help?
EDIT: Math is hard.
EDIT: Maybe a parabola then?
The problem is that there are multiple curves that you can fit to the same data. To borrow an example from my old stats book, here is the same data set (1, 1, 1, 10, 1, 1, 1) with four curves:
You need to specify the overall trend to get a meaningful result.
First, you are going to have to have an idea of what your line is or better said, what type of line fits your data the best. Is it linear (straight line) or does it curve (x-squared). Sounds like this is a curve.
If your curve is a parabola, then you will need to solve y = Ax(2) + Bx + c using your three points that the user has chosen. You will need at least 3 points to solve for 3 unknowns.
200 = A(2500)(2) + B(2500) + C
500 = A(4500)(2) + B(4500) + C
1000 = A(9500)(2) + B(9500) + C
Given these three equations, you should be able to solve for A, B and C, then use these to plot a new curve.
The Least Square Fit would give you a nice data matching curve.
This is a rather general extrapolation problem. In your case, fitting a quadric (parabola) is probably the most reasonable course of action. Depending on how well your data fits a quadric, you may want to fit it to more than 3 points (the noisier and weirder the data, the more points you should use).
Depending on the amount and type of data you have, you may want to try LOESS regression.
However, this may not be a good option if you only have 3 points as in your example (but keep in mind that you will not be able to have good extrapolation with 3 points no matter the algorithm you use)
Another option would be B-splines

Resources