interpolating to a specific interval in R - r

I'm plotting 2 sets of data (x,y) and (a,b). The x axis is at an interval of 0.05 for (x,y) and at an interval of 0.02 for (a,b). I'm trying to interpolate (x,y) so that it fills in data every 0.02 units. I've played with approxfun() and splinefun() but can't figure out how to work with the n or xout parameters properly.
require(graphics)
x<-c(1.00,1.05,1.10,1.15,1.20)
y<-c(4.1,6.4,8.4,5.2,0.5)
a<-c(1.00,1.02,1.04,1.06,1.08)
b<-c(5.0,8.3,7.3,4.0,6.0)
par(mfrow = c(2,1))
plot(x,y)
points(approx(x,y,method="linear"),col=2,pch="*")
plot(a,b)
Ultimately I want all of my vectors to have an x interval of 0.02 like (a,b) so that all of my vectors have the same number of elements, and save the new vector to a variable. I would also like to be able to switch back from 0.02 to 0.05, which I think would involve the same commands but switching the intervals? I think the words for what I want to do is resample my data to a new frequency.
I've looked in various threads for an answer to this, but I don't know enough about R to figure out how to ask this/search for it. Thanks for any help.

Hi You may want to use a <-seq(1,10,by =.5), it increments by what ever factor you wish in this case .5

Related

How to calculate area under the curve from logarithmic plot with xmgrace?

I have plotted my data on linear scale in xmgrace by using these numbers:
0.001 0
0.00589391 0.10
0.155206 0.20
0.294695 0.30
0.43222 0.40
0.436149 0.50
0.489194 0.60
0.611002 0.70
0.860511 0.80
0.939096 0.90
0.964637 1
1 1
I have use xmgrace in Ubuntu to plot my date and calculate area under the curve (AUC; Data ->Transformation -> Integration-> SumOnly).
After converting linear curve to the logarithmic one, I am having a problem with calculating area under logarithmic curve.
Has anybody else encountered similar issue?
When you set the axis scale to "logarithmic" you are not actually changing your data, just the way you display it. Therefore, since data transformations such as integration act on the actual data you have, the result is bound to be the same.
In other words, you are integrating f(x) regardless of the scale of the axes. If you want to integrate log(f(x)) you have to first convert f(x) to log(f(x)) by using the Data -> Transformation -> Expression, writing something like y = ln(y) and pressing "apply". Be careful though: the first point (which has y = 0) will get an "inf". You'll need to get rid of it manually (double click on a set, select the first row and use edit -> delete) or don't use exactly 0 in your dataset. If you want to convert also the x axis then open the same "Expression" window and write x = ln(x). Integrate the new dataset and you should get the right number (I got -7.9 I think).

When plotting a curve in R, a piece of the curve gets cut off, not sure why

I am trying to plot this formula. As x approaches 0 from the right, y should be approaching infinity, and so my curve should be going upwards close to y-axis. Instead it gets cut off at y=23 or so.
my_formula = function(x){7.9*x^(-0.5)-1.3}
curve(my_formula,col="red",from=0 ,to=13, xlim=c(0,13),ylim=c(0,50),axes=T, xlab=NA, ylab=NA)
I tried to play with from= parameter, and actually got what I needed when I
put from=-4.8 but I have no idea why this works. in fact x doesn't get less than 0, and from/to should represent the range of x values, Do they? If someone could explain it to me, this would be amazing! Thank you!
By default, curve only chooses 101 x-values within the (from, to) range, set by the default value of the n argument. In your case this means there aren't many values that are close enough to 0 to show the full behaviour of the function. Increasing the number of values that are plotted with something like n=500 helps:
curve(my_formula,col="red",from=0 ,to=13,
xlim=c(0,13),ylim=c(0,50),axes=T, xlab=NA, ylab=NA,
n=500)
This is due mainly to the fact that my_formula(0) is Inf:
So plotting from=0, to=13 in curve means your first 2 values are by default (with 101 points as #Marius notes):
# x
seq(0, 13, length.out=101)[1:2]
#[1] 0.00 0.13
# y
my_formula(seq(0, 13, length.out=101)[1:2])
#[1] Inf 20.61066
And R will not plot infinite values to join the lines from the first point to the second one.
If you get as close to 0 on your x axis as is possible on your system, you can make this work a-okay. For instance:
curve(my_formula, col="red", xlim=c(0 + .Machine$double.eps, 13), ylim=c(0,50))

How to cleanly use interpolation between points to generate a mean in R

I am having issues trying to generate a code that will cleanly produce a mean (specifically a weighted average) based on a simple plot of points using interpolation.
For Example;
ex=c(1,2,3,4,5)
why=c(2,5,9,15,24)
This shows the kind of information I am working with.
plot(ex, why, type="o")
At this point, I want to actually have each point "binned" so the lines between them are straight. To do this, I have been adding points to the x values manually in excel as (x+0.01).
This is the new output:
why=c(2,2,5,5,9,9,15,15,24,24)
ex=c(1,2,2.01,3,3.01,4,4.01,5,5.01,6)
plot(ex, why, type="o")
So this is where my question comes in to play. I have to do this many times and do not want to generate a ton of new vectors and objects. To get a weighted average, I have been interpolating y values for increments of x at 0.01 using interpolation into a new object. I am then able to go into this new object and get a mean when a point falls between the actual ex values, i.e.
mean(newy[1:245])
Because I made new y values for 100 increments of x that (basically) follow a straight line, I am getting a weighted average here for x= 1 to 2.45.
Is there an easier and more elegant way to embed the interpolate code into the mean code so I could just say "average of interpolated y for nonreal x to nonreal x?"
It doesn't do exactly what you want, but you should consider the stepfun function -- this creates a step function out of two series.
plot(stepfun(ex[-1], why))
stepfun is handy because it gives you a function defined over that interval, so you can easily interpolate just by evaluating anywhere. The downside to it is that it is not strictly defined on the range given (which is why we have to cut off the first value in ex).
Based on your second plotting example, I think you are probably looking for this:
library(ggplot2)
qplot(ex, why, geom="step")
this gives:
Or if you want the line to go vertical first, you can use:
qplot(ex, why, geom="step", direction = "vh")
which gives:

Probability distribution values plot

I have
probability values: 0.06,0.06,0.1,0.08,0.12,0.16,0.14,0.14,0.08,0.02,0.04 ,summing up to 1
the corresponding intervals where a stochastic variable may take its value with the corresponding probability from the above list:
126,162,233,304,375,446,517,588,659,730,801,839
How can I plot the probability distribution?
On the x axis, the interval values, between the intervals histogram with the probability value?
Thanks.
How about
x <- c(126,162,233,304,375,446,517,588,659,730,801,839)
p <- c(0.06,0.06,0.1,0.08,0.12,0.16,0.14,0.14,0.08,0.02,0.04)
plot(x,c(p,0),type="s")
lines(x,c(0,p),type="S")
rect(x[-1],0,x[-length(x)],p,col="lightblue")
for a quick answer? (With the rect included you might not need the lines call and might be able to change it to plot(x,p,type="n"). As usual I would recommend par(bty="l",lty=1) for my preferred graphical defaults ...)
(Explanation: "s" and "S" are two different stair-step types (see Details in ?plot): I used them both to get both the left and right boundaries of the distribution.)
edit: In your comments you say "(it) doesn't look like a histogram". It's not quite clear what you want. I added rectangles in the example above -- maybe that does it? Or you could do
b <- barplot(p,width=diff(x),space=0)
but getting the x-axis labels right is a pain.

R question about plotting probability/density histogram the right way

I have a following matrix [500,2], so we have 500 rows and 2 columns, the left one gives us the index of X observations, and the right one gives the probability with which this X comes true, so - a typical probability density relationship.
So, my question is, how to plot the histogram the right way, so that the x-axis is the x-index, and the y-axis is the density(0.01-1.00). The bandwidth of the estimator is 0.33.
Thanks in advance!
the end of the whole data looks like this: just for a little orientation
[490,] 2.338260830 0.04858685
[491,] 2.347839477 0.04797310
[492,] 2.357418125 0.04736149
[493,] 2.366996772 0.04675206
[494,] 2.376575419 0.04614482
[495,] 2.386154067 0.04553980
[496,] 2.395732714 0.04493702
[497,] 2.405311361 0.04433653
[498,] 2.414890008 0.04373835
[499,] 2.424468656 0.04314252
[500,] 2.434047303 0.04254907
#everyone,
yes, I have made the estimation before, so.. the bandwith is what I mentioned, the data is ordered from low to high values, so respecively the probability at the beginning is 0,22, at the peak about 0,48, at the end 0,15.
The line with the density is plotted like a charm but I have to do in addition is to plot a histogram! So, how I can do this, ordering the blocks properly(ho the data to be splitted in boxes etc..)
Any suggestions?
Here is a part of the data AFTER the estimation, all values are discrete, so I assume histogram can be created.., hopefully.
[491,] 4.956164 0.2618131
[492,] 4.963014 0.2608723
[493,] 4.969863 0.2599309
[494,] 4.976712 0.2589889
[495,] 4.983562 0.2580464
[496,] 4.990411 0.2571034
[497,] 4.997260 0.2561599
[498,] 5.004110 0.2552159
[499,] 5.010959 0.2542716
[500,] 5.017808 0.2533268
[501,] 5.024658 0.2523817
Best regards,
appreciate the fast responses!(bow)
What will do the job is to create a histogram just for the indexes, grouping them in a way x25/x50 each, for instance...and compute the average probability for each 25 or 50/100/150/200/250 etc as boxes..?
Assuming the rows are in order from lowest to highest value of x, as they appear to be, you can use the default plot command, the only change you need is the type:
plot(your.data, type = 'l')
EDIT:
Ok, I'm not sure this is better than the density plot, but it can be done:
x = dnorm(seq(-1, 1, length = 500))
x.bins = rep(1:50, each = 10)
bars = aggregate(x, by = list(x.bins), FUN = sum)[,2]
barplot(bars)
In your case, replace x with the probabilities from the second column of your matrix.
EDIT2:
On second thought, this only makes sense if your 500 rows represent discrete events. If they are instead points along a continuous distribution function adding them together as I have done is incorrect. Mathematically I don't think you can produce the binned probability for a range using only a few points from within that range.
Assuming M is the matrix. wouldn't this just be :
plot(x=M[ , 1], y = M[ , 2] )
You have already done the density estimation since this is not the original data.

Resources