Seq() producing numbers off by minute amounts (R) [duplicate] - r

This question already has answers here:
Why are these numbers not equal?
(6 answers)
Closed 5 years ago.
I am attempting to use seq() to define my breaks on a plot. Dummy example below.
ggplot(data, aes(x=cat, y=dog) +
scale_y_continuous(breaks=seq(-0.3, 0.3, by=0.1))
For some reason, seq() is giving me output numbers that are off by minute amounts. This behavior occurs within and outside of my plotting device. As evidenced below, it appears to be a problem with generating negative numbers. It can produce them, but that's where the issue appears.
seq(0.3, 0.9, by=0.1) # test with positives
seq(-0.3, 0.3, by = 0.1) # test with negatives
format(seq(-0.3, 0.3, by = 0.1), scientific = F) # show full number
I read the documentation and couldn't find anything talking about negatives so I'm not sure how to fix it. Is there something I'm doing wrong or excluding? Is there a workaround or another function I should be using?
edit
Marked as duplicate but the duplicate doesn't explicitly provide a solution to this. Here's a few:
# i went with this solution as given in comments to keep it all contained within seq()
seq(-0.3, 0.3, length.out=7)
# from the answers
seq(-3, 3, by=1)/10
# didn't work for my case but should work as a general rule
round(x, digits=n) # x would be the seq(-0.3, 0.3, by = 0.1) and n=1 in my case)

For a workaround you could try seq(-3,3,1)/10

Related

How to smooth a curve in R?

location diffrence<-c(0,0.5,1,1.5,2)
Power<-c(0,0.2,0.4,0.6,0.8,1)
plot(location diffrence,Power)
The guy which has written the paper said he has smoothed the curve using a weighted moving average with weights vector w = (0.25,0.5,0.25) but he did not explained how he did this and with which function he achieved that.i am really confused
Up front, as #MartinWettstein cautions, be careful in when you smooth data and what you do with it (infer from it). Having said that, a simple exponential moving average might look like this.
# replacement data
x <- seq(0, 2, len=5)
y <- c(0, 0.02, 0.65, 1, 1)
# smoothed
ysm <-
zoo::rollapply(c(NA, y, NA), 3,
function(a) Hmisc::wtd.mean(a, c(0.25, 0.5, 0.25), na.rm = TRUE),
partial = FALSE)
# plot
plot(x, y, type = "b", pch = 16)
lines(x, ysm, col = "red")
Notes:
the zoo:: package provides a rolling window (3-wide here), calling the function once for indices 1-3, then again for indices 2-4, then 3-5, 4-6, etc.
with rolling-window operations, realize that they can be center-aligned (default of zoo::rollapply) or left/right aligned. There are some good explanations here: How to calculate 7-day moving average in R?)
I surround the y data with NAs so that I can mimic a partial window. Normally with rolling-window ops, if k=3, then the resulting vector is length(y) - (k-1) long. I'm inferring that you want to include data on the ends, so the first smoothed data point would be effectively (0.5*0 + 0.25*0.02)/0.75, the second smoothed data point (0.25*0 + 0.5*0.02 + 0.25*0.65)/1, and the last smoothed data point (0.25*1 + 0.5*1)/0.75. That is, omitting the 0.25 times a missing data point. That's a guess and can easily be adjusted based on your real needs.
I'm using Hmisc::wtd.mean, though it is trivial to write this weighted-mean function yourself.
This is suggestive only, and not meant to be authoritative. Just to help you begin exploring your smoothing processes.

How to prevent slight errors in axis break locations in ggplot when using non-integer break points in scale_y_continuous?

I've encountered an unusual issue in ggplot2 lately, where axis break points are incorrect, but by super small amounts, causing the axis to display values like 0.29999999992455 where it should instead be displaying 0.3, for example. I've never encountered this problem before during multiple years of using ggplot2 so I'm not sure how reproducible it will be for others, but below is example code that causes the problem for me. Thanks in advance for any help!
Also, here's the output I get from the code: https://i.stack.imgur.com/6EieP.png
#disabling scientific notation, since for some reason y-axis values were being displayed that way otherwise
options(scipen=999)
#make dataframe
df <- data.frame(cat=letters[1:5], yvar=seq(-0.3,0.3,0.15))
#make plot
ggplot(df, aes(x=cat,y=yvar)) + geom_point() +
scale_y_continuous(limits=c(-.32,0.32), breaks=seq(-0.3,0.3,0.1), expand=c(0,0))
This was a very interesting question! The issue is not with respect to scale_y_continuous(), but actually to your call to seq(). Consider the output of that call is as follows:
> seq(-0.3,0.3,0.1)
[1] -3.000000e-01 -2.000000e-01 -1.000000e-01 5.551115e-17 1.000000e-01 2.000000e-01 3.000000e-01
There's your problem. What you want is to output -0.3, -0.2, ... If you specifically type them in via an explicit vector, your plot looks fine. So this code for your plot looks okay:
ggplot(df, aes(x=cat,y=yvar)) +
geom_point() +
scale_y_continuous(
limits=c(-.32,0.32),
breaks=c(-0.3, -0.2, -0.1,0, 0.1, 0.2, 0.3),
expand=c(0,0))
That's all well and good... but what about long sequences, where you would definitely need to use seq? There's some excellent information in the answer posted here that should help, but the basic idea is that seq is a primitive function (I believe that's the proper term), which can be adapted to many different forms. For some strange reason, it seems to have some issues when using the form: seq(-0.3, 0.3, 0.1). There you're specifiying the form: seq(from=-0.3, to=0.3, by=0.1). Pretty sure this is some floating point nonsense causing those issues. :P
What works is if you use the form seq(from, to, length.out), where "length-out" is the desired length of the resulting sequence. When you use that form, you get what you expect:
> seq(from=-.3, to=0.3,length.out = 7)
[1] -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3
And when you put that seq call back into your plot code, it looks identical to the "forced" sequence using c() above:
ggplot(df, aes(x=cat,y=yvar)) +
geom_point() +
scale_y_continuous(
limits=c(-.32,0.32),
breaks=seq(-0.3, 0.3, length.out=7),
expand=c(0,0))

Plotting ECDF results in horizontal lines beyond the expected range. How can I prevent this?

I want to plot the ECDF of a vector. The vector has range [0,1]. However, plotting ECDF(data) results in horizontal lines that extend beyond this range. I want to create a plot that does not have these lines beyond the range [0,1].
Calling plot.stepfun shows that the function chooses a vector of abscissa values that includes the values -0.16 and 1.16, but I don't know why. I have tried manually selecting the abscissa values using the argument xval, but this made no difference.
I have tried using ggplot2, but again this made no difference.
I have also tried removing the first and last values of the vector, which are 0 and 1, but again this made no difference.
I could of course just use MS Paint, but that seems like a poor solution to the problem.
data <- c(0, 0.0267937939860966, 0.0831161599875003, 0.089312646620322,
0.09, 0.162046969424378, 0.214535013990776, 0.216, 0.254227922418882,
0.29770882206774, 0.3, 0.346218858110426, 0.3483, 0.351120057363453,
0.446176768935429, 0.469316812739393, 0.47178, 0.506720537855168,
0.51, 0.53499413030498, 0.577201705567453, 0.579825, 0.61501969832776,
0.653481161056275, 0.657, 0.667975762603373, 0.6705828, 0.685122481157394,
0.742234640167266, 0.74470167, 0.745169566125031, 0.756545373540315,
0.7599, 0.795669365154443, 0.801746023714245, 0.803996766, 0.828933122166261,
0.83193, 0.837497330035643, 0.848695641093207, 0.8506916541,
0.87169919974533, 0.879781895687186, 0.882351, 0.885279431049518,
0.8870099004, 0.899358675688768, 0.913502229556406, 0.914974950051,
0.915505354483016, 0.9176457, 0.921514704291551, 0.935095914758442,
0.9363300788754, 0.939114814765667, 0.940605918657197, 0.94235199,
0.951503562401266, 0.95252438490057, 0.952993345228527, 0.958244748310785,
0.959646393, 0.963897452890123, 0.964732400211852, 0.970641607614244,
0.9717524751, 0.973212104364713, 0.973888411695313, 0.979355426072477,
0.980181739205269, 0.98022673257, 0.980724900269631, 0.985376582975203,
0.985481180229861, 0.98580953864678, 0.986158712799, 0.989235347816543,
0.989578152973373, 0.989788073567854, 0.9903110989593, 0.9923627402258,
0.992816530697457, 0.99321776927151, 0.994414541359167, 0.994946291138756,
0.995252438490057, 0.995922615192192, 0.9964442204999, 0.99667670694304,
0.997028536003077, 0.997497885105047, 0.997673694860128, 0.997837868960133,
0.998239132446338, 0.998371586402089, 0.998429033902679, 0.998860091673285,
0.998860110481463, 0.999173901730287, 0.999202077337024, 0.999402017502492,
0.999441454135917, 0.999567612655648, 0.999609017895142, 0.999687669141686,
0.999726312526599, 0.999774606597093, 0.999808418768619, 0.999837491356504,
0.999865893138033, 0.99988293066653, 0.999906125196623, 0.999915732168455,
0.999934287637636, 0.999939389009237, 0.999954001346345, 0.999956435850389,
0.999967800942442, 0.999968709576019, 0.999977460659709, 0.999984222461796,
0.999988955723257, 0.99999226900628, 0.999994588304396, 0.999996211813077,
0.999997348269154, 1)
plot(ecdf(data), do.points=FALSE)
I would like to be able to plot the ECDF with the x axis matching the range of the vector, that is, [0,1].

retrieve x and y value based on graph in r

I'm new in r and I would ask you all some help. I have x (value) and prob (it's probability) as follow:
x <- c(0.00, 1.08, 2.08, 3.08, 4.08, 4.64, 4.68)
prob <- c(0.000, 0.600, 0.370, 0.010, 0.006, 0.006, 0.006)
My aim is to contruct an estimate distribution graph based on those values. So far, I use qplot(x,prob,geom=c("point", "smooth"),span=0.55) to make it and it's shown here
https://i.stack.imgur.com/aVgNk.png
my question are:
Are there any other ways to contruct a nice distribution like that
without using qplot?
I need to retrieve the all the x values (i.e., 0.5, 1, 1.2, etc) and their corresponding prob values. Can can I do that?
I've been searching for a while, but with no luck.
Thank you all
If you're looking to predict the values of prob for given values of x, this is one way to do it. Note I'm using a loess prediction function here (because I believe it's the default for ggplot's smooth geom, which you've used), which may or may not be appropriate for you.
x <- c(0.00, 1.08, 2.08, 3.08, 4.08, 4.64, 4.68)
prob <- c(0.000, 0.600, 0.370, 0.010, 0.006, 0.006, 0.006)
First make a data frame with one column, I'll put a whole lot of data points into that column, just to make a bunch of predictions.
df <- data.frame( datapoints = seq.int( 0, max(x), 0.1 ) )
Then create a prediction column. I'm using the predict function, passing a loess smoothed function to it. The loess function is given your input data, and predict is asked to use the function from loess to predict for the values of df$datapoints
df$predicted <- predict( loess( prob ~ x, span = 0.55 ), df$datapoints )
Here's what the output looks like.
> head( df )
datapoints predicted
1 0.0 0.01971800
2 0.1 0.09229939
3 0.2 0.15914675
4 0.3 0.22037484
5 0.4 0.27609841
6 0.5 0.32643223
On the plotting side of things, ggplot2 is a good way to go, so I don't see a reason to shy away from qplot here. If you want more flexibility in what you get from ggplot2, you can code the functions more explicitly (as #Jan Sila has mentioned in another answer). Here's a way with ggplot2's more common (and more flexible) syntax:
plot <- ggplot( data = df,
mapping = aes( x = datapoints,
y = predicted ) ) +
geom_point() +
geom_smooth( span = 0.55 )
plot
you can get the observations once you specify the probability distribution.Have a look here. This will help you and walk you through MASS package.
..nicer graphs? I think ggplot is the best (also pretty sure that grapgh is from ggplot2). IF you want exacatly that, then you want a blue geom_line and on top of that add geom_point with the same mapping :) Try to have alook at tutorials, or we can help you out with that.

Reinitializing variables in R and having them update globally

I'm not sure how to pose this question with the right lingo and the related questions weren't about the same thing. I wanted to plot a function and noticed that R wasn't udpating the plot with my change in a coefficient.
a <- 2
x <- seq(-1, 1, by=0.1)
y <- 1/(1+exp(-a*x))
plot(x,y)
a <- 4
plot(x,y) # no change
y <- 1/(1+exp(-a*x)) # redefine function
plot(x,y) # now it updates
Just in case I didn't know what I was doing, I followed the syntax on this R basic plotting tutorial. The only difference was the use of = instead of <- for assignment of y = 1/(1+exp(-a*x)). The result was the same.
I've actually never just plotted a function with R, so this was the first time I experienced this. It makes me wonder if I've seen bad results in other areas if re-defined variables aren't propagated to functions or objects initialized with the initial value.
1) Am I doing something wrong and there is a way to have variables sort of dynamically assigned so that functions take into account the current value vs. the value it had when they were created?
2) If not, is there a common way R programmers work around this when tweaking variable assignments and making sure everything else is properly updated?
You are not, in fact, plotting a function. Instead, you are plotting two vectors. Since you haven't updated the values of the vector before calling the next plot, you get two identical plots.
To plot a function directly, you need to use the curve() function:
f <- function(x, a)1/(1+exp(-a*x))
Plot:
curve(f(x, 1), -1, 1, 100)
curve(f(x, 4), -1, 1, 100)
R is not Excel, or MathCAD, or any other application that might lead you to believe that changing an object's value might update other vectors that might have have used that value at some time in the past. When you did this
a <- 4
plot(x,y) # no change
There was no change in 'x' or 'y'.
Try this:
curve( 1/(1+exp(-a*x)) )
a <- 10
curve( 1/(1+exp(-a*x)) )

Resources