I'm new in r and I would ask you all some help. I have x (value) and prob (it's probability) as follow:
x <- c(0.00, 1.08, 2.08, 3.08, 4.08, 4.64, 4.68)
prob <- c(0.000, 0.600, 0.370, 0.010, 0.006, 0.006, 0.006)
My aim is to contruct an estimate distribution graph based on those values. So far, I use qplot(x,prob,geom=c("point", "smooth"),span=0.55) to make it and it's shown here
https://i.stack.imgur.com/aVgNk.png
my question are:
Are there any other ways to contruct a nice distribution like that
without using qplot?
I need to retrieve the all the x values (i.e., 0.5, 1, 1.2, etc) and their corresponding prob values. Can can I do that?
I've been searching for a while, but with no luck.
Thank you all
If you're looking to predict the values of prob for given values of x, this is one way to do it. Note I'm using a loess prediction function here (because I believe it's the default for ggplot's smooth geom, which you've used), which may or may not be appropriate for you.
x <- c(0.00, 1.08, 2.08, 3.08, 4.08, 4.64, 4.68)
prob <- c(0.000, 0.600, 0.370, 0.010, 0.006, 0.006, 0.006)
First make a data frame with one column, I'll put a whole lot of data points into that column, just to make a bunch of predictions.
df <- data.frame( datapoints = seq.int( 0, max(x), 0.1 ) )
Then create a prediction column. I'm using the predict function, passing a loess smoothed function to it. The loess function is given your input data, and predict is asked to use the function from loess to predict for the values of df$datapoints
df$predicted <- predict( loess( prob ~ x, span = 0.55 ), df$datapoints )
Here's what the output looks like.
> head( df )
datapoints predicted
1 0.0 0.01971800
2 0.1 0.09229939
3 0.2 0.15914675
4 0.3 0.22037484
5 0.4 0.27609841
6 0.5 0.32643223
On the plotting side of things, ggplot2 is a good way to go, so I don't see a reason to shy away from qplot here. If you want more flexibility in what you get from ggplot2, you can code the functions more explicitly (as #Jan Sila has mentioned in another answer). Here's a way with ggplot2's more common (and more flexible) syntax:
plot <- ggplot( data = df,
mapping = aes( x = datapoints,
y = predicted ) ) +
geom_point() +
geom_smooth( span = 0.55 )
plot
you can get the observations once you specify the probability distribution.Have a look here. This will help you and walk you through MASS package.
..nicer graphs? I think ggplot is the best (also pretty sure that grapgh is from ggplot2). IF you want exacatly that, then you want a blue geom_line and on top of that add geom_point with the same mapping :) Try to have alook at tutorials, or we can help you out with that.
Related
location diffrence<-c(0,0.5,1,1.5,2)
Power<-c(0,0.2,0.4,0.6,0.8,1)
plot(location diffrence,Power)
The guy which has written the paper said he has smoothed the curve using a weighted moving average with weights vector w = (0.25,0.5,0.25) but he did not explained how he did this and with which function he achieved that.i am really confused
Up front, as #MartinWettstein cautions, be careful in when you smooth data and what you do with it (infer from it). Having said that, a simple exponential moving average might look like this.
# replacement data
x <- seq(0, 2, len=5)
y <- c(0, 0.02, 0.65, 1, 1)
# smoothed
ysm <-
zoo::rollapply(c(NA, y, NA), 3,
function(a) Hmisc::wtd.mean(a, c(0.25, 0.5, 0.25), na.rm = TRUE),
partial = FALSE)
# plot
plot(x, y, type = "b", pch = 16)
lines(x, ysm, col = "red")
Notes:
the zoo:: package provides a rolling window (3-wide here), calling the function once for indices 1-3, then again for indices 2-4, then 3-5, 4-6, etc.
with rolling-window operations, realize that they can be center-aligned (default of zoo::rollapply) or left/right aligned. There are some good explanations here: How to calculate 7-day moving average in R?)
I surround the y data with NAs so that I can mimic a partial window. Normally with rolling-window ops, if k=3, then the resulting vector is length(y) - (k-1) long. I'm inferring that you want to include data on the ends, so the first smoothed data point would be effectively (0.5*0 + 0.25*0.02)/0.75, the second smoothed data point (0.25*0 + 0.5*0.02 + 0.25*0.65)/1, and the last smoothed data point (0.25*1 + 0.5*1)/0.75. That is, omitting the 0.25 times a missing data point. That's a guess and can easily be adjusted based on your real needs.
I'm using Hmisc::wtd.mean, though it is trivial to write this weighted-mean function yourself.
This is suggestive only, and not meant to be authoritative. Just to help you begin exploring your smoothing processes.
I want to plot the ECDF of a vector. The vector has range [0,1]. However, plotting ECDF(data) results in horizontal lines that extend beyond this range. I want to create a plot that does not have these lines beyond the range [0,1].
Calling plot.stepfun shows that the function chooses a vector of abscissa values that includes the values -0.16 and 1.16, but I don't know why. I have tried manually selecting the abscissa values using the argument xval, but this made no difference.
I have tried using ggplot2, but again this made no difference.
I have also tried removing the first and last values of the vector, which are 0 and 1, but again this made no difference.
I could of course just use MS Paint, but that seems like a poor solution to the problem.
data <- c(0, 0.0267937939860966, 0.0831161599875003, 0.089312646620322,
0.09, 0.162046969424378, 0.214535013990776, 0.216, 0.254227922418882,
0.29770882206774, 0.3, 0.346218858110426, 0.3483, 0.351120057363453,
0.446176768935429, 0.469316812739393, 0.47178, 0.506720537855168,
0.51, 0.53499413030498, 0.577201705567453, 0.579825, 0.61501969832776,
0.653481161056275, 0.657, 0.667975762603373, 0.6705828, 0.685122481157394,
0.742234640167266, 0.74470167, 0.745169566125031, 0.756545373540315,
0.7599, 0.795669365154443, 0.801746023714245, 0.803996766, 0.828933122166261,
0.83193, 0.837497330035643, 0.848695641093207, 0.8506916541,
0.87169919974533, 0.879781895687186, 0.882351, 0.885279431049518,
0.8870099004, 0.899358675688768, 0.913502229556406, 0.914974950051,
0.915505354483016, 0.9176457, 0.921514704291551, 0.935095914758442,
0.9363300788754, 0.939114814765667, 0.940605918657197, 0.94235199,
0.951503562401266, 0.95252438490057, 0.952993345228527, 0.958244748310785,
0.959646393, 0.963897452890123, 0.964732400211852, 0.970641607614244,
0.9717524751, 0.973212104364713, 0.973888411695313, 0.979355426072477,
0.980181739205269, 0.98022673257, 0.980724900269631, 0.985376582975203,
0.985481180229861, 0.98580953864678, 0.986158712799, 0.989235347816543,
0.989578152973373, 0.989788073567854, 0.9903110989593, 0.9923627402258,
0.992816530697457, 0.99321776927151, 0.994414541359167, 0.994946291138756,
0.995252438490057, 0.995922615192192, 0.9964442204999, 0.99667670694304,
0.997028536003077, 0.997497885105047, 0.997673694860128, 0.997837868960133,
0.998239132446338, 0.998371586402089, 0.998429033902679, 0.998860091673285,
0.998860110481463, 0.999173901730287, 0.999202077337024, 0.999402017502492,
0.999441454135917, 0.999567612655648, 0.999609017895142, 0.999687669141686,
0.999726312526599, 0.999774606597093, 0.999808418768619, 0.999837491356504,
0.999865893138033, 0.99988293066653, 0.999906125196623, 0.999915732168455,
0.999934287637636, 0.999939389009237, 0.999954001346345, 0.999956435850389,
0.999967800942442, 0.999968709576019, 0.999977460659709, 0.999984222461796,
0.999988955723257, 0.99999226900628, 0.999994588304396, 0.999996211813077,
0.999997348269154, 1)
plot(ecdf(data), do.points=FALSE)
I would like to be able to plot the ECDF with the x axis matching the range of the vector, that is, [0,1].
This question already has an answer here:
Messy plot when plotting predictions of a polynomial regression using lm() in R
(1 answer)
Closed 6 years ago.
I am trying to fit a non linear function to a given set of data (x and y in code snippet), the function is defined as
f(x) = a/((sin((x-b)/2))^4)
x <- c(0, 5, -5, 10, -10, 15, -15, 20, -20, 25, -25, 30, -30)
y <- c(4.21, 3.73, 2.08, 1.1, 0.61, 0.42, 0.13, 0.1, 0.04, 0.036667, 0.016667, 0.007778, 0.007778)
plot(x,y, log="y")
This is how the initial graph on which I should fit before mentioned function looks like.
But when I try to fit using nls and plot the curve, the graph does not look quite right
f <- function(x,a,b) { a/((sin((x-b)/2))^4) }
fitmodel <- nls (y ~ f(x,a,b), start=list(a=1,b=1))
lines(x, predict(fitmodel))
This is what I see:
I am pretty sure I am doing something wrong here and will appreciate any help from you.
The R interpreter did exactly what you told it to do.
x is unsorted array.
Therefore, predict(fitmodel) make predictions for these unsorted points.
lines(x, predict(fitmodel)) connects the points in the given order. It connects (x[1], predict(fitmodel)[1]) to (x[2], predict(fitmodel)[2]) to (x[3], predict(fitmodel)[3]) etc. Since the points are not sorted by x, you see the picture in the graph.
You might do ind <- order(x); x <- x[ind]; y <- y[ind] as per Zheyuan Li'd suggestion.
Besides, your model makes no sense.
f <- function(x,a,b) { a/((sin((x-b)/2))^4) }
fitmodel <- nls (y ~ f(x,a,b), start=list(a=1,b=1))
For any a and b, f will be a periodic function with period 2π, while your x changes from -30 to 30 with step 5. You cannot reasonably approximate your points with such a function.
data:
varx <- c(1.234, 1.32, 1.54, 2.1 , 2.76, 3.2, 4.56, 5.123, 6.1, 6.9)
hist(varx)
Gives me
What I would like to do is create the same histogram but with spaces in between the bars.
I've tried what is found here How to separate the two leftmost bins of a histogram in R
But no luck.
When I do it on my actual data I get:
This is my actual data:
a <- c(2.6667
,4.45238
,5.80952
,3.09524
,3.52381
,4.04762
,4.53488
,3.80952
,5.7619
,3.42857
,4.57143
,6.04762
,4.02381
,5.47619
,4.09524
,6.18182
,4.85714
,4.52381
,5.61905
,4.90476
,4.42857
,5.31818
,2.47619
,5
,2.78571
,4.61905
,3.71429
,2.47619
,4.33333
,4.80952
,6.52381
,5.06349
,4.06977
,5.2381
,5.90476
,4.04762
,3.95238
,2.42857
,4.38333
,4.225
,3.96667
,3.875
,3.375
,4.18333
,5.45
,4.45
,3.76667
,4.975
,2.2
,5.53846
,6.1
,5.9
,4.25
,5.7
,3.475
,3.5
,4
,4.38333
,3.81667
,3.9661
,1.2332
,1.2443
,5.4323
,2.324
,1.342
,1.321
,3.81667
,3.9661
,1.2332
,1.2443
,5.4323
,2.324
,1.342
,1.321
,4.32
,6.43
,6.98
,4.321
,3.253
,2.123
,1.234)
Why do I get these skinny bars and how do I remove them?
The code works, but needs smaller numbers:
varx <- c(1.234, 1.32, 1.54, 2.1 , 2.76, 3.2, 4.56, 5.123, 6.1, 6.9)
hist(varx, breaks=rep(1:7,each=2)+c(-.04,.04), freq=T)
This returns a warning as it prefers to return "density" instead of "frequency" after manually changing the breaks in that way. Change to freq=F if you prefer.
In general this is a bad idea - histograms show the continuity of data, and gaps ruin that. You can use the previous code with smaller gaps (your values hit the previous gaps):
hist(varx,breaks=rep(1:7,each=2)+c(-.05,.05))
But this is not a general solution - any values closer than 0.05 to the cutoff will end up in the gap region.
We can make a bar plot of factored data using ggplot2, depending on how you want to round values. In this case, I have taken the floor (rounds down to nearest integer), and rounded to the nearest integer:
library(ggplot2)
varx <- as.data.frame(varx)
varx$floor <- floor(varx$varx)
varx$round <- round(varx$varx)
ggplot(varx, aes(x = as.factor(floor))) + geom_bar()
ggplot(varx, aes(x = as.factor(round))) + geom_bar()
In case anyone is looking for a more vanilla solution, you can just set the border argument for hist to be the same color as the background of the plot:
par(mfrow=1:2)
# connected bars
hist(y <- rnorm(100))
# seemingly disconnected bars
hist(y, border=par('bg'))
Adding artificial separation between bars
I am using the R package segmented to calculate parameters for a model, in which the response variable is linearly correlated with the explanatory variable until a breakpoint, then the response variable becomes independent from the explanatory variable. In other words, a segmented linear model with the second part having a slope = 0.
What I already did is:
linear1 <- lm(Y ~ X)
linear2 <- segmented (linear1, seg.Z = ~ X, psi = 2)
This gives a model that have a very good first line, but the second line is not horizontal (but not significant). I want to make the second line horizontal. (psi = 2 is the place where I observed a breakpoint.)
Also, when I use "abline" to show the broken line on the plotting, it only show the first part of the model, giving a warning: "only using the first two of 4 regression coefficients". How could I display both parts of the model?
To input my data into R:
X <- c(0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.8, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0)
Y <- c(1.31, 1.60, 1.86, 2.16, 2.44, 2.71, 3.00, 3.24, 3.57, 3.81, 3.80, 3.83, 3.78, 3.94, 3.75, 3.89)
This is as easy as using the plot method for segmented class objects provided by the package segmented and linked in the help for segmented
Assuming your data is in the data.frame d
linear2 <- segmented (linear1, seg.Z = ~ X, psi = 2, data = d)
plot(linear2)
points(Y~X, data = d)
An easy way to fudge a horizontal line would be to replace the coefficient with value required for that line to be horizontal
fudgedmodel <- linear2
fudgedmodel$coefficients[3] <- - fudgedmodel$coefficients[2]
plot(fudgedmodel)
points(Y~X, data = d)
Searching for the same thing and found a neat answer on this post from the R help mailing list:
https://stat.ethz.ch/pipermail/r-help/2007-July/137625.html
Here's an edited version of that answer that cuts straight to the solution:
library(segmented)
# simulate data - linear slope down until some point, at which slope=0
n<-50
x<-1:n/n
y<- 0-pmin(x-.5,0)+rnorm(50)*.03
plot(x,y) #This should be your scatterplot..
abline(0,0,lty=2)
# a parsimonious modelling: constrain right slope=0
# NB. This is probably what you want...
o<-lm(y~1)
xx<- -x
o2<-segmented(o,seg.Z=~xx,psi=list(xx=-.3))
slope(o2)
points(x,fitted(o2),col=2)
# now constrain \hat{\mu}(x)=0 for x>psi (you can do this if you know what the value of y is when x becomes independent)
o<-lm(y~0)
xx<- -x
o3<-segmented(o,seg.Z=~xx,psi=list(xx=-.3))
slope(o3)
points(x,fitted(o3),col=3)
You should get something like this. Red points are the first method, which sounds like the one for you. Green points are the second method, which only applies if you already know the value of y at which x becomes independent: