I am trying to smooth my data set, using kernel or loess smoothing method. But, They are all not clear or not what I want. Several questions are the followings.
My x data is "conc" and y data is "depth", which is ex. cm.
1) Kernel smooth
k <- kernel("daniell", 150)
plot(k)
K <- kernapply(conc, k)
plot(conc~depth)
lines(K, col = "red")
Here, my data is smoothed by frequency=150. This means that every data point is averaged by neighboring (right and left) 150 data points? What "daniell" means? I could not find what it means online.
2) Loess smooth
p<-qplot(depth, conc, data=total)
p1 <- p + geom_smooth(method = "loess", size = 1, level=0.95)
Here, what is the default of loess smooth function? If I want to smooth my data with frequency=150 like above case (moving average by every 150 data point), how can I modify this code?
3) To show y-axis with log scale, I put "log10(conc)", instead of "conc", and it worked. But, I cannot change the y-axis tick label. I tried to use "scale_y_log10(limits = c(1,1e3))" in my code to show axis tick labe like 10^0, 10^1, 10^2..., but did not work.
Please answer my questions. Thanks a lot for your help.
Sum
Related
I'm using the wireframe function in order to obtain a 3d plot. Since I'm using some models in order to forecast, I want to plot observed and forecasted values in the sime wireframe plot, but I want to know if it's possible to change color from the begin of forecasting. This is the result I obtain using this code:
wireframe(grid$mxt~grid$ages*grid$years,
xlab=TeX("$x$"),ylab=TeX("$t$"),zlab=TeX("$log\\mu_x(t)"),
drape = TRUE,col="black",
col.regions = colorRampPalette(c("yellow", "red"))(100),
scales = list(arrows=FALSE, cex=0.8, col = "black", font = 1),
aspect=c(1,0.6))
What I want is to change the surface color from the 2011 years, in order to make understandable the plot. I attach the data. Thank you
Well, I think I partially solved. It seems that wireframe function does not plot different surface with different (x,y). So what you have to do is to extend the data, in order to have two different surface that have the same (x,y), but before a certain y one of them contains NA, and after the other one contains NA. I post the code I used and the results (well I have to change colours but thi is quite easy).
This is an age-year plot, with z equal to log-mortality rate for ages and years. The matrix mxt1 is of dimension (n, n1), while the matrix pred is of dimension (n, n2); y1 is a vector of dimension n1 (years of observed values) while y2 is of dimension n2 (years of predicted values).
grid<-expand.grid(list(ages=ages, years=c(y1,y2)))
grid<-rbind(grid,grid)
grid$mxt <- c(cbind(mxt1, matrix(nrow=n,ncol=n2)),
cbind(matrix(nrow=n,ncol=(n1-1)),mxt1[,n1],pred))
grid$group <- factor(c(rep("obs",n*n1+n*n2),rep("for",n*n1+n*n2)))
wireframe(mxt~ages*years,data=grid,
groups=group, col.groups=c("red","green"))
The trick in the second part of vector grid$mxt is for bind the two surface. The result is this.
Hope to have helped someone.
This is my data:
y<-c(1.8, 2, 2.8, 2.9, 2.46, 1.8,0.3,1.1,0.664,0.86,1,1.9)
x<- c(1:12)
data<-as.data.frame(cbind(y,x))
plot(data$y ~ data$x)
I want to fit a curve through these points so that I can generate the intermediate predicted values. I need a curve that goes through the points. I don't care what function it fits.
I consulted this link.
Fitting a curve to specific data
install.packages("rgp")
library(rgp)
result <- symbolicRegression(y ~ x,data=data,functionSet=mathFunctionSet,
stopCondition=makeStepsStopCondition(2000))
# inspect results, they'll be different every time...
(symbreg <- result$population[[which.min(sapply(result$population,
result$fitnessFunction))]])
function (x)
exp(sin(sqrt(x)))
# inspect visual fit
ggplot() + geom_point(data=data, aes(x,y), size = 3) +
geom_line(data=data.frame(symbx=data$x, symby=sapply(data$x, symbreg)),
aes(symbx, symby), colour = "red")
If I repeat this analysis again, every time the function above produces a different curve. Does anyone know why is this happening and whether this is a right way to fit a curve in these points? Also this function does not go through each points therefore I cannot obtain the intermediates points.
A standard approach is to fit a spline, this gives a nice curve that goeas through all points. See spline. Concretely you would use a call like:
spline(x = myX, y = myY, xout=whereToInterpolate)
or just calculating 100 points to your example:
ss <- spline(x,y, n=100)
plot(x,y)
lines(ss)
Note there is also a smoothing spline which may help for noisy data.
If the curve doesn't need to be smooth there is the simpler approx which does linear interpolation.
approx(x = myX, y = myY, xout=whereToInterpolate)
Could someone explain me why I get different lines when I plot? Somehow I thought the line should be the same
data(aircraft)
help(aircraft)
attach(aircraft)
lgWeight <- log(Weight)
library(KernSmooth)
# a) Fit a nonparametric regression to data (xi,yi) and save the estimated values mˆ (xi).
# Regression of degree 2 polynomial of lgWeight against Yr
op <- par(mfrow=c(2,1))
lpr1 <- locpoly(Yr,lgWeight, bandwidth=7, degree = 2, gridsize = length(Yr))
plot(Yr,lgWeight,col="grey", ylab="Log(Weight)", xlab = "Year")
lines(lpr1,lwd=2, col="blue")
lines(lpr1$y, col="black")
How can I get the values from the model? If I print the model, it gives me the values on $x and $y, but somehow if I plot them, is not the same as the blue line. I need the values of the fitted model (blue) for every x, could someone help me?
The fitted model (blue curve) is correctly in lpr1. As you said, the correct y-values are in lpr1$y and the correct x-values are in lpr1$x.
The reason the second plot looks like a straight line is because you are only giving the plot function one variable, lpr1$y. Since you don't specify the x-coordinates, R will automatically plot them along an index, from 1 to the length of the y variable.
The following are two explicit and equivalent ways to plot the curve and line:
lines(x = lpr1$x, y = lpr1$y,lwd=2, col="blue") # plots curve
lines(x = 1:length(lpr1$y), y = lpr1$y, col="black") # plot line
I'm trying to log-transform the x axis of a density plot and get unexpected results. The code without the transformation works fine:
library(ggplot2)
data = data.frame(x=c(1,2,10,11,1000))
dens = density(data$x)
densy = sapply(data$x, function(x) { dens$y[findInterval(x, dens$x)] })
ggplot(data, aes(x = x)) +
geom_density() +
geom_point(y = densy)
If I add scale_x_log10(), I get the following result:
Apart from the y values having been rescaled, something seems to have happened to the x values as well -- the peaks of the density function are not quite where the points are.
Am I using the log transformation incorrectly here?
The shape of the density curve changes after the transformation because the distribution of the data has changed and the bandwidths are different. If you set a bandwidth of (bw=1000) prior to the transformation and 10 afterward, you will get two normal looking densities (with different y-axis values because the support will be much larger in the first case). Here is an example showing how varying bandwidths change the shape of the density.
data = data.frame(x=c(1,2,10,11,1000), y=0)
## Examine how changing bandwidth changes the shape of the curve
par(mfrow=c(2,1))
greys <- colorRampPalette(c("black", "red"))(10)
plot(density(data$x), main="No Transform")
points(data, pch=19)
plot(density(log10(data$x)), ylim=c(0,2), main="Log-transform w/ varying bw")
points(log10(data$x), data$y, pch=19)
for (i in 1:10)
points(density(log10(data$x), bw=0.02*i), col=greys[i], type="l")
legend("topright", paste(0.02*1:10), col=greys, lty=2, cex=0.8)
So I have a barplot in which the y axis is the log (frequencies). From just eyeing it, it appears that bars decrease exponentially, but I would like to know this for sure. What I want to do is also plot an exponential on this same graph. Thus, if my bars fall below the exponential, I would know that my bars to decrease either exponentially or faster than exponential, and if the bars lie on top of the exponential, I would know that they dont decrease exponentially. How do I plot an exponential on a bar graph?
Here is my graph if that helps:
If you're trying to fit density of an exponential function, you should probably plot density histogram (not frequency). See this question on how to plot distributions in R.
This is how I would do it.
x.gen <- rexp(1000, rate = 3)
hist(x.gen, prob = TRUE)
library(MASS)
x.est <- fitdistr(x.gen, "exponential")$estimate
curve(dexp(x, rate = x.est), add = TRUE, col = "red", lwd = 2)
One way of visually inspecting if two distributions are the same is with a Quantile-Quantile plot, or Q-Q plot for short. Typically this is done when inspecting if a distribution follows standard normal.
The basic idea is to plot your data, against some theoretical quantiles, and if it matches that distribution, you will see a straight line. For example:
x <- qnorm(seq(0,1,l=1002)) # Theoretical normal quantiles
x <- x[-c(1, length(x))] # Drop ends because they are -Inf and Inf
y <- rnorm(1000) # Actual data. 1000 points drawn from a normal distribution
l.1 <- lm(sort(y)~sort(x))
qqplot(x, y, xlab="Theoretical Quantiles", ylab="Actual Quantiles")
abline(coef(l.1)[1], coef(l.1)[2])
Under perfect conditions you should see a straight line when plotting the theoretical quantiles against your data. So you can do the same plotting your data against the exponential function you think it will follow.