I am trying to make an exponential plot of a variable. The coefficient of the variable is very high (350 million) from the GLM results. With other variables that had lower coefficients, I was able to plot them easily with no issues. I have been trying to set the sequence interval smaller and smaller but it keeps crashing r when I try to plot it.
Any suggestions? I have tried breaking up the data already with no luck.
My vectors are very large numerics as well (18Mb).
chlautcnod<-seq.int(0, 2.45259, 0.000001)
chlautcnodline<- glmnodosaALL$coefficients[1] +
glmnodosaALL$coefficients[2]*mean(bornodosaAP$Chl_spring) +
glmnodosaALL$coefficients[3]*chlautcnod + glmnodosaALL$coefficients[4]*mean(bornodosaAP$Dist_coast) +
glmnodosaALL$coefficients[5]*mean(bornodosaAP$Chl_winter)+ glmnodosaALL$coefficients[6]*mean(bornodosaAP$Depth) +
glmnodosaALL$coefficients[7]*mean(bornodosaAP$Chl_yr_avg)+ glmnodosaALL$coefficients[8]*mean(bornodosaAP$Dist_complete_river) +
glmnodosaALL$coefficients[9]*mean(bornodosaAP$Temp_yr_min)+ glmnodosaALL$coefficients[10]*mean(bornodosaAP$Chl_summer)+
glmnodosaALL$coefficients[11]*mean(bornodosaAP$Chl_yr_max)+ glmnodosaALL$coefficients[12]*mean(bornodosaAP$SWH_summer)+
glmnodosaALL$coefficients[13]*mean(bornodosaAP$SWH_yr_min)+ glmnodosaALL$coefficients[14]*mean(bornodosaAP$SWH_spring)
gc(plot(exp(1)^chlautcnodline~chlautcnod, xlab = (expression(paste("Chlorophyll-α Autumn (mg/m"^"3"~")"))), ylab= "Probability of C. nodosa occurance",ylim=c(0,0.05),xlim=c(0.15,0.17), type="l", bty="l")
Related
I am trying to figure out residual distances for a length-mass association, and am running into an issue where the predicted values line does not match the points on the scatterplot at all, although I believe I am using the correct code. I've attached a picture of the plot I'm getting... any ideas as to what's going wrong?
logTL<-log10(bd.1$TL)
logMass<-log10(bd.1$mass)
#linear relation bt log TL and log mass
lma<-lm(logMass~logTL)
summary(lma)
#plot and fit line to data
plot(logMass, logTL,xlab="log (base 10) total length", ylab="log (base 10) mass")
abline(coefficients(lma))
I think you should exchange the order of logTL and logMass in your plot, i.e.,
plot(logTL, logMass, xlab="log (base 10) total length", ylab="log (base 10) mass")
since you did regression of logMass with respect to logTL, i.e., lma<-lm(logMass~logTL)
Otherwise, you can adapt it in another way, i.e., abline(1/coefficients(lma)).
in the documentation for coord_trans function that is used for coordinates transformation it says that the difference between this function and scale_x_log10 is transformation occurs after statistics, and scale transformation occurs before , I didn't get the point check documentation here .
and how the data is plotted using both methods
The quote from the documentation you supplied tells us that scale transformation occurs before any statistical analysis pertaining to the plot.
The example provided in the documentation is especially informative, since it involves regression analysis. In the case of scale transformation, i.e. using
d <- subset(diamonds, carat > 0.5)
qplot(carat, price, data = d, log="xy") + geom_smooth(method="lm"),
the scales are first transformed and then the regression analysis is performed. Minimizing the SS of the errors is done on transformed axes (or transformed data), which you would only want if you thought that there is a linear relationship between the logs of variables. The result is a straight line on a log-log plot, even though the axes are not scaled 1:1 (hard to see in this example).
Meanwhile, when using
qplot(carat, price, data = d) +
geom_smooth(method="lm") +
coord_trans(x = "log10", y = "log10")
the regression analysis is performed first on untransformed data (and axes, that is, independently of the plot) and then everything is plotted with transformed coordinates. This results in the regression line not being straight at all, because its equation (or rather the coordinates of its points) is transformed in the process of coordinate transformation.
This is illustrated further in the documentation by using
library(scales)
qplot(carat, price, data=diamonds, log="xy") +
geom_smooth(method="lm") +
coord_trans(x = exp_trans(10), y = exp_trans(10))
Where you can see that 1. using a scale transformation, 2. fitting a line and 3. transforming coordinates back to the original (linear) system, this doesn't produce a straight line as it should. In the first scenario, you actually fitted an exponential curve which looked straight on a log-log plot.
I am trying to smooth my data set, using kernel or loess smoothing method. But, They are all not clear or not what I want. Several questions are the followings.
My x data is "conc" and y data is "depth", which is ex. cm.
1) Kernel smooth
k <- kernel("daniell", 150)
plot(k)
K <- kernapply(conc, k)
plot(conc~depth)
lines(K, col = "red")
Here, my data is smoothed by frequency=150. This means that every data point is averaged by neighboring (right and left) 150 data points? What "daniell" means? I could not find what it means online.
2) Loess smooth
p<-qplot(depth, conc, data=total)
p1 <- p + geom_smooth(method = "loess", size = 1, level=0.95)
Here, what is the default of loess smooth function? If I want to smooth my data with frequency=150 like above case (moving average by every 150 data point), how can I modify this code?
3) To show y-axis with log scale, I put "log10(conc)", instead of "conc", and it worked. But, I cannot change the y-axis tick label. I tried to use "scale_y_log10(limits = c(1,1e3))" in my code to show axis tick labe like 10^0, 10^1, 10^2..., but did not work.
Please answer my questions. Thanks a lot for your help.
Sum
So I have a barplot in which the y axis is the log (frequencies). From just eyeing it, it appears that bars decrease exponentially, but I would like to know this for sure. What I want to do is also plot an exponential on this same graph. Thus, if my bars fall below the exponential, I would know that my bars to decrease either exponentially or faster than exponential, and if the bars lie on top of the exponential, I would know that they dont decrease exponentially. How do I plot an exponential on a bar graph?
Here is my graph if that helps:
If you're trying to fit density of an exponential function, you should probably plot density histogram (not frequency). See this question on how to plot distributions in R.
This is how I would do it.
x.gen <- rexp(1000, rate = 3)
hist(x.gen, prob = TRUE)
library(MASS)
x.est <- fitdistr(x.gen, "exponential")$estimate
curve(dexp(x, rate = x.est), add = TRUE, col = "red", lwd = 2)
One way of visually inspecting if two distributions are the same is with a Quantile-Quantile plot, or Q-Q plot for short. Typically this is done when inspecting if a distribution follows standard normal.
The basic idea is to plot your data, against some theoretical quantiles, and if it matches that distribution, you will see a straight line. For example:
x <- qnorm(seq(0,1,l=1002)) # Theoretical normal quantiles
x <- x[-c(1, length(x))] # Drop ends because they are -Inf and Inf
y <- rnorm(1000) # Actual data. 1000 points drawn from a normal distribution
l.1 <- lm(sort(y)~sort(x))
qqplot(x, y, xlab="Theoretical Quantiles", ylab="Actual Quantiles")
abline(coef(l.1)[1], coef(l.1)[2])
Under perfect conditions you should see a straight line when plotting the theoretical quantiles against your data. So you can do the same plotting your data against the exponential function you think it will follow.
I have two density curves plotted using this:
Network <- Mydf$Networks
quartiles <- quantile(Mydf$Avg.Position, probs=c(25,50,75)/100)
density <- ggplot(Mydf, aes(x = Avg.Position, fill = Network))
d <- density + geom_density(alpha = 0.2) + xlim(1,11) + opts(title = "September 2010") + geom_vline(xintercept = quartiles, colour = "red")
print(d)
I'd like to compute the area under each curve for a given Avg.Position range. Sort of like pnorm for the normal curve. Any ideas?
Calculate the density seperately and plot that one to start with. Then you can use basic arithmetics to get the estimate. An integration is approximated by adding together the area of a set of little squares. I use the mean method for that. the length is the difference between two x-values, the height is the mean of the y-value at the begin and at the end of the interval. I use the rollmeans function in the zoo package, but this can be done using the base package too.
require(zoo)
X <- rnorm(100)
# calculate the density and check the plot
Y <- density(X) # see ?density for parameters
plot(Y$x,Y$y, type="l") #can use ggplot for this too
# set an Avg.position value
Avg.pos <- 1
# construct lengths and heights
xt <- diff(Y$x[Y$x<Avg.pos])
yt <- rollmean(Y$y[Y$x<Avg.pos],2)
# This gives you the area
sum(xt*yt)
This gives you a good approximation up to 3 digits behind the decimal sign. If you know the density function, take a look at ?integrate
Three possibilities:
The logspline package provides a different method of estimating density curves, but it does include pnorm style functions for the result.
You could also approximate the area by feeding the x and y variables returned by the density function to the approxfun function and using the result with the integrate function. Unless you are interested in precise estimates of small tail areas (or very small intervals) then this will probably give a reasonable approximation.
Density estimates are just sums of the kernels centered at the data, one such kernel is just the normal distribution. You could average the areas from pnorm (or other kernels) with the sd defined by the bandwidth and centered at your data.