plotting log(10) lengths differ - r

I am having difficulty plotting a log(10) formula on to existing data points. I derived a logarithmic function based on a list of data where "Tout_F_6am" is my independent variable and "clo" is my dependent variable.
When I go to plot it, I am getting the error that lengths x and y are different. Can someone please help me figure out whats going wrong?
logKT=lm(log10(clo)~ Tout_F_6am,data=passive)
summary(logKT) #r2=0.12
coef(logKT)
plot(passive$Tout_F_6am,passive$clo) #plot data points
x=seq(53,84, length=6381)#match length of x variable
y=logKT
lines(x,y,type="l",lwd=2,col="red")
length(passive$Tout_F_6am) #6381
length(passive$clo) #6381
Additionally, can the formula curve(-0.0219-0.005*log10(x),add=TRUE,col=2)be written as eq=(10^-0.022)*(10^-0.005*x)? thanks!

The problem is that you are trying to plot the model object, not the predictions from the model. Try something like this:
Define the explanatory values you want to plot, in a data frame (or tibble). It doesn't have to be as many as there are data points.
library(dplyr)
explanatory_data <- tibble(
Tout_F_6am = seq(53, 84, 0.1)
)
Add a column of predicted values using predict(). This takes a model and your explanatory data. predict() will return the transformed values, so you have to backtransform them.
prediction_data <- explanatory_data %>%
mutate(
log10_clo = predict(logKT, explanatory_data),
clo = 10 ^ log10_clo
)
Finally, draw your plot.
plot(clo ~ Tout_F_6am, data = prediction_data, log="y", type = "l")
The plotting is actually easier using ggplot2. This should give you more or less what you want.
library(ggplot2)
ggplot(passive, aes(Tout_F_6am, clo)) +
geom_point() +
geom_smooth(method = "lm") +
scale_y_log10()

Related

Export results from LOESS plot

I am trying to export the underlying data from a LOESS plot (blue line)
I found this post on the subject and was able to get it to export like the post says:
Can I export the result from a loess regression out of R?
However, as the last comment from the poster in that post says, I am not getting the results for my LOESS line. Does anyone have any insights on how to get it to export properly?
Thanks!
Code for my export is here:
#loess object
CL111_loess <- loess(dur_cleaned~TS_LightOn, data = CL111)
#get SE
CL111_predict <- predict(CL111_loess, se=T)
CL111_ouput <- data.frame("fitted" = CL111_predict$fit, "SE"=CL111_predict$se.fit)
write.csv(CL111_ouput, "CL111_output.csv")
Data for the original plot is here:
Code for my original plot is here:
{r}
#individual plot
ggplot(data = CL111) + geom_smooth(mapping = aes(x = TS_LightOn, y = dur_cleaned), method = "lm", se = FALSE, colour = "Green") +
labs(x = "TS Light On (Seconsd)", y = "TS Response Time (Seconds)", title = "Layout 1, Condition AO, INS High") +
theme(plot.title = element_text(hjust = 0.5)) +
stat_smooth(mapping = aes(x = TS_LightOn, y = dur_cleaned), method = "loess", se = TRUE) + xlim(0, 400) + ylim (0, 1.0)
#find coefficients for best fit line
lm(CL111_LM$dur_cleaned ~ CL111_LM$TS_LightOn)
You can get this information via ggplot_build().
If your plot is saved as gg1, run ggplot_build(gg1); then you have to examine the data object (which is a list of data for different layers) and try to figure out which layer you need (in this case, I looked for which data layer included a colour column that matched the smooth line ...
bb <- ggplot_build(gg1)
## extract the right component, just the x/y coordinates
out <- bb$data[[2]][,c("x","y")]
## check
plot(y~x, data = out)
You can do whatever you want with this output now (write.csv(), save(), saveRDS() ...)
I agree that there is something weird/that I don't understand about the way that ggplot2 is setting up the loess fit. You do have to do predict() with the right newdata (e.g. a data frame with a single column TS_LightOn that ranges from 0 to 400) - otherwise you get predictions of the points in your data set, which may not be properly spaced/in the right order - but that doesn't resolve the difference for me.
To complement #ben-bolker, I have just written a small function that may be useful for retrieving the internal dataset created by ggplot for a geom_smooth call. It takes the resultant ggplot as input and returns the smoothed data. The problem it solves is that, as Ben observed, internally ggplot creates a smoothed fit with predicted data on random intervals, different from the interval used for the input data. This function will get you back the ggplot fit data with an interval based on integer and equally spaced values. That function uses a loess fit on the already smoothed data, using a small value of span (0.1), that is adjusted upward on-the-fly to cope with small numbers of values.
This is useful if you used geom_smooth with a method that is not 'loess' or using 'NULL' and you cannot easily build a model that replicates what geom_smooth is doing internally.
The function separates different series on the same plot as well as series located on different facets. It also returns the 'ymin' and 'ymax' values.
Note that this function uses an interval based on integer values of x. You can modify this if you need an interval based on equally-spaced values of x, but not integral. In that case, pass your x interval of choice in the xInterval parameter, or tweak the line:
outOne <- data.frame(x=c(min(trunc(sub$x)):max(trunc(sub$x)))).
get_geom_smooth_dataFromPlot <- function (a_ggplot, xInterval=NULL) {
#internal ggplot values read in ggTable
ggTable <- ggplot_build(a_ggplot)$data[[1]]
#facet panels
panels <- as.numeric(names(table(ggTable$PANEL)))
nPanel <- length(panels)
onePanel <- (nPanel==1)
#number of series in each plot
groups <- as.numeric(names(table(ggTable$group)))
nGroup <- length(groups)
oneGroup <- (nGroup==1)
out <- data.frame()
#are there 'ymin' and 'ymax' values?
SE_data <- "ymin" %in% colnames(ggTable)
for (pan in (1:nPanel)) {
for (grp in (1:nGroup)) {
sub <- subset(ggTable, (PANEL==panels[pan])&(group==groups[grp]))
#no group series for this facet panel?
if (dim(sub)[1] == 0) next
if (is.null(xInterval)) {
outOne <- data.frame(x=c(min(trunc(sub$x)):max(trunc(sub$x))))
} else {
outOne <- data.frame(x=xInterval)
}
nObs <- dim(outOne)[1]
#hack to avoid problems with a small range for the x interval
# when there are more than 90 x values
# we use a span of 0.1, but
# we adjust on-the-fly up to a span of 0.5
# for 10 values of the x interval
cSpan <- max (0.1, 0.5 * 10 / (nObs-(nObs-10)/2))
if (!onePanel) outOne$panel <- pan
if (!oneGroup) outOne$group <- grp
mod <- loess(y~x, data=sub, span=cSpan)
outOne$y <- predict(mod, outOne$x, se=FALSE)
if (SE_data) {
mod <- loess(ymin~x, data=sub, span=cSpan)
outOne$ymin <- predict(mod, outOne$x, se=FALSE)
mod <- loess(ymax~x, data=sub, span=cSpan)
outOne$ymax <- predict(mod, outOne$x, se=FALSE)
}
out <- rbind(out, outOne)
}
}
return (out)
}

Plotting Chi-square Distribution with ggplot2 in R

I would like to use R to randomly construct chi-square distribution with the degree of freedom of 5 with 100 observations. After doing so, I want to calculate the mean of those observations and use ggplot2 to plot the chi-square distribution with a bar chart. The following is my code:
rm(list = ls())
library(ggplot2)
set.seed(9487)
###Step_1###
x_100 <-data.frame(rchisq(100, 5, ncp = FALSE))
###Step_2###
mean_x <- mean(x_100[,1])
class(x_100)
###Step_3###
plot_x_100 <- ggplot(data = x_100, aes(x = x_100)) +
geom_bar()
plot_x_100
Firstly, I construct a data frame of a random chi-square distribution with df = 5, obs = 100.
Secondly, I calculate the mean value of this chi-square distribution.
At last, I plot the graph with the ggplot2 package.
However, I get the result like the follows:
Don't know how to automatically pick scale for object of type data.frame. Defaulting to continuous.
Error in is.finite(x) : default method not implemented for type 'list'
I got stuck in this problem for several hours and cannot find any list in my global environment. It would be appreciated if anyone can help me and give me some suggestions.
The problem is that inside the ggplot function you are calling the same dataframe (x_100) as both the data and the x variable inside aes. Remember that in ggplot, inside aes you should indicate the name of the column you wish to map. Additionally, if you want to plot the chi-square distribution I think it might be a better idea to use the geom_histogram instead of geom_bar, as the first one groups the observations into bins.
library(ggplot2)
# Rename the only column of your data frame as "value"
colnames(x_100) <- "value"
plot_x_100 <- ggplot(data = x_100, aes(x = value)) +
geom_histogram(bins = 20)

Why scatter plots in ggpairs function don't have the loess layer on them?

I have a quick question, and can't figure out what the problem is. I wanted to plot a dataset I have, and found one solution here:
How to use loess method in GGally::ggpairs using wrap function
However, I can't seem to figure out what was wrong with my approach. Here is the code chunk below with simple mtcars dataset:
library(ggplot2)
library(GGally)
View(mtcars)
GGally::ggpairs(mtcars,
lower= list(
ggplot(mapping = aes(rownames(mtcars))) +
geom_point()+
geom_smooth(method = "loess"))
)
Here, as you can see, is my output that doesn't put the smooth layer on the scatter plot. I wanted to have it for the regression analysis for my actual dataset. Any direction or explanation would be good. Thank you!
The solution in the post from #Edward's comment works here with mtcars. The snippet below replicates your plot above, with a loess line added:
library(ggplot2)
library(GGally)
View(mtcars)
# make a function to plot generic data with points and a loess line
my_fn <- function(data, mapping, method="loess", ...){
p <- ggplot(data = data, mapping = mapping) +
geom_point() +
geom_smooth(method=method, ...)
p
}
# call ggpairs, using mtcars as data, and plotting continuous variables using my_fn
ggpairs(mtcars, lower = list(continuous = my_fn))
In your snippet, the second argument lower has a ggplot object passed to it, but what it requires is a list with specifically named elements, that specify what to do with specific variable types. The elements in the list can be functions or character vectors (but not ggplot objects). From the ggpairs documentation:
upper and lower are lists that may contain the variables 'continuous',
'combo', 'discrete', and 'na'. Each element of the list may be a
function or a string. If a string is supplied, it must implement one
of the following options:
continuous exactly one of ('points', 'smooth', 'smooth_loess',
'density', 'cor', 'blank'). This option is used for continuous X and Y
data.
combo exactly one of ('box', 'box_no_facet', 'dot', 'dot_no_facet',
'facethist', 'facetdensity', 'denstrip', 'blank'). This option is used
for either continuous X and categorical Y data or categorical X and
continuous Y data.
discrete exactly one of ('facetbar', 'ratio', 'blank'). This option is
used for categorical X and Y data.
na exactly one of ('na', 'blank'). This option is used when all X data
is NA, all Y data is NA, or either all X or Y data is NA.
The reason my snippet works is because I've passed a list to lower, with an element named 'continuous' that is my_fn (which generates a ggplot).

Indexing separate survival curves

I would like to plot Kaplan-Meier survival estimates for each of two groups in ggplot.
To do so requires getting a separate survival curve for each group. The survfit function in the survival package splits the nicely but I don't know how to index the separate plots to work on them.
Here is sample data:
rearrest<-read.table("http://stats.idre.ucla.edu/stat/examples/alda/rearrest.csv", sep=",", header=T)
This is the curve ungrouped
(sCurve <- summary(arr1 <- survfit(Surv(months, abs(censor-1))~1, data = rearrest)))
It is easy to index elements within this, for example
sCurve$n.event
When I fit the same thing except this time grouped according to the value of the personal variable I get two nice survival curve objects ready to go.
(sCurveA <- summary(arr1 <- survfit(Surv(months, abs(censor-1))~personal, data = rearrest)))
One object is labelled personal=0 and the other personal=1. I have tried indexing with $, [], [[]] both with number-type indexes and named-, all to no avail.
Can anyone help?
sCurveA$strata provides the grouping variable as a vector. You can pull out the key pieces and throw them into a data.frame for ggplot.
df = data.frame(Time = sCurveA$time,
Survival = sCurveA$surv,
Strata = sCurveA$strata)
ggplot(df, aes(Time, Survival, col = Strata)) +
geom_line()

Plotting GLM models in ggplot2 r

Apologies for the obvious question but just incase there is a simple answer! Here is an example of what my data looks like:
DATA <- data.frame(
TotalAbund = sample(1:10),
TotalHab = sample(0:1),
TotalInv = sample(c("yes", "no"), 20, replace = TRUE)
)
DATA$TotalHab<-as.factor(DATA$TotalHab)
DATA
I've made the following plot:
p <- ggplot(DATA, aes(x=factor(TotalInv), y=TotalAbund,colour=TotalHab))
p + geom_boxplot() + geom_jitter()
I've created a model as follows:
MOD.1<-glm(TotalAbund~TotalInv+TotalHab, data=DATA)
However, I want to present fitted values from glm model rather than raw data. I know I can simply do it in visreg with:
visreg(MOD.1)
Is there a way to do this with ggplot too? Thanks
You could do something like this:
Create a "prediction frame" containing the relevant values for which you want to predict (if you had a continuous predictor, it would probably make more sense to include evenly spaced values, e.g. seq(min(cont_pred),max(cont_pred),length=51))
pframe <- with(DATA,
expand.grid(TotalInv=unique(TotalInv),
TotalHab=unique(TotalHab)))
Use the predict method to fill in the predicted values:
pframe$TotalAbund <- predict(MOD.1,newdata=pframe)
Add a layer to the graph. The only annoying part is using position_dodge with a manually tweaked width to match the widths of the bars ... (I'm assuming here that you've saved your existing plot as gg1 ...)
gg1 + geom_point(data=pframe,size=8,shape=16,alpha=0.7,
position=position_dodge(width=0.75))

Resources