Plotting Chi-square Distribution with ggplot2 in R - r

I would like to use R to randomly construct chi-square distribution with the degree of freedom of 5 with 100 observations. After doing so, I want to calculate the mean of those observations and use ggplot2 to plot the chi-square distribution with a bar chart. The following is my code:
rm(list = ls())
library(ggplot2)
set.seed(9487)
###Step_1###
x_100 <-data.frame(rchisq(100, 5, ncp = FALSE))
###Step_2###
mean_x <- mean(x_100[,1])
class(x_100)
###Step_3###
plot_x_100 <- ggplot(data = x_100, aes(x = x_100)) +
geom_bar()
plot_x_100
Firstly, I construct a data frame of a random chi-square distribution with df = 5, obs = 100.
Secondly, I calculate the mean value of this chi-square distribution.
At last, I plot the graph with the ggplot2 package.
However, I get the result like the follows:
Don't know how to automatically pick scale for object of type data.frame. Defaulting to continuous.
Error in is.finite(x) : default method not implemented for type 'list'
I got stuck in this problem for several hours and cannot find any list in my global environment. It would be appreciated if anyone can help me and give me some suggestions.

The problem is that inside the ggplot function you are calling the same dataframe (x_100) as both the data and the x variable inside aes. Remember that in ggplot, inside aes you should indicate the name of the column you wish to map. Additionally, if you want to plot the chi-square distribution I think it might be a better idea to use the geom_histogram instead of geom_bar, as the first one groups the observations into bins.
library(ggplot2)
# Rename the only column of your data frame as "value"
colnames(x_100) <- "value"
plot_x_100 <- ggplot(data = x_100, aes(x = value)) +
geom_histogram(bins = 20)

Related

plotting log(10) lengths differ

I am having difficulty plotting a log(10) formula on to existing data points. I derived a logarithmic function based on a list of data where "Tout_F_6am" is my independent variable and "clo" is my dependent variable.
When I go to plot it, I am getting the error that lengths x and y are different. Can someone please help me figure out whats going wrong?
logKT=lm(log10(clo)~ Tout_F_6am,data=passive)
summary(logKT) #r2=0.12
coef(logKT)
plot(passive$Tout_F_6am,passive$clo) #plot data points
x=seq(53,84, length=6381)#match length of x variable
y=logKT
lines(x,y,type="l",lwd=2,col="red")
length(passive$Tout_F_6am) #6381
length(passive$clo) #6381
Additionally, can the formula curve(-0.0219-0.005*log10(x),add=TRUE,col=2)be written as eq=(10^-0.022)*(10^-0.005*x)? thanks!
The problem is that you are trying to plot the model object, not the predictions from the model. Try something like this:
Define the explanatory values you want to plot, in a data frame (or tibble). It doesn't have to be as many as there are data points.
library(dplyr)
explanatory_data <- tibble(
Tout_F_6am = seq(53, 84, 0.1)
)
Add a column of predicted values using predict(). This takes a model and your explanatory data. predict() will return the transformed values, so you have to backtransform them.
prediction_data <- explanatory_data %>%
mutate(
log10_clo = predict(logKT, explanatory_data),
clo = 10 ^ log10_clo
)
Finally, draw your plot.
plot(clo ~ Tout_F_6am, data = prediction_data, log="y", type = "l")
The plotting is actually easier using ggplot2. This should give you more or less what you want.
library(ggplot2)
ggplot(passive, aes(Tout_F_6am, clo)) +
geom_point() +
geom_smooth(method = "lm") +
scale_y_log10()

Indexing separate survival curves

I would like to plot Kaplan-Meier survival estimates for each of two groups in ggplot.
To do so requires getting a separate survival curve for each group. The survfit function in the survival package splits the nicely but I don't know how to index the separate plots to work on them.
Here is sample data:
rearrest<-read.table("http://stats.idre.ucla.edu/stat/examples/alda/rearrest.csv", sep=",", header=T)
This is the curve ungrouped
(sCurve <- summary(arr1 <- survfit(Surv(months, abs(censor-1))~1, data = rearrest)))
It is easy to index elements within this, for example
sCurve$n.event
When I fit the same thing except this time grouped according to the value of the personal variable I get two nice survival curve objects ready to go.
(sCurveA <- summary(arr1 <- survfit(Surv(months, abs(censor-1))~personal, data = rearrest)))
One object is labelled personal=0 and the other personal=1. I have tried indexing with $, [], [[]] both with number-type indexes and named-, all to no avail.
Can anyone help?
sCurveA$strata provides the grouping variable as a vector. You can pull out the key pieces and throw them into a data.frame for ggplot.
df = data.frame(Time = sCurveA$time,
Survival = sCurveA$surv,
Strata = sCurveA$strata)
ggplot(df, aes(Time, Survival, col = Strata)) +
geom_line()

R: Weighted Joyplot/Ridgeplot/Density Plot?

I am trying to create a joyplot using the ggridges package (based on ggplot2). The general idea is that a joyplot creates nicely scaled stacked density plots. However, I cannot seem to produce one of these using weighted density. Is there some way of incorporating sampling weights (for weighted density) in the calculation of the densities in the creation of a joyplot?
Here's a link to the documentation for the ggridges package: https://cran.r-project.org/web/packages/ggridges/ggridges.pdf I know a lot of packages based on ggplot can accept additional aesthetics, but I don't know how to add weights to this type of geom object.
Additionally, here is an example of an unweighted joyplot in ggplot. I am trying to convert this to a weighted plot with the density weighted according to pweight.
# Load package, set seed
library(ggplot)
set.seed(1)
# Create an example dataset
dat <- data.frame(group = c(rep("A",100), rep("B",100)),
pweight = runif(200),
val = runif(200))
# Create an example of an unweighted joyplot
ggplot(dat, aes(x = val, y = group)) + geom_density_ridges(scale= 0.95)
It looks like the way to do this is to use stat_density rather than the default stat_density_ridges. Per the docs you linked to:
Note that the default stat_density_ridges makes joint density
estimation across all datasets. This may not generate the desired
result when using faceted plots. As an alternative, you can set
stat = "density" to use stat_density. In this case, it is required
to add the aesthetic mapping height = ..density.. (see examples).
Fortunately, stat_density (unlike stat_density_ridges) understands the aesthetic weight and will pass it to the underlying density call. You end up with something like:
ggplot(dat, aes(x = val, y = group)) +
geom_density_ridges(aes(height=..density.., # Notice the additional
weight=pweight), # aes mappings
scale= 0.95,
stat="density") # and use of stat_density
The ..density.. variable is automatically generated by stat_density.
Note: It appears that when you use stat_density the x-axis range behaves a little differently: it will trim the density plot to the data range and drop the nice-looking tails. You can easily correct this by manually expanding your x-axis, but I thought it was worth mentioning.

Plotting GLM models in ggplot2 r

Apologies for the obvious question but just incase there is a simple answer! Here is an example of what my data looks like:
DATA <- data.frame(
TotalAbund = sample(1:10),
TotalHab = sample(0:1),
TotalInv = sample(c("yes", "no"), 20, replace = TRUE)
)
DATA$TotalHab<-as.factor(DATA$TotalHab)
DATA
I've made the following plot:
p <- ggplot(DATA, aes(x=factor(TotalInv), y=TotalAbund,colour=TotalHab))
p + geom_boxplot() + geom_jitter()
I've created a model as follows:
MOD.1<-glm(TotalAbund~TotalInv+TotalHab, data=DATA)
However, I want to present fitted values from glm model rather than raw data. I know I can simply do it in visreg with:
visreg(MOD.1)
Is there a way to do this with ggplot too? Thanks
You could do something like this:
Create a "prediction frame" containing the relevant values for which you want to predict (if you had a continuous predictor, it would probably make more sense to include evenly spaced values, e.g. seq(min(cont_pred),max(cont_pred),length=51))
pframe <- with(DATA,
expand.grid(TotalInv=unique(TotalInv),
TotalHab=unique(TotalHab)))
Use the predict method to fill in the predicted values:
pframe$TotalAbund <- predict(MOD.1,newdata=pframe)
Add a layer to the graph. The only annoying part is using position_dodge with a manually tweaked width to match the widths of the bars ... (I'm assuming here that you've saved your existing plot as gg1 ...)
gg1 + geom_point(data=pframe,size=8,shape=16,alpha=0.7,
position=position_dodge(width=0.75))

Creating barplot with standard errors plotted in R

I am trying to find the best way to create barplots in R with standard errors displayed. I have seen other articles but I cannot figure out the code to use with my own data (having not used ggplot before and this seeming to be the most used way and barplot not cooperating with dataframes). I need to use this in two cases for which I have created two example dataframes:
Plot df1 so that the x-axis has sites a-c, with the y-axis displaying the mean value for V1 and the standard errors highlighted, similar to this example with a grey colour. Here, plant biomass should the mean V1 value and treatments should be each of my sites.
Plot df2 in the same way, but so that before and after are located next to each other in a similar way to this, so pre-test and post-test equate to before and after in my example.
x <- factor(LETTERS[1:3])
site <- rep(x, each = 8)
values <- as.data.frame(matrix(sample(0:10, 3*8, replace=TRUE), ncol=1))
df1 <- cbind(site,values)
z <- factor(c("Before","After"))
when <- rep(z, each = 4)
df2 <- data.frame(when,df1)
Apologies for the simplicity for more experienced R users and particuarly those that use ggplot but I cannot apply snippets of code that I have found elsewhere to my data. I cannot even get enough code together to produce a start to a graph so I hope my descriptions are sufficient. Thank you in advance.
Something like this?
library(ggplot2)
get.se <- function(y) {
se <- sd(y)/sqrt(length(y))
mu <- mean(y)
c(ymin=mu-se, ymax=mu+se)
}
ggplot(df1, aes(x=site, y=V1)) +
stat_summary(fun.y=mean, geom="bar", fill="lightgreen", color="grey70")+
stat_summary(fun.data=get.se, geom="errorbar", width=0.1)
ggplot(df2, aes(x=site, y=V1, fill=when)) +
stat_summary(fun.y=mean, geom="bar", position="dodge", color="grey70")+
stat_summary(fun.data=get.se, geom="errorbar", width=0.1, position=position_dodge(width=0.9))
So this takes advantage of the stat_summary(...) function in ggplot to, first, summarize y for given x using mean(...) (for the bars), and then to summarize y for given x using the get.se(...) function for the error-bars. Another option would be to summarize your data prior to using ggplot, and then use geom_bar(...) and geom_errorbar(...).
Also, plotting +/- 1 se is not a great practice (although it's used often enough). You'd be better served plotting legitimate confidence limits, which you could do, for instance, using the built-in mean_cl_normal function instead of the contrived get.se(...). mean_cl_normal returns the 95% confidence limits based on the assumption that the data is normally distributed (or you can set the CL to something else; read the documentation).
I used group_by and summarise_each function for this and std.error function from package plotrix
library(plotrix) # for std error function
library(dplyr) # for group_by and summarise_each function
library(ggplot2) # for creating ggplot
For df1 plot
# Group data by when and site
grouped_df1<-group_by(df1,site)
#summarise grouped data and calculate mean and standard error using function mean and std.error(from plotrix)
summarised_df1<-summarise_each(grouped_df1,funs(mean=mean,std_error=std.error))
# Define the top and bottom of the errorbars
limits <- aes(ymax = mean + std_error, ymin=mean-std_error)
#Begin your ggplot
#Here we are plotting site vs mean and filling by another factor variable when
g<-ggplot(summarised_df1,aes(site,mean))
#Creating bar to show the factor variable position_dodge
#ensures side by side creation of factor bars
g<-g+geom_bar(stat = "identity",position = position_dodge())
#creation of error bar
g<-g+geom_errorbar(limits,width=0.25,position = position_dodge(width = 0.9))
#print graph
g
For df2 plot
# Group data by when and site
grouped_df2<-group_by(df2,when,site)
#summarise grouped data and calculate mean and standard error using function mean and std.error
summarised_df2<-summarise_each(grouped_df2,funs(mean=mean,std_error=std.error))
# Define the top and bottom of the errorbars
limits <- aes(ymax = mean + std_error, ymin=mean-std_error)
#Begin your ggplot
#Here we are plotting site vs mean and filling by another factor variable when
g<-ggplot(summarised_df2,aes(site,mean,fill=when))
#Creating bar to show the factor variable position_dodge
#ensures side by side creation of factor bars
g<-g+geom_bar(stat = "identity",position = position_dodge())
#creation of error bar
g<-g+geom_errorbar(limits,width=0.25,position = position_dodge(width = 0.9))
#print graph
g

Resources