I have used the following code to plot the results of the Tukey test after my Anova analysis in R.
TukeyHSD(myANOVA, conf.level=.90)
TUKEY <- TukeyHSD(myANOVA, conf.level=.90)
plot(TUKEY , las=1 , col="black")
However, since the number of lines plotted is too large, I would like to have the significant ones highlighted or in red. I have seen a similar question here with the comment "overwrite the black lines showing significant differences with red lines" however, I don't know how to do it.
Imagine we have the following data:
data <- data.frame(group = rep(c("P1", "P2", "P3"), each = 40), values = c(rnorm(40, 0, 3),rnorm (40, 8, 10),rnorm (40, 0, 3)))
Then we conduct a Tukey test, convert the results to a matrix, and then to a dataframe (I don't know how to do it otherwise):
results_test <- TukeyHSD(aov(data$values~ data$group), conf.level=.95)
results_matrix <- as.matrix (results_test)
df_res <- as.data.frame(results_matrix[1])
Then we plot it using an ifelse function, as a function of the p-values:
plot(results_matrix, col= ifelse(df_res[,4]<0.05, 'red', 'black'))
I personally prefer the plot generated by the multcomp package, and with this package you can perform the Tukey test for unbalanced designs.
library(multcomp)
### set up a one-way ANOVA
data(warpbreaks)
amod <- aov(breaks ~ tension, data = warpbreaks)
### specify all pair-wise comparisons among levels of variable "tension"
tuk <- glht(amod, linfct = mcp(tension = "Tukey"))
### p-values
pvalues <- adjusted()(tuk)$pvalues
### get confidence intervals
ci.glht <- confint(tuk)
### plot them
plot(ci.glht, col = ifelse(pvalues < 0.05, "red", "black"),
xlab = "Difference in mean levels")
Related
I'd like to do some correlation analysis with plotting. As my actual data is too large I used the mtcars dataframe to setup an example.
Here the code
library(ggplot2)
library(ggcorrplot)
mtcars
library(ggcorrplot)
# Computing correlation matrix
corrmatr_mtcars <- round(cor(subset(mtcars[c(3:7,1)])),1)
head(corrmatr_mtcars[,1:6])
corrmatr_mtcars
# Computing correlation matrix with p-values
corrmatr_mtcars.mat <- cor_pmat(mtcars[c(3:7,1)])
head(corrmatr_mtcars.mat[, 1:6])
corrmatr_mtcars.mat
library(GGally)
ggpairs(mtcars[c(3:7,1)],
title = "Corr Analysis of...",
lower = list(continuous = wrap("cor",
size = 3)),
upper = list(
continuous = wrap("smooth",
alpha = 0.3,
size = 0.1))
)
With this plot result:
But, I am interested only in the correlation of the first two variables against all others. So, for avoiding unneccessary information and saving place I'd rather like
my plot to show only the first two correlation rows. All other correlations could be dropped.
In the end, I imagine something as follows needing only 3 rows.
Subsequently the Corr-Value labels should be placed at the scatterplot panels.>br>
I couldn't find any option to do so.
Would that even generally be possible with ggpairs (without complex functions)? If yes: how? If no: what could be an approach with a comparable result?
It can be done this way
library(ggplot2)
library(ggcorrplot)
mtcars
library(ggcorrplot)
# Computing correlation matrix
corrmatr_mtcars <- round(cor(subset(mtcars[c(3:7,1)])),1)
head(corrmatr_mtcars[,1:6])
corrmatr_mtcars
# Computing correlation matrix with p-values
corrmatr_mtcars.mat <- cor_pmat(mtcars[c(3:7,1)])
head(corrmatr_mtcars.mat[, 1:6])
corrmatr_mtcars.mat
library(GGally)
gg1 = ggpairs(mtcars[c(3:7,1)],
title = "Corr Analysis of...",
lower = list(continuous = wrap("cor",
size = 3)),
upper = list(
continuous = wrap("smooth",
alpha = 0.3,
size = 0.1))
)
gg1$plots = gg1$plots[1:12]
gg1$yAxisLabels = gg1$yAxisLabels[1:2]
gg1
By using the following code I am able to plot the results of my quantile regression model:
quant_reg_all <- rq(y_quant ~ X_quant, tau = seq(0.05, 0.95, by = 0.05), data=df_lasso)
quant_plot <- summary(quant_reg_all, se = "boot")
plot(quant_plot)
However, as there are many variables the plots are unreadable as shown in the image below:
Including the label, I have 18 variables.
How could I plot a few of these images at the time so they are readable?
depending on the number of graphs you cant, you could do:
quant_reg_all <- rq(y_quant ~ X_quant, tau = seq(0.05, 0.95, by = 0.05), data=df_lasso)
quant_plot <- summary(quant_reg_all, se = "boot")
plot(quant_plot, 1:3)# plot the first 3
plot(quant_plot, c(3, 6, 9, 10))# plot the 3rd, 6th, 9th and 10th plots
I want to create 100 samples from a normal distribution. For the first class, the mean is to be taken as (0,0) and covariance matrix as [(1,0),(0,1)]. For the second class, the mean is to be taken as (5,0) but the covariance matrix is the same as for the first class and finally would like to visualize all 200 instances in a single plot with different colors for each class.
My problem is: When I generate this plot I am unsure about the final plot whether it actually has a volume of 200 samples.
My approach:
a1 <- c(1,0)
a2 <- c(0,1)
M <- cbind(a1, a2)
x <- cov(M)
dev <- sd(x, na.rm = FALSE)
C0 <- sample(rnorm(100, mean=0, sd=dev), size=100, replace=T)
C1 <- sample(rnorm(100, mean=5, sd=dev), size=100, replace=T)
plot(C0,C1, col=c("red","blue"), main = '200 samples, with mean 0 and 5 and S.D=0.5')
legend("topright", 95, legend=c("C0", "C1"),
col=c("red", "blue"), lty=1:2, cex=0.8)
I would like to know the corrections in my code.
plot
Aside from the plotting issue mentioned in the other answer, it seems from your description like you want to sample from two 2D multivariate normal distributions with different means.
If so, you can simply use the mvtnorm library to sample from these distributions, which is the multivariate normal distribution.
library(mvtnorm)
C0 <- rmvnorm(100, c(0,0), M) # 100 samples, means (0, 0), covariance mtx M
C1 <- rmvnorm(100, c(5,0), M)
Right now, you take the covariance of the covariance matrix you have by typing x <- cov(M). This doesn't make much sense unless I'm misunderstanding what you're trying to accomplish.
EDIT: This is the full code for what I think you're trying to accomplish:
a1 <- c(1, 0)
a2 <- c(0, 1)
M <- cbind(a1, a2)
C0 <- rmvnorm(100, c(0, 0), M)
C1 <- rmvnorm(100, c(5, 0), M)
plot(C0, col = "red", xlim = c(-5, 10), ylim = c(-5, 5), xlab = "X", ylab = "Y")
points(C1, col = "blue")
legend("topright", inset = .05, c("Class 1", "Class 2"), fill = c("red", "blue"))
which outputs the plot
Your x and y axes demonstrate that you're plotting C1 against C0. That's why your y-axis has its midpoint at 5 and the x-axis has it at 0. What you've done is plot 100 points with their x-coordinate from C0 and y-coordinate from C1.
Short of counting them, proving that you have 100 points on the screen is difficult. I know of no way to access the data that R has used to display your plot. However, one trick is to call text(C0,C1,label=1:150) after your code. This adds the numbers 1:150 to your plot, with each number having a corresponding label. If you had 200 points, this would be a tidy plot. However, since you have 100, many are labelled twice, making the plot unreadable.
If we make a new plot and use text(C0,C1,label=1:100) instead, things are much more clear:
I have some difficulties to nicely (gg)plot actual vs. predicted values.
Here some data:
# I use caret in order to split the data into train and test
library(caret)
data(economics) # from caret
library(forecast)
# that is recommended for time series data
timeSlices <- createTimeSlices(1:nrow(economics),
initialWindow = 36, horizon = 10, fixedWindow = TRUE)
trainSlices <- timeSlices[[1]]
testSlices <- timeSlices[[2]]
# I'm not really sure about the periods
fit <- tbats(economics[trainSlices[[1]],]$unemploy, seasonal.periods=c(4,12), use.trend=TRUE, use.parallel=TRUE)
# Using forecast for prediction
pred <- forecast(fit,h=length(economics[testSlices[[1]],]$unemploy))
# Here I plot the forecast
plot(pred)
Here I just actually stuck and dont really know how to add the test data aka testSlices to that particular pred object with the corresponding test/train labels. Perhaps there is also a way to use a different style on the confidence interval.
Thank you!
While there is definitely cleaner way of doing it.
Fast way to do it, is like this:
lines(x = as.numeric(rownames(economics[testSlices[[1]],])), economics[testSlices[[1]],]$unemploy, col = "red")
Update for comment:
Add labels:
legend(x = "topleft", legend = c("Predicted", "Test Data"), col = c("blue", "red"), lty = c(1, 1))
I'm having trouble plotting random intercepts from a clmm() model with 4 random effects in 31 countries.
I tried following this SO post: In R, plotting random effects from lmer (lme4 package) using qqmath or dotplot: how to make it look fancy? However, I cannot get the confidence intervals to show up. I've managed to use dotchart to plot the intercepts by country.
library(ggplot2)
library(ordinal)
# create data frame with intercepts and variances of all random effects
# the first column are the grouping factor, followed by 5 columns of intercepts,
# columns 7-11 are the variances.
randoms <- as.data.frame(ranef(nodual.logit, condVar = F))
var <- as.data.frame(condVar(nodual.logit))
df <- merge(randoms, var, by ="row.names")
# calculate the CI
df[,7:11] <- (1.96*(sqrt(df[,7:11])/sqrt(length(df[,1]))))
# dot plot of intercepts and CI.
p <- ggplot(df,aes(as.factor(Row.names),df[,2]))
p <- p + geom_hline(yintercept=0) +
geom_errorbar(aes(xmax=df[,2]+df[,7], xmin=df[,2]-df[,7]), width=0, color="black") +
geom_point(aes(size=2))
p <- p + coord_flip()
print(p)
Error: Discrete value supplied to continuous scale
Here is another way I tried to plot them:
D <- dotchart(df[,2], labels = df[,1])
D <- D + geom_errorbarh(aes(xmax=df[,2]+df[,7], xmin=df[,2]-df[,7],))
Error in dotchart(df[, 2], labels = df[, 1]) + geom_errorbarh(aes(xmax = df[, : non-numeric argument to binary operator
Found a solution based on R.H.B Christensen (2013) “A Tutorial on fitting Cumulative Link Mixed Models with clmm2 from the ordinal Package” pg. 5.
First plot intercept points for all 31 countries, the add labels using axis(), then add CI’s using segments().
plot(1:31,df[,2], ylim=range(df[,2]), axes =F, ylab ="intercept")
abline(h = 0, lty=2)
axis(1, at=1:31, labels = df[,1], las =2)
axis(2, at= seq(-2,2, by=.5))
for(i in 1:31) segments(i, df[i,2]+df[i,7], i, df[i,2]-df[i, 7])
Can put this code into another loop to plot the Betas of the random effects
for(n in 2:6) plot(1:31,df[,n], ylim=range(df[,n]),axes =F, ylab =colnames(df[n]))+
abline(h = 0, lty=2)+
axis(1, at=1:31, labels = df[,1], las =2)+
axis(2, at= seq(-2,2, by=.5))+
for(i in 1:31) segments(i, df[i,n]+df[i,(n+5)], i, df[i,n]-df[i, (n+5)])