I am trying to make a correlation plot from the correlation matrix using corrplot function.
But I am getting a squeezed and unreadable plot. Also,the plot is generated at the extreme right end of the window. Ways of expanding a ggplot plot is not working here.
> col <- colorRampPalette(c("#BB4444", "#EE9988", "#FFFFFF", "#77AADD", "#4477AA"))
> corrplot(correlation_matrix, method="color", col=col(200),
type="upper", order="hclust",
addCoef.col = "black", # Add coefficient of correlation
tl.col="black", tl.srt=45, #Text label color and rotation
# hide correlation coefficient on the principal diagonal
diag=FALSE
)
Here is the plot generated
As somebody suggested above, you should either repair the names you have or change the parameters of your plot. I will use ggcorrplot instead because I find it easier to work with (and better looking), but the illustration will show the same problem. If I switch out the names for the airquality data to be hideous like so and plot it:
#### Libraries ####
library(tidyverse)
library(ggcorrplot)
#### Change Data Names ####
bad.names <- airquality %>%
rename(Approximate_Ozone_Measurement_in_Some_Measure = Ozone,
Solar_Radiation_Based_On_Sun_Movements = Solar.R,
Wind_Barometer_Ratings_And_Such = Wind,
Temperature_In_Fahrenheit_To_Nearest_Degree = Temp)
#### Run Correlation ####
bad.corr <- bad.names %>%
correlation()
#### Plot ####
ggcorrplot(bad.corr)
You get something like this:
There are two ways around this...either rename your variables or rotate the names in some way to fix the angle so its readable. Its much easier with ridiculous names like this to simply fix them rather than squeeze them in artificially:
#### Fix Names ####
good.names <- bad.names %>%
rename(Ozone= Approximate_Ozone_Measurement_in_Some_Measure,
Solar.R = Solar_Radiation_Based_On_Sun_Movements,
Wind = Wind_Barometer_Ratings_And_Such,
Temp = Temperature_In_Fahrenheit_To_Nearest_Degree)
#### Run Correlation ####
good.corr <- good.names %>%
correlation()
#### Replot ####
ggcorrplot(good.corr,
lab = T,
type = "lower")
Related
Data
I tried looking through the threads here for an answer, but none of them were specific to my issue as far as I could tell. Here is some simulated data for my query.
#### Load Library ####
library(tidyverse)
#### Set Random Seed ####
set.seed(123)
#### Create Data ####
character <- c("哎","不","嗎","都","發",
"你","肉","吧","地","有")
x <- round(rnorm(n=10,
mean=10))
y <- round(rnorm(n=10,
mean=10))
#### Store Into Tibble ####
tib <- tibble(
character,
x,
y
)
Issue
Now if I create this plot:
#### Plot ####
tib.plot <- tib %>%
ggplot(aes(x,y,
label=character))+
geom_point()+
geom_label()+
geom_smooth(se=F)+
labs(x="Some Variable",
y="Some Other Variable",
title = "Simulated Plot")+
theme_classic()
tib.plot
It looks like this in R's plot window:
However, when I try to save the plot with the Export function, it removes the Chinese characters due to some decoding/encoding issue:
How do I fix this?
If you don't mind your texts will be vectorized you can use pdf() and showtext:
library(showtext)
pdf('plot.pdf', width=8, height=5)
showtext_begin()
tib.plot
dev.off()
I am trying to make a correlation plot with a subset of variables versus a different subset.
Using the mtcars data I do the following using ggcorrplot:
data(mtcars)
corrtest <- psych::corr.test(mtcars[,1:7], adjust="none")
all_matrix <- corrtest$r
all_pmat <- corrtest$p
pheno_markers <- names(mtcars)[1:4]
serol_markers <- names(mtcars)[5:7]
sub_matrix <- all_matrix[pheno_markers, serol_markers]
sub_pmat <- all_pmat[pheno_markers, serol_markers]
grDevices::pdf(file="heat_duo.pdf", height=4, width=4)
print(
ggcorrplot::ggcorrplot(sub_matrix, p.mat=sub_pmat, method="circle")
)
grDevices::dev.off()
This produces the following plot, which is good:
Now I want to reproduce the same plot with ggcorrplot2 instead, cause it allows me to overlay significance values of the comparisons as ***. I use this package usually with no problem, but I do not seem to get this case right; it seems it can only deal with symmetrical matrices with colnames == rownames...
I tried the following:
grDevices::pdf(file="heat_duo2.pdf", height=4, width=4)
print(
ggcorrplot2::ggcorrplot(sub_matrix, p.mat=sub_pmat, method="circle",
insig = "label_sig", sig.lvl = c(0.05, 0.01, 0.001))
)
grDevices::dev.off()
But the result is obviously wrong:
Any idea on how to deal with a case like this in ggcorrplot2 (ggcorrplot makes it so easy)?
I'm new to R and working on an assignment were I am supposed to replicate the results from a linear regression (time series data with 1360 observations and 52 variables (11 variables in the regression model)). In the original study the researchers identified outliers with the Hadi method. It seems that this is done best in R with the mvBacon function, is this correct? I cannot seem to find a good answer on how to use this though, could anyone please tell me how I can use this function to find the outliers?
(I would very much appreciate an answer that is explained as simply as possible since R is very new to me).
Thank you very much!
Yes, the mvBACON is for outlier identification based on some distance. The default one is the Mahalanobis distance.
The following code will walk you through a simple example on the mtcars subdataset on how to identify outliers with mvBACON:
# load packages
library(dplyr)
library(magrittr)
# Use mtcars (sub)dataset and plot it
data <- mtcars %>% select(mpg, disp)
plot(data, main = "mtcars")
# Add some outliers and plot again
data <- rbind(data,
data.frame(mpg = c(1, 80), disp = c(800, 1000)))
plot(data, main = "mtcars")
# Use mvBacon to calculate the distances and get the ouliers
# install.packages("robustX) # uncomment line to install package
library(robustX)
#compute distance - default is Mahalonobis
distances <- mvBACON(data)
# Plot it again...
plot(data, main = "mtcars")
# ...with highlighting the outliers
points(data[!distances$subset, ], col = "red", pch = 19)
# Some fine tuning, since many of the outliers seem to be still good for regression
distances <- mvBACON(data, alpha = 0.6)
# update plot
plot(data, main = "mtcars")
points(data[!distances$subset, ], col = "red", pch = 19)
Okay so I'm sure this has been asked before but I can't find a nice answer anywhere after many hours of searching.
I have some data, I run a classification then I make a dendrogram.
The problem has to do with aesthetics, specifically; (1) how to cut according to the number of groups (in this example I want 3), (2) make the group labels aligned with the branches of the trees, (2) Re-scale so that there aren't any huge gaps between the groups
More on (3). I have dataset which is very species rich and there would be ~1000 groups without cutting. If I cut at say 3, the tree has some branches on the right and one 'miles' off to the right which I would want to re-scale so that its closer. All of this is possible via external programs but I want to do it all in r!
Bonus points if you can put an average silhouette width plot nested into the top right of this plot
Here is example using iris data
library(ggplot2)
data(iris)
df = data.frame(iris)
df$Species = NULL
ED10 = vegdist(df,method="euclidean")
EucWard_10 = hclust(ED10,method="ward.D2")
hcd_ward10 = as.dendrogram(EucWard_10)
plot(hcd_ward10)
plot(cut(hcd_ward10, h = 10)$upper, main = "Upper tree of cut at h=75")
I suspect what you would want to look at is the dendextend R package (it also has a paper in bioinformatics).
I am not fully sure about your question on (3), since I am not sure I understand what rescaling means. What I can tell you is that you can do quite a lot of dendextend. Here is a quick example for coloring the branches and labels for 3 groups.
library(ggplot2)
library(vegan)
data(iris)
df = data.frame(iris)
df$Species = NULL
library(vegan)
ED10 = vegdist(df,method="euclidean")
EucWard_10 = hclust(ED10,method="ward.D2")
hcd_ward10 = as.dendrogram(EucWard_10)
plot(hcd_ward10)
install.packages("dendextend")
library(dendextend)
dend <- hcd_ward10
dend <- color_branches(dend, k = 3)
dend <- color_labels(dend, k = 3)
plot(dend)
You can also get an interactive dendrogram by using plotly (ggplot method is available through dendextend):
library(plotly)
library(ggplot2)
p <- ggplot(dend)
ggplotly(p)
I did a linear discriminant analysis using the function lda() from the package MASS. Now I would try to plot a biplot like in ade4 package (forLDA). Do you know how can I do this?
If I try to use the biplot() function it doesn't work. For example, if I use the Iris data and make LDA:
dis2 <- lda(as.matrix(iris[, 1:4]), iris$Species)
then I can plot it using the function plot(), but if I use the function biplot() it doesn't work:
biplot(dis2)
Error in nrow(y) : argument "y" is missing, with no default
How can I plot the arrows of variables?
I wrote the following function to do this:
lda.arrows <- function(x, myscale = 1, tex = 0.75, choices = c(1,2), ...){
## adds `biplot` arrows to an lda using the discriminant function values
heads <- coef(x)
arrows(x0 = 0, y0 = 0,
x1 = myscale * heads[,choices[1]],
y1 = myscale * heads[,choices[2]], ...)
text(myscale * heads[,choices], labels = row.names(heads),
cex = tex)
}
For your example:
dis2 <- lda(as.matrix(iris[, 1:4]), iris$Species)
plot(dis2, asp = 1)
lda.arrows(dis2, col = 2, myscale = 2)
The length of the arrows is arbitrary relative to the lda plot (but not to each other, of course!). If you want longer or shorter arrows, change the value of myscale accordingly. By default, this plots arrows for the first and second axes. If you want to plot other axes, change choices to reflect this.
My understanding is that biplots of linear discriminant analyses can be done, it is implemented in fact also in R package ggbiplot, see https://github.com/vqv/ggbiplot/tree/experimental and package ggord, see https://github.com/fawda123/ggord, for your example:
install.packages("devtools")
library(devtools)
install_github("fawda123/ggord")
library(ggord)
ord <- lda(Species ~ ., iris, prior = rep(1, 3)/3)
ggord(ord, iris$Species)
Also the book "Biplots in practice" by M. Greenacre has one chapter (chapter 11) on it and in Figure 11.5 it shows a biplot of a linear discriminant analysis of the iris dataset:
You can achieve this using the ggord package from github. The dataset used is IRIS dataset
# --- data partition -- #
set.seed(555)
IRSam <- sample.int(n = nrow(IR), size = floor(.60*nrow(IR)), replace = FALSE, prob = NULL)
IRTrain <- IR[IRSam,]
IRTest <- IR[-IRSam,]
# --- Prediction --- #
p<- predict(IR.lda, IRTrain)
# --- plotting a biplot --- #
library(devtools)
# install_github('fawda123/ggord') --- Used to install ggord from github we need to run devtools to achieve this.
library(ggord)
ggord(IR.lda, IRTrain$Species, ylim=c(-5,5), xlim=c(-10,10))