Using ggplot2 and viridis, fill histogram based on other variable - r

I am trying to create the top left graph in this figure in ggplot, using viridis to make the colour gradient.
Here is my sample data:
# simulate t-values
data = data.frame(sim =1:10000,
t_0= rt(n = 10000,df =12, ncp=0),
t_1 = rt(n = 10000,df =12, ncp=1.2))
# compute p-values
data = data %>%
mutate(p_0 = 2* pt(t_0, df=12, lower.tail = ifelse(t_0 > 0,FALSE ,TRUE)),
p_1 = 2* pt(t_1, df=12, lower.tail = ifelse(t_1 > 0,FALSE ,TRUE)))
# convert from wide to long
data.long = data %>%
gather(condition,measurement, t_0:p_1) %>%
separate(col=condition, into=c("para","hyp"), sep = "_")
# convert to wide repeated measures format
data.wide = data.long %>% spread(key = para, measurement)
To create the graphs on the left, I need to colour the histogram according to the corresponding values in the right graphs. If t = 0 (corresponding to a p close to 1), the graph should be yellow, if t>4 (corresponding to a p close to 0), the fill should be dark blue. This post shows how to create a similar graph using scale_fill_gradientn, which does unfortunately does not work with the discrete values I have created using cut().
This is the closest I have come, however I want the graph to have yellow for x=0 blending to dark blue at the edges.
# create bins based on t-values
t0bins <- seq(-12, 12, by = 1)
# compute corresponding p-values
pt0bins <- 2*pt(t0bins, df = 12, lower.tail = FALSE)
ggplot(data.wide, aes(x=t, fill=cut(..x.., breaks=get("t0bins", envir=.GlobalEnv)))) +
geom_histogram(binwidth=0.1)+
scale_fill_viridis(discrete=T)
which gives:

You can try
library(tidyverse)
library(viridis)
data.wide %>%
mutate(bins=cut(t, breaks=t0bins)) %>%
{ggplot(.,aes(x=t, fill=bins)) +
geom_histogram(binwidth=0.1)+
scale_x_continuous(limits =c(-12,12)) +
scale_fill_manual(drop=FALSE,values = c(viridis(nlevels(.$bins)/2), viridis(nlevels(.$bins)/2, direction = -1)))}

Related

How do I show multiple plots in the same graph and demarcate specific values through color? I have taken a specific e.g. to illustrate my Q

I need to plot 100 different confidence interval values of a bootstrap distribution by iterating the process 100 times and plotting the confidence interval lines. Then I need to plot 100 line segments corresponding to the confidence intervals and demarcate those lines which fall outside a given value (boldened by drawing a vertical line corresponding to that value) from those that include that given value by coloring.
As of now, I could just create the bootstrap for a given repetition only once, I don't know how to repeat the process 100 times and then plot those 100 values. Please help!
Code that I have written to get the bootstrap distribution once and get the corresponding confidence intervals:
bootstrap1 <- bowl_sample_1 %>%
specify(response = color, success = "red") %>%
generate(reps = 1000, type = "bootstrap") %>%
calculate(stat = "prop")
percentile1 <- bootstrap1 %>%
get_confidence_interval(level = .95, type = "percentile")
percentile1
This seems like it should be rather simple. For ggplot make a data.frame with start, end, id and colour indicating whether the value is within or outside the interval. Do the same for a base plot.
Simple reproducible example:
set.seet(1000)
start <- runif(100, 0.3, 1)
end <- runif(100, 0, 0.7)
br <- 0.6
data <- data.frame(id = 1:100,
start = start,
end = end,
within = ifelse(br < pmax(start, end) & br > pmin(start, end), 'gray' , 'black'))
library(ggplot2)
ggplot(data, aes(y = id, yend = id, x = start, xend = end, col = within)) +
geom_segment() +
scale_colour_manual(values = c("black" = 'black', "gray" = 'gray'))

How to specify groups with colors in qqplot()?

I have created a qqplot (with quantiles of beta distribution) from a dataset including two groups. To visualize, which points belong to which group, I would like to color them. I have tried the following:
res <- beta.mle(data$values) #estimate parameters of beta distribution
qqplot(qbeta(ppoints(500),res$param[1], res$param[2]),data$values,
col = data$group,
ylab = "Quantiles of data",
xlab = "Quantiles of Beta Distribution")
the result is shown here:
I have seen solutions specifying a "col" vector for qqnorm, hover this seems to not work with qqplot, as simply half the points is colored in either color, regardless of group. Is there a way to fix this?
A simulated some data just to shown how to add color in ggplot
Libraries
library(tidyverse)
# install.packages("Rfast")
Data
#Simulating data from beta distribution
x <- rbeta(n = 1000,shape1 = .5,shape2 = .5)
#Estimating parameters
res <- Rfast::beta.mle(x)
data <-
tibble(
simulated_data = sort(x),
quantile_data = qbeta(ppoints(length(x)),res$param[1], res$param[2])
) %>%
#Creating a group variable using quartiles
mutate(group = cut(x = simulated_data,
quantile(simulated_data,seq(0,1,.25)),
include.lowest = T))
Code
data %>%
# Adding group variable as color
ggplot(aes( x = quantile_data, y = simulated_data, col = group))+
geom_point()
Output
For those who are wondering, how to work with pre-defined groups, this is the code that worked for me:
library(tidyverse)
library(Rfast)
res <- beta.mle(x)
# make sure groups are not numerrical
# (else color skale might turn out continuous)
g <- plyr::mapvalues(g, c("1", "2"), c("Group1", "Group2"))
data <-
tibble(
my_data = sort(x),
quantile_data = qbeta(ppoints(length(x)),res$param[1], res$param[2]),
group = g[order(x)]
)
data %>%
# Adding group variable as color
ggplot(aes( x = quantile_data, y = my_data, col = group))+
geom_point()
result

Boxplots aren't colouring or plotting labels properly in R, why?

My Tukey test significant results LABELS and the colours plotted as box plots do not plot over each sample box plot. Why?
Seems like the labels are plotted at different y-axis along the same s1 (x-axis)?
Reproducible dataset here:
library(multcompView)
df <- data.frame('Sample'=c("s1","s1","s1","s1","s1","s2","s2","s2","s2","s2","s3","s3","s3","s3","s4","s4","s5","s5"), 'value'=c(-0.1098,-0.1435,-0.1046,-0.1308,-0.1523,-0.1219,-0.1114,-0.1328,-0.1589,-0.1567,-0.1395,-0.1181,-0.1448,-0.124,-0.1929,-0.1996,-0.1981,-0.1917))
anova_df <- aov(df$value ~ df$Sample )
tukey_df <- TukeyHSD(anova_df, 'df$Sample', conf.level=0.95)
# I need to group the treatments that are not different each other together.
TUKEY <- tukey_df
generate_label_df <- function(TUKEY, variable){
# Extract labels and factor levels from Tukey post-hoc
Tukey.levels <- TUKEY[[variable]][,4]
Tukey.labels <- data.frame(multcompLetters(Tukey.levels)['Letters'])
#I need to put the labels in the same order as in the boxplot :
Tukey.labels$Sample=rownames(Tukey.labels)
Tukey.labels=Tukey.labels[order(Tukey.labels$Sample) , ]
return(Tukey.labels)
}
# Apply the function on my dataset
LABELS <- generate_label_df(TUKEY , "df$Sample")
# A panel of colors to draw each group with the same color :
my_colors <- c(
rgb(143,199,74,maxColorValue = 255),
rgb(242,104,34,maxColorValue = 255),
rgb(111,145,202,maxColorValue = 255))
# Draw the basic boxplot
a <- boxplot(df$value ~ df$Sample , ylim=c(min(df$value) , 1.1*max(df$value)) , col=my_colors[as.numeric(LABELS[,1])] , ylab="Value" , main="")
# I want to write the letter over each box. Over is how high I want to write it.
over <- 0.1*max(a$stats[nrow(a$stats),] )
#Add the labels
text(c(1:nlevels(df$Sample)), a$stats[nrow(a$stats),]+over, LABELS[,1] , col=my_colors[as.numeric(LABELS[,1])] )
Current output:
Desired plot-like (colours and LABELS):
First, LABELS$Letters is a character vector. You can get as.numeric(LABELS[,1]) to work if you make it a factor first.
Second, your y-limit needs some work for negative values. There is a function you might find useful called extendrange which is used in many a plotting function.
This line c(1:nlevels(df$Sample)) also would work if df$Sample was a factor which is was not.
Also, if you are plotting text at a specific location, you can adjust the text using either text(..., pos = ) or text(..., adj = ) to shift the position.
LABELS$Letters <- factor(LABELS$Letters)
a <- boxplot(df$value ~ df$Sample , ylim = extendrange(df$value), col=my_colors[as.numeric(LABELS[,1])] , ylab="Value" , main="")
text(seq_along(a$names), apply(a$stats, 2, max), LABELS[,1], col=my_colors[as.numeric(LABELS[,1])], pos = 3)
If you don't mind changing your workflow and use tidyverse library this is how you could achieve your goal:
# join df and LABELS into one data table
inner_join(df, LABELS, by = "Sample") %>%
# calculate max value for each Sample group (it will be used to place the labels)
group_by(Sample) %>%
mutate(placement = max(value)) %>%
ungroup() %>%
# make a plot
ggplot(aes(Sample, value, fill = Letters))+
geom_boxplot()+
geom_text(aes(y = placement, label = Letters, col = Letters), nudge_y = 0.01, size = 6)+
theme_minimal()+
theme(legend.position = "none")

pie charts in R where slices represent the frequency of values in the columns of the data set

I want to make pie charts for each column of my dataframe, where the slices represent the frequency, in which the values in the columns appear. For instance, the following will produce a data frame with 3 columns, and will round the numbers down to single digits.
test1<-rnorm(200,mean = 20, sd = 2)
test2<-rnorm(200,mean=20, sd =1)
test3<-rnorm(200,mean=20, sd =3)
testdata<-cbind(test,test2,test3)
testdata <-round(testdata,0)
So I would need to have 3 pie charts, where the slices represent the number of times, in which a given value appears in the respective column (with the name of the column on top of the pie chart, if possible)
So far, I have tried pie(frame(testdata$test1)) but it works for creating a single pie chart, and my real data has 25 columns. On top of that, trying to pass a "main=" argument to name it, results in error.
Thank you in advance.
ggplot2 is the go-to library to make nice plots. To have 3 different pie-plots one needs to adjust the data a bit, which is done with some tidyverse-functions.
test1<-rnorm(200,mean = 20, sd = 2)
test2<-rnorm(200,mean=20, sd =1)
test3<-rnorm(200,mean=20, sd =3)
testdata<-cbind(test1,test2,test3)
testdata <-round(testdata,0)
library(ggplot2)
library(tidyverse)
plotdata <- testdata %>%
as_tibble() %>%
pivot_longer(names(.),names_to = "data1", values_to = "value") %>%
group_by(data1) %>%
count(value)
ggplot(plot_data, aes( x = "", y = n, fill = factor(value))) +
geom_col(width = 1, show.legend = TRUE) +
coord_polar("y", start = 0) +
facet_wrap(~data1)

R: PCA ggplot Error "arguments imply differing number of rows"

I have a dataset:
https://docs.google.com/spreadsheets/d/1ZgyRQ2uTw-MjjkJgWCIiZ1vpnxKmF3o15a5awndttgo/edit?usp=sharing
that I'm trying to apply PCA analysis and to achieve a graph based on graph provided in this post:
https://stats.stackexchange.com/questions/61215/how-to-interpret-this-pca-biplot-coming-from-a-survey-of-what-areas-people-are-i
However, an error doesn't seem to go away:
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names =
TRUE, :
arguments imply differing number of rows: 0, 1006
Following is my code that I have trouble finding the source of error. Would like to have some help for error detection. Any hints?
The goal is to produced a PCA graph grouped by levels of Happiness.in.life. I modified the original code to fit with my dataset. Originally, group is determined by Genders, which has 2 levels. What I'm attempting to do is to build a graph based on 5 levels of Happiness.in.life. However, it doesn't seem I can use the old code...
Thanks!
library(magrittr)
library(dplyr)
library(tidyr)
df <- happiness_reduced %>% dplyr::select(Happiness.in.life:Internet.usage, Happiness.in.life)
head(df)
vars_on_hap <- df %>% dplyr::select(-Happiness.in.life)
head(vars_on_hap)
group<-df$Happiness.in.life
fit <- prcomp(vars_on_hap)
pcData <- data.frame(fit$x)
vPCs <- fit$rotation[, c("PC1", "PC2")] %>% as.data.frame()
multiple <- min(
(max(pcData[,"PC1"]) - min(pcData[,"PC1"]))/(max(vPCs[,"PC1"])-
min(vPCs[,"PC1"])),
(max(pcData[,"PC2"]) - min(pcData[,"PC2"]))/(max(vPCs[,"PC2"])-
min(vPCs[,"PC2"]))
)
ggplot(pcData, aes(x=PC1, y=PC2)) +
geom_point(aes(colour=groups)) +
coord_equal() +
geom_text(data=vPCs,
aes(x = fit$rotation[, "PC1"]*multiple*0.82,
y = fit$rotation[,"PC2"]*multiple*0.82,
label=rownames(fit$rotation)),
size = 2, vjust=1, color="black") +
geom_segment(data=vPCs,
aes(x = 0,
y = 0,
xend = fit$rotation[,"PC1"]*multiple*0.8,
yend = fit$rotation[,"PC2"]*multiple*0.8),
arrow = arrow(length = unit(.2, 'cm')),
color = "grey30")
Here is an approach on how to plot the result of PCA in ggplot2:
library(tidyverse)
library(ggrepel)
A good idea (not in all cases for instance if they are all in the same units) is to scale the variables prior to PCA
hapiness %>% #this is the data from google drive. In the future try not top post such links on SO because they tend to be unusable after some time has passed
select(-Happiness.in.life) %>%
prcomp(center = TRUE, scale. = TRUE) -> fit
Now we can proceed to plotting the fit:
fit$x %>% #coordinates of the points are in x element
as.data.frame()%>% #convert matrix to data frame
select(PC1, PC2) %>% #select the first two PC
bind_cols(hapiness = as.factor(hapiness$Happiness.in.life)) %>% #add the coloring variable
ggplot() +
geom_point(aes(x = PC1, y = PC2, colour = hapiness)) + #plot points and color
geom_segment(data = fit$rotation %>% #data we want plotted by geom_segment is in rotation element
as.data.frame()%>%
select(PC1, PC2) %>%
rownames_to_column(), #get to row names so you can label after
aes(x = 0, y = 0, xend = PC1 * 7, yend = PC2* 7, group = rowname), #I scaled the rotation by 7 so it fits in the plot nicely
arrow = arrow(angle = 20, type = "closed", ends = "last",length = unit(0.2,"cm")),
color = "grey30") +
geom_text_repel(data = fit$rotation %>%
as.data.frame()%>%
select(PC1, PC2) %>%
rownames_to_column(),
aes(x = PC1*7,
y = PC2*7,
label = rowname)) +
coord_equal(ratio = fit$sdev[2]^2 / fit$sdev[1]^2) + #I like setting the ratio to the ratio of eigen values
xlab(paste("PC1", round(fit$sdev[1]^2/ sum(fit$sdev^2) *100, 2), "%")) +
ylab(paste("PC2", round(fit$sdev[2]^2/ sum(fit$sdev^2) *100, 2), "%")) +
theme_bw()
Look at all them happy people on the left (well it is hard to notice because of the colors used, I suggest using the palette jco from ggpubr library) get_palette('jco', 5) ie scale_color_manual(values = get_palette('jco', 5))
quite a similar plot can be achieved with library ggord:
library(ggord)
ggord(fit, grp_in = as.factor(hapiness$Happiness.in.life),
size = 1, ellipse = F, ext = 1.2, vec_ext = 5)
the major difference is ggord uses equal scaling for axes. Also I scaled the rotation by 5 instead of 7 as in the first plot.
As you can see I do not like many intermediate data frames.

Resources