Order heatmap according to one specific row [duplicate] - r

This question already has answers here:
Order data inside a geom_tile
(2 answers)
Closed 4 years ago.
I am trying to do the following. Consider the following dataset
trends <- c('Virtual Assistant', 'Citizen DS', 'Deep Learning', 'Speech Recognition',
'Handwritten Recognition', 'Machine Translation', 'Chatbots',
'NLP')
impact <- sample(5,8, replace = TRUE)
maturity <- sample(5,8, replace = TRUE)
strategy <- sample(5,8, replace = TRUE)
h <- sample(5,8, replace = TRUE)
df <- data.frame(trends, impact, maturity, strategy, h)
rownames(df) <- df$trends
I am trying to generate a heatmap. So far is good. That is relatively easy. For example I can use
dftemp = df[,c("impact", "maturity", "strategy", "h")]
dt2 <- dftemp %>%
rownames_to_column() %>%
gather(colname, value, -rowname)
and then
ggplot(dt2, aes(x = rowname, y = colname, fill = value)) +
geom_tile()
I know the labels on the x-axis are horizontal, but I know how to fix that. What I would like to have is to order the x-axis based on one specific rows. For example I would like to have the heatmap with the row "impact" (for example) values in ascending order. Anyone can point me in the right direction?
Shoudl I convert the x in a factor and change the levels there?

Yes, you could convert it into factors and specify the levels. So to change it based on impact row we can do
dt2$rowname <- factor(dt2$rowname, levels = df$trends[order(df$impact)])
library(ggplot2)
ggplot(dt2, aes(x = rowname, y = colname, fill = value)) +
geom_tile()

Related

Adding legend to ggplot curves plotted on the same axis [duplicate]

This question already has answers here:
Add legend to ggplot2 line plot
(4 answers)
Closed 4 months ago.
I have a graph that I'm trying to add a legend to but I can't find any answers.
Here's what the graph looks like
I made a dataframe containing my x-axis as a colum and several othe columns containing y values that I graphed against x (fixed) in order to get these curves. I want a legend to appear on the side saying column 1, ...column 11 and corresponding to the color of the graph
How do I do this? I feel like I'm missing something obvious
Here's what my code looks like:(sorry for the pic. I keep getting errors that my code is not formatted correctly even though I'm using the code button)
interval is just 2:100 and aaaa etc... is a vector the same length as interval.
As Peter says, you will need to convert your data into "long" format. Here is an example using reshape2::melt:
library(reshape2)
library(ggplot2)
n <- 20
df <- data.frame(x = seq(n))
tmp <- as.data.frame(do.call("cbind", lapply(seq(5), FUN = function(x){rnorm(n)})))
names(tmp) <- paste0("aaaa", letters[1:5])
df <- cbind(df, tmp)
head(df)
df2 <- melt(df, id.vars = "x")
head(df2)
ggplot(data = df2) + aes(x = x, y = value, color = variable) +
geom_point() +
geom_line()

Plot only certain values in entire dataframe

it's quite a basic question I think but I can't figure out how to do it in a few elegant steps. I have this dataset:
df <- data.frame(A=c(1,2,2,3,4,5,1,1,2,3),
B=c(4,4,2,3,4,2,1,5,2,2),
C=c(3,3,3,3,4,2,5,1,2,3),
D=c(1,2,5,5,5,4,5,5,2,3),
E=c(1,4,2,3,4,2,5,1,2,3),
dummy1=c("yes","yes","no","no","no","no","yes","no","yes","yes"),
dummy2=c("high","low","low","low","high","high","high","low","low","high"))
df1 <- data.frame(lapply(df1, factor))
And I would like to make a grouped barplot in ggplot that only considers the "1" (basically plotting the frequencies of the 1s) by dummy. On the X-axis (the groups) there should be the columns (A,B,C,D,E), on the Y-axis the % of "1" in the respective column and the colors of the bars reflect the dummy considered.
This is basically what I want:
I know how to do this by creating each time a single dataframe for each dummy level and then plot them, but I'm sure there's a better and more efficient way to do this.
Thanks in advance for any suggestion!
Something like the following could work:
library(data.table)
setDT(df1)
graph_data <- df1[, lapply(.SD, function(x) sum(x == 1)/ nrow(df1)),
by = "dummy1",
.SDcols = c("A","B","C","D","E")] %>%
melt(id.vars = "dummy1")
library(ggplot2)
ggplot() +
geom_col(data = graph_data,
mapping = aes(x = variable, y = value, fill = dummy1),
position = "dodge")

Sort dataset for grouped boxplot

I have a rather untidy dataset and can't wrap my head around how to do this in R. Alternative would be to do this in Excel but since I have several of these, this would take forever.
So what I need is to create a grouped boxplot.
For this I think I need a dataset that consists of 4 columns: species, group (A or B), variable, value.
But what I have at the moment is only:
variable and species_group (together in one column),
Here is a reproducible example:
variable <- c('precipitation','soil','land use')
species1_A <- c(10000, 500, 1322)
species1_B <- c(11500, 200, 600)
species2_A <- c(10000, 500, 1489)
species2_B <- c(15687, 800, 587)
df <- data.frame(variable, species1_A, species1_B,species2_A, species2_B)
So I guess I have to create a whole new column "group" with A or B and somehow tell R to take that information from the "species1_A" name.
Can anyone help me please? Thank you!
I'd suggest the following:
library(tidyverse)
df %>%
pivot_longer(contains("species"), names_to = "name", values_to = "value") %>%
separate(name, c("species", "group"), "_") %>%
ggplot() +
facet_wrap(~variable) +
aes(x = species, y = value, color = group) +
geom_point()
Sorry I'm not sure how you'd want things laid out and you only have one value per group in your example dataset. You can change geom_point to geom_boxplot once you have more variables per group. Spacing between the boxes can be adjusted with position_dodge. HTH.

How to assign unique title and text labels to ggplots created in lapply loop?

I've tried about every iteration I can find on Stack Exchange of for loops and lapply loops to create ggplots and this code has worked well for me. My only problem is that I can't assign unique titles and labels. From what I can tell in the function i takes the values of my response variable so I can't index the title I want as the ith entry in a character string of titles.
The example I've supplied creates plots with the correct values but the 2nd and 3rd plots in the plot lists don't have the correct titles or labels.
Mock dataset:
library(ggplot2)
nms=c("SampleA","SampleB","SampleC")
measr1=c(0.6,0.6,10)
measr2=c(0.6,10,0.8)
measr3=c(0.7,10,10)
qual1=c("U","U","")
qual2=c("U","","J")
qual3=c("J","","")
df=data.frame(nms,measr1,qual1,measr2,qual2,measr3,qual3,stringsAsFactors = FALSE)
identify columns in dataset that contain response variable
measrsindex=c(2,4,6)
Create list of plots that show all samples for each measurement
plotlist=list()
plotlist=lapply(df[,measrsindex], function(i) ggplot(df,aes_string(x="nms",y=i))+
geom_col()+
ggtitle("measr1")+
geom_text(aes(label=df$qual1)))
Create list of plots that show all measurements for each sample
plotlist2=list()
plotlist2=lapply(df[,measrsindex],function(i)ggplot(df,aes_string(x=measrsindex, y=i))+
geom_col()+
ggtitle("SampleA")+
geom_text(aes(label=df$qual1)))
The problem is that I cant create unique title for each plot. (All plots in the example have the title "measr1" or "SampleA)
Additionally I cant apply unique labels (from qual columns) for each bar. (ex. the letter for qual 2 should appear on top of the column for measr2 for each sample)
Additionally in the second plot list the x-values aren't "measr1","measr2","measr3" they're the index values for those columns which isn't ideal.
I'm relatively new to R and have never posted on Stack Overflow before so any feedback about my problem or posting questions is welcomed.
I've found lots of questions and answers about this sort of topic but none that have a data structure or desired plot quite like mine. I apologize if this is a redundant question but I have tried to find the solution in previous answers and have been unable.
This is where I got the original code to make my loops, however this example doesn't include titles or labels:
Looping over ggplot2 with columns
You could loop over the names of the columns instead of the column itself and then use some non-standard evaluation to get column values from the names. Also, I have included label in aes.
library(ggplot2)
library(rlang)
plotlist3 <- purrr::map(names(df)[measrsindex],
~ggplot(df, aes(nms, !!sym(.x), label = qual1)) +
geom_col() + ggtitle(.x) + geom_text(vjust = -1))
plotlist3[[1]]
plotlist3[[2]]
The same can be achieved with lapply as well
plotlist4 <- lapply(names(df)[measrsindex], function(x)
ggplot(df, aes(nms, !!sym(x), label = qual1)) +
geom_col() + ggtitle(x) + geom_text(vjust = -1))
I would recommend putting your data in long format prior to using ggplot2, it makes plotting a much simpler task. I also recoded some variables to facilitate constructing the plot. Here is the code to construct the plots with lapply.
library(tidyverse)
#Change from wide to long format
df1<-df %>%
pivot_longer(cols = -nms,
names_to = c(".value", "obs"),
names_sep = c("r","l")) %>%
#Separate Sample column into letters
separate(col = nms,
sep = "Sample",
into = c("fill","Sample"))
#Change measures index to 1-3
measrsindex=c(1,2,3)
plotlist=list()
plotlist=lapply(measrsindex, function(i){
#Subset by measrsindex (numbers) and plot
df1 %>%
filter(obs == i) %>%
ggplot(aes_string(x="Sample", y="meas", label="qua"))+
geom_col()+
labs(x = "Sample") +
ggtitle(paste("Measure",i, collapse = " "))+
geom_text()})
#Get the letters A : C
samplesvec<-unique(df1$Sample)
plotlist2=list()
plotlist2=lapply(samplesvec, function(i){
#Subset by samplesvec (letters) and plot
df1 %>%
filter(Sample == i) %>%
ggplot(aes_string(x="obs", y = "meas",label="qua"))+
geom_col()+
labs(x = "Measure") +
ggtitle(paste("Sample",i,collapse = ", "))+
geom_text()})
Watching the final plots, I think it might be useful to use facet_wrap to make these plots. I added the code to use it with your plots.
#Plot for Measures
ggplot(df1, aes(x = Sample,
y = meas,
label = qua)) +
geom_col()+
facet_wrap(~ obs) +
ggtitle("Measures")+
labs(x="Samples")+
geom_text()
#Plot for Samples
ggplot(df1, aes(x = obs,
y = meas,
label = qua)) +
geom_col()+
facet_wrap(~ Sample) +
ggtitle("Samples")+
labs(x="Measures")+
geom_text()
Here is a sample of the plots using facet_wrap.

Plot data for each row within a single command

I'm an R newbie and need help with the following.
I have the following data
# Simulate matrix of integers
set.seed(1)
df <- matrix(sample.int(5, size = 3*5, replace = TRUE), nrow = 3, ncol = 5)
print(df)
df <- tbl_df(df) # tabulate as dataframe
df <- rbind(df, c(3,5,4,1,4))
print(df)
Within a single command, I need to plot the data for each row, so that y-axis: data in each row (in my case these are values from 1 to 5); x-axis: values 1,2,3,4,5 that refer to each column. So effectively, for each row, I am trying to plot how row values change for every single column.
I have tried the following, which works but has two problems which I need to resolve. First, this only plots 1 row at a time. Not an efficient way of doing things especially if there are many rows. Second, I could not find a way to refer to the x-axis as the number of columns, so I resorted to simply counting the number of columns (i.e. 5) and put a c(1:5) vector to represent a number of columns. I also tried to put ncol(df) to represent x-axis but that returns an error saying that variables have different length. Indeed when requesting ncol(df) it return number 5, which is the number of columns but it does not do what I wanted it to, i.e. to represent number of columns sequentially 1,2,3,4,5.
plot(c(1:5),df[1,], type = "b", pch=19,
col = "blue", xlab = "number of columns", ylab = "response format")
Thank you, your help is much appreciated
You could do:
library(tidyverse)
df %>%
mutate(row_number = as.factor(row_number())) %>%
gather(columns, responses, V1:V5) %>%
ggplot(aes(x = columns, y = responses, group = row_number, color = row_number)) +
geom_line() + geom_point()
Output:
What this does:
Creates an id for each row (row_number);
Transforms the data frame into a long format with 1 column for columns, and another for responses;
Plots everything on 1 chart where each color represents one row.
You could also slightly change the plot so that each line (row) has its own chart by adding facet_wrap, e.g.:
df %>%
mutate(row_number = as.factor(row_number())) %>%
gather(columns, responses, V1:V5) %>%
ggplot(aes(x = columns, y = responses, group = row_number, color = row_number)) +
geom_line() + geom_point() +
facet_wrap(~ row_number)
Output:

Resources