Sort dataset for grouped boxplot - r

I have a rather untidy dataset and can't wrap my head around how to do this in R. Alternative would be to do this in Excel but since I have several of these, this would take forever.
So what I need is to create a grouped boxplot.
For this I think I need a dataset that consists of 4 columns: species, group (A or B), variable, value.
But what I have at the moment is only:
variable and species_group (together in one column),
Here is a reproducible example:
variable <- c('precipitation','soil','land use')
species1_A <- c(10000, 500, 1322)
species1_B <- c(11500, 200, 600)
species2_A <- c(10000, 500, 1489)
species2_B <- c(15687, 800, 587)
df <- data.frame(variable, species1_A, species1_B,species2_A, species2_B)
So I guess I have to create a whole new column "group" with A or B and somehow tell R to take that information from the "species1_A" name.
Can anyone help me please? Thank you!

I'd suggest the following:
library(tidyverse)
df %>%
pivot_longer(contains("species"), names_to = "name", values_to = "value") %>%
separate(name, c("species", "group"), "_") %>%
ggplot() +
facet_wrap(~variable) +
aes(x = species, y = value, color = group) +
geom_point()
Sorry I'm not sure how you'd want things laid out and you only have one value per group in your example dataset. You can change geom_point to geom_boxplot once you have more variables per group. Spacing between the boxes can be adjusted with position_dodge. HTH.

Related

geom_bar combine 2 dataset onto one graph

I have two dataframes with paired scores, each scoring patients on a 1-8 scoring system (Where 1= managing well and 8 = terminally ill).
One score is done by the patient and one by the clinician.
sample data
df <- data.frame(Patient = c(1,1,2,4,5,3,2,6,7,6,3,4,2,3,5,6,7,3,8,1), Clinican= c(1,2,2,5,4,5,4,4,4,2,3,5,4,6,5,4,3,7,7,1))
I'd like to create a bar chart similar to the one below using my dataset.
Any help would be much appreciated.
I believe I need dplyr pivot_longer similar to this post:
geom_bar two datasets together in R
Here is a solution using pivot_longer and geom_bar as you asked.
Libraries
library(dplyr)
library(tidyr)
library(ggplot2)
Solution
You can change name and value for whatever name you prefer.
Also, the x-axis is categorical, so we have to mutate it to factor.
You can then recode the factor value for the labels you need (e.g., 'well', 'very fit'...)
df %>%
pivot_longer(Patient:Clinican, names_to = "name", values_to = "value") %>%
mutate(value = factor(value)) %>%
ggplot(aes(x = value, fill = name)) +
geom_bar(position = "dodge")
Output

Why does my line plot (ggplot2) look vertical?

I am new to coding in R, when I was using ggplot2 to make a line graph, I get vertical lines. This is my code:
all_trips_v2 %>%
group_by(Month_Name, member_casual) %>%
summarise(average_duration = mean(length_of_ride))%>%
ggplot(aes(x = Month_Name, y = average_duration)) + geom_line()
And I'm getting something like this:
This is a sample of my data:
(Not all the cells in the Month_Name is August, it's just sorted)
Any help will be greatly appreciated! Thank you.
I added a bit more code just for the mere example. the data i chose is probably not the best choice to display a proper timer series.
I hope the features of ggplot i displayed will be benficial for you in the future
library(tidyverse)
library(lubridate)
mydat <- sample_frac(storms,.4)
# setting the month of interest as the current system's month
month_of_interest <- month(Sys.Date(),label = TRUE)
mydat %>% group_by(year,month) %>%
summarise(avg_pressure = mean(pressure)) %>%
mutate(month = month(month,label = TRUE),
current_month = month == month_of_interest) %>%
# the mutate code is just for my example.
ggplot(aes(x=year, y=avg_pressure,
color=current_month,
group=month,
size=current_month
))+geom_line(show.legend = FALSE)+
## From here its not really important,
## just ideas for your next plots
scale_color_manual(values=c("grey","red"))+
scale_size_manual(values = c(.4,1))+
ggtitle(paste("Averge yearly pressure,\n
with special interest in",month_of_interest))+
theme_minimal()
## Most important is that you notice the group argument and also,
# in most cases you will want to color your different lines.
# I added a logical variable so only October will be colored,
# but that is not mandatory
You should add a grouping argument.
see further info here:
https://ggplot2.tidyverse.org/reference/aes_group_order.html
# Multiple groups with one aesthetic
p <- ggplot(nlme::Oxboys, aes(age, height))
# The default is not sufficient here. A single line tries to connect all
# the observations.
p + geom_line()
# To fix this, use the group aesthetic to map a different line for each
# subject.
p + geom_line(aes(group = Subject))

How to assign unique title and text labels to ggplots created in lapply loop?

I've tried about every iteration I can find on Stack Exchange of for loops and lapply loops to create ggplots and this code has worked well for me. My only problem is that I can't assign unique titles and labels. From what I can tell in the function i takes the values of my response variable so I can't index the title I want as the ith entry in a character string of titles.
The example I've supplied creates plots with the correct values but the 2nd and 3rd plots in the plot lists don't have the correct titles or labels.
Mock dataset:
library(ggplot2)
nms=c("SampleA","SampleB","SampleC")
measr1=c(0.6,0.6,10)
measr2=c(0.6,10,0.8)
measr3=c(0.7,10,10)
qual1=c("U","U","")
qual2=c("U","","J")
qual3=c("J","","")
df=data.frame(nms,measr1,qual1,measr2,qual2,measr3,qual3,stringsAsFactors = FALSE)
identify columns in dataset that contain response variable
measrsindex=c(2,4,6)
Create list of plots that show all samples for each measurement
plotlist=list()
plotlist=lapply(df[,measrsindex], function(i) ggplot(df,aes_string(x="nms",y=i))+
geom_col()+
ggtitle("measr1")+
geom_text(aes(label=df$qual1)))
Create list of plots that show all measurements for each sample
plotlist2=list()
plotlist2=lapply(df[,measrsindex],function(i)ggplot(df,aes_string(x=measrsindex, y=i))+
geom_col()+
ggtitle("SampleA")+
geom_text(aes(label=df$qual1)))
The problem is that I cant create unique title for each plot. (All plots in the example have the title "measr1" or "SampleA)
Additionally I cant apply unique labels (from qual columns) for each bar. (ex. the letter for qual 2 should appear on top of the column for measr2 for each sample)
Additionally in the second plot list the x-values aren't "measr1","measr2","measr3" they're the index values for those columns which isn't ideal.
I'm relatively new to R and have never posted on Stack Overflow before so any feedback about my problem or posting questions is welcomed.
I've found lots of questions and answers about this sort of topic but none that have a data structure or desired plot quite like mine. I apologize if this is a redundant question but I have tried to find the solution in previous answers and have been unable.
This is where I got the original code to make my loops, however this example doesn't include titles or labels:
Looping over ggplot2 with columns
You could loop over the names of the columns instead of the column itself and then use some non-standard evaluation to get column values from the names. Also, I have included label in aes.
library(ggplot2)
library(rlang)
plotlist3 <- purrr::map(names(df)[measrsindex],
~ggplot(df, aes(nms, !!sym(.x), label = qual1)) +
geom_col() + ggtitle(.x) + geom_text(vjust = -1))
plotlist3[[1]]
plotlist3[[2]]
The same can be achieved with lapply as well
plotlist4 <- lapply(names(df)[measrsindex], function(x)
ggplot(df, aes(nms, !!sym(x), label = qual1)) +
geom_col() + ggtitle(x) + geom_text(vjust = -1))
I would recommend putting your data in long format prior to using ggplot2, it makes plotting a much simpler task. I also recoded some variables to facilitate constructing the plot. Here is the code to construct the plots with lapply.
library(tidyverse)
#Change from wide to long format
df1<-df %>%
pivot_longer(cols = -nms,
names_to = c(".value", "obs"),
names_sep = c("r","l")) %>%
#Separate Sample column into letters
separate(col = nms,
sep = "Sample",
into = c("fill","Sample"))
#Change measures index to 1-3
measrsindex=c(1,2,3)
plotlist=list()
plotlist=lapply(measrsindex, function(i){
#Subset by measrsindex (numbers) and plot
df1 %>%
filter(obs == i) %>%
ggplot(aes_string(x="Sample", y="meas", label="qua"))+
geom_col()+
labs(x = "Sample") +
ggtitle(paste("Measure",i, collapse = " "))+
geom_text()})
#Get the letters A : C
samplesvec<-unique(df1$Sample)
plotlist2=list()
plotlist2=lapply(samplesvec, function(i){
#Subset by samplesvec (letters) and plot
df1 %>%
filter(Sample == i) %>%
ggplot(aes_string(x="obs", y = "meas",label="qua"))+
geom_col()+
labs(x = "Measure") +
ggtitle(paste("Sample",i,collapse = ", "))+
geom_text()})
Watching the final plots, I think it might be useful to use facet_wrap to make these plots. I added the code to use it with your plots.
#Plot for Measures
ggplot(df1, aes(x = Sample,
y = meas,
label = qua)) +
geom_col()+
facet_wrap(~ obs) +
ggtitle("Measures")+
labs(x="Samples")+
geom_text()
#Plot for Samples
ggplot(df1, aes(x = obs,
y = meas,
label = qua)) +
geom_col()+
facet_wrap(~ Sample) +
ggtitle("Samples")+
labs(x="Measures")+
geom_text()
Here is a sample of the plots using facet_wrap.

Order heatmap according to one specific row [duplicate]

This question already has answers here:
Order data inside a geom_tile
(2 answers)
Closed 4 years ago.
I am trying to do the following. Consider the following dataset
trends <- c('Virtual Assistant', 'Citizen DS', 'Deep Learning', 'Speech Recognition',
'Handwritten Recognition', 'Machine Translation', 'Chatbots',
'NLP')
impact <- sample(5,8, replace = TRUE)
maturity <- sample(5,8, replace = TRUE)
strategy <- sample(5,8, replace = TRUE)
h <- sample(5,8, replace = TRUE)
df <- data.frame(trends, impact, maturity, strategy, h)
rownames(df) <- df$trends
I am trying to generate a heatmap. So far is good. That is relatively easy. For example I can use
dftemp = df[,c("impact", "maturity", "strategy", "h")]
dt2 <- dftemp %>%
rownames_to_column() %>%
gather(colname, value, -rowname)
and then
ggplot(dt2, aes(x = rowname, y = colname, fill = value)) +
geom_tile()
I know the labels on the x-axis are horizontal, but I know how to fix that. What I would like to have is to order the x-axis based on one specific rows. For example I would like to have the heatmap with the row "impact" (for example) values in ascending order. Anyone can point me in the right direction?
Shoudl I convert the x in a factor and change the levels there?
Yes, you could convert it into factors and specify the levels. So to change it based on impact row we can do
dt2$rowname <- factor(dt2$rowname, levels = df$trends[order(df$impact)])
library(ggplot2)
ggplot(dt2, aes(x = rowname, y = colname, fill = value)) +
geom_tile()

same y axis variable , scatter-plot and long format

Let's say two different raters are evaluating behavioral problems. They use the same scale (from 0 to 50) and the child being evaluated is the same for both raters. In social sciences, this method is common and there are some useful statistics, such as correlation coefficient and Intra-Class Correlation.
In addition, one graph that comes to my mind is the scatter-plot, and in the x-axys I'll plot the behavioral problems scores considering the first rater and in the y-axis, I'll do the same for the second rater.
gplot2 creates amazing plots, however, some simple routines and action become really difficult to do.
Please see the code below and the r base plot. I would like to know if ggplot can create this plot as well.
Thanks much
set.seed(123)
ds <- data.frame(behavior_problems = rnorm(100,30,2), evaluator=sample(1:2))
plot(ds$behavior_problems[ds$evaluator == '1'] ,
y = ds$behavior_problems[ds$evaluator == '2'])
== I had to edit to make clear why a scatter-plot would be informative==
I think the key problem here is the way in which you have set up the data frame. It is not clear that each individual gets a pair of scores, one from each evaluator. So the first thing to do is add an ID for each individual: 50 IDs in your example data.
library(tidyverse)
ds %>%
mutate(id = rep(1:50, each = 2)
Now we can use tidyr::spread to create a column for each evaluator. But numbers for column names are not a great idea, so we'll rename them to e1 and e2.
ds %>%
mutate(id = rep(1:50, each = 2)) %>%
spread(evaluator, behavior_problems) %>%
rename(e1 = `1`, e2 = `2`)
Now we have column names that can be supplied to ggplot:
ds %>%
mutate(id = rep(1:50, each = 2)) %>%
spread(evaluator, behavior_problems) %>%
rename(e1 = `1`, e2 = `2`) %>%
ggplot(aes(e1, e2)) +
geom_point()
If this seems like a "long way around", it's because ggplot2 works better with "long" data (before the spread) than "wide" (after the spread). To illustrate, here's another way to visualize the difference in scores by individual, which I think works quite well:
ds %>%
mutate(id = rep(1:50, each = 2),
evaluator = factor(evaluator)) %>%
ggplot(aes(id, behavior_problems)) +
geom_point(aes(color = evaluator)) +
geom_line(aes(group = id))

Resources