Please help me. I have the following data in R: I have values of three groups of organisms from day 0 to day 7 which represent the mean of populations for these groups for each day.
Here is my data:
https://docs.google.com/spreadsheets/d/15-XXT6jOSKZs0FS14FScnHMm0Qd19N-x/edit#gid=377184551
And was trying to follow an example on the following page; https://statisticsglobe.com/plot-all-columns-of-data-frame-in-r, but the graphs I get give data value on the y axis and also the ploted lines are joined. I would like to have separate lines for each of the groups (the three groups) and also have a scale on the y axis instead of plot values. Plotting individual values for each of the groups gives me the same values on the y-axis instead of a scale. I would however like the y-axis values to begin with values of Day 0 and keep ascending upwards into until Day 7 unlike the mixed case I have right now. The code I used is as follows:
Data and code
growth <- data.frame(
stringsAsFactors = FALSE,
day = c("Day 0","Day 1","Day 2",
"Day 3","Day 4","Day 5","Day 6","Day 7"),
wild_type = c(6, 9.8, 69.53, 84.67, 99.33, 145.33, 147.33, 121.8),
t7_cas9 = c(6, 8.57, 68.83, 85.5, 98.25, 144.67, 137.5, 120.5),
ip6k = c(6, 6.5, 49.67, 56, 70.5, 127.5, 123.67, 111.33)
)
data_ggp <- data.frame(x = growth$day,
y = c(growth$wild_type, growth$t7_cas9, growth$ip6k),
group = c(rep("Wild_Type", nrow(growth)),
rep("T7_Cas9", nrow(growth)),
rep("IP6K-+", nrow(growth))))
ggp <- ggplot(data_ggp, aes(x, y, col = group, group = 1)) +
geom_line()
ggp
p1 <- ggp + facet_grid(group ~ .)
p1
However, what I would like to have is:
Are you looking for such a solution:
library(tidyverse)
df %>%
pivot_longer(-Day) %>%
ggplot(aes(x = Day, y = value, group=name, color = name))+
geom_line(size=1)
AND with facet
library(tidyverse)
df %>%
pivot_longer(-Day) %>%
ggplot(aes(x = Day, y = value, group=name, color = name))+
geom_line(size=1)+
facet_grid(name ~ .)
data:
df <- structure(list(Day = c("Day 0", "Day 1", "Day 2", "Day 3", "Day 4",
"Day 5", "Day 6", "Day 7"), Wild_Type = c(6, 9.8, 69.53, 84.67,
99.33, 145.33, 147.33, 121.8), T7_Cas9 = c(6, 8.57, 68.83, 85.5,
98.25, 144.67, 137.5, 120.5), IP6K = c(6, 6.5, 49.67, 56, 70.5,
127.5, 123.67, 111.33)), class = "data.frame", row.names = c(NA,
-8L))
Try:
scale_y_continuous(breaks = seq(1, 7, 1), limits = c(0, 7), labels = c())
I guess you could play around with the labels argument; also not sure about your data, but some transformations (eg log) may help to separate the data better!
Related
I hope I asked my question in the right way this time! If not let me know!
I want to code a grouped bar-chart similary to this one (I just created in paint):
enter image description here
I created as flipped both it actually doesn't matter if its flipped or not. So, a plot similarly to this will also be very usefull:
Grouped barchart in r with 4 variables
Both the variables, happy and lifesatisfied are scaled values from 0 to 10. Working hours is a grouped value and contains 43+, 37-42, 33-36, 27-32, and <27.
A very similar example of how my data set looks like (I just changed the values and order, I also have much more observations):
Working hours
happy
lifestatisfied
contry
37-42
7
9
DK
<27
8
8
SE
43+
7
8
DK
33-36
6
6
SE
37-42
7
5
NO
<27
4
7
NO
I tried to found similar examples and based on that tried to code the bar chart in the following way but it doesn't work:
df2 <- datafilteredwomen %>%
pivot_longer(cols = c("happy", "stflife"), names_to = "var", values_to = "Percentage")
ggplot(df2) +
geom_bar(aes(x = Percentage, y = workinghours, fill = var ), stat = "identity", position = "dodge") + theme_minimal()
It give this plot which is not correct/what I want:
enter image description here
seocnd try:
forplot = datafilteredwomen %>% group_by(workinghours, happy, stflife) %>% summarise(count = n()) %>% mutate(proportion = count/sum(count))
ggplot(forplot, aes(workinghours, proportion, fill = as.factor(happy))) +
geom_bar(position = "dodge", stat = "identity", color = "black")
gives this plot:
enter image description here
third try - used the ggplot2 builder add-in:
library(dplyr)
library(ggplot2)
datafilteredwomen %>%
filter(!is.na(workinghours)) %>%
ggplot() +
aes(x = workinghours, group = happy, weight = happy) +
geom_bar(position = "dodge",
fill = "#112446") +
theme_classic() + scale_y_continuous(labels = scales::percent)
gives this plot:
enter image description here
But none of my tries are what I want.. really hope that someone can help me if it's possible!
After speaking to the OP I found his data source and came up with this solution. Apologies if it's a bit messy, I have only been using R for 6 months. For ease of reproducibility I have preselected the variables used from the original dataset.
data <- structure(list(wkhtot = c(40, 8, 50, 40, 40, 50, 39, 48, 45,
16, 45, 45, 52, 45, 50, 37, 50, 7, 37, 36), happy = c(7, 8, 10,
10, 7, 7, 7, 6, 8, 10, 8, 10, 9, 6, 9, 9, 8, 8, 9, 7), stflife = c(8,
8, 10, 10, 7, 7, 8, 6, 8, 10, 9, 10, 9, 5, 9, 9, 8, 8, 7, 7)), row.names = c(NA,
-20L), class = c("tbl_df", "tbl", "data.frame"))
Here are the packages required.
require(dplyr)
require(ggplot2)
require(tidyverse)
Here I have manipulated the data and commented my reasoning.
data <- data %>%
select(wkhtot, happy, stflife) %>% #Select the wanted variables
rename(Happy = happy) %>% #Rename for graphical sake
rename("Life Satisfied" = stflife) %>%
na.omit() %>% # remove NA values
group_by(WorkingHours = cut(wkhtot, c(-Inf, 27, 32,36,42,Inf))) %>% #Create the ranges
select(WorkingHours, Happy, "Life Satisfied") %>% #Select the variables again
pivot_longer(cols = c(`Happy`, `Life Satisfied`), names_to = "Criterion", values_to = "score") %>% # pivot the df longer for plotting
group_by(WorkingHours, Criterion)
data$Criterion <- as.factor(data$Criterion) #Make criterion a factor for graphical reasons
A bit more data prep
# Creating the percentage
data.plot <- data %>%
group_by(WorkingHours, Criterion) %>%
summarise_all(sum) %>% # get the sums for score by working hours and criterion
group_by(WorkingHours) %>%
mutate(tot = sum(score)) %>%
mutate(freq =round(score/tot *100, digits = 2)) # get percentage
Creating the plot.
# Plotting
ggplot(data.plot, aes(x = WorkingHours, y = freq, fill = Criterion)) +
geom_col(position = "dodge") +
geom_text(aes(label = freq),
position = position_dodge(width = 0.9),
vjust = 1) +
xlab("Working Hours") +
ylab("Percentage")
Please let me know if there is a more concise or easier way!!
B
DataSource: https://www.europeansocialsurvey.org/downloadwizard/?fbclid=IwAR2aVr3kuqOoy4mqa978yEM1sPEzOaghzCrLCHcsc5gmYkdAyYvGPJMdRp4
Taking this example dataframe df:
df <- structure(list(Working.hours = c("37-42", "37-42", "<27", "<27",
"43+", "43+", "33-36", "33-36", "37-42", "37-42", "<27", "<27"
), country = c("DK", "DK", "SE", "SE", "DK", "DK", "SE", "SE",
"NO", "NO", "NO", "NO"), criterion = c("happy", "lifesatisfied",
"happy", "lifesatisfied", "happy", "lifesatisfied", "happy",
"lifesatisfied", "happy", "lifesatisfied", "happy", "lifesatisfied"
), score = c(7L, 9L, 8L, 8L, 7L, 8L, 6L, 6L, 7L, 5L, 4L, 7L)), row.names = c(NA,
-12L), class = c("tbl_df", "tbl", "data.frame"))
you can proceed like this:
library(dplyr)
library(ggplot2)
df <-
df %>%
pivot_longer(cols = c(happy, lifesatisfied),
names_to = 'criterion',
values_to = 'score'
)
df %>%
ggplot(aes(x = Working.hours,
y = score,
fill = criterion)) +
geom_col(position = 'dodge') +
coord_flip()
For picking colours see ?scale_fill_manual, for formatting legend etc. numerous existing answers to related questions on stackoverflow.
I have the values of the error bars, and I want to specify the values in "ggpubr". It seems like the add and error.plot functions have a lot of possibilities (e.g., "mean_sd"), but I couldn't find anything that will allow me to specify the values myself. I also tried geom_errorbar, but it doesn't work properly. I know next time I will use ggplot2 for flexibility.
example code -
df <- data.frame(stringsAsFactors = FALSE, "pse" = c(40, 42, 41, 40, 60, 61, 62, 60, 39, 38, 40, 39, 59, 58, 60, 59 ))
df[1:4,2]="30 cm"
df[5:8,2]="60 cm"
df[9:12,2]="30 cm"
df[13:16,2]="60 cm"
df[1:8,3] = "3.5 cm"
df[9:16,3] = "6.5 cm"
colnames(df)[2]="Size"
colnames(df)[3]="Distance"
my_comparisons <- list( c("Near", "Far"))
ggbarplot(df, x = "Size", y = "pse", fill ="Distance", color = "Distance", ylim=c(25,75), width = 0.6, add = c("mean_se", "jitter"), palette = c("#000000", "#111111"),
position = position_dodge(0.65))+
theme(legend.position = "top")+ theme_bw() + theme(axis.text=element_text(size=14),axis.title=element_text(size=14))+ scale_fill_grey(start=0.8, end=0.95)+ theme(legend.position = "top")+ ylab ("PSE (mm)")[![enter image description here][1]][1]
1: https://i.stack.imgur.com/AlrKa.jpg
library(ggpubr)
df <- data.frame(stringsAsFactors = FALSE, "pse" = c(40, 42, 41, 40, 60, 61, 62, 60, 39, 38, 40, 39, 59, 58, 60, 59 ))
df[1:4,2]="30 cm"
df[5:8,2]="60 cm"
df[9:12,2]="30 cm"
df[13:16,2]="60 cm"
df[1:8,3] = "3.5 cm"
df[9:16,3] = "6.5 cm"
colnames(df)[2]="Size"
colnames(df)[3]="Distance"
mean_30_3.5 <- mean(df$pse[df$Size == "30 cm" & df$Distance == "3.5 cm"])
mean_30_6.5 <- mean(df$pse[df$Size == "30 cm" & df$Distance == "6.5 cm"])
mean_60_3.5 <- mean(df$pse[df$Size == "60 cm" & df$Distance == "3.5 cm"])
mean_60_6.5 <- mean(df$pse[df$Size == "60 cm" & df$Distance == "6.5 cm"])
my_comparisons <- list( c("Near", "Far"))
ggbarplot(df, x = "Size", y = "pse", fill ="Distance",color = "Distance", ylim=c(25,75),label = F, width = 0.6, add = c("mean_se", "jitter"),
palette = c("#000000", "#111111"),
position = position_dodge(0.65))+
theme(legend.position = "top")+ theme_bw() + theme(axis.text=element_text(size=14),axis.title=element_text(size=14))+
scale_fill_grey(start=0.8, end=0.95)+
theme(legend.position = "top")+ ylab ("PSE (mm)") +
annotate("text", x = 0.85, y = mean_30_3.5 + 3, label = "your_value1")+
annotate("text", x = 1.15, y = mean_30_6.5 + 3, label = "your_value2")+
annotate("text", x = 1.85, y = mean_60_3.5 + 3, label = "your_value3")+
annotate("text", x = 2.15, y = mean_60_6.5 + 3, label = "your_value4")
Did you mean something like this?:
Thank you!
I have also found a different solution. Sharing it here.
data_summary <- function(data, varname, groupnames){
require(plyr)
summary_func <- function(x, col){
c(mean = mean(x[[col]], na.rm=TRUE),
sd = sd(x[[col]], na.rm=TRUE))
}
data_sum<-ddply(data, groupnames, .fun=summary_func,
varname)
data_sum <- rename(data_sum, c("mean" = varname))
return(data_sum)
}
next
df2 <- data_summary(x, varname="PSE",
groupnames=c("Size", "Distance"))
df2$Size=as.factor(df2$Size)
my_comparisons <- list( c("Near", "Far"))
ggbarplot(x, x = "Size", y = "PSE", fill ="Distance", color = "Distance",
ylim=c(25,75), width = 0.6, add = c( "mean", "jitter"), palette = c("#000000",
"#111111"),
position = position_dodge(0.65))+ theme_bw()
+theme(axis.text=element_text(size=14),axis.title=element_text(size=14))+
scale_fill_grey(start=0.8, end=0.95)+ theme(legend.position = "top")+ ylab ("PSE
(mm)")+geom_errorbar(data=df2, mapping=aes(x=Size, y=PSE, color=Distance, ymin=PSE-
0.32, ymax=PSE+0.32), width=.15, position=position_dodge(.6))
Following is the dataframe for which I want to create a grouped barplot
df <- structure(list(Race = c("Caucasian/White", "African American", "Asian", "Other"), 'Hospital 1' = c(374, 820, 31, 108), 'Hospital 2' = c(291, 311, 5, 15), 'Hospital 3' = c(330, 206, 6, 5), 'Hospital 4' = c(950, 341, 6, 13)), class = "data.frame", row.names = c(NA, -4L))
To be precise, I want to group each Hospital according to 'Race'. Each hospital bars should be represented as percentages with their corresponding value labels.
Not a programmer basically, but trying to learn.
You probably want something like this:
df %>%
pivot_longer(contains("Hospital"), names_to = "hospital", values_to = "count") %>%
group_by(hospital) %>%
mutate(percent = count/sum(count)) %>%
ggplot() +
aes(x = hospital, y = percent, fill = Race) +
geom_col(position = "stack")
I'm having trouble getting my plot to display dates (ie. 23/01) instead of weekday names (ie. Thu). My dataset consists of dates and measurements of bat activity. I've set the 'Dates' column of my data as as.Date in the format "%d.%m.%y" and whenever I plot my graph I get weekday names instead of dates.
My code looks like this:
rdate<-as.Date(df,"%d.%m.%Y")
plot(df$Afromontane)
My plot ends up looking like this (below). It's all fine except I'd like the weekday names to be dates in the format (d/m).
df looks like this:
structure(list(Date = c("23.01.20", "24.01.20", "25.01.20", "26.01.20",
"27.01.20", "28.01.20", "29.01.20"), Afromontane = c(13.67, 0,
0, 1.67, 3.67, 22, 3.33), Milkwood = c(8.33, 3.67, 8, 8.33, 4.33,
6.33, 1)), row.names = c(NA, -7L), class = c("tbl_df", "tbl",
"data.frame"))
A minimal example using ggplot2:
library(ggplot2)
df = data.frame(date = sample(seq(as.Date('2001/01/01'), as.Date('2003/01/01'), by="day"), 10), x = runif(10, 1, 10))
df$shortdate <- format(df$date, format="%m-%d")
ggplot(df, aes(x = shortdate, y = x)) +
geom_point()
Alternatively, using base R:
df = data.frame(date = sample(seq(as.Date('2001/01/01'), as.Date('2003/01/01'), by="day"), 10), x = runif(10, 1, 10))
plot(as.Date(df$date), df$x,xaxt = "n", type = "p")
axis(1, df$date, format(df$date, "%m-%d"))
I frequently have to produce stacked bar plots with labels. The way I've been coding the labels is very time intensive and I wondered if there was a way to code things more efficiently. I would like the labels to be centered on each section of the bars. I'd prefer base R solutions.
stemdata <- structure(list( #had to round some nums below for 100% bar
A = c(7, 17, 76),
B = c(14, 10, 76),
C = c( 14, 17, 69),
D = c( 4, 10, 86),
E = c( 7, 17, 76),
F = c(4, 10, 86)),
.Names = c("Food, travel, accommodations, and procedures",
"Travel itinerary and dates",
"Location of the STEM Tour stops",
"Interactions with presenters/guides",
"Duration of each STEM Tour stop",
"Overall quality of the STEM Tour"
),
class = "data.frame",
row.names = c(NA, -3L)) #4L=number of numbers in each letter vector#
# attach(stemdata)
print(stemdata)
par(mar=c(0, 19, 1, 2.1)) # this sets margins to allow long labels
barplot(as.matrix(stemdata),
beside = F, ylim = range(0, 10), xlim = range(0, 100),
horiz = T, col=colors, main="N=29",
border=F, las=1, xaxt='n', width = 1.03)
text(7, 2, "14%")
text(19, 2, "10%")
text(62, 2, "76%")
text(7, 3.2, "14%")
text(22.5, 3.2, "17%")
text(65.5, 3.2, "69%")
text(8, 4.4, "10%")
text(55, 4.4, "86%")
text(3.5, 5.6, "7%")
text(15, 5.6, "17%")
text(62, 5.6, "76%")
text(9, 6.9, "10%")
text(55, 6.9, "86%")
Staying base R as OP requested, we can easily automate the inner label positioning (i.e. x coordinates) within a small function.
xFun <- function(x) x/2 + c(0, cumsum(x)[-length(x)])
Now, it's good to know that barplot invisibly trows the y coordinates, we can catch them by assignment (here byc <- barplot(.)).
Eventually, just assemble coordinates and labels in data frame labs and "loop" through the text calls in a sapply. (Use col="white" or col=0 for white labels as wished in the other question.)
# barplot
colors <- c("gold", "orange", "red")
par(mar=c(2, 19, 4, 2) + 0.1) # expand margins
byc <- barplot(as.matrix(stemdata), horiz=TRUE, col=colors, main="N=29", # assign `byc`
border=FALSE, las=1, xaxt='n')
# labels
labs <- data.frame(x=as.vector(sapply(stemdata, xFun)), # apply `xFun` here
y=rep(byc, each=nrow(stemdata)), # use `byc` here
labels=as.vector(apply(stemdata, 1:2, paste0, "%")),
stringsAsFactors=FALSE)
invisible(sapply(seq(nrow(labs)), function(x) # `invisible` prevents unneeded console output
text(x=labs[x, 1:2], labels=labs[x, 3], cex=.9, font=2, col=0)))
# legend (set `xpd=TRUE` to plot beyond margins!)
legend(-55, 8.5, legend=c("Medium","High", "Very High"), col=colors, pch=15, xpd=TRUE)
par(mar=c(5, 4, 4, 2) + 0.1) # finally better reset par to default
Result
Data
stemdata <- structure(list(`Food, travel, accommodations, and procedures` = c(7,
17, 76), `Travel itinerary and dates` = c(14, 10, 76), `Location of the STEM Tour stops` = c(14,
17, 69), `Interactions with presenters/guides` = c(4, 10, 86),
`Duration of each STEM Tour stop` = c(7, 17, 76), `Overall quality of the STEM Tour` = c(4,
10, 86)), class = "data.frame", row.names = c(NA, -3L))
Would you consider a tidyverse solution?
library(tidyverse) # for dplyr, tidyr, tibble & ggplot2
stemdata %>%
rownames_to_column(var = "id") %>%
gather(Var, Val, -id) %>%
group_by(Var) %>%
mutate(id = factor(id, levels = 3:1)) %>%
ggplot(aes(Var, Val)) +
geom_col(aes(fill = id)) +
coord_flip() +
geom_text(aes(label = paste0(Val, "%")),
position = position_stack(0.5))
Result: