How to make a lineplot with specific values out of a dataframe - r

I have a df as follow:
Variable Value
G1_temp_0 37.9
G1_temp_5 37.95333333
G1_temp_10 37.98333333
G1_temp_15 38.18666667
G1_temp_20 38.30526316
G1_temp_25 38.33529412
G1_mean_Q1 38.03666667
G1_mean_Q2 38.08666667
G1_mean_Q3 38.01
G1_mean_Q4 38.2
G2_temp_0 37.9
G2_temp_5 37.95333333
G2_temp_10 37.98333333
G2_temp_15 38.18666667
G2_temp_20 38.30526316
G2_temp_25 38.33529412
G2_mean_Q1 38.53666667
G2_mean_Q2 38.68666667
G2_mean_Q3 38.61
G2_mean_Q4 38.71
I like to make a lineplot with two lines which reflects the values "G1_mean_Q1 - G1_mean_Q4" and "G2_mean_Q1 - G2_mean_Q4"
In the end it should more or less look like this, the x axis should represent the different variables:
The main problem I have is, how to get a basic line plot with this df.
I've tried something like this,
ggplot(df, aes(x = c(1:4), y = Value) + geom_line()
but I have always some errors. It would be great if someone could help me. Thanks

Please post your data with dput(data) next time. it makes it easier to read your data into R.
You need to tell ggplot which are the groups. You can do this with aes(group = Sample). For this purpose, you need to restructure your dataframe a bit and separate the Variable into different columns.
library(tidyverse)
dat <- structure(list(Variable = structure(c(5L, 10L, 6L, 7L, 8L, 9L,
1L, 2L, 3L, 4L, 15L, 20L, 16L, 17L, 18L, 19L, 11L, 12L, 13L,
14L), .Label = c("G1_mean_Q1", "G1_mean_Q2", "G1_mean_Q3", "G1_mean_Q4",
"G1_temp_0", "G1_temp_10", "G1_temp_15", "G1_temp_20", "G1_temp_25",
"G1_temp_5", "G2_mean_Q1", "G2_mean_Q2", "G2_mean_Q3", "G2_mean_Q4",
"G2_temp_0", "G2_temp_10", "G2_temp_15", "G2_temp_20", "G2_temp_25",
"G2_temp_5"), class = "factor"), Value = c(37.9, 37.95333333,
37.98333333, 38.18666667, 38.30526316, 38.33529412, 38.03666667,
38.08666667, 38.01, 38.2, 37.9, 37.95333333, 37.98333333, 38.18666667,
38.30526316, 38.33529412, 38.53666667, 38.68666667, 38.61, 38.71
)), class = "data.frame", row.names = c(NA, -20L))
dat <- dat %>%
filter(str_detect(Variable, "mean")) %>%
separate(Variable, into = c("Sample", "mean", "time"), sep = "_")
g <- ggplot(data=dat, aes(x=time, y=Value, group=Sample)) +
geom_line(aes(colour=Sample))
g
Created on 2020-07-20 by the reprex package (v0.3.0)

Related

How to part diverging bar plots in R

Hi I am relatively new in R / ggplot2 and I would like to ask for some advice on how to create a plot that looks like this:
Explanation: A diverging bar plot showing biological functions with genes that have increased expression (yellow) pointing towards the right, as well as genes with reduced expression (purple) pointing towards the left. The length of the bars represent the number of differentially expressed genes, and color intensity vary according to their p-values.
Note that the x-axis must be 'positive' in both directions.
(In published literature on gene expression experimental studies, bars that point towards the left represent genes that have reduced expression, and right to show genes that have increased expression. The purpose of the graph is not to show the "magnitude" of change (which would give rise to positive and negative values). Instead, we are trying to plot the NUMBER of genes that have changes of expression, therefore cannot be negative)
I have tried ggplot2 but fails completely to reproduce the graph that is shown.
Here is the data which I am trying to plot: Click here for link
> dput(sample)
structure(list(Name = structure(c(15L, 19L, 5L, 11L, 8L, 6L,
16L, 13L, 17L, 1L, 3L, 2L, 14L, 18L, 7L, 12L, 10L, 9L, 4L, 20L
), .Label = c("Actin synthesis", "Adaptive immunity", "Antigen presentation",
"Autophagy", "Cell cycle", "Cell division", "Cell polarity",
"DNA repair", "Eye development", "Lipid metabolism", "Phosphorylation",
"Protein metabolism", "Protein translation", "Proteolysis", "Replication",
"Signaling", "Sumoylation", "Trafficking", "Transcription", "Translational initiation"
), class = "factor"), Trend_in_AE = structure(c(2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
), .Label = c("Down", "Up"), class = "factor"), Count = c(171L,
201L, 38L, 63L, 63L, 47L, 22L, 33L, 20L, 16L, 16L, 7L, 10L, 4L,
13L, 15L, 5L, 7L, 9L, 7L), PValue = c(1.38e-08, 1.22e-06, 1.79e-06,
2.89e-06, 0.000122, 0.000123, 0.00036, 0.000682, 0.001030253,
0.001623939, 7.76e-05, 0.000149, 0.000734, 0.001307039, 0.00292414,
0.003347556, 0.00360096, 0.004006781, 0.007330264, 0.010083734
)), .Names = c("Name", "Trend_in_AE", "Count", "PValue"), class = "data.frame", row.names = c(NA,
-20L))
Thank you very much for your help and suggestions, this is really help with my learning.
My own humble attempt was this:
table <- read.delim("file.txt", header = T, sep = "\t")
library(ggplot2)
ggplot(aes(x=Number, y=Names)) +
geom_bar(stat="identity",position="identity") +
xlab("number of genes") +
ylab("Name"))
Result was error message regarding the aes
Although not exactly what you are looking for, but the following should get you started. #Genoa, as the expression goes, "there are no free lunches". So in this spirit, like #dww has rightly pointed out, show "some effort"!
# create dummy data
df <- data.frame(x = letters,y = runif(26))
# compute normalized occurence for letter
df$normalize_occurence <- round((df$y - mean(df$y))/sd(df$y), 2)
# categorise the occurence
df$category<- ifelse(df$normalize_occurence >0, "high","low")
# check summary statistic
summary(df)
x y normalize_occurence
a : 1 Min. :0.00394 Min. :-1.8000000
b : 1 1st Qu.:0.31010 1st Qu.:-0.6900000
c : 1 Median :0.47881 Median :-0.0800000
d : 1 Mean :0.50126 Mean : 0.0007692
e : 1 3rd Qu.:0.70286 3rd Qu.: 0.7325000
f : 1 Max. :0.93091 Max. : 1.5600000
(Other):20
category
Length:26
Class :character
Mode :character
ggplot(df,aes(x = x,y = normalize_occurence)) +
geom_bar(aes(fill = category),stat = "identity") +
labs(title= "Diverging Bars")+
coord_flip()
#ddw and #Ashish are right - there's a lot in this question. It's also not clear how ggplot "failed" in reproducing the figure, and that would help understand what you're struggling with.
The key to ggplot is that pretty much everything that you want to include in the plotting should be included in the data. Adding a few variables to your table to help with putting bars in the right direction will get you a long way toward what you want. Make the variables that are actually negative ("down" values) negative, and they'll plot that way:
r_sample$Count2 <- ifelse(r_sample$Trend_in_AE=="Down",r_sample$Count*-1,r_sample$Count)
r_sample$PValue2 <- ifelse(r_sample$Trend_in_AE=="Down",r_sample$PValue*-1,r_sample$PValue)
Then reorder your "Name" so that it plots according to the new PValue2 variable:
r_sample$Name <- factor(r_sample$Name, r_sample$Name[order(r_sample$PValue2)], ordered=T)
Lastly, you'll want to left-justify some labels and right-justify others, so make that a variable now:
r_sample$just <- ifelse(r_sample$Trend_in_AE=="Down",0,1)
Then some fairly minimal plot code gets you quite close to what you want:
ggplot(r_sample, aes(x=Name, y=Count2, fill=PValue2)) +
geom_bar(stat="identity") +
scale_y_continuous("Number of Differently Regulated Genes", position="top", limits=c(-100,225), labels=c(100,0,100,200)) +
scale_x_discrete("", labels=NULL) +
scale_fill_gradient2(low="blue", mid="light grey", high="yellow", midpoint=0) +
coord_flip() +
theme_minimal() +
geom_text(aes(x=Name, y=0, label=Name), hjust=r_sample$just)
You can explore the theme commands on the ggplot2 help page to figure out the rest of the formatting.

geom_smooth does not plot a line for my data frame

I have a dataframe with the following data
my2016.regression.dataframe <- structure(list(Economy_Directorate = structure(c(9L, 1L, 18L,
11L, 5L, 7L), .Label = c("20128895", "25392278", "26802176",
"33214069", "34194316", "34863777", "34867843", "36497785", "37280694",
"37411816", "44460126", "45484123", "47463441", "48354697", "57954259",
"60187650", "65135916", "67317188"), class = "factor"), People_Directorate = structure(c(12L,
14L, 17L, 16L, 13L, 15L), .Label = c("20128895", "25392278",
"26802176", "33214069", "34194316", "34863777", "34867843", "36497785",
"37280694", "37411816", "44460126", "45484123", "47463441", "48354697",
"57954259", "60187650", "65135916", "67317188"), class = "factor")), .Names = c("Economy_Directorate",
"People_Directorate"), row.names = c(NA, -6L), class = "data.frame")
I used the following code to plot it. it plotts the points, but it does not plot the lm .
Could you help me why it does not plot the the lm in the geom_smooth
library(ggplot2)
ggplot(data =my2016.regression.dataframe )+
geom_point(aes(y=Economy_Directorate,x=People_Directorate))+
geom_smooth(method = "lm",aes(y=Economy_Directorate,x=People_Directorate),
fill="orange",colour="red")
Regards,
You need to convert your columns to numeric types. They are currently factors:
my2016.regression.dataframe$Economy_Directorate = as.numeric(as.character(my2016.regression.dataframe$Economy_Directorate))
my2016.regression.dataframe$People_Directorate = as.numeric(as.character(my2016.regression.dataframe$People_Directorate))
ggplot(data = my2016.regression.dataframe) +
geom_point(aes(y=Economy_Directorate,x=People_Directorate))+
geom_smooth(method = "lm",aes(y=Economy_Directorate,x=People_Directorate),
fill="orange",colour="red")

Plotting multiple lines in R

I'm pretty new to R so I don't really know what I'm doing. Anyway, I have data in this format in excel (as a csv file):
dt <- data.frame(species = rep(c("a", "b", "c"), each = 4),
cover = rep(1:3, times = 4),
depth = rep(c(15, 30, 60, 90), times = 3),
stringsAsFactors = FALSE)
I want to plot a graph of cover against depth, with a different coloured line for each species, and a key for which species is which colour. I don't even know where to start.
Sorry if something similar has been asked before. Any help would be much appreciated!
Don't know if this is in a helpful format but here's some of the actual data, I need to read more about dput I think:
structure(list(species = structure(c(1L, 1L, 2L, 2L, 3L, 3L,
4L, 4L, 5L, 5L, 6L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 10L, 10L, 11L,
11L), .Label = c("Agaricia fragilis", "bryozoan", "Dichocoenia stokesi",
"Diploria labyrinthiformis", "Diploria strigosa", "Madracis decactis",
"Manicina", "Montastrea cavernosa", "Orbicella franksi", "Porites asteroides",
"Siderastrea radians"), class = "factor"), cover = c(0.021212121,
0.04047619, 0, 0, 0, 0, 1.266666667, 4.269047619, 3.587878788,
3.25, 0.118181818, 0.152380952, 0, 0.007142857, 3.806060606,
2.983333333, 14.13030303, 15.76190476, 0.415151515, 0.2, 0.26969697,
0.135714286), depth = c(30L, 15L, 30L, 15L, 30L, 15L, 30L, 15L,
30L, 15L, 30L, 15L, 30L, 15L, 30L, 15L, 30L, 15L, 30L, 15L, 30L,
15L)), .Names = c("species", "cover", "depth"), row.names = c(NA,
22L), class = "data.frame")
Here is a solution using the ggplot2 package.
# Load packages
library(ggplot2)
# Create example data frame based on the original example the OP provided
dt <- data.frame(species = rep(c("a", "b", "c"), each = 4),
cover = rep(1:3, times = 4),
depth = rep(c(15, 30, 60, 90), times = 3),
stringsAsFactors = FALSE)
# Plot the data
ggplot(dt, aes(x = depth, y = cover, group = species, colour = species)) +
geom_line()
This should get you going!
df1 <- read.csv("//file_location.csv", headers=T)
library(dplyr)
df1 <- df1 %>% select(species, depth) %>% group_by(species) %>%
summarise(mean(depth)
library(ggplot2)
ggplot(df1, aes(x=depth, y=species, group=species, color=species) +
geom_line()

wrong linking point with lines in ggplot

I don't know what I'm missing but I cannot figure out a very simple task. This is a small piece of my dataframe:
dput(df)
structure(list(ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L), .Label = "SOU55", class = "factor"), Depth = c(2L, 4L,
6L, 8L, 10L, 12L, 14L, 16L, 18L, 20L), Value = c(211.8329815,
278.9603866, 255.6111086, 212.6163368, 193.7281895, 200.9584658,
160.9289157, 192.0664419, 174.5951019, 7.162682425)), .Names = c("ID",
"Depth", "Value"), class = "data.frame", row.names = c(NA, -10L
))
What I'm trying to do is simply plotting Depth versus Value with ggplot, this is the simple code:
ggplot(df, aes(Value, Depth))+
geom_point()+
geom_line()
and this the result:
But it is pretty different from what I really want. This is the plot made with Libreoffice:
It seems that ggplot doesn't link correctly the values. What am I doing wrong?
Thanks to all!
You need geom_path() to connect the observations in the original order. geom_line() sorts the data according to the x-aesthetic before plotting:
ggplot(df, aes(Value, Depth))+
geom_point()+
geom_path()

ggplot: better presentation of barplot

I have a small data frame DF which consists of two columns X=Type, Y=Cost. want to graph a barplot for each type with its cost. I have managed to do that, however, I'm seeking a better presentation in barplot. I have three issues which I think will satisfy my requirements:
1) Since the X-axis text for each type is long, I made them with 45 degree. I tried abbreviation, it was unreadable !!!
2) Instead of the color, I was trying to use filling patterns/texture in ggplot, which turns out not possible by Hadley : fill patterns
Is there any way to make the plot readable in case of black/white printing ?
3) I'm wondering if there is a way to focus on one of the "Type" categories i.e. make it bold and special color to attract the eye to this special type. For example, I want to make the "other" result looks different from other.
I tried my thoughts, however, I'm totally open to re-design the graph. Any suggestions
Here is the data- I have used dput command:
structure(list(Type = structure(c(6L, 8L, 7L, 9L, 10L, 15L, 11L,
17L, 3L, 16L, 5L, 19L, 4L, 14L, 2L, 18L, 13L, 1L, 12L), .Label = c("Backup Hardware ",
"data or network control", "Email exchange server/policy", "Instant messaging control",
"Login/system administrators/privilage", "Machine A", "Machine A with Software and Camera",
"Machine A without Software", "Machine B", "Machine B without RAM and CD ROM",
"Managment analyses software ", "Other", "Password and security",
"public web application availability", "Software for backup",
"System access by employees ", "Telecom and Harware", "Web site update",
"wireless network access ponits"), class = "factor"), Cost = structure(c(4L,
3L, 15L, 13L, 11L, 7L, 2L, 1L, 19L, 16L, 14L, 12L, 10L, 9L, 8L,
6L, 5L, 17L, 18L), .Label = c("$1,292,312", "$1,888,810", "$11,117,200",
"$14,391,580", "$161,210", "$182,500", "$2,145,900", "$250,000",
"$270,500", "$298,810", "$3,452,010", "$449,001", "$6,034,000",
"$621,710", "$7,642,660", "$700,000", "$85,100", "$885,000",
"$923,700"), class = "factor")), .Names = c("Type", "Cost"), class = "data.frame", row.names = c(NA,
-19L))
here is my code in R:
p<- ggplot(data = DF, aes(x=Type, y=Cost)) +
geom_bar(aes(fill=Type),stat="identity") +
geom_line()+
scale_x_discrete (name="Type")+
scale_y_discrete(name="Cost")+
theme(axis.text.x = element_text(colour="black",size=11,face="bold")) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))+
labs(fill=expression(paste("Type of study\n")))
print(p)
Here are some starting points for your plot:
1) First, converted variable Cost from factor to numeric and named it Cost2
DF$Cost2<-as.numeric(gsub("[^0-9]", "",DF$Cost))
2) Converted your plot to grey scale using scale_fill_manual() - here all bars are grey except bar for Other that is black. With scale_y_continuous() made y axis values again as dollars with labels=dollar (for this you need to add library scales). To make Other label of x axis bold while others are normal you should provide argument face= inside theme() axis.text.x= with vector of the same length as number of levels - 1 for all levels and 2 for level Other.
library(scales)
ggplot(data = DF, aes(x=Type, y=Cost2)) +
geom_bar(aes(fill=Type),stat="identity",show_guide=FALSE) +
theme(axis.text.x = element_text(angle = 45, hjust = 1,
face=(as.numeric(levels(DF$Type)=="Other") + 1)))+
scale_y_continuous(labels=dollar)+
scale_fill_manual(values=c("grey43","black")[as.numeric(levels(DF$Type)=="Other")+1])

Resources