I'm pretty new to R so I don't really know what I'm doing. Anyway, I have data in this format in excel (as a csv file):
dt <- data.frame(species = rep(c("a", "b", "c"), each = 4),
cover = rep(1:3, times = 4),
depth = rep(c(15, 30, 60, 90), times = 3),
stringsAsFactors = FALSE)
I want to plot a graph of cover against depth, with a different coloured line for each species, and a key for which species is which colour. I don't even know where to start.
Sorry if something similar has been asked before. Any help would be much appreciated!
Don't know if this is in a helpful format but here's some of the actual data, I need to read more about dput I think:
structure(list(species = structure(c(1L, 1L, 2L, 2L, 3L, 3L,
4L, 4L, 5L, 5L, 6L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 10L, 10L, 11L,
11L), .Label = c("Agaricia fragilis", "bryozoan", "Dichocoenia stokesi",
"Diploria labyrinthiformis", "Diploria strigosa", "Madracis decactis",
"Manicina", "Montastrea cavernosa", "Orbicella franksi", "Porites asteroides",
"Siderastrea radians"), class = "factor"), cover = c(0.021212121,
0.04047619, 0, 0, 0, 0, 1.266666667, 4.269047619, 3.587878788,
3.25, 0.118181818, 0.152380952, 0, 0.007142857, 3.806060606,
2.983333333, 14.13030303, 15.76190476, 0.415151515, 0.2, 0.26969697,
0.135714286), depth = c(30L, 15L, 30L, 15L, 30L, 15L, 30L, 15L,
30L, 15L, 30L, 15L, 30L, 15L, 30L, 15L, 30L, 15L, 30L, 15L, 30L,
15L)), .Names = c("species", "cover", "depth"), row.names = c(NA,
22L), class = "data.frame")
Here is a solution using the ggplot2 package.
# Load packages
library(ggplot2)
# Create example data frame based on the original example the OP provided
dt <- data.frame(species = rep(c("a", "b", "c"), each = 4),
cover = rep(1:3, times = 4),
depth = rep(c(15, 30, 60, 90), times = 3),
stringsAsFactors = FALSE)
# Plot the data
ggplot(dt, aes(x = depth, y = cover, group = species, colour = species)) +
geom_line()
This should get you going!
df1 <- read.csv("//file_location.csv", headers=T)
library(dplyr)
df1 <- df1 %>% select(species, depth) %>% group_by(species) %>%
summarise(mean(depth)
library(ggplot2)
ggplot(df1, aes(x=depth, y=species, group=species, color=species) +
geom_line()
Related
I'm new to R and I'm trying to define a function in R where I call another function already in a R package (pgls and sma). I'm not sure how to do it or even if it is possible.
I have tried the following:
For pgls
getpgls <- function(P1, P2, dataf){
PGLSt <- pgls(log(P1)~log(P2), data = dataf, lambda = 'ML')
}
When I call the function:
getpgls(sym('Long'), sym('massAvg'), CompData)
I get:
Error in log(P1) : non-numeric argument to mathematical function
Something similar happens with the sma function:
getsma <- function(P1, P2, dataf){
SMAt <- sma(P1~P2,
log = "xy",
data = dataf,
)
}
when I call the function:
getsma(sym('Long'), sym('massAvg'), Data_Animal_de_pd)
I get the following error:
Error in model.frame.default(formula = P1 ~ P2, data = dataf, drop.unused.levels = TRUE) :
object is not a matrix
When I run both pgls and sma with the same argumerts, but outside the function, it runs just fine.
ie.
Long.SMA <- sma(Long~massAvg,
log = "xy",
data = Data_Animal_de_pd,
)
and
Long.PGLS = pgls(log(Long)~log(massAvg), data = CompData, lambda = 'ML')
EDIT:
Here I include small versions of CompData and Data_Animal_de_pd (only with 10 animals and the parameters massAvg and Long).
The class of CompData is "comparative.data" and comes from a function comparative.data which connects a phylogenetic tree with another data frame (Data_Animal_de_pd).
> dput(CompData)
structure(list(phy = structure(list(edge = structure(c(11L, 12L,
13L, 14L, 14L, 15L, 15L, 16L, 17L, 17L, 16L, 13L, 12L, 18L, 18L,
11L, 19L, 19L, 12L, 13L, 14L, 1L, 15L, 2L, 16L, 17L, 3L, 4L,
5L, 6L, 18L, 7L, 8L, 19L, 9L, 10L), dim = c(18L, 2L)), edge.length = c(100.597661,
5.254328, 4.311278, 71.0845800943, 34.327960646, 36.7566030561,
5.779375747, 15.0619109945, 15.9153248095, 15.9153245794, 30.9772360366,
75.39586827, 44.21113726, 36.439042146, 36.4390420969, 108.977279909,
72.27059073, 72.270578302), Nnode = 9L, tip.label = c("Tupaia_minor",
"Hystrix_cristata", "Geocapromys_brownii", "Myocastor_coypus",
"Hydrochoerus_hydrochaeris", "Rhinoceros_sondaicus", "Dasypus_hybridus",
"Tolypeutes_matacus", "Caluromysiops_irrupta", "Acrobates_pygmaeus"
), node.label = 11:19), class = "phylo", order = "cladewise"),
data = structure(list(massAvg = c(0.045, 20, 1.5, 7.5, 50.5,
1350, 5.5, 1.5, 0.45, 0.01), Long = c(21.565, 110.4, 55.52,
68.3266666666667, 175.2, 447.4, 47.02, 44.68, 38.58, 12.67
)), row.names = c("Tupaia_minor", "Hystrix_cristata", "Geocapromys_brownii",
"Myocastor_coypus", "Hydrochoerus_hydrochaeris", "Rhinoceros_sondaicus",
"Dasypus_hybridus", "Tolypeutes_matacus", "Caluromysiops_irrupta",
"Acrobates_pygmaeus"), class = "data.frame"), data.name = "datanm2[, c(\"massAvg\", \"Long\", \"Sci_name2\")]",
phy.name = "newphy", dropped = list(tips = character(0),
unmatched.rows = character(0))), class = "comparative.data")
Data_Animal_de_pd is a data frame that contains the information of the animals such as the length of the bones, etc.
> dput(Data_Animal_de_pd)
structure(list(massAvg = c(20, 50.5, 7.5, 1350, 0.45, 0.045,
1.5, 5.5, 1.5, 0.01), Long = c(110.4, 175.2, 68.3266666666667,
447.4, 38.58, 21.565, 55.52, 47.02, 44.68, 12.67), Sci_name = c("Hystrix cristata",
"Hydrochoerus hydrochaeris", "Myocastor coypus", "Rhinoceros sondaicus",
"Caluromysiops irrupta", "Tupaia minor", "Geocapromys brownii",
"Dasypus hybridus", "Tolypeutes matacus", "Acrobates pygmaeus"
), Sci_name2 = c("Hystrix_cristata", "Hydrochoerus_hydrochaeris",
"Myocastor_coypus", "Rhinoceros_sondaicus", "Caluromysiops_irrupta",
"Tupaia_minor", "Geocapromys_brownii", "Dasypus_hybridus", "Tolypeutes_matacus",
"Acrobates_pygmaeus")), row.names = c("10137", "10149", "10157",
"102233", "126286", "143289", "1543402", "1756220", "183749",
"190720"), class = "data.frame")```
To make your function work with symbols (i assume from rlang::sym) you must inject them with rlang::inject:
getsma <- function(P1, P2, dataf){
SMAt <- rlang::inject(sma(!!P1 ~ !!P2,
log = "xy",
data = dataf,
))
}
but you can instead substitute and inject arguments:
getsma <- function(P1, P2, dataf){
P1 <- rlang::enexpr(P1)
P2 <- rlang::enexpr(P2)
SMAt <- rlang::inject(sma(!!P1 ~ !!P2,
log = "xy",
data = dataf,
))
}
Then call them directly:
getsma(Long, massAvg, Data_Animal_de_pd)
I would like to plot an interactive heatmap, where the column widths are different.
Although I managed to get different cell widths, the widths do not correspond to the values and the ordering is not correct.
The order of the x-axis should remain the same as the segments column in the df data.frame.
If the heatmap doesn't work, I would also be fine with a stacked barchart.
df <- structure(list(
segments = c(101493L, 101493L, 101493L, 101492L, 101492L, 101492L, 101494L, 101494L, 101494L, 102018L, 102018L,
102018L, 102018L, 102018L, 102019L, 102019L, 102019L, 102019L, 102019L),
timestamp = structure(c(1579233600, 1579240800, 1579248000,
1579233600, 1579240800, 1579248000, 1579233600, 1579240800, 1579248000,
1579219200, 1579226400, 1579233600, 1579240800, 1579248000, 1579219200,
1579226400, 1579233600, 1579240800, 1579248000), class = c("POSIXct", "POSIXt"), tzone = "Europe/Berlin"),
value = c(91.772, 91.923, 96.968, 104.307, 101.435, 105.539, 104.879, 104.197, 103.038,
96.403, 90.926, 111.807, 115.931, 111.729, 100.129, 86.903, 108.22, 117.841, 112.293),
width = c(5L, 5L, 5L, 2L, 2L, 2L, 3L, 3L, 3L, 10L, 10L, 10L, 10L, 10L, 9L, 9L, 9L, 9L, 9L)),
row.names = c(1L, 2L, 3L, 11L, 12L, 13L, 21L, 22L, 23L, 31L, 32L, 33L, 34L, 35L,43L, 44L, 45L, 46L, 47L),
class = "data.frame")
library(plotly)
plot_ly(data = df) %>%
add_trace(type="heatmap",
x = ~as.character(width),
y = ~timestamp,
z = ~value,
xgap = 0.2, ygap = 0.2) %>%
plotly::layout(xaxis = list(rangemode = "nonnegative",
tickmode = "array",
tickvals=as.character(unique(df$width)),
ticktext=as.character(unique(df$segments)),
zeroline = FALSE))
By giving Plotly a matrix for the z-values it seems to work and the widths are respected.
df$newx <- rep(cumsum(df[!duplicated(df$segments),]$width), rle(df$segments)$length)
mappdf <- expand.grid(timestamp=unique(df$timestamp), newx=unique(df$newx))
mappdf <- merge(mappdf, df[,c("timestamp","value","newx")], all.x = T, all.y = F, sort = F)
mappdf <- mappdf[order(mappdf$newx, mappdf$timestamp),]
zvals <- matrix(data = mappdf$value,
nrow = length(unique(df$timestamp)),
ncol = length(unique(df$newx)))
plot_ly() %>%
add_heatmap(y = sort(unique(df$timestamp)),
x = c(0,unique(df$newx)),
z = zvals) %>%
plotly::layout(xaxis = list(
title = "",
tickvals=unique(df$newx),
ticktext=paste(unique(df$segments), "-", unique(df$width))
))
I have a df as follow:
Variable Value
G1_temp_0 37.9
G1_temp_5 37.95333333
G1_temp_10 37.98333333
G1_temp_15 38.18666667
G1_temp_20 38.30526316
G1_temp_25 38.33529412
G1_mean_Q1 38.03666667
G1_mean_Q2 38.08666667
G1_mean_Q3 38.01
G1_mean_Q4 38.2
G2_temp_0 37.9
G2_temp_5 37.95333333
G2_temp_10 37.98333333
G2_temp_15 38.18666667
G2_temp_20 38.30526316
G2_temp_25 38.33529412
G2_mean_Q1 38.53666667
G2_mean_Q2 38.68666667
G2_mean_Q3 38.61
G2_mean_Q4 38.71
I like to make a lineplot with two lines which reflects the values "G1_mean_Q1 - G1_mean_Q4" and "G2_mean_Q1 - G2_mean_Q4"
In the end it should more or less look like this, the x axis should represent the different variables:
The main problem I have is, how to get a basic line plot with this df.
I've tried something like this,
ggplot(df, aes(x = c(1:4), y = Value) + geom_line()
but I have always some errors. It would be great if someone could help me. Thanks
Please post your data with dput(data) next time. it makes it easier to read your data into R.
You need to tell ggplot which are the groups. You can do this with aes(group = Sample). For this purpose, you need to restructure your dataframe a bit and separate the Variable into different columns.
library(tidyverse)
dat <- structure(list(Variable = structure(c(5L, 10L, 6L, 7L, 8L, 9L,
1L, 2L, 3L, 4L, 15L, 20L, 16L, 17L, 18L, 19L, 11L, 12L, 13L,
14L), .Label = c("G1_mean_Q1", "G1_mean_Q2", "G1_mean_Q3", "G1_mean_Q4",
"G1_temp_0", "G1_temp_10", "G1_temp_15", "G1_temp_20", "G1_temp_25",
"G1_temp_5", "G2_mean_Q1", "G2_mean_Q2", "G2_mean_Q3", "G2_mean_Q4",
"G2_temp_0", "G2_temp_10", "G2_temp_15", "G2_temp_20", "G2_temp_25",
"G2_temp_5"), class = "factor"), Value = c(37.9, 37.95333333,
37.98333333, 38.18666667, 38.30526316, 38.33529412, 38.03666667,
38.08666667, 38.01, 38.2, 37.9, 37.95333333, 37.98333333, 38.18666667,
38.30526316, 38.33529412, 38.53666667, 38.68666667, 38.61, 38.71
)), class = "data.frame", row.names = c(NA, -20L))
dat <- dat %>%
filter(str_detect(Variable, "mean")) %>%
separate(Variable, into = c("Sample", "mean", "time"), sep = "_")
g <- ggplot(data=dat, aes(x=time, y=Value, group=Sample)) +
geom_line(aes(colour=Sample))
g
Created on 2020-07-20 by the reprex package (v0.3.0)
I am trying to use geom_label_repel to add labels to a couple of data points on a plot. In this case, they happen to be outliers on box plots. I've got most of the code working, I can label the outlier, but for some reason I am getting multiple labels (equal to my sample size for the entire data set) mapped to that point. I'd like just one label for this outlier.
Example:
Here is my data:
dput(sus_dev_data)
structure(list(time_point = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L), .Label = c("3", "8", "12"), class = "factor"),
days_to_pupation = c(135L, 142L, 143L, 155L, 149L, 159L,
153L, 171L, 9L, 67L, 53L, 49L, 72L, 67L, 55L, 64L, 60L, 122L,
53L, 51L, 49L, 53L, 50L, 56L, 44L, 47L, 60L)), row.names = c(1L,
2L, 3L, 4L, 5L, 6L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L,
17L, 18L, 20L, 21L, 22L, 23L, 24L, 26L, 27L, 28L, 29L, 30L), class = "data.frame")
and my code...
####################################################################################################
# Time to pupation statistical analysis
####################################################################################################
## linear model
pupation_Model=lm(sus_dev_data$days_to_pupation~sus_dev_data$time_point)
pupationANOVA=aov(pupation_Model)
summary(pupationANOVA)
# Tukey test to study each pair of treatment :
pupationTUKEY <- TukeyHSD(x=pupationANOVA, which = 'sus_dev_data$time_point',
conf.level=0.95)
## Function to generate significance labels on box plot
generate_label_df <- function(pupationTUKEY, variable){
# Extract labels and factor levels from Tukey post-hoc
Tukey.levels <- pupationTUKEY[[variable]][,4]
Tukey.labels <- data.frame(multcompLetters(Tukey.levels, reversed = TRUE)['Letters'])
#I need to put the labels in the same order as in the boxplot :
Tukey.labels$treatment=rownames(Tukey.labels)
Tukey.labels=Tukey.labels[order(Tukey.labels$treatment) , ]
return(Tukey.labels)
}
#generate labels using function
labels<-generate_label_df(pupationTUKEY , "sus_dev_data$time_point")
#rename columns for merging
names(labels)<-c('Letters','time_point')
# obtain letter position for y axis using means
pupationyvalue<-aggregate(.~time_point, data=sus_dev_data, max)
#merge dataframes
pupationfinal<-merge(labels,pupationyvalue)
####################################################################################################
# Time to pupation plot
####################################################################################################
# Plot of data
(pupation_plot <- ggplot(sus_dev_data, aes(time_point, days_to_pupation)) +
Alex_Theme +
geom_boxplot(fill = "grey80", outlier.size = 0.75) +
geom_text(data = pupationfinal, aes(x = time_point, y = days_to_pupation,
label = Letters),vjust=-2,hjust=.5, size = 4) +
#ggtitle(expression(atop("Days to pupation"))) +
labs(y = 'Days to pupation', x = 'Weeks post-hatch') +
scale_y_continuous(limits = c(0, 200)) +
scale_x_discrete(labels=c("3" = "13", "8" = "18",
"12" = "22")) +
geom_label_repel(aes(x = 1, y = 9),
label = '1')
)
Here's a shorter example to demonstrate what is going on. Essentially, your labels are beng recycled to be the same length as the data.
df = data.frame(x=1:5, y=1:5)
ggplot(df, aes(x,y, color=x)) +
geom_point() +
geom_label_repel(aes(x = 1, y = 1), label = '1')
You can override this by providing new data for the ggrepel
ggplot(df, aes(x,y, color=x)) +
geom_point() +
geom_label_repel(data = data.frame(x=1, y=1), label = '1')
Based on your data, you have 3 outliers (one in each group), you can manually identify them by applying the classic definition of outliers by John Tukey (Upper: Q3+1.5*IQR and Lower: Q1-1.5*IQR) (but you are free to set your own rules to define an outlier). You can use the function quantile and IQR to get those points.
Here, I incorporated them in a sequence of pipe using dplyr package:
library(tidyverse)
Outliers <- sus_dev_data %>% group_by(time_point) %>%
mutate(Out_up = ifelse(days_to_pupation > quantile(days_to_pupation,0.75)+1.5*IQR(days_to_pupation), "Out","In"))%>%
mutate(Out_Down = ifelse(days_to_pupation < quantile(days_to_pupation,0.25)-1.5*IQR(days_to_pupation), "Out","In")) %>%
filter(Out_up == "Out" | Out_Down == "Out")
# A tibble: 3 x 4
# Groups: time_point [3]
time_point days_to_pupation Out_up Out_Down
<fct> <int> <chr> <chr>
1 3 9 In Out
2 8 122 Out In
3 12 60 Out In
As mentioned by #dww, you need to pass a new dataframe to geom_label_repel if you want your outliers to be single labeled. So, here we use the dataframe Outliers to feed the geom_label_repel function:
library(ggplot2)
library(ggrepel)
ggplot(sus_dev_data, aes(time_point, days_to_pupation)) +
#Alex_Theme +
geom_boxplot(fill = "grey80", outlier.size = 0.75) +
geom_text(data = pupationfinal, aes(x = time_point, y = days_to_pupation,
label = Letters),vjust=-2,hjust=.5, size = 4) +
#ggtitle(expression(atop("Days to pupation"))) +
labs(y = 'Days to pupation', x = 'Weeks post-hatch') +
scale_y_continuous(limits = c(0, 200)) +
scale_x_discrete(labels=c("3" = "13", "8" = "18",
"12" = "22")) +
geom_label_repel(inherit.aes = FALSE,
data = Outliers,
aes(x = time_point, y = days_to_pupation, label = "Out"))
And you get the following graph:
I hope it helps you to figure it how to label all your outliers.
I have a dataframe with the following data
my2016.regression.dataframe <- structure(list(Economy_Directorate = structure(c(9L, 1L, 18L,
11L, 5L, 7L), .Label = c("20128895", "25392278", "26802176",
"33214069", "34194316", "34863777", "34867843", "36497785", "37280694",
"37411816", "44460126", "45484123", "47463441", "48354697", "57954259",
"60187650", "65135916", "67317188"), class = "factor"), People_Directorate = structure(c(12L,
14L, 17L, 16L, 13L, 15L), .Label = c("20128895", "25392278",
"26802176", "33214069", "34194316", "34863777", "34867843", "36497785",
"37280694", "37411816", "44460126", "45484123", "47463441", "48354697",
"57954259", "60187650", "65135916", "67317188"), class = "factor")), .Names = c("Economy_Directorate",
"People_Directorate"), row.names = c(NA, -6L), class = "data.frame")
I used the following code to plot it. it plotts the points, but it does not plot the lm .
Could you help me why it does not plot the the lm in the geom_smooth
library(ggplot2)
ggplot(data =my2016.regression.dataframe )+
geom_point(aes(y=Economy_Directorate,x=People_Directorate))+
geom_smooth(method = "lm",aes(y=Economy_Directorate,x=People_Directorate),
fill="orange",colour="red")
Regards,
You need to convert your columns to numeric types. They are currently factors:
my2016.regression.dataframe$Economy_Directorate = as.numeric(as.character(my2016.regression.dataframe$Economy_Directorate))
my2016.regression.dataframe$People_Directorate = as.numeric(as.character(my2016.regression.dataframe$People_Directorate))
ggplot(data = my2016.regression.dataframe) +
geom_point(aes(y=Economy_Directorate,x=People_Directorate))+
geom_smooth(method = "lm",aes(y=Economy_Directorate,x=People_Directorate),
fill="orange",colour="red")