I have been working on plotting several lines according to different probability levels and am stuck adding labels to each line to represent the probability level.
Since each curve plotted has varying x and y coordinates, I cannot simply have a large data-frame on which to perform usual ggplot2 functions.
The end goal is to have each line with a label next to it according to the p-level.
What I have tried:
To access the data comfortably, I have created a list df with for example 5 elements, each element containing a nx2 data frame with column 1 the x-coordinates and column 2 the y-coordinates. To plot each curve, I create a for loop where at each iteration (i in 1:5) I extract the x and y coordinates from the list and add the p-level line to the plot by:
plot = plot +
geom_line(data=df[[i]],aes(x=x.coor, y=y.coor),color = vector_of_colors[i])
where vector_of_colors contains varying colors.
I have looked at using ggrepel and its geom_label_repel() or geom_text_repel() functions, but being unfamiliar with ggplot2 I could not get it to work. Below is a simplification of my code so that it may be reproducible. I could not include an image of the actual curves I am trying to add labels to since I do not have 10 reputation.
# CREATION OF DATA
plevel0.5 = cbind(c(0,1),c(0,1))
colnames(plevel0.5) = c("x","y")
plevel0.8 = cbind(c(0.5,3),c(0.5,1.5))
colnames(plevel0.8) = c("x","y")
data = list(data1 = line1,data2 = line2)
# CREATION OF PLOT
plot = ggplot()
for (i in 1:2) {
plot = plot + geom_line(data=data[[i]],mapping=aes(x=x,y=y))
}
Thank you in advance and let me know what needs to be clarified.
EDIT :
I have now attempted the following :
Using bind_rows(), I have created a single dataframe with columns x.coor and y.coor as well as a column called "groups" detailing the p-level of each coordinate.
This is what I have tried:
plot = ggplot(data) +
geom_line(aes(coors.x,coors.y,group=groups,color=groups)) +
geom_text_repel(aes(label=groups))
But it gives me the following error:
geom_text_repel requires the following missing aesthetics: x and y
I do not know how to specify x and y in the correct way since I thought it did this automatically. Any tips?
You approach is probably a bit to complicated. As far as I get it you could of course go on with one dataset and use the group aesthetic to get the same result you are trying to achieve with your for loop and multiple geom_line. To this end I use dplyr:.bind_rows to bind your datasets together. Whether ggrepel is needed depends on your real dataset. In my code below I simply use geom_text to add an label at the rightmost point of each line:
plevel0.5 <- data.frame(x = c(0, 1), y = c(0, 1))
plevel0.8 <- data.frame(x = c(0.5, 3), y = c(0.5, 1.5))
library(dplyr)
library(ggplot2)
data <- list(data1 = plevel0.5, data2 = plevel0.8) |>
bind_rows(.id = "id")
ggplot(data, aes(x = x, y = y, group = id)) +
geom_line(aes(color = id)) +
geom_text(data = ~ group_by(.x, id) |> filter(x %in% max(x)), aes(label = id), vjust = -.5, hjust = .5)
I have a list of model-output in R that I want to plot using ggplot. I want to produce a scatter plot within which every column of data is a different colour. In the example here, I have three model outputs which I want to plot against 'measured'. What I want in the end is a scatter with three different 'clouds' of points, each of which is a different colour. Here is a reproducible example of what I have so far:
library(ggplot)
library(tidyverse)
#data for three different models as well as a column for 'observations' (measured)
output <- list(model1 = 1:10, model2 = 22:31, model3=74:83)
#create the dataframe
df <- data.frame(
predicted = output,
measured = 1:length(output[[1]]),
#year = as.factor(data$year),
#site = data$site
#model = as.factor(names(output)),
#stringsAsFactors=TRUE)
fix.empty.names = TRUE)
#fix the column names
colnames(df)<-names(output)
#plot the data with a different colour for each column of data
p <- ggplot(df) +
geom_point(
aes(
measured,
predicted,
colour =colnames(df)
)
) +
ylim(-5, 90)+
theme_minimal()
p + geom_hline(yintercept=0)
print(p)
I am getting the error: Error in FUN(X[[i]], ...) : object 'measured' not found
why is 'measured' not being found? I can see it in the df?
Perhaps I needs to collapse all the model outputs into one column a create a column as a 'factor' column to 'assign' each data point to a particular model?
The first issue is that your output list only has as many elements as you have models, so it has no name for the last "measured" column and that gets overwritten with NA.
Compare:
colnames(df)<-names(output). # NA in last col
colnames(df)<-c(names(output), "measured"). # fixed
Then, to plot your data in ggplot2 it's almost always better to convert to longer, "tidy" format, with one row per observation. pivot_longer from tidyr is great for that.
df %>%
pivot_longer(-measured, # don't pivot "measured" -- keep in every row
names_to = "model",
values_to = "predicted") %>%
ggplot() +
geom_point(
aes(
measured,
predicted,
colour = model
)
) +
ylim(-5, 90)+
theme_minimal() +
geom_hline(yintercept=0)
You changed the name of your object :
colnames(df)<-names(output)
So now your columns were not found.
I reorganized your object into a data frame that can be easily understood by ggplot2. Do not hesitate to look at your objects.
Here is one option :
library(ggplot2)
library(tidyverse)
#data for three different models as well as a column for 'observations' (measured)
output <- list(model1 = 1:10, model2 = 22:31, model3=74:83)
#create the dataframe
df <- data.frame(
predicted = unlist(output),
measured = 1:length(unlist(output)),
model = names(output)
)
#plot the data with a different colour for each column of data
p <- ggplot(df) +
geom_point(aes(measured, predicted,colour = model)) +
ylim(-5, 90)+
theme_minimal()
p + geom_hline(yintercept=0)
print(p)
plotwithgroups
If you add this line :
facet_grid(~model) +
You can get this which sounds like what you were asking :
plotwithfacet
I have timeseries of 4 simulated variables, with its 4 observed variables (observed variables have less data than simulated variables) as attached in the following link:
https://www.dropbox.com/s/sumgi6pqmjx70dl/nutrients2.csv?dl=0
I used the following code, The data is stored in "data 2" object
data2 <- read.table("C:/Users/Downloads/nutrients2.csv", header=T, sep=",")
library(lubridate)
data2$Date <- dmy(data2$Date)
library(reshape2)
data2 <- melt(data2, id=c("Date","Type"))
seg2 <- ggplot(data = data2, aes(x = Date, y = value, group = Type, colour = Type)) +
geom_line() +
facet_wrap(~ variable, scales = "free")
seg2
This give the plot (all variables in line)
Plot obtained
I need the observed data in points instead of interrupted lines, like this example
Plot desired
How to get a plot like this in ggplot, (simulated variables in line and observed variables in points or dots)?
One possible solution is to subset your dataset for geom_line and geom_point in order to use only sim and obs data respectively.
Then, if you pass shape = Type in your aes, you can remove dots for sim data in your legend by using scale_shape_manual:
(NB: I used melt function from data.table package because I found it more efficient for big dataset than the melt function reshape2)
library(lubridate)
df$Date <- dmy(df$Date)
library(data.table)
dt.m <- melt(setDT(df),measure = list(c("Nitrate","Ammonium","DIP","Chla")), value.name = "Values", variable.name = "Element")
library(ggplot2)
ggplot(dt.m, aes(x = Date, y = Values, group = Type, color = Type, shape = Type))+
geom_line(data = subset(dt.m, Type == "sim"))+
geom_point(data = subset(dt.m, Type == "obs"))+
scale_shape_manual(values = c(16,NA))+
facet_wrap(~Element, scales = "free")
i need your help.
I was trying to do a stacked bar plot in R and i m not succeding for the moment. I have read several post but, no succed neither.
Like i am newbie, this is the chart I want (I made it in excel)
And this is how i have the data
Thank you in advance
I would use the package ggplot2 to create this plot as it is easier to position text labels than compared to the basic graphics package:
# First we create a dataframe using the data taken from your excel sheet:
myData <- data.frame(
Q_students = c(1000,1100),
Students_with_activity = c(950, 10000),
Average_debt_per_student = c(800, 850),
Week = c(1,2))
# The data in the dataframe above is in 'wide' format, to use ggplot
# we need to use the tidyr package to convert it to 'long' format.
library(tidyr)
myData <- gather(myData,
Condition,
Value,
Q_students:Average_debt_per_student)
# To add the text labels we calculate the midpoint of each bar and
# add this as a column to our dataframe using the package dplyr:
library(dplyr)
myData <- group_by(myData,Week) %>%
mutate(pos = cumsum(Value) - (0.5 * Value))
#We pass the dataframe to ggplot2 and then add the text labels using the positions which
#we calculated above to place the labels correctly halfway down each
#column using geom_text.
library(ggplot2)
# plot bars and add text
p <- ggplot(myData, aes(x = Week, y = Value)) +
geom_bar(aes(fill = Condition),stat="identity") +
geom_text(aes(label = Value, y = pos), size = 3)
#Add title
p <- p + ggtitle("My Plot")
#Plot p
p
so <- data.frame ( week1= c(1000,950,800), week2=c(1100,10000,850),row.names = c("Q students","students with Activity","average debt per student")
barplot(as.matrix(so))
The following code produces three plots. The first plot uses data from df_fault, and plots lines with symbols from df_maint, and that plot is fine also. The problem is with the 3rd plot, that combines the lines with symbols from df_fault with the lines from df_maint. The legend is incorrect, and there are two legends, one for lines and one for symbols. How to get one correct legend with four entries.
Create some sample data
library(zoo)
library(ggplot2)
rDates <- function(N, st="2012/01/01", et="2012/12/31") {
st <- as.POSIXct(as.Date(st))
et <- as.POSIXct(as.Date(et))
dt <- as.numeric(difftime(et,st,unit="sec"))
ev <- sort(runif(N, 0, dt))
rt <- st + ev
}
first_maint <- as.POSIXct(strptime("2014/01/01", "%Y/%m/%d"))
last_maint <- as.POSIXct(strptime("2014/12/31", "%Y/%m/%d"))
first_fault <- as.POSIXct(strptime("2014/05/01", "%Y/%m/%d"))
last_fault <- as.POSIXct(strptime("2014/07/31", "%Y/%m/%d"))
set.seed(31)
nMDates=40
nFDates=10
rMaintDates <- rDates(nMDates,first_maint,last_maint)
rFaultDates <- rDates(nFDates,first_fault,last_fault)
df_fault <- data.frame(date = rFaultDates,
type = "Non-Op",
ci = runif(nFDates,.7,1.8),stringsAsFactors=FALSE)
df_fault$type[sample(1:nFDates,3)] = "Advisory"
z_hr <- zoo(c(0,0,9.9,9.9),c(first_maint,first_fault,last_fault,last_maint))
z_maint <- zoo(,rMaintDates[c(-1,-nMDates)])
z_hr_maint_a <- merge(z_hr,z_maint)
z_hr_maint <- na.approx(z_hr_maint_a)
z_repair <- zoo(c(0,3000,5000,8000),c(first_maint,first_fault,last_fault,last_maint))
z_repair_maint_a <- merge(z_repair,z_maint)
z_repair_maint <- na.approx(z_repair_maint_a)
df_maint <- data.frame(date=index(z_hr_maint),
hrs=coredata(z_hr_maint)/9.8,
repairs=coredata(z_repair_maint)/8000)
Plot the sample data, these examples work
rpr_title = "repairs/8000"
flt_title = "hrs/9.8"
(gp2 <- ggplot(data=df_fault,aes(x=date, y=ci, color=type)) +
labs(x="Date (2014)", y="CI Amplitude",title="Sample, this plot is fine, df_fault") +
geom_line(aes(group=type,shape=type))+
geom_point(aes(group=type,shape=type),size=4)+
theme(plot.title=element_text( size=12),
axis.title=element_text( size=8)) )
(gp2a <- ggplot() + geom_line(data=df_maint,aes(x=date,y=repairs,color=rpr_title))+
geom_line(data=df_maint,aes(x=date,y=hrs,color=flt_title))+
labs(x="Date (2014)", y="CI Amplitude",title="Sample, this plot is fine, df_maint ")
)
This plot shows the fault data
This plot shows the maintenance and usage data
I would like to combine the above two plots into one plot, with four legend entries. Here is my current attempt, but the legend isn't correct
(gp2b <- gp2 + geom_line(data=df_maint,aes(x=date,y=repairs,color=rpr_title))+
geom_line(data=df_maint,aes(x=date,y=hrs,color=flt_title))+
labs(x="Date (2014)", y="CI Amplitude",title="Sample, this plot the legend is wrong")
)
This plot, there are two legends, and neither one is correct. The first "type" legend has the wrong symbols on the line, showing a circle symbol for all the lines. The second "type" legend shows two black symbols, so the colors are incorrect. I would like the 2nd legend removed, and the 1st legend correctly showing lines and colors. Also, it would be nice if the lines without symbols could be wider. The legend line/symbol for "Advisory" is correct. The legend entry for "Non-op" should have a triangle instead of a circle. The legend entries for "hrs/9.8" and "repairs/8000" should only have a line, no symbol.
Brandon suggestions for using meld helps, but the plot below still doesn't have the legend correct...
names(df_fault)[2:3] <- c("variable","value") # for rbind
dat <- melt(df_maint, c("date")) # melted
dat <- rbind(dat, df_fault)
p1 <- ggplot(dat, aes(date,value, group = variable, color = variable)) + geom_line()
p1 + geom_point(data =
dat[dat$variable %in% c("Advisory","Non-Op"),],
aes(date,value, group = variable, color = variable, shape=variable)) +
scale_colour_discrete(name ="Fleet",
breaks=c("hrs", "repairs","Advisory","Non-Op"),
labels=c("usage hrs", "maint repairs","Advisory Faults","Non-Op Faults")) +
scale_shape_discrete(name ="Fleet",
breaks=c("hrs", "repairs","Advisory","Non-Op"),
labels=c("usage hrs", "maint repairs","Advisory Faults","Non-Op Faults"),
guide = "none")
Post script: I want to mention that it took some effort to apply the above procedure to an actual data set. Here's an summary of the process.
1) Identify the x axis variables, and grouping variables.
2) In the two data frames, rename the x axis variable and group variables to the same names
3) Use melt twice (example only used it once) to generate a melted data frame. Use the x axis and group variables as is.vars. Specify the variable that you want to plot as measure.vars.
3b) Do head on the melted data frames. You need to see the X axis variable names and the grouping variable names, followed by the field variable and values. The field variable has text values corresponding to the different y axis names.
4) Use rbind to combine the two melted dataframes
5) Do head on both steps 3 and 4 so you understand the storage of the data
6) Plot the lines for all the data. Include the modification of the legend title in this step, using + guides(color=guide_legend(title="Fleet")). I don't see this command in the example.
7) Create a subset from the melted data frame of the data that will have symbols. Add the symbols, but don't add the 2nd legend from symbols +scale_shape_discrete(name ="Fleet", guide = "none") in the example.
8) Adjust the legend line symbols using + guides(colour = guide_legend(override.aes = list(shape = c(32,32,16,17))))
9) Once you can see a nominal plot of lines with some symbols and the correct legend, you may need to repeat the above process after sorting the combined melted data frame in order to get the correct lines / symbols in front. You may want to sort on variable, and the x axis fields.
By adding guides, and specifying the shape as no shape (32), and matching the other symbols (16, 17), the plot comes out correct
p1 <- ggplot(dat, aes(date,value, group = variable, color = variable)) + geom_line(size=1)
p1 + geom_point(data =
dat[dat$variable %in% c("Advisory","Non-Op"),],
aes(date,value, group = variable, color = variable, shape=variable),size=3) +
scale_colour_discrete(name ="Fleet",
breaks=c("hrs", "repairs","Advisory","Non-Op"),
labels=c("usage hrs", "maint repairs","Advisory Faults","Non-Op Faults")) +
scale_shape_discrete(name ="Fleet",
guide = "none") +
guides(colour = guide_legend(override.aes = list(shape = c(32,32,16,17))))
When in doubt, melt. See the example below:
library(reshape2)
library(ggplot2)
names(df_fault)[2:3] <- c("variable","value") # for rbind
dat <- melt(df_maint, c("date")) # melted
dat <- rbind(dat, df_fault)
p1 <- ggplot(dat, aes(date,value, group = variable, color = variable)) + geom_line()
p1 + geom_point(data =
dat[dat$variable %in% c("Advisory","Non-Op"),],
aes(date,value, group = variable, color = variable, shape=variable)) +
scale_shape(guide = "none")
Notice that I specified "data" in my geom_point() call. Each scale_ has a method for removing the guide by setting it to "none".