Loops, dataframes and ggplot

Loops, dataframes and ggplot - r

I would like to display multiple plots on the same page using ggplot, and the multiplot function described here: http://www.cookbook-r.com/Graphs/Multiple_graphs_on_one_page_(ggplot2)/. My data is stored in a large dataframe with the first column corresponding to period. I want to visualize columns 2:26. My issue is reproducible using:
rawdata1 <- data.frame("Period" = 1:34, "Sample" = sample(x = c(1,2),34, replace = TRUE),"Runif" = runif(n = 34))
Intuitively, I would use the following code: (With 2:3 replaced with 2:26)
out <- NULL
for (i in 2:3){
out[[i-1]] <- ggplot(rawdata1, aes("Period", y = value)) + geom_line(aes(x = Period, y = rawdata1[[i]])) + ggtitle(label = colnames(rawdata1)[i])
}
multiplot(plotlist = out, cols = 2)
This succeeds in plotting multiple graphs, however my problem is that each graph that is plotted uses data from the same column (column 3 in the above example, column 26 in my dataset). I've puzzled out that this is because my "out" list stores the ggplot list with the y values stored dynamically.
i's final value is 26, and when I call an item from "out", it uses the current value for i to create the graph. So every graph displays using the same column. As I am new to R, so my guess is that I am not managing my variables correctly. Any help would be appreciated

Below you find an alternative: using the melt function from reshape2 and then faceting with facet_wrap.
require(ggplot2)
require(reshape2)
data.melt <- melt(rawdata1, id.var='Period')
ggplot(data.melt, aes(Period, value)) +
geom_line() +
facet_wrap(~variable, scales='free_y')
If you want to use multiplot instead, you could do the following:
out <- lapply(names(rawdata1)[-1],
function(index) ggplot(rawdata1) +
geom_line(aes_string(x = 'Period', y = index)) +
ggtitle(label = index))
multiplot(plotlist = out, cols = 2)

Related

How can I manually add labels to multiple ggplot2 mappings created through a for-loop?

I have been working on plotting several lines according to different probability levels and am stuck adding labels to each line to represent the probability level.
Since each curve plotted has varying x and y coordinates, I cannot simply have a large data-frame on which to perform usual ggplot2 functions.
The end goal is to have each line with a label next to it according to the p-level.
What I have tried:
To access the data comfortably, I have created a list df with for example 5 elements, each element containing a nx2 data frame with column 1 the x-coordinates and column 2 the y-coordinates. To plot each curve, I create a for loop where at each iteration (i in 1:5) I extract the x and y coordinates from the list and add the p-level line to the plot by:
plot = plot +
geom_line(data=df[[i]],aes(x=x.coor, y=y.coor),color = vector_of_colors[i])
where vector_of_colors contains varying colors.
I have looked at using ggrepel and its geom_label_repel() or geom_text_repel() functions, but being unfamiliar with ggplot2 I could not get it to work. Below is a simplification of my code so that it may be reproducible. I could not include an image of the actual curves I am trying to add labels to since I do not have 10 reputation.
# CREATION OF DATA
plevel0.5 = cbind(c(0,1),c(0,1))
colnames(plevel0.5) = c("x","y")
plevel0.8 = cbind(c(0.5,3),c(0.5,1.5))
colnames(plevel0.8) = c("x","y")
data = list(data1 = line1,data2 = line2)
# CREATION OF PLOT
plot = ggplot()
for (i in 1:2) {
plot = plot + geom_line(data=data[[i]],mapping=aes(x=x,y=y))
}
Thank you in advance and let me know what needs to be clarified.
EDIT :
I have now attempted the following :
Using bind_rows(), I have created a single dataframe with columns x.coor and y.coor as well as a column called "groups" detailing the p-level of each coordinate.
This is what I have tried:
plot = ggplot(data) +
geom_line(aes(coors.x,coors.y,group=groups,color=groups)) +
geom_text_repel(aes(label=groups))
But it gives me the following error:
geom_text_repel requires the following missing aesthetics: x and y
I do not know how to specify x and y in the correct way since I thought it did this automatically. Any tips?

You approach is probably a bit to complicated. As far as I get it you could of course go on with one dataset and use the group aesthetic to get the same result you are trying to achieve with your for loop and multiple geom_line. To this end I use dplyr:.bind_rows to bind your datasets together. Whether ggrepel is needed depends on your real dataset. In my code below I simply use geom_text to add an label at the rightmost point of each line:
plevel0.5 <- data.frame(x = c(0, 1), y = c(0, 1))
plevel0.8 <- data.frame(x = c(0.5, 3), y = c(0.5, 1.5))
library(dplyr)
library(ggplot2)
data <- list(data1 = plevel0.5, data2 = plevel0.8) |>
bind_rows(.id = "id")
ggplot(data, aes(x = x, y = y, group = id)) +
geom_line(aes(color = id)) +
geom_text(data = ~ group_by(.x, id) |> filter(x %in% max(x)), aes(label = id), vjust = -.5, hjust = .5)

R - Reorder a bar plot in a function using ggplot2

I have the following plot function using ggplot2.
Function_Plot <- function(Fun_Data, Fun_Color)
{
MyPlot <- ggplot(data = na.omit(Fun_Data), aes_string(x = colnames(Fun_Data[2]), fill = colnames(Fun_Data[1]))) +
geom_bar(stat = "count") +
coord_flip() +
scale_fill_manual(values = Fun_Color)
return(MyPlot)
}
The result is :
I need to upgrade my function to reorder the bar according frequencies of the words (in descending order). As I see the answer for another question about reordering, I try to introduce reorder function in the aes_string but it doesn't work.
A reproducible example :
a <- c("G1","G1","G1","G1","G1","G1","G1","G1","G1","G1","G2","G2","G2","G2","G2","G2","G2","G2")
b <- c("happy","sad","happy","bravery","bravery","God","sad","happy","freedom","happy","freedom",
"God","sad","happy","freedom",NA,"money","sad")
MyData <- data.frame(Cluster = a, Word = b)
MyColor <- c("red","blue")
Function_Plot(Fun_Data = MyData, Fun_Color = MyColor)

Well, if reordering doesn't work inside aes_string, let's try it beforehand.
Function_Plot <- function(Fun_Data, Fun_Color)
{
Fun_Data[[2]] <- reorder(Fun_Data[[2]], Fun_Data[[2]], length)
MyPlot <- ggplot(data = na.omit(Fun_Data), aes_string(x = colnames(Fun_Data[2]), fill = colnames(Fun_Data[1]))) +
geom_bar(stat = "count") +
coord_flip() +
scale_fill_manual(values = Fun_Color)
return(MyPlot)
}
Function_Plot()
Couple other notes - I'd recommend you use a more consistent style, mixing whether or not use use _ to separate words in variable names is confusing and asking for bugs.
It won't matter much unless your data is really big, but extracting names from a data frame is very efficient, whereas subsetting a data frame is less efficient. Your code subsets a data frame and then extracts the column names remaining, e.g., colnames(Fun_Data[1]). It will be cleaner to extract the names and then subset that vector: colnames(Fun_Data)[1]

Data driven plot names in data.table

This is a personal project to learn the syntax of the data.table package. I am trying to use the data values to create multiple graphs and label each based on the by group value. For example, given the following data:
# Generate dummy data
require(data.table)
set.seed(222)
DT = data.table(grp=rep(c("a","b","c"),each=10),
x = rnorm(30, mean=5, sd=1),
y = rnorm(30, mean=8, sd=1))
setkey(DT, grp)
The data consists of random x and y values for 3 groups (a, b, and c). I can create a formatted plot of all values with the following code:
# Example of plotting all groups in one plot
require(ggplot2)
p <- ggplot(data=DT, aes(x = x, y = y)) +
aes(shape = factor(grp))+
geom_point(aes(colour = factor(grp), shape = factor(grp)), size = 3) +
labs(title = "Group: ALL")
p
This creates the following plot:
Instead I would like to create a separate plot for each by group, and change the plot title from “Group: ALL” to “Group: a”, “Group: b”, “Group: c”, etc. The documentation for data.table says:
.BY is a list containing a length 1 vector for each item in by. This can be useful when by is not known in advance. The by variables are also available to j directly by name; useful for example for titles of graphs if j is a plot command, or to branch with if()
That being said, I do not understand how to use .BY or .SD to create separate plots for each group. Your help is appreciated.

Here is the data.table solution, though again, not what I would recommend:
make_plot <- function(dat, grp.name) {
print(
ggplot(dat, aes(x=x, y=y)) +
geom_point() + labs(title=paste0("Group: ", grp.name$grp))
)
NULL
}
DT[, make_plot(.SD, .BY), by=grp]
What you really should do for this particular application is what #dmartin recommends. At least, that's what I would do.

Instead of using data.table, you could use facet_grid in ggplot with the labeller argument:
p <- ggplot(data=DT, aes(x = x, y = y)) + aes(shape = factor(grp)) +
geom_point(aes(colour = factor(grp), shape = factor(grp)), size = 3) +
facet_grid(. ~ grp, labeller = label_both)
See the ggplot documentation for more information.

I see you already have a "facetting" option. I had done this
p+facet_wrap('grp')
But this gives the same result:
p+facet_wrap(~grp)

Plotting multiple time series on the same plot using ggplot()

I am fairly new to R and am attempting to plot two time series lines simultaneously (using different colors, of course) making use of ggplot2.
I have 2 data frames. the first one has 'Percent change for X' and 'Date' columns. The second one has 'Percent change for Y' and 'Date' columns as well, i.e., both have a 'Date' column with the same values whereas the 'Percent Change' columns have different values.
I would like to plot the 'Percent Change' columns against 'Date' (common to both) using ggplot2 on a single plot.
The examples that I found online made use of the same data frame with different variables to achieve this, I have not been able to find anything that makes use of 2 data frames to get to the plot. I do not want to bind the two data frames together, I want to keep them separate. Here is the code that I am using:
ggplot(jobsAFAM, aes(x=jobsAFAM$data_date, y=jobsAFAM$Percent.Change)) + geom_line() +
xlab("") + ylab("")
But this code produces only one line and I would like to add another line on top of it.
Any help would be much appreciated.
TIA.

ggplot allows you to have multiple layers, and that is what you should take advantage of here.
In the plot created below, you can see that there are two geom_line statements hitting each of your datasets and plotting them together on one plot. You can extend that logic if you wish to add any other dataset, plot, or even features of the chart such as the axis labels.
library(ggplot2)
jobsAFAM1 <- data.frame(
data_date = runif(5,1,100),
Percent.Change = runif(5,1,100)
)
jobsAFAM2 <- data.frame(
data_date = runif(5,1,100),
Percent.Change = runif(5,1,100)
)
ggplot() +
geom_line(data = jobsAFAM1, aes(x = data_date, y = Percent.Change), color = "red") +
geom_line(data = jobsAFAM2, aes(x = data_date, y = Percent.Change), color = "blue") +
xlab('data_date') +
ylab('percent.change')

If both data frames have the same column names then you should add one data frame inside ggplot() call and also name x and y values inside aes() of ggplot() call. Then add first geom_line() for the first line and add second geom_line() call with data=df2 (where df2 is your second data frame). If you need to have lines in different colors then add color= and name for eahc line inside aes() of each geom_line().
df1<-data.frame(x=1:10,y=rnorm(10))
df2<-data.frame(x=1:10,y=rnorm(10))
ggplot(df1,aes(x,y))+geom_line(aes(color="First line"))+
geom_line(data=df2,aes(color="Second line"))+
labs(color="Legend text")

I prefer using the ggfortify library. It is a ggplot2 wrapper that recognizes the type of object inside the autoplot function and chooses the best ggplot methods to plot. At least I don't have to remember the syntax of ggplot2.
library(ggfortify)
ts1 <- 1:100
ts2 <- 1:100*0.8
autoplot(ts( cbind(ts1, ts2) , start = c(2010,5), frequency = 12 ),
facets = FALSE)

I know this is old but it is still relevant. You can take advantage of reshape2::melt to change the dataframe into a more friendly structure for ggplot2.
Advantages:
allows you plot any number of lines
each line with a different color
adds a legend for each line
with only one call to ggplot/geom_line
Disadvantage:
an extra package(reshape2) required
melting is not so intuitive at first
For example:
jobsAFAM1 <- data.frame(
data_date = seq.Date(from = as.Date('2017-01-01'),by = 'day', length.out = 100),
Percent.Change = runif(5,1,100)
)
jobsAFAM2 <- data.frame(
data_date = seq.Date(from = as.Date('2017-01-01'),by = 'day', length.out = 100),
Percent.Change = runif(5,1,100)
)
jobsAFAM <- merge(jobsAFAM1, jobsAFAM2, by="data_date")
jobsAFAMMelted <- reshape2::melt(jobsAFAM, id.var='data_date')
ggplot(jobsAFAMMelted, aes(x=data_date, y=value, col=variable)) + geom_line()

This is old, just update new tidyverse workflow not mentioned above.
library(tidyverse)
jobsAFAM1 <- tibble(
date = seq.Date(from = as.Date('2017-01-01'),by = 'day', length.out = 5),
Percent.Change = runif(5, 0,1)
) %>%
mutate(serial='jobsAFAM1')
jobsAFAM2 <- tibble(
date = seq.Date(from = as.Date('2017-01-01'),by = 'day', length.out = 5),
Percent.Change = runif(5, 0,1)
) %>%
mutate(serial='jobsAFAM2')
jobsAFAM <- bind_rows(jobsAFAM1, jobsAFAM2)
ggplot(jobsAFAM, aes(x=date, y=Percent.Change, col=serial)) + geom_line()
#Chris Njuguna
tidyr::gather() is the one in tidyverse workflow to turn wide dataframe to long tidy layout, then ggplot could plot multiple serials.

An alternative is to bind the dataframes, and assign them the type of variable they represent. This will let you use the full dataset in a tidier way
library(ggplot2)
library(dplyr)
df1 <- data.frame(dates = 1:10,Variable = rnorm(mean = 0.5,10))
df2 <- data.frame(dates = 1:10,Variable = rnorm(mean = -0.5,10))
df3 <- df1 %>%
mutate(Type = 'a') %>%
bind_rows(df2 %>%
mutate(Type = 'b'))
ggplot(df3,aes(y = Variable,x = dates,color = Type)) +
geom_line()

How to specify columns in facet_grid OR how to change labels in facet_wrap

I have a large number of data series that I want to plot using small multiples. A combination of ggplot2 and facet_wrap does what I want, typically resulting a nice little block of 6 x 6 facets. Here's a simpler version:
The problem is that I don't have adequate control over the labels in facet strips. The names of the columns in the data frame are short and I want to keep them that way, but I want the labels in the facets to be more descriptive. I can use facet_grid so that I can take advantage of the labeller function but then there seems to be no straightforward way to specify the number of columns and a long row of facets just doesn't work for this particular task. Am I missing something obvious?
Q. How can I change the facet labels when using facet_wrap without changing the column names? Alternatively, how can I specify the number of columns and rows when using facet_grid?
Code for a simplified example follows. In real life I am dealing with multiple groups each containing dozens of data series, each of which changes frequently, so any solution would have to be automated rather than relying on manually assigning values.
require(ggplot2)
require(reshape)
# Random data with short column names
set.seed(123)
myrows <- 30
mydf <- data.frame(date = seq(as.Date('2012-01-01'), by = "day", length.out = myrows),
aa = runif(myrows, min=1, max=2),
bb = runif(myrows, min=1, max=2),
cc = runif(myrows, min=1, max=2),
dd = runif(myrows, min=1, max=2),
ee = runif(myrows, min=1, max=2),
ff = runif(myrows, min=1, max=2))
# Plot using facet wrap - we want to specify the columns
# and the rows and this works just fine, we have a little block
# of 2 columns and 3 rows
mydf <- melt(mydf, id = c('date'))
p1 <- ggplot(mydf, aes(y = value, x = date, group = variable)) +
geom_line() +
facet_wrap( ~ variable, ncol = 2)
print (p1)
# Problem: we want more descriptive labels without changing column names.
# We can change the labels, but doing so requires us to
# switch from facet_wrap to facet_grid
# However, in facet_grid we can't specify the columns and rows...
mf_labeller <- function(var, value){ # lifted bodily from the R Cookbook
value <- as.character(value)
if (var=="variable") {
value[value=="aa"] <- "A long label"
value[value=="bb"] <- "B Partners"
value[value=="cc"] <- "CC Inc."
value[value=="dd"] <- "DD Company"
value[value=="ee"] <- "Eeeeeek!"
value[value=="ff"] <- "Final"
}
return(value)
}
p2 <- ggplot(mydf, aes(y = value, x = date, group = variable)) +
geom_line() +
facet_grid( ~ variable, labeller = mf_labeller)
print (p2)

I don't quite understand. You've already written a function that converts your short labels to long, descriptive labels. What is wrong with simply adding a new column and using facet_wrap on that column instead?
mydf <- melt(mydf, id = c('date'))
mydf$variableLab <- mf_labeller('variable',mydf$variable)
p1 <- ggplot(mydf, aes(y = value, x = date, group = variable)) +
geom_line() +
facet_wrap( ~ variableLab, ncol = 2)
print (p1)

To change the label names, just change the factor levels of the factor you use in facet_wrap. These will be used in facet_wrap on the strips. You can use a similar setup as you would using the labeller function in facet_grid. Just do something like:
new_labels = sapply(levels(df$factor_variable), custom_labeller_function)
df$factor_variable = factor(df$factor_variable, levels = new_labels)
Now you can use factor_variable in facet_wrap.

Just add labeller = label_wrap_gen(width = 25, multi_line = TRUE) to the facet_wrap() arguments.
Eg.: ... + facet_wrap( ~ variable, ,labeller = label_wrap_gen(width = 25, multi_line = TRUE))
More info: ?ggplot2::label_wrap_gen

Simply add labeller = label_both to the facet_wrap() arguments.
... + facet_wrap( ~ variable, labeller = label_both)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Loops, dataframes and ggplot - r

Related

How can I manually add labels to multiple ggplot2 mappings created through a for-loop?

R - Reorder a bar plot in a function using ggplot2

Data driven plot names in data.table

Plotting multiple time series on the same plot using ggplot()

How to specify columns in facet_grid OR how to change labels in facet_wrap

Categories

Resources