Data driven plot names in data.table - r

This is a personal project to learn the syntax of the data.table package. I am trying to use the data values to create multiple graphs and label each based on the by group value. For example, given the following data:
# Generate dummy data
require(data.table)
set.seed(222)
DT = data.table(grp=rep(c("a","b","c"),each=10),
x = rnorm(30, mean=5, sd=1),
y = rnorm(30, mean=8, sd=1))
setkey(DT, grp)
The data consists of random x and y values for 3 groups (a, b, and c). I can create a formatted plot of all values with the following code:
# Example of plotting all groups in one plot
require(ggplot2)
p <- ggplot(data=DT, aes(x = x, y = y)) +
aes(shape = factor(grp))+
geom_point(aes(colour = factor(grp), shape = factor(grp)), size = 3) +
labs(title = "Group: ALL")
p
This creates the following plot:
Instead I would like to create a separate plot for each by group, and change the plot title from “Group: ALL” to “Group: a”, “Group: b”, “Group: c”, etc. The documentation for data.table says:
.BY is a list containing a length 1 vector for each item in by. This can be useful when by is not known in advance. The by variables are also available to j directly by name; useful for example for titles of graphs if j is a plot command, or to branch with if()
That being said, I do not understand how to use .BY or .SD to create separate plots for each group. Your help is appreciated.

Here is the data.table solution, though again, not what I would recommend:
make_plot <- function(dat, grp.name) {
print(
ggplot(dat, aes(x=x, y=y)) +
geom_point() + labs(title=paste0("Group: ", grp.name$grp))
)
NULL
}
DT[, make_plot(.SD, .BY), by=grp]
What you really should do for this particular application is what #dmartin recommends. At least, that's what I would do.

Instead of using data.table, you could use facet_grid in ggplot with the labeller argument:
p <- ggplot(data=DT, aes(x = x, y = y)) + aes(shape = factor(grp)) +
geom_point(aes(colour = factor(grp), shape = factor(grp)), size = 3) +
facet_grid(. ~ grp, labeller = label_both)
See the ggplot documentation for more information.

I see you already have a "facetting" option. I had done this
p+facet_wrap('grp')
But this gives the same result:
p+facet_wrap(~grp)

Related

Represent dataset in column bar in R using ggplot [duplicate]

I have a csv file which looks like the following:
Name,Count1,Count2,Count3
application_name1,x1,x2,x3
application_name2,x4,x5,x6
The x variables represent numbers and the applications_name variables represent names of different applications.
Now I would like to make a barplot for each row by using ggplot2. The barplot should have the application_name as title. The x axis should show Count1, Count2, Count3 and the y axis should show the corresponding values (x1, x2, x3).
I would like to have a single barplot for each row, because I have to store the different plots in different files. So I guess I cannot use "melt".
I would like to have something like:
for each row in rows {
print barplot in file
}
Thanks for your help.
You can use melt to rearrange your data and then use either facet_wrap or facet_grid to get a separate plot for each application name
library(ggplot2)
library(reshape2)
# example data
mydf <- data.frame(name = paste0("name",1:4), replicate(5,rpois(4,30)))
names(mydf)[2:6] <- paste0("count",1:5)
# rearrange data
m <- melt(mydf)
# if you are wanting to export each plot separately
# I used facet_wrap as a quick way to add the application name as a plot title
for(i in levels(m$name)) {
p <- ggplot(subset(m, name==i), aes(variable, value, fill = variable)) +
facet_wrap(~ name) +
geom_bar(stat="identity", show_guide=FALSE)
ggsave(paste0("figure_",i,".pdf"), p)
}
# or all plots in one window
ggplot(m, aes(variable, value, fill = variable)) +
facet_wrap(~ name) +
geom_bar(stat="identity", show_guide=FALSE)
I didn't see #user20650's nice answer before preparing this. It's almost identical, except that I use plyr::d_ply to save things instead of a loop. I believe dplyr::do() is another good option (you'd group_by(Name) first).
yourData <- data.frame(Name = sample(letters, 10),
Count1 = rpois(10, 20),
Count2 = rpois(10, 10),
Count3 = rpois(10, 8))
library(reshape2)
yourMelt <- melt(yourData, id.vars = "Name")
library(ggplot2)
# Test a function on one piece to develope graph
ggplot(subset(yourMelt, Name == "a"), aes(x = variable, y = value)) +
geom_bar(stat = "identity") +
labs(title = subset(yourMelt, Name == 'a')$Name)
# Wrap it up, with saving to file
bp <- function(dat) {
myPlot <- ggplot(dat, aes(x = variable, y = value)) +
geom_bar(stat = "identity") +
labs(title = dat$Name)
ggsave(filname = paste0("path/to/save/", dat$Name, "_plot.pdf"),
myPlot)
}
library(plyr)
d_ply(yourMelt, .variables = "Name", .fun = bp)

Create graphs by group using ggplot in R

I'm relatively new to using ggplot2 in R and have been struggling with this for awhile. I have figured out how to get everything from one data frame on a graph (that is pretty easy...), and how to write a loop function to get each observation (id in the example below) onto their own graphs but not how to create separate graphs with multiple id per group, when the id and group can change each time I run the code. Here is some sample data and the output I am trying to produce.
x <- c(1,3,6,12,24,48,72,1,3,6,12,24,48,72,1,3,6,12,24,48,72,1,3,6,12,24,48,72)
y <- c(8,27,67,193,271,294,300,10,30,70,195,280,300,310,5,25,60,185,250,275,300,15,40,80,225,275,325,330)
group <- c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2)
id <- c(100,100,100,100,100,100,100,101,101,101,101,101,101,101,102,102,102,102,102,102,102,103,103,103,103,103,103,103)
df <- data.frame(x,y,group,id)
Similar questions were asked here and here but I still can't figure out how to do what I need because I need separate graphs (not facets) by group with multiple id on the same graph.
Edit to add attempt -
l <- unique(df$group)
for(l in df$group){
print(ggplot(df, aes(x = x, y = y, group = group, color = id))+
geom_line())
}
To achieve your desired result
Split your dataframe by group using e.g. split
Use lapply to loop over the list of splitted data frames to create your plots or if you want to add the group labels to the title you could loop over names(df_split).
Note: I converted the id variable to factor. Also, you have to map id on the group aesthetic to get lines per group. However, as your x variable is a numeric there is actually no need for the group aesthetic.
library(ggplot2)
df_split <- split(df, df$group)
lapply(df_split, function(df) {
ggplot(df, aes(x = x, y = y, group = id, color = factor(id))) +
geom_line()
})
lapply(names(df_split), function(i) {
ggplot(df_split[[i]], aes(x = x, y = y, group = id, color = factor(id))) +
geom_line() +
labs(title = paste("group =", i))
})
#> [[1]]
#>
#> [[2]]
And even I if would recommend to use lapply the same could be achieved using a for loop like so:
for (i in names(df_split)) {
print(
ggplot(df_split[[i]], aes(x = x, y = y, group = id, color = factor(id))) +
geom_line() +
labs(title = paste("group =", i))
)
}
Use facet_grid() or facet_wrap()
library(ggplot2)
ggplot(df, aes(x= x, y=y, colour= factor(id))) + geom_line() + facet_grid(group ~ .)
Edit: OP clarifies in comments they want separate graphs, not faceting
# 1
ggplot2(df[df$group == 1,], aes(x= x, y=y, colour= factor(id))) + geom_line()
# 2
ggplot2(df[df$group == 2,], aes(x= x, y=y, colour= factor(id))) + geom_line()

How to graph "before and after" measures using ggplot with connecting lines and subsets?

I’m totally new to ggplot, relatively fresh with R and want to make a smashing ”before-and-after” scatterplot with connecting lines to illustrate the movement in percentages of different subgroups before and after a special training initiative. I’ve tried some options, but have yet to:
show each individual observation separately (now same values are overlapping)
connect the related before and after measures (x=0 and X=1) with lines to more clearly illustrate the direction of variation
subset the data along class and id using shape and colors
How can I best create a scatter plot using ggplot (or other) fulfilling the above demands?
Main alternative: geom_point()
Here is some sample data and example code using genom_point
x <- c(0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1) # 0=before, 1=after
y <- c(45,30,10,40,10,NA,30,80,80,NA,95,NA,90,NA,90,70,10,80,98,95) # percentage of ”feelings of peace"
class <- c(0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,1) # 0=multiple days 1=one day
id <- c(1,1,2,3,4,4,4,4,5,6,1,1,2,3,4,4,4,4,5,6) # id = per individual
df <- data.frame(x,y,class,id)
ggplot(df, aes(x=x, y=y), fill=id, shape=class) + geom_point()
Alternative: scale_size()
I have explored stat_sum() to summarize the frequencies of overlapping observations, but then not being able to subset using colors and shapes due to overlap.
ggplot(df, aes(x=x, y=y)) +
stat_sum()
Alternative: geom_dotplot()
I have also explored geom_dotplot() to clarify the overlapping observations that arise from using genom_point() as I do in the example below, however I have yet to understand how to combine the before and after measures into the same plot.
df1 <- df[1:10,] # data before
df2 <- df[11:20,] # data after
p1 <- ggplot(df1, aes(x=x, y=y)) +
geom_dotplot(binaxis = "y", stackdir = "center",stackratio=2,
binwidth=(1/0.3))
p2 <- ggplot(df2, aes(x=x, y=y)) +
geom_dotplot(binaxis = "y", stackdir = "center",stackratio=2,
binwidth=(1/0.3))
grid.arrange(p1,p2, nrow=1) # GridExtra package
Or maybe it is better to summarize data by x, id, class as mean/median of y, filter out ids producing NAs (e.g. ids 3 and 6), and connect the points by lines? So in case if you don't really need to show variability for some ids (which could be true if the plot only illustrates tendencies) you can do it this way:
library(ggplot)
library(dplyr)
#library(ggthemes)
df <- df %>%
group_by(x, id, class) %>%
summarize(y = median(y, na.rm = T)) %>%
ungroup() %>%
mutate(
id = factor(id),
x = factor(x, labels = c("before", "after")),
class = factor(class, labels = c("one day", "multiple days")),
) %>%
group_by(id) %>%
mutate(nas = any(is.na(y))) %>%
ungroup() %>%
filter(!nas) %>%
select(-nas)
ggplot(df, aes(x = x, y = y, col = id, group = id)) +
geom_point(aes(shape = class)) +
geom_line(show.legend = F) +
#theme_few() +
#theme(legend.position = "none") +
ylab("Feelings of peace, %") +
xlab("")
Here's one possible solution for you.
First - to get the color and shapes determined by variables, you need to put these into the aes function. I turned several into factors, so the labs function fixes the labels so they don't appear as "factor(x)" but just "x".
To address multiple points, one solution is to use geom_smooth with method = "lm". This plots the regression line, instead of connecting all the dots.
The option se = FALSE prevents confidence intervals from being plotted - I don't think they add a lot to your plot, but play with it.
Connecting the dots is done by geom_line - feel free to try that as well.
Within geom_point, the option position = position_jitter(width = .1) adds random noise to the x-axis so points do not overlap.
ggplot(df, aes(x=factor(x), y=y, color=factor(id), shape=factor(class), group = id)) +
geom_point(position = position_jitter(width = .1)) +
geom_smooth(method = 'lm', se = FALSE) +
labs(
x = "x",
color = "ID",
shape = 'Class'
)

Single barplot for each row of dataframe

I have a csv file which looks like the following:
Name,Count1,Count2,Count3
application_name1,x1,x2,x3
application_name2,x4,x5,x6
The x variables represent numbers and the applications_name variables represent names of different applications.
Now I would like to make a barplot for each row by using ggplot2. The barplot should have the application_name as title. The x axis should show Count1, Count2, Count3 and the y axis should show the corresponding values (x1, x2, x3).
I would like to have a single barplot for each row, because I have to store the different plots in different files. So I guess I cannot use "melt".
I would like to have something like:
for each row in rows {
print barplot in file
}
Thanks for your help.
You can use melt to rearrange your data and then use either facet_wrap or facet_grid to get a separate plot for each application name
library(ggplot2)
library(reshape2)
# example data
mydf <- data.frame(name = paste0("name",1:4), replicate(5,rpois(4,30)))
names(mydf)[2:6] <- paste0("count",1:5)
# rearrange data
m <- melt(mydf)
# if you are wanting to export each plot separately
# I used facet_wrap as a quick way to add the application name as a plot title
for(i in levels(m$name)) {
p <- ggplot(subset(m, name==i), aes(variable, value, fill = variable)) +
facet_wrap(~ name) +
geom_bar(stat="identity", show_guide=FALSE)
ggsave(paste0("figure_",i,".pdf"), p)
}
# or all plots in one window
ggplot(m, aes(variable, value, fill = variable)) +
facet_wrap(~ name) +
geom_bar(stat="identity", show_guide=FALSE)
I didn't see #user20650's nice answer before preparing this. It's almost identical, except that I use plyr::d_ply to save things instead of a loop. I believe dplyr::do() is another good option (you'd group_by(Name) first).
yourData <- data.frame(Name = sample(letters, 10),
Count1 = rpois(10, 20),
Count2 = rpois(10, 10),
Count3 = rpois(10, 8))
library(reshape2)
yourMelt <- melt(yourData, id.vars = "Name")
library(ggplot2)
# Test a function on one piece to develope graph
ggplot(subset(yourMelt, Name == "a"), aes(x = variable, y = value)) +
geom_bar(stat = "identity") +
labs(title = subset(yourMelt, Name == 'a')$Name)
# Wrap it up, with saving to file
bp <- function(dat) {
myPlot <- ggplot(dat, aes(x = variable, y = value)) +
geom_bar(stat = "identity") +
labs(title = dat$Name)
ggsave(filname = paste0("path/to/save/", dat$Name, "_plot.pdf"),
myPlot)
}
library(plyr)
d_ply(yourMelt, .variables = "Name", .fun = bp)

How to make multiple plots in r?

I have a large matrix mdat (1000 rows and 16 columns) contains first column as x variable and other columns as y variables. What I want to do is to make scatter plot in R having 15 figures on the same window. For example:
mdat <- matrix(c(1:50), nrow = 10, ncol=5)
In the above matrix, I have 10 rows and 5 columns. Is it possible that to use the first column as variable on x axes and other columns as variable on y axes, so that I have four different scatterplots on the same window? Keep in mind that I will not prefer par(mfrow=, because in that case I have to run each graph and then produce them on same window. What I need is a package so that I will give it just data and x, y varaibeles, and have graphs on same windows.
Is there some package available that can do this? I cannot find one.
Perhaps the simplest base R way is mfrow (or mfcol)
par(mfrow = c(2, 2)) ## the window will have 2 rows and 2 columns of plots
for (i in 2:ncol(mdat)) plot(mdat[, 1], mdat[, i])
See ?par for everything you might want to know about further adjustments.
Another good option in base R is layout (the help has some nice examples). To be fancy and pretty, you could use the ggplot2 package, but you'll need to reshape your data into a long format.
require(ggplot2)
require(reshape2)
molten <- melt(as.data.frame(mdat), id = "V1")
ggplot(molten, aes(x = V1, y = value)) +
facet_wrap(~ variable, nrow = 2) +
geom_point()
Alternatively with colors instead of facets:
ggplot(molten, aes(x = V1, y = value, color = variable)) +
geom_point()
#user4299 You can re-write shujaa's ggplot command in this form, using qplot which means 'quick plot' which is easier when starting out. Then instead of faceting, use variable to drive the color. So first command produces the same output as shujaa's answer, then the second command gives you all the lines on one plot with different colors and a legend.
qplot(data = molten, x = V1, y = value, facets = . ~ variable, geom = "point")
qplot(data = molten, x = V1, y = value, color = variable, geom = "point")
Maybe
library(lattice)
x = mdat[,1]; y = mdat[,-1]
df = data.frame(X = x, Y = as.vector(y),
Grp = factor(rep(seq_len(ncol(y)), each=length(x))))
xyplot(Y ~ X | Grp, df)

Resources