Modify labels in facet_grid on existing ggplot2 object - r

Suppose we have the following dataset:
d = data.frame(
y = rnorm(100),
x = rnorm(100),
f1 = sample(c("A", "B"), size=100, replace=T)
)
And I want to plot the data using facets:
require(ggplot2)
plot = ggplot(d, aes(x,y)) +
facet_grid(~f1, labeller = labeller(.cols=label_both))
Now let's suppose I want to capitalize all columns. It's trivial to do so with the x/y variables:
plot + labs(x="X", y="Y")
But how do I go about capitalizing the facet labels?
The obvious solutions are:
Just change the name of the variable (e.g., d$F1 = d$f1) then rerun the code.
Create a custom labeller that capitalizes the variable names
However, I cannot do either of these in my current application. I cannot change the original ggplot object; I can only layer (e.g., as I do with the x/y axis labels) or I can modify the ggplot object directly.
So, is there a way to change the facet labels by either modifying the ggplot object directly or layering it?

Fortunately, I was able to solve my own problem by creating my MWE. And, rather than keep that knowledge to myself, I figured I'd share it with others (or future me if I forget how to do this).
ggplot objects can be easily dissected using str
In this case, the ggplot object (plot) can be dissected:
str(plot)
Which lists many objects, including one called facet, which can be further dissected:
str(plot$facet)
After some trial and error, I found an object called plot$facet$params$cols. Now, using the following code:
names(plot$facet$params$cols) = "F1"
I get the desired result.

Related

Labeling points using qplot in R

I'm having trouble labeling points in R. I've created a qplot that uses four numeric variables I'm plotting as the x and y axes, the color of the points and the size of the points. When I try to add the labels by just including label = player (where player is the column name with the labels I want) R says: "Error: object 'Player' not found." Maybe because this is the only text column? This is probably really simple, but my first plot, so...
qplot(cars$dist, cars$speed) + geom_text(label = cars$dist)
You can append normal ggplot syntax to qplot() exactly the same way you would when calling ggplot().
You need to specify the source of the data you are feeding: you can do so by passing the name of the dataframe to the data argument of a geom() and then referencing a specific column ('Player'), in quotes, in the aes() call within the same geom():
geom_point(data = data, aes(x = 'col1', y = 'col2'))
or you can attach() the data, and then just specify the column (without quotes or the data= parameter):
geom_point(aes(x = col1, y = col2))
Thank you to Marius for pointing out the notion that referencing data through the data parameter may be preferential over $ (data$col) in certain situations like facetting.

Plot multiple traces in R

I started learning R for data analysis and, most importantly, for data visualisation.
Since I am still in the switching process, I am trying to reproduce the activities I was doing with Graphpad Prism or Origin Pro in R. In most of the cases everything was smooth, but I could not find a smart solution for plotting multiple y columns in a single graph.
What I usually get from the softwares I use for data visualisations look like this:
Each single black trace is a measurement, and I would like to obtain the same plot in R. In Prism or Origin, this will take a single copy-paste in a XY graph.
I exported the matrix of data (one X, which indicates the time, and multiple Y values, which are the traces you see in the image).
I imported my data in R with the following commands:
library(ggplot2) #loaded ggplot2
Data <- read.csv("Directory/File.txt", header=F, sep="") #imported data
DF <- data.frame(Data) #transformed data into data frame
If I plot my data now, I obtain a series of columns, where the first one (called V1) is the X axis and all the others (V2 to V140) are the traces I want to put on the same graph.
To plot the data, I tried different solutions:
ggplot(data=DF, aes(x=DF$V1, y=DF[V2:V140]))+geom_line()+theme_bw() #did not work
plot(DF, xy.coords(x=DF$V1, y=DF$V2:V140)) #gives me an error
plot(DF, xy.coords(x=V1, y=c(V2:V10))) #gives me an error
I tried the matplot, without success, following the EZH guide:
The code I used is the following: matplot(x=DF$V1, type="l", lty = 2:100)
The only solution I found would be to individually plot a command for each single column, but it is a crazy solution. The number of columns varies among my data, and manually enter commands for 140 columns is insane.
What would you suggest?
Thank you in advance.
Here there are also some data attached.Data: single X, multiple Y
I tried using the matplot(). I used a very sample data which has no trend at all. so th eoutput from my code shall look terrible, but my main focus is on the code. Since you have already tried matplot() ,just recheck with below solution if you had done it right!
set.seed(100)
df = matrix(sample(1:685765,50000,replace = T),ncol = 100)
colnames(df)=c("x",paste0("y", 1:99))
dt=as.data.frame(df)
matplot(dt[["x"]], y = dt[,c(paste0("y",1:99))], type = "l")
If you want to plot in base R, you have to make a plot and add lines one at a time, however that isn't hard to do.
we start by making some sample data. Since the data in the link seemed to all be on the same scale, I will assume your data frame only has y values and the x value is stored separately.
plotData <- as.data.frame(matrix(sort(rnorm(500)),ncol = 5))
xval <- sort(sample(200, 100))
Now we can initialize a plot with the first column.
plot(xval, plotData[[1]], type = "l",
ylim = c(min(plotData), max(plotData)))
type = "l" makes a line plot instead of a scatter plot
ylim = c(min(plotData), max(plotData)) makes sure the y-axis will fit all the data.
Now we can add the rest of the values.
apply(plotData[-1], 2, lines, x = xval)
plotData[-1] removes the column we already plotted,
apply function with 2 as the second parameter means we want to execute a function on every column,
lines defines the function we are applying to the columns. lines adds a new line to the current plot.
x = xval passes an extra parameter (x) to the lines function.
if you wat to plot the data using ggplot2, the data should be transformed to long format;
library(ggplot2)
library(reshape2)
dat <- read.delim('AP.txt', header = F)
# plotting only first 9 traces
# my rstudio will crach if I plot the full data;
df <- melt(dat[1:10], id.vars = 'V1')
ggplot(df, aes(x = V1, y = value, color = variable)) + geom_line()
# if you want all traces to be in same colour, you can use
ggplot(df, aes(x = V1, y = value, group = variable)) + geom_line()

Manually added legend not working in ggplot2?

Here's facsimile of my data:
d1 <- data.frame(
e=rnorm(3000,10,10)
)
d2 <- data.frame(
e=rnorm(2000,30,30)
)
So, I got around the problem of plotting two different density distributions from two very different datasets on the same graph by doing this:
ggplot() +
geom_density(aes(x=e),fill="red",data=d1) +
geom_density(aes(x=e),fill="blue",data=d2)
But when I try to manually add a legend, like so:
ggplot() +
geom_density(aes(x=e),fill="red",data=d1) +
geom_density(aes(x=e),fill="blue",data=d2) +
scale_fill_manual(name="Data", values = c("XXXXX" = "red","YYYYY" = "blue"))
Nothing happens. Does anybody know what's going wrong? I thought I could actually manually add legends if need be.
Generally ggplot works best when your data is in a single data.frame and in long format. In your case we therefore want to combine the data from both data.frames. For this simple example, we just concatenate the data into a long variable called d and use an additional column id to indicate to which dataset that value belongs.
d.f <- data.frame(id = rep(c("XXXXX", "YYYYY"), c(3000, 2000)),
d = c(d1$e, d2$e))
More complex data manipulations can be done using packages such as reshape2 and tidyr. I find this cheat sheet often useful. Then when we plot we map fill to id, and ggplot will take of the legend automatically.
ggplot(d.f, aes(x = d, fill = id)) +
geom_density()

What does the ".." refer to in ggplot's "fill=..density.."?

I am working my way through The R Graphics Cookbook and ran into this set of code:
library(gcookbook)
library(ggplot2)
p <- ggplot(faithful, aes(x = eruptions, y = waiting)) +
geom_point() +
stat_density2d(aes(alpha=..density.., fill=..density..), geom="tile", contour=FALSE)
It runs fine, but I don't understand what the .. before and after density is referring to. I can't seem to find it mentioned in the book either.
Variable names beginning with .. are possible in R, and are treated in the same way as any other variable. Trying creating one of your own.
..x.. <- 1:5
ggplot2 often creates appends extra columns to your data frame in order to draw the plot. (In ggplot2 terminology, this is "fortifying the data".) ggplot2 uses the naming convention ..something.. for these fortified columns.
This is partly because using ..something.. is unlikely to clash with existing variables in your dataset. Take that as a hint that you shouldn't name the columns in your dataset using that pattern.
The stat_density* functions use ..density.. to represent the density of the x variable. Other fortified variable names include ..count...

How to make an overall boxplot alongside factors in R?

I am trying to create a boxplot that shows all of the factors of a variable, along with sample size, and at eh end of the plot also want an overall boxplot that combines all of the values into one. I am using the following line of code to do everything except making the overall plot:
library(ggplot2)
library(plyr)
xlabels <- ddply(extract8, .(Fuel), summarize, xlabels = paste(unique(Fuel), '\n(n = ', length(Fuel),')'))
ggplot(extract8, aes(x = Fuel, y = Exfiltration.Fraction.Percentage))+geom_boxplot()+
stat_boxplot(geom='errorbar', linetype=1) +
geom_boxplot(fill="pink") + geom_hline(yintercept = 0.4) +
scale_x_discrete(labels = xlabels[['xlabels']]) + ggtitle("Exfiltration Fraction (%) by Fuel Type")
Not sure on how to proceed regarding adding a boxplot that combines all of the factors into one.
This is certainly not the most elegant way to solve it, but it works:
Copy your dataset into a new object.
Within the new object, replace the content of the variable containing the factors with the label you would like, for instance, "Total".
Use rbind to attach the old and new objects together and attribute the result to the new object.
In ggplot replace the old object by the new object.
I had the same issue, couldn't find an answer and proceeded this way.

Resources