Plotting ordered factors on x-axis in ggplot2 - r

I have the following data.
pos <- c(1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6)
block <- c(1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2)
set <- c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4)
fsize <- c(4,5,6,1,2,1,2,2,3,4,5,1,7,11,2,1,2,3,5,3,5,6,1,2)
dat <- data.frame(pos,block,set,fsize)
dat <- dat[order(block,set,-fsize),]
dat$pos <- as.factor(dat$pos)
ggplot(dat, aes(x = pos, y = fsize)) + geom_bar(stat="identity") +
facet_wrap(~block+set)
Each position pos is associated with a size fsize. There are 6 positions within each block/set. I want to arrange the sizes in decreasing female size.
So for example, the first block/set with rearranged positions would be 3,2,1,5,4,6 and it would be different for the other. However, when I plot it, the x-axis gets automatically reordered to 1-6 even when I factor the pos column. Any suggestions on how to rectify this?

Here is a solution, but in order to plot in the desired order, I needed to create a new variable with unique names. The variable is a combination of the set and pos columns.
dat <- data.frame(pos,block,set,fsize)
dat <- dat[order(block,set,-fsize),]
#make a key variable in the overall desired order
key<-paste(dat$set, dat$pos, sep=",")
#make an new ordered factor variable in the proper order
dat$order <- factor(key, levels= key, ordered =TRUE)
ggplot(dat, aes(x = order, y = fsize)) + geom_bar(stat="identity") +
facet_wrap(~block+set, scales="free_x") + labs(x="Set,Pos")

Related

How to use + geom_line() with a categorical x-variable and quantitative y-variable [duplicate]

This question already has an answer here:
ggplot: line plot for discrete x-axis
(1 answer)
Closed 2 years ago.
How can I create a line graph with ggplot 2 where the x variable is either categorical or a factor, the y variable is numeric and the group variable is categorical? I have tried just + geom_point() with the variables as stated above and it works, but + geom_line() does not.
I have already reviewed posts such as:
Creating line graph using categorical data,
ggplot2 bar plot with two categorical variables, and No line in plot chart despite + geom_line(), but none of them answer my question.
Before I go into code and examples, (1) Yes I absolutely must have the x-variable and group variable as a character or factor, (2) No, I do not want a bar graph or just geom_point().
The example below provides the coefficients of multiple independent variables from three different example regressions run using different variations on the dependent variable. While the code below shows a work around that I figured out (i.e. creating a int variable named 'test' to use in place of the chr variable containing the names of the independent variables form the regression), I need to instead be able to preserve the chr names of the independent variables.
Here is what I have:
library(dplyr)
library(ggplot2)
library(plotly)
library(tidyr)
var_names <- c("ST1", "ST2", "ST3",
"EFI1", "EFI2", "EFI3", "EFI4",
"EFI5", "EFI6")
####Dataset1####
reg <- c(26441.84, 20516.03, 12936.79, 17793.22, 18837.48, 15704.31, 17611.14, 17360.59, 14836.34)
r_adj <- c(30473.17, 35221.43, 29875.98, 30267.31, 29765.9, 30322.86, 31535.66, 30955.29, 29828.3)
a_adj <- c(19588.63, 31163.79, 22498.53, 27713.72, 25703.89, 28565.34, 29853.22, 29088.25, 25213.02)
df1 <- data.frame(var_names, reg, r_adj, a_adj, stringsAsFactors = FALSE)
df1$test <- c(1:9)
df2 <- gather(df1, key = "series_type", value = "value", c(2:4))
fig7 <- ggplot(df2, aes(x = test, y = value, color = series_type)) + geom_line() + geom_point()
fig7
Ultimately I want something that looks like the plot below, but with the independent variable names in place of the 'test' variable.
Example Plot
You can convert var_names into a factor and set the levels in the order of appearance (otherwise it will be assigned alphanumerically and the x axis will be out of order). Then just add series_type to the group parameter in the plot.
df2 <- gather(df1, key = "series_type", value = "value", c(2:4)) %>%
mutate(var_names = factor(var_names, levels = unique(var_names)))
ggplot(df2, aes(x = var_names, y = value, color = series_type, group = series_type)) + geom_line() + geom_point()

boxplots with missing values in R - ggplot

I am trying to make boxplots for a matrix (athTp) with 6 variables (columns) but with many missing values, '
ggplot(athTp)+geom_boxplot()
But maybe sth I am doing wrong...
I tried also to make many box plots and after to arrange the grid, but the final plot was very small (in desired dimensions), loosing many of details.
q1 <- ggplot(athTp,aes(x="V1", y=athTp[,1]))+ geom_boxplot()
..continue with other 5 columns
grid.arrange(q1,q2,q3,q4,q5,q6, ncol=6)
ggsave("plot.pdf",plot = qq, width = 8, height = 8, units = "cm")
Do you have any ideas?
Thanks in advance!
# ok so your data has 6 columns like this
set.seed(666)
dat <- data.frame(matrix(runif(60,1,20),ncol=6))
names(dat) <- letters[1:6]
head(dat)
# so let's get in long format like ggplot likes
library(reshape2)
longdat <- melt(dat)
head(longdat)
# and try your plot call again specifying that we want a box plot per column
# which is now indicated by the "variable" column
# [remember you should specify the x and y axes with `aes()`]
library(ggplot2)
ggplot(longdat, aes(x=variable, y=value)) + geom_boxplot(aes(colour = variable))

Automatically determine number of axis ticks for discrete variable

I want to automatically set the number of breaks and the position of the breaks itself for the axis of a discrete variable such that the labels which are plotted are actually readable.
For example in the code below, the resulting plot should only show a portion of the labels/the x-variable.
ggData <- data.frame(x=paste0('B',1:100), y=rnorm(100))
ggplot(ggData, aes_string('x', 'y')) +
geom_point(size=2.5, shape=19, na.rm = TRUE)
So far, I tried to use pretty, and pretty_breaks which are, however, not for discrete variables.
Fist we turn the factor into a character and then into a ordered factor. Secondly, we subset ggData$x to create a vector (labels) with the ticks we want. In the example every 10 elements. Finally, we create the plot using scale_x_discrete, using the previous vector (labels), inside the parameter breaks.
ggData <- data.frame(x=paste0('B',1:100), y=rnorm(100))
ggData$x <- as.character(ggData$x)
ggData$x <- factor(ggData$x, levels=unique(ggData$x))
labels <- ggData$x[seq(0, 100, by= 10)]
ggplot(ggData, aes_string('x', 'y')) +
geom_point(size=2.5, shape=19, na.rm = TRUE) +
scale_x_discrete(breaks=labels)

R plot two series of means with 95% confidence intervals

I am trying to plot the following data
factor <- as.factor(c(1,2,3))
V1_mean <- c(100,200,300)
V2_mean <- c(350,150,60)
V1_stderr <- c(5,9,3)
V2_stderr <- c(12,9,10)
plot <- data.frame(factor,V1_mean,V2_mean,V1_stderr,V2_stderr)
I want to create a plot with factor on the x-axis, value on the y-axis and seperate lines for V1 and V2 (hence the points are the values of V1_mean on one line and V2_mean on the other). I would also like to add error bars for these means based on V1_stderr and V2_stderr
Many thanks
I'm not sure regarding your desired output, but here's a possible solution.
First of all, I wouldn't call your data plot as this is a stored function in R which is being commonly used
Second of all, when you want to plot two lines in ggplot you'll usually have to tide your data using functions such as melt (from reshape2 package) or gather (from tidyr package).
Here's an a possible approach
library(ggplot2)
library(reshape2)
dat <- data.frame(factor, V1_mean, V2_mean, V1_stderr, V2_stderr)
mdat <- cbind(melt(dat[1:3], "factor"), melt(dat[c(1, 4:5)], "factor"))
names(mdat) <- make.names(names(mdat), unique = TRUE)
ggplot(mdat, aes(factor, value, color = variable)) +
geom_point(aes(group = variable)) + # You can also add `geom_point(aes(group = variable)) + ` if you want to see the actual points
geom_errorbar(aes(ymin = value - value.1, ymax = value + value.1))

Dynamically Set X limits on time plot

I am wondering how to dynamically set the x axis limits of a time series plot containing two time series with different dates. I have developed the following code to provide a reproducible example of my problem.
#Dummy Data
Data1 <- data.frame(Date = c("4/24/1995","6/23/1995","2/12/1996","4/14/1997","9/13/1998"), Area_2D = c(20,11,5,25,50))
Data2 <- data.frame(Date = c("6/23/1995","4/14/1996","11/3/1997","11/6/1997","4/15/1998"), Area_2D = c(13,15,18,25,19))
Data3 <- data.frame(Date = c("4/24/1995","6/23/1995","2/12/1996","4/14/1996","9/13/1998"), Area_2D = c(20,25,28,30,35))
Data4 <- data.frame(Date = c("6/23/1995","4/14/1996","11/3/1997","11/6/1997","4/15/1998"), Area_2D = c(13,15,18,25,19))
#Convert date column as date
Data1$Date <- as.Date(Data1$Date,"%m/%d/%Y")
Data2$Date <- as.Date(Data2$Date,"%m/%d/%Y")
Data3$Date <- as.Date(Data3$Date,"%m/%d/%Y")
Data4$Date <- as.Date(Data4$Date,"%m/%d/%Y")
#PLOT THE DATA
max_y1 <- max(Data1$Area_2D)
# Define colors to be used for cars, trucks, suvs
plot_colors <- c("blue","red")
plot(Data1$Date,Data1$Area_2D, col=plot_colors[1],
ylim=c(0,max_y1), xlim=c(min_x1,max_x1),pch=16, xlab="Date",ylab="Area", type="o")
par(new=T)
plot(Data2$Date,Data2$Area_2D, col=plot_colors[2],
ylim=c(0,max_y1), xlim=c(min_x1,max_x1),pch=16, xlab="Date",ylab="Area", type="o")
The main problem I see with the code above is there are two different x axis on the plot, one for Data1 and another for Data2. I want to have a single x axis spanning the date range determined by the dates in Data1 and Data2.
My questions is:
How do i dynamically create an x axis for both series? (i.e select the minimum and maximum date from the data frames 'Data1' and 'Data2')
The solution is to combine the data into one data.frame, and base the x-axis on that. This approach works very well with the ggplot2 plotting package. First we merge the data and add an ID column, which specifies to which dataset it belongs. I use letters here:
Data1$ID = 'A'
Data2$ID = 'B'
merged_data = rbind(Data1, Data2)
And then create the plot using ggplot2, where the color denotes which dataset it belongs to (can easily be changed to different colors):
library(ggplot2)
ggplot(merged_data, aes(x = Date, y = Area_2D, color = ID)) +
geom_point() + geom_line()
Note that you get one uniform x-axis here. In this case this is fine, but if the timeseries do not overlap, this might be problematic. In that case we can use multiple sub-plots, known as facets in ggplot2:
ggplot(merged_data, aes(x = Date, y = Area_2D)) +
geom_point() + geom_line() + facet_wrap(~ ID, scales = 'free_x')
Now each facet has it's own x-axis, i.e. one for each sub-dataset. What approach is most valid depends on the specific situation.

Resources