I often facet according to a numeric variable, but I want the facet label to be more explanatory than the simple number. I usually create a new label variable that has the numeric value pasted to explanatory text. However, when values have more than one place before the decimal, the first number is used for sorting the factor. Any suggestions to avoid this?
iris[,1:4]<-iris[,1:4]*10
This would work fine for iris, when it does not have more than one value before the decimal.
iris$Petal.Width.label<-paste("Petal.Width=", iris$Petal.Width)
iris$Petal.Width.label<-paste("Petal.Width=", iris$Petal.Width)
qplot(data=iris,
x=Sepal.Length,
y=Sepal.Width,
colour=Species)+facet_wrap(~Petal.Width.label)
Related to:
ggplot: How to change facet labels?
How to change the order of facet labels in ggplot (custom facet wrap labels)
Just reoder the levels of your label:
data(iris)
iris[ , 1:4] <- iris[ , 1:4] * 10
iris$Petal.Width.label <- paste("Petal.Width=", iris$Petal.Width)
# reoder levels by Petal.Width
iris$Petal.Width.label2 <- factor(iris$Petal.Width.label,
levels = unique(iris$Petal.Width.label[order(iris$Petal.Width)]))
qplot(data = iris,
x = Sepal.Length,
y = Sepal.Width,
colour = Species)+
facet_wrap( ~Petal.Width.label2)
Related
So my first ggplot2 box plot was just one big stretched out box plot, the second one was correct but I don't understand what changed and why the second one worked. I'm new to R and ggplot2, let me know if you can, thanks.
#----------------------------------------------------------
# This is the original ggplot that didn't work:
#----------------------------------------------------------
zSepalFrame <- data.frame(zSepalLength, zSepalWdth)
zPetalFrame <- data.frame(zPetalLength, zPetalWdth)
p1 <- ggplot(data = zSepalFrame, mapping = aes(x=zSepalWdth, y=zSepalLength, group = 4)) + #fill = zSepalLength
geom_boxplot(notch=TRUE) +
stat_boxplot(geom = 'errorbar', width = 0.2) +
theme_classic() +
labs(title = "Iris Data Box Plot") +
labs(subtitle ="Z Values of Sepals From Iris.R")
p1
#----------------------------------------------------------
# This is the new ggplot box plot line that worked:
#----------------------------------------------------------
bp = ggplot(zSepalFrame, aes(x=factor(zSepalWdth), y=zSepalLength, color = zSepalWdth)) + geom_boxplot() + theme(legend.position = "none")
bp
This is what the ggplot box plot looked like
I don't have your precise dataset, OP, but it seems to stem from assigning a continuous variable to your x axis, when boxplots require a discrete variable.
A continuous variable is something like a numeric column in a dataframe. So something like this:
x <- c(4,4,4,8,8,8,8)
Even though the variable x only contains 4's and 8's, R assigns this as a numeric type of variable, which is continuous. It means that if you plot this on the x axis, ggplot will have no issue with something falling anywhere in-between 4 or 8, and will be positioned accordingly.
The other type of variable is called discrete, which would be something like this:
y <- c("Green", "Green", "Flags", "Flags", "Cars")
The variable y contains only characters. It must be discrete, since there is no such thing as something between "Green" and "Cars". If plotted on an x axis, ggplot will group things as either being "Green", "Flags", or "Cars".
The cool thing is that you can change a continuous variable into a discrete one. One way to do that is to factorize or force R to consider a variable as a factor. If you typed factor(x), you get this:
[1] 4 4 4 8 8 8 8
Levels: 4 8
The values in x are the same, but now there is no such thing as a number between 4 and 8 when x is a factor - it would just add another level.
That is in short why your box plot changes. Let's demonstrate with the iris dataset. First, an example like yours. Notice that I'm assigning x=Sepal.Length. In the iris dataset, Sepal.Length is numeric, so continuous.
ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width)) +
geom_boxplot()
This is similar to yours. The reason is that the boxplot is drawn by grouping according to x and then calculating statistics on those groups. If a variable is continuous, there are no "groups", even if data is replicated (like as in x above). One way to make groups is to force the data to be discrete, as in factor(Sepal.Length). Here's what it looks like when you do that:
ggplot(iris, aes(x=factor(Sepal.Length), y=Sepal.Width)) +
geom_boxplot()
The other way to have this same effect would be to use the group= aesthetic, which does what you might think: it groups according to that column in the dataset.
ggplot(iris, aes(x=Sepal.Length), y=Sepal.Width, group=Sepal.Length)) +
geom_boxplot()
I am creating box plots within R, however, they are appearing incorrectly. My data is based off of German Credit Dataset on Kaggle.
My code with two different attributes trying to be tested:
data %>%
ggplot(aes(x = Creditability, y = Purpose, fill = Creditability)) +
geom_boxplot() +
ggtitle("Creditability vs Purpose")
data %>%
ggplot(aes(x = Creditability, y = Account.Balance, fill = Creditability)) +
geom_boxplot() +
ggtitle("Creditability vs Account Balance")
I've tried a few of the different attributes for it, but results in the same error
Edited info: Is it because the attributes have too much information? I have split the sample into test (300) vs train (700) and I am currently using train. Would it simply be because there's too much info?
Edit picture:
Factors
Edit for graph error:
Error
As others have explained in the comments, you cannot show boxplots where the y axis is set to be a factor. Factors are by their nature discrete variables, even if the levels are named as numbers. In order to utilize the stat function for the boxplot geom, you need the y axis to be continuous and the x axis to be discrete (or able to be separated into discrete values via the group= aesthetic).
Let me demonstrate with the mtcars dataset built into ggplot2:
library(ggplot2)
ggplot(mtcars, aes(x=factor(carb), y=mpg)) + geom_boxplot()
Here we can draw boxpots because the x aesthetic is forced to be discrete (via factor(carb)), while the y axis is using mpg which is a numeric column in the mtcars dataset.
If you set both carb and mpg to be factors, you get something that should look pretty similar to what you're seeing:
ggplot(mtcars, aes(x=factor(carb), y=factor(mpg))) + geom_boxplot()
In your case, all your columns in your dataset are factors. If they are factors that can be coerced to be numbers, you can turn them into continuous vectors via using as.numeric(levels(column_name)[column_name]). Alternatively, you can use as.numeric(as.character(column_name)). Here's what it looks like to first convert the mtcars$mpg column to a factor of numeric values, and then back to being only numeric via this method.
df <- mtcars
# convert to a factor
df$mpg <- factor(df$mpg)
# back to numeric!
df$mpg <- as.numeric(levels(df$mpg)[df$mpg])
# this plot looks like it did before when we did the same with mtcars
ggplot(df, aes(x=factor(carb), y=mpg)) + geom_boxplot()
So, for your case, do this two step process:
data$Purpose <- as.numeric(levels(data$Purpose)[data$Purpose])
data %>%
ggplot(aes(x = Creditability, y = Purpose, fill = Creditability)) +
geom_boxplot() +
ggtitle("Creditability vs Purpose")
That should work. You can follow in a similar fashion for your other variables.
This question already has an answer here:
ggplot: line plot for discrete x-axis
(1 answer)
Closed 2 years ago.
How can I create a line graph with ggplot 2 where the x variable is either categorical or a factor, the y variable is numeric and the group variable is categorical? I have tried just + geom_point() with the variables as stated above and it works, but + geom_line() does not.
I have already reviewed posts such as:
Creating line graph using categorical data,
ggplot2 bar plot with two categorical variables, and No line in plot chart despite + geom_line(), but none of them answer my question.
Before I go into code and examples, (1) Yes I absolutely must have the x-variable and group variable as a character or factor, (2) No, I do not want a bar graph or just geom_point().
The example below provides the coefficients of multiple independent variables from three different example regressions run using different variations on the dependent variable. While the code below shows a work around that I figured out (i.e. creating a int variable named 'test' to use in place of the chr variable containing the names of the independent variables form the regression), I need to instead be able to preserve the chr names of the independent variables.
Here is what I have:
library(dplyr)
library(ggplot2)
library(plotly)
library(tidyr)
var_names <- c("ST1", "ST2", "ST3",
"EFI1", "EFI2", "EFI3", "EFI4",
"EFI5", "EFI6")
####Dataset1####
reg <- c(26441.84, 20516.03, 12936.79, 17793.22, 18837.48, 15704.31, 17611.14, 17360.59, 14836.34)
r_adj <- c(30473.17, 35221.43, 29875.98, 30267.31, 29765.9, 30322.86, 31535.66, 30955.29, 29828.3)
a_adj <- c(19588.63, 31163.79, 22498.53, 27713.72, 25703.89, 28565.34, 29853.22, 29088.25, 25213.02)
df1 <- data.frame(var_names, reg, r_adj, a_adj, stringsAsFactors = FALSE)
df1$test <- c(1:9)
df2 <- gather(df1, key = "series_type", value = "value", c(2:4))
fig7 <- ggplot(df2, aes(x = test, y = value, color = series_type)) + geom_line() + geom_point()
fig7
Ultimately I want something that looks like the plot below, but with the independent variable names in place of the 'test' variable.
Example Plot
You can convert var_names into a factor and set the levels in the order of appearance (otherwise it will be assigned alphanumerically and the x axis will be out of order). Then just add series_type to the group parameter in the plot.
df2 <- gather(df1, key = "series_type", value = "value", c(2:4)) %>%
mutate(var_names = factor(var_names, levels = unique(var_names)))
ggplot(df2, aes(x = var_names, y = value, color = series_type, group = series_type)) + geom_line() + geom_point()
I would like to plot(x,y) but associated with it are two other factors z and t. There are three levels in z and two levels in t. How do I do a scatter plot with assigned colours to each different factors and levels? ... which would mean a total of six different colours.
I'm considering creating multiple .csv file and using par but I think there should be an easier way to do this.
I'm not sure if you want a single plot or multiple plots. Since you mentioned par, I'm guessing multiple plots. Regardless, to make two factors work together to make the correct number of colors, an easy way is to combine them into a new factor by concatenating them together with paste(). Here's an example with ggplot2 and data.table:
library(data.table)
library(ggplot2)
DT <- as.data.table(mtcars)
DT[, combinedFactor := as.factor(paste(cyl, am))]
ggplot(data = DT, aes(x = mpg, y = disp, color = combinedFactor)) +
geom_point() +
facet_wrap(facets = "am")
bargraph from sciplot allows us to plot bar chart with error bars. It also allows grouping by independent variables (factors). I want to group by dependent variable, how can I achieve that
bargraph.CI(x.factor, response, group=NULL, split=FALSE,
col=NULL, angle=NULL, density=NULL,
lc=TRUE, uc=TRUE, legend=FALSE, ncol=1,
leg.lab=NULL, x.leg=NULL, y.leg=NULL, cex.leg=1,
bty="n", bg="white", space=if(split) c(-1,1),
err.width=if(length(levels(as.factor(x.factor)))>10) 0 else .1,
err.col="black", err.lty=1,
fun = function(x) mean(x, na.rm=TRUE),
ci.fun= function(x) c(fun(x)-se(x), fun(x)+se(x)),
ylim=NULL, xpd=FALSE, data=NULL, subset=NULL, ...)
The specification of bargraph.CI is shown above. The response variable is usually numerical vector. This time, I really want to plot three response variables (A,B,C) against the same independent variables. Let me use the data frame "mpg" to illustrate the problem. I can sucessufully get a plot with the following code, here the DV is hwy
data(mpg)
attach(mpg)
bargraph.CI(
class, #categorical factor for the x-axis
hwy, #numerical DV for the y-axis
group=NULL, #grouping factor
legend=T,
ylab="Highway MPG",
xlab="Class")
I can also successfully get a plot with the only change being the DV (changed from hwy to cty)
data(mpg)
attach(mpg)
bargraph.CI(
class, #categorical factor for the x-axis
cty, #numerical DV for the y-axis
group=NULL, #grouping factor
legend=T,
ylab="Highway MPG",
xlab="Class")
However, if I want to use the two DVs at the same time, I mean, for each group, I want to display two bars, one for cty and one for hwy.
data(mpg)
attach(mpg)
bargraph.CI(
class, #categorical factor for the x-axis
c(cty,hwy), #numerical DV for the y-axis
group=NULL, #grouping factor
legend=T,
ylab="Highway MPG",
xlab="Class")
it won't work because of mismatched dimension. How can I achieve this? Well, actually similar effect of bargraph can be achieved by using the method from Boxplot schmoxplot: How to plot means and standard errors conditioned by a factor in R? with ggplot2. So if you have any idea of how to do it with ggplot2, it's also fine for me.
As happens often when displaying data, you should manipulate the data first and then use bargraph.CI. In your expamle, the data.frame that you would like to visualize is the following:
df <- data.frame(class=c(mpg$class, mpg$class),
value=c(mpg$cty, mpg$hwy),
grp=rep(c("cty", "hwy"), each=nrow(mpg)))
Then you can use bargraph.CI on this new data.frame.
bargraph.CI(
class, #categorical factor for the x-axis
value, #numerical DV for the y-axis
group=grp, #grouping factor
data=df,
legend=T,
ylab="Highway MPG",
xlab="Class")