R:par(cex.lab=2) doesn't work in plot(effect(),…) - r

I made a linear regression with a database including group(1=smoke,2=control) , gender(1=m,2=f) and a dependent variable like weight. I want to see the interactions between group and gender with a plot. I need to change the size of the label of axes but it doesn't work with par(). The code is like this:
lin <- lm(weight ~ group + gender + group:gender, data=data)
par(cex.lab = 2, cex.axis = 2)
library(effects)
plot(effect("group:gender",lin,,list(gender=c(1,2))),multiline=T)
The size doesn't change. And if I want to delete the axis like this:
plot(effect("group:gender",lin,,list(gender=c(1,2))),multiline=T,axes=FALSE)
It gives me this error:
$ operator is invalid for atomic vectors
how to solve this?

I don't know why that is happening, my guess is that the class(effect) is "eff" which perhaps may not be suited for plot to render it properly, to avoid this convert this object to data.frame and then use the par functionality to do your task.
Answer to your question: Here if you change par options with different values, your font size will change like in the graph which I have mentioned down.
You can do this:
library(effects)
lin <- lm(mpg ~ cyl + am + am:cyl, data=mtcars)
par(cex.lab=1.2, cex.axis=1.2, cex.main=1.2, cex.sub=1.2) #Here you can check, the par options, if you change it the font will incrase or decrese
effect1 <- data.frame(effect("cyl:am",lin,,list(cyl=c(4,6,8))))
effects <- effect1[,c("cyl","am", "fit")] ##Keeping only the required columns
You can do a plotting with effects , by using all three objects: cyl, am and fit, However, the lines are getting joined , I am not aware any functionality like ggplot's group in base plot R. So I will split it and then plot it.
xvals <- split(effects$am,effects$cyl) #split x-axis basis cyl
yvals <- split(effects$fit,effects$cyl) #split y-axis basis cyl
plot(1:max(unlist(xvals)),xlim = c(0,max(unlist(xvals))),ylim=(c(0,max(unlist(yvals)))),type="n", main="plot b/w mpg, am * cyl",
xlab="am", ylab="mpg") #adding header, labels and xlim and ylim to the graphs
Map(lines,xvals,yvals,col=c("red","blue","black"),pch=1:2,type="o") #plotting the objects using Map
legend("bottomright", legend=c("8", "6", "4"),
col=c("red", "blue", "black"), lty=1:2, cex=0.8) #adding the legend
Output:
With par options fixed at 1.2
With par options fixed at 1.5:

Related

Adjusting facet order and legend labels when using plot_model function of sjplot

I have successfully used the plot_model function of sjplot to plot a multinomial logistic regression model. The regression contains an outcome (Info Sought, with 3 levels) and 2 continuous predictors (DSA, ASA). I have also changed the values of ASA in the plot_model so as to plot predicted effect outcomes based on the ASA mean value and SDs:
plot1 <- plot_model(multinomialmodel , type = "pred", terms = c("DSA", "ASA[meansd]")
I have two customization questions:
1) Facet Order: The facet order is based on the default alphabetical order of the outcome levels ("Expand" then "First Pic" then "Multiple Pics"). Is there a means by which to adjust this? I tried resorting the levels with factor() (as exampled here with ggplot2) prior to running and plotting the model, but this did not cause any changes in the resulting facet order. Perhaps instead something through ggplot2, as exampled in the first solution provided here?
2) Legend Labels: The legend currently labels the plotted lines with the -1 SD, mean, and +1 SD values for ASA; is there a way to adjust these labels to instead simply say "-1 SD", "mean", and "+1 SD" instead of the raw values?
Thanks!
First I replicate your plot using your supplied data:
library(dplyr)
library(readr)
library(nnet)
library(sjPlot)
"ASA,DSA,Info_Sought
-0.108555801,0.659899854,First Pic
0.671946671,1.481880373,First Pic
2.184170211,-0.801398848,First Pic
-0.547588442,1.116555698,First Pic
-1.27930951,-0.299077419,First Pic
0.037788412,1.527545958,First Pic
-0.74271406,-0.755733264,Multiple Pics
1.20854212,-1.166723523,Multiple Pics
0.769509479,-0.390408588,Multiple Pics
-0.450025633,-1.02972677,Multiple Pics
0.769509479,0.614234269,Multiple Pics
0.281695434,0.705565438,Multiple Pics
-0.352462824,-0.299077419,Expand
0.671946671,1.481880373,Expand
2.184170211,-0.801398848,Expand
-0.547588442,1.116555698,Expand
-0.157337206,1.070890114,Expand
-1.27930951,-0.299077419,Expand" %>%
read_csv() -> d
multinomialmodel <- multinom(Info_Sought ~ ASA + DSA, data = d)
p1 <- plot_model(multinomialmodel ,
type = "pred",
terms = c("DSA", "ASA[meansd]"))
p1
Your attempt to re-factor did not work because sjPlot::plot_model() does not pay heed. One way to tackle reordering the facets is to produce an initial plot as above and replace the faceting variable in the data with a factor version containing your desired order like so:
p2 <- p1
p2$data$response.level <- factor(p2$data$response.level,
levels = c("Multiple Pics", "First Pic", "Expand"))
p2
Finally, to tackle the legend labeling issue, we can just replace the color scale with one containing your desired labels:
p2 +
scale_color_discrete(labels = c("-1 SD", "mean", "+1 SD"))
Just following up on #the-mad-statter's answer, I wanted to add a note on how to change the legend title and labels when you're working with a black-and-white graph where the lines differ by linetype (i.e. using sjplot's colors = "bw" argument).
p1 <- plot_model(multinomialmodel ,
type = "pred",
terms = c("DSA", "ASA[meansd]"),
colors = "bw)
As the lines are all black, if you would like to change the axis title and labels, you need to use the scale_linetype_manual() function instead of scale_color_discrete(), like this:
p1 + scale_linetype_manual(name = "ASA values",
values = c("dashed", "solid", "dotted"),
labels = c("Low (-1 SD)", "Medium (mean)", "High (+1 SD)"))
The resulting graph with look like this:
Note that I also took this opportunity to change how linetypes are assigned to values, making the line corresponding to the mean of ASA solid.

Coloring points on scatterplot by Variable in R

I conducted a logistic regression for the quality of wine (dataset from UCI database). I am attempting to make a scatterplot with the points colored by quality (0=low, 1=high) and have succeeded, but the colors are black and white. White points on a plot are obviously not helpful, so I wanted to be able to specify/change the colors, but I have tried many things with nothing working.
Code:
glm.fit=glm(wine$quality~., data=wine,
family=binomial)
step(glm.fit)
glm.fit2=glm(wine$quality~volatile.acidity
+residual.sugar+free.sulfur.dioxide+
density+pH+sulphates+alcohol,
data=wine, family=binomial)
summary(glm.fit2)
plot(wine$sulphates, wine$alcohol,
xlab="sulphates", ylab="alcohol",
col=wine$quality)
legend("topright", col=1:2, pch=21,
legend=c("low quality","high quality"))
Here's the plot I get:
scatterplot
It's a plot of the two most significant variables from the glm. I don't really care what colors, just not white!!
As NColl suggested, the ggplot2 package is great for this. Try the code below, then learn more here.
library(ggplot2)
ggplot(data = glm.fit2, aes(x = sulphates, y = alcohol, color = quality)) +
geom_point()
A simple R base solution is this.
Assuming you have a df similar in structure to this one, with one binary variable containing just 0 and 1 and two more variables:
df <- data.frame(
Var1 = c(sample(500, 100)),
Var2 = c(rnorm(500, 100)),
binaryVar = c(sample(0:1, 100, replace = T))
)
then you can assign colors to the binary variable using an ifelse statement like this:
df$col <- ifelse(binaryVar==0, "red", "blue")
and can finally do your scatter plot using df$col to define the colors of your data points:
plot(df$Var1, df$Var2, frame=F, col=df$col)
legend("topright",legend=c("Low quality", "High quality"),
pch=1, col=c("red","blue"), bg="grey")
The result looks like this:

Colouring a subset of data in Lattice plots R

I have plotted a xyplot in lattice of shellfish catch rates by year grouped by survey area using the below code:
xyplot(catch.rate ~ Year | Area, data, xlab = "Year", ylab = "Catch rate",
col ="black", par.settings = list(strip.background=list(col="white")))
I have one year of data that I would like to highlight on the plot in a different colour (e.g. red). I created a subset of this data with:
subset <- grep("^0214A",data$Haul_ID,ignore.case=TRUE)
I have done something similar with the standard R plots using points before but as I am new to lattice and I am not sure how to do this using this package.
For plots without conditioning variables, the col= argument accepts a vector parallel to the points being plotted, so for instance
xyplot(mpg~disp, mtcars, col=mtcars$cyl, pch=20, cex=4)
colors points by the number of cylinders. Maybe you'd do
cols=c("red", "green")[grepl("^0214A", data$Haul_ID, ignore.case=TRUE) + 1L]
For plots with conditioning variables, one can write a panel function that accepts the col vector and subscripts, an index into the data describing the rows currently being plotted. Pass the arguments to the panel function to panel.xyplot(), adjusting the color of each point to reflect the subset of data in the panel. Here's the panel function
panel <- function(..., col, subscripts) {
panel.xyplot(..., col=col[subscripts])
}
and in action
xyplot(mpg ~ disp | factor(cyl), mtcars, col=mtcars$cyl,
panel=panel, pch=20, cex=4)

R: visualizing differences across a large number of groups

I have a dataset having the unique IDs of manufacturing units, the industrial classification of their outputs (CAT) and the number of people each unit employs (EMP). I want to graphically show that EMP varies by CAT, i.e. employment size in general varies by the kind of output a unit produces. I tried boxplots arranged by median EMP:
a = read.csv("/filepath/plot.csv", header=T, stringsAsFactors=F)
bymedian = with(a, reorder(CAT, log(as.numeric(as.character(EMP))), median))
boxplot(log(EMP) ~ bymedian, data=a, horizontal=F, notch=T, pch=1, cex=.25, col="gray95", boxwex=.25, las=2, outline=F)
pch=1, cex=.25, col="gray95", boxwex=.25, las=2, outline=F)
The problem is that because of the large number of categories (400+), the plot becomes very messy. Is there a cleaner way of showing what I am trying to do?
Using ggplot2 you can show what you are trying to do with a scale_x_discrete
library(ggplot2)
a$bymedian = with(a, reorder(CAT, log(EMP), median))
p <- ggplot(a,aes(y=log(EMP),x=bymedian))+
geom_boxplot()
breaks <- levels(a$bymedian)[seq(1,nlevels(a$bymedian),20)]
p %+% scale_x_discrete(breaks = breaks, labels = breaks)

Plot frequency of a value of 2 factors in the same plot in R

I'd like to plot the frequency of a variable color coded for 2 factor levels for example blue bars should be the hist of level A and green the hist of level B both n the same graph? Is this possible with the hist command? The help of hist does not allow for a factor. Is there another way around?
I managed to do this by barplots manually but i want to ask if there is a more automatic method
Many thanks
EC
PS. I dont need density plots
Just in case the others haven't answered this is a way that satisfies. I had to deal with stacking histograms recently, and here's what I did:
data_sub <- subset(data, data$V1 == "Yes") #only samples that have V1 as "yes" in my dataset #are added to the subset
hist(data$HL)
hist(data_sub$HL, col="red", add=T)
Hopefully, this is what you meant?
It's rather unclear what you have as a data layout. A histogram requires that you have a variable that is ordinal or continuous so that breaks can be created. If you also have a separate grouping factor you can plot histograms conditional on that factor. A nice worked example of such a grouping and overlaying a density curve is offered in the second example on the help page for the histogram function in the lattice package.
A nice resource for learning relative merits of lattice and ggplot2 plotting is the Learning R blog. This is from the first of a multipart series on side-by=side comparison of the two plotting systems:
library(lattice)
library(ggplot2)
data(Chem97, package = "mlmRev")
#The lattice method:
pl <- histogram(~gcsescore | factor(score), data = Chem97)
print(pl)
# The ggplot method:
pg <- ggplot(Chem97, aes(gcsescore)) + geom_histogram(binwidth = 0.5) +
facet_wrap(~score)
print(pg)
I don't think you can do that easily with a bar histogram, as you would have to "interlace" the bars from both factor levels... It would need some kind of "discretization" of the now continuous x axis (i.e. it would have to be split in "categories" and in each category you would have 2 bars, for each factor level...
But it is quite easy and without problems if you are just fine with plotting the density line function:
y <- rnorm(1000, 0, 1)
x <- rnorm(1000, 0.5, 2)
dx <- density(x)
dy <- density(y)
plot(dx, xlim = range(dx$x, dy$x), ylim = range(dx$y, dy$y),
type = "l", col = "red")
lines(dy, col = "blue")
It's very possible.
I didn't have data to work with but here's an example of a histogram with different colored bars. From here you'd need to use my code and figure out how to make it work for factors instead of tails.
BASIC SETUP
histogram <- hist(scale(vector)), breaks= , plot=FALSE)
plot(histogram, col=ifelse(abs(histogram$breaks) < #of SD, Color 1, Color 2))
#EXAMPLE
x<-rnorm(1000)
histogram <- hist(scale(x), breaks=20 , plot=FALSE)
plot(histogram, col=ifelse(abs(histogram$breaks) < 2, "red", "green"))
I agree with the others that a density plot is more useful than merging colored bars of a histogram, particularly if the group's values are intermixed. This would be very difficult and wouldn't really tell you much. You've got some great suggestions from others on density plots, here's my 2 cents for density plots that I sometimes use:
y <- rnorm(1000, 0, 1)
x <- rnorm(1000, 0.5, 2)
DF <- data.frame("Group"=c(rep(c("y","x"), each=1000)), "Value"=c(y,x))
library(sm)
with(DF, sm.density.compare(Value, Group, xlab="Grouping"))
title(main="Comparative Density Graph")
legend(-9, .4, levels(DF$Group), fill=c("red", "darkgreen"))

Resources