How to group by dependent variable?

How to group by dependent variable? - r

bargraph from sciplot allows us to plot bar chart with error bars. It also allows grouping by independent variables (factors). I want to group by dependent variable, how can I achieve that
bargraph.CI(x.factor, response, group=NULL, split=FALSE,
col=NULL, angle=NULL, density=NULL,
lc=TRUE, uc=TRUE, legend=FALSE, ncol=1,
leg.lab=NULL, x.leg=NULL, y.leg=NULL, cex.leg=1,
bty="n", bg="white", space=if(split) c(-1,1),
err.width=if(length(levels(as.factor(x.factor)))>10) 0 else .1,
err.col="black", err.lty=1,
fun = function(x) mean(x, na.rm=TRUE),
ci.fun= function(x) c(fun(x)-se(x), fun(x)+se(x)),
ylim=NULL, xpd=FALSE, data=NULL, subset=NULL, ...)
The specification of bargraph.CI is shown above. The response variable is usually numerical vector. This time, I really want to plot three response variables (A,B,C) against the same independent variables. Let me use the data frame "mpg" to illustrate the problem. I can sucessufully get a plot with the following code, here the DV is hwy
data(mpg)
attach(mpg)
bargraph.CI(
class, #categorical factor for the x-axis
hwy, #numerical DV for the y-axis
group=NULL, #grouping factor
legend=T,
ylab="Highway MPG",
xlab="Class")
I can also successfully get a plot with the only change being the DV (changed from hwy to cty)
data(mpg)
attach(mpg)
bargraph.CI(
class, #categorical factor for the x-axis
cty, #numerical DV for the y-axis
group=NULL, #grouping factor
legend=T,
ylab="Highway MPG",
xlab="Class")
However, if I want to use the two DVs at the same time, I mean, for each group, I want to display two bars, one for cty and one for hwy.
data(mpg)
attach(mpg)
bargraph.CI(
class, #categorical factor for the x-axis
c(cty,hwy), #numerical DV for the y-axis
group=NULL, #grouping factor
legend=T,
ylab="Highway MPG",
xlab="Class")
it won't work because of mismatched dimension. How can I achieve this? Well, actually similar effect of bargraph can be achieved by using the method from Boxplot schmoxplot: How to plot means and standard errors conditioned by a factor in R? with ggplot2. So if you have any idea of how to do it with ggplot2, it's also fine for me.

As happens often when displaying data, you should manipulate the data first and then use bargraph.CI. In your expamle, the data.frame that you would like to visualize is the following:
df <- data.frame(class=c(mpg$class, mpg$class),
value=c(mpg$cty, mpg$hwy),
grp=rep(c("cty", "hwy"), each=nrow(mpg)))
Then you can use bargraph.CI on this new data.frame.
bargraph.CI(
class, #categorical factor for the x-axis
value, #numerical DV for the y-axis
group=grp, #grouping factor
data=df,
legend=T,
ylab="Highway MPG",
xlab="Class")

Related

Specify range of emmeans outputs to range of individual factor levels

I am trying to create a figure using outputs from emmeans, plotting lines for 5 levels of a factor. I would like the range of each ribbon to correspond to the range of data on the x axis in which that level occurs, not across the whole x axis. i.e. some factors only had data at specific ranges of the x axis and I do not want to extrapolate beyond these ranges.
Current code that extrapolates across whole range is:
newdata=emmeans(model, ~x|factor, at=list(factor=levels(data$factor), x=seq(min(data$x), max(data$x), len=100)), type='response') %>% as.data.frame
figure=ggplot(data, aes(y=y, x=x, color=factor, fill=factor))+
geom_ribbon(data=newdata, aes(x=x, y=response,ymin=lower.CL, ymax=upper.CL), alpha=0.3, colour = NA)+
geom_line(data=newdata, aes(x=x, y=response))
figure

I since found a bulky workaround:
#Build dataframes with max and min for each factor
factorvariable.1 <- c("factorvariable.1")
data.factorvariable.1=filter(data, factor %in% factorvariable.1)
factorvariable.1.range=range(data.factorvariable.1$x)%>% as.data.frame
factorvariable.1.range$factor=factorvariable.1
factorvariable.1.range$min.max=c('min','max')
factorvariable.2 <- c("factorvariable.2")
data.factorvariable.2=filter(data, factor %in% factorvariable.2)
factorvariable.2.range=range(data.factorvariable.2$x)%>% as.data.frame
factorvariable.2.range$factor=factorvariable.2
factorvariable.2.range$min.max=c('min','max')
Range=rbind(factorvariable.1.range,factorvariable.2.range)
Range <- spread(Range, min.max, .)
#filter emmeans data by max and min values
newdata=emmeans(model, ~x|factor, at=list(factor=levels(data$factor), x=seq(min(data$x), max(data$x), len=100)), type='response') %>% as.data.frame
newdata=merge(newdata, Range, by="factor")
newdata= newdata%>%filter(x>min)
newdata= newdata%>%filter(x<max)
newdata

How to use + geom_line() with a categorical x-variable and quantitative y-variable [duplicate]

This question already has an answer here:
ggplot: line plot for discrete x-axis
(1 answer)
Closed 2 years ago.
How can I create a line graph with ggplot 2 where the x variable is either categorical or a factor, the y variable is numeric and the group variable is categorical? I have tried just + geom_point() with the variables as stated above and it works, but + geom_line() does not.
I have already reviewed posts such as:
Creating line graph using categorical data,
ggplot2 bar plot with two categorical variables, and No line in plot chart despite + geom_line(), but none of them answer my question.
Before I go into code and examples, (1) Yes I absolutely must have the x-variable and group variable as a character or factor, (2) No, I do not want a bar graph or just geom_point().
The example below provides the coefficients of multiple independent variables from three different example regressions run using different variations on the dependent variable. While the code below shows a work around that I figured out (i.e. creating a int variable named 'test' to use in place of the chr variable containing the names of the independent variables form the regression), I need to instead be able to preserve the chr names of the independent variables.
Here is what I have:
library(dplyr)
library(ggplot2)
library(plotly)
library(tidyr)
var_names <- c("ST1", "ST2", "ST3",
"EFI1", "EFI2", "EFI3", "EFI4",
"EFI5", "EFI6")
####Dataset1####
reg <- c(26441.84, 20516.03, 12936.79, 17793.22, 18837.48, 15704.31, 17611.14, 17360.59, 14836.34)
r_adj <- c(30473.17, 35221.43, 29875.98, 30267.31, 29765.9, 30322.86, 31535.66, 30955.29, 29828.3)
a_adj <- c(19588.63, 31163.79, 22498.53, 27713.72, 25703.89, 28565.34, 29853.22, 29088.25, 25213.02)
df1 <- data.frame(var_names, reg, r_adj, a_adj, stringsAsFactors = FALSE)
df1$test <- c(1:9)
df2 <- gather(df1, key = "series_type", value = "value", c(2:4))
fig7 <- ggplot(df2, aes(x = test, y = value, color = series_type)) + geom_line() + geom_point()
fig7
Ultimately I want something that looks like the plot below, but with the independent variable names in place of the 'test' variable.
Example Plot

You can convert var_names into a factor and set the levels in the order of appearance (otherwise it will be assigned alphanumerically and the x axis will be out of order). Then just add series_type to the group parameter in the plot.
df2 <- gather(df1, key = "series_type", value = "value", c(2:4)) %>%
mutate(var_names = factor(var_names, levels = unique(var_names)))
ggplot(df2, aes(x = var_names, y = value, color = series_type, group = series_type)) + geom_line() + geom_point()

Adjusting facet order and legend labels when using plot_model function of sjplot

I have successfully used the plot_model function of sjplot to plot a multinomial logistic regression model. The regression contains an outcome (Info Sought, with 3 levels) and 2 continuous predictors (DSA, ASA). I have also changed the values of ASA in the plot_model so as to plot predicted effect outcomes based on the ASA mean value and SDs:
plot1 <- plot_model(multinomialmodel , type = "pred", terms = c("DSA", "ASA[meansd]")
I have two customization questions:
1) Facet Order: The facet order is based on the default alphabetical order of the outcome levels ("Expand" then "First Pic" then "Multiple Pics"). Is there a means by which to adjust this? I tried resorting the levels with factor() (as exampled here with ggplot2) prior to running and plotting the model, but this did not cause any changes in the resulting facet order. Perhaps instead something through ggplot2, as exampled in the first solution provided here?
2) Legend Labels: The legend currently labels the plotted lines with the -1 SD, mean, and +1 SD values for ASA; is there a way to adjust these labels to instead simply say "-1 SD", "mean", and "+1 SD" instead of the raw values?
Thanks!

First I replicate your plot using your supplied data:
library(dplyr)
library(readr)
library(nnet)
library(sjPlot)
"ASA,DSA,Info_Sought
-0.108555801,0.659899854,First Pic
0.671946671,1.481880373,First Pic
2.184170211,-0.801398848,First Pic
-0.547588442,1.116555698,First Pic
-1.27930951,-0.299077419,First Pic
0.037788412,1.527545958,First Pic
-0.74271406,-0.755733264,Multiple Pics
1.20854212,-1.166723523,Multiple Pics
0.769509479,-0.390408588,Multiple Pics
-0.450025633,-1.02972677,Multiple Pics
0.769509479,0.614234269,Multiple Pics
0.281695434,0.705565438,Multiple Pics
-0.352462824,-0.299077419,Expand
0.671946671,1.481880373,Expand
2.184170211,-0.801398848,Expand
-0.547588442,1.116555698,Expand
-0.157337206,1.070890114,Expand
-1.27930951,-0.299077419,Expand" %>%
read_csv() -> d
multinomialmodel <- multinom(Info_Sought ~ ASA + DSA, data = d)
p1 <- plot_model(multinomialmodel ,
type = "pred",
terms = c("DSA", "ASA[meansd]"))
p1
Your attempt to re-factor did not work because sjPlot::plot_model() does not pay heed. One way to tackle reordering the facets is to produce an initial plot as above and replace the faceting variable in the data with a factor version containing your desired order like so:
p2 <- p1
p2$data$response.level <- factor(p2$data$response.level,
levels = c("Multiple Pics", "First Pic", "Expand"))
p2
Finally, to tackle the legend labeling issue, we can just replace the color scale with one containing your desired labels:
p2 +
scale_color_discrete(labels = c("-1 SD", "mean", "+1 SD"))

Just following up on #the-mad-statter's answer, I wanted to add a note on how to change the legend title and labels when you're working with a black-and-white graph where the lines differ by linetype (i.e. using sjplot's colors = "bw" argument).
p1 <- plot_model(multinomialmodel ,
type = "pred",
terms = c("DSA", "ASA[meansd]"),
colors = "bw)
As the lines are all black, if you would like to change the axis title and labels, you need to use the scale_linetype_manual() function instead of scale_color_discrete(), like this:
p1 + scale_linetype_manual(name = "ASA values",
values = c("dashed", "solid", "dotted"),
labels = c("Low (-1 SD)", "Medium (mean)", "High (+1 SD)"))
The resulting graph with look like this:
Note that I also took this opportunity to change how linetypes are assigned to values, making the line corresponding to the mean of ASA solid.

R continuous vs categorical percentage share with geom_line

I'd like to create a ggplot geom_line graph with continuous data on the x-axis and the percentage share of a categorical variable.
E.g. for mtcars I would like to have hp on the x-axis and the percentage of the cars that have 6 cylinders on the y-axis.
ggplot2(aes(x=hp,y=cyl), data=mtcars) +
geom_line()
I think it needs to be defined in geom_line by fun.y or something similar.

Compute the frequencies beforehand, using reshape for instance :
library(reshape)
M <- melt(mtcars,id.vars="hp",measure.vars="cyl")
C <- cast(M,hp~ variable)
C$f <- C$cyl/sum(C$cyl)
ggplot(C,aes(x=hp,y=f)) +
geom_line()
Note that in that case, a line plot doesn't seem to make much sense, data points are too far appart. You could use a bar plot instead :
ggplot(C,aes(x=hp,y=f)) +
geom_bar(stat="identity")

R: Plot interaction between categorial Factor and continuous Variable on DV

What I have is a 3-Levels Repeated Measures Factor and a continuous variable (Scores in psychological questionnaire, measured only once pre-experiment, NEO), which showed significant interaction together in a Linear Mixed Effects Model with a Dependent Variable (DV; State-Scores measured at each time level, IAS).
To see the nature of this interaction, I would like to create a plot with time levels on X-Axis, State-Score on Y-Axis and multiple curves for the continuous variable, similar to this. The continuous variable should be categorized in, say quartiles (so I get 4 different curves), which is exactly what I can't achieve. Until now I get a separate curve for each value in the continuous variable.
My goal is also comparable to this, but I need the categorial (time) variable not as separate curves but on the X-Axis.
I tried out a lot with different plot functions in R but did'nt manage to get what I want, maybe because I am not so skilled in dealing with R.
F. e.
gplot(Data_long, aes(x = time, y = IAS, colour = NEO, group = NEO)) +
geom_line()
from the first link shows me dozens of curves (one for each value in the measurement NEO) and I can't find how to group continuous variables in a meaningful way in that gplot function.
Edit:
Original Data:
http://www.pastebin.ca/2598926
(I hope it is not too inconvenient.)
This object (Data_long) was created/converted with the following line:
Data_long <- transform(Data_long0, neo.binned=cut(NEO,c(25,38,46,55,73),labels=c(".25",".50",".75","1.00")))
Every value in the neo.binned col seems to be set correctly with enough cases per quantile.
What I then tried and didn't work:
ggplot(Data_long, aes(x = time, y = ias, color = neo.binned)) + stat_summary(fun.y="median",geom="line")
geom_path: Each group consist of only one observation. Do you need to adjust the group >aesthetic?
I got 92 subjects and values for NEO between 26-73. Any hints what to enter for cut and labels function? Quantiles are 0% 25% 50% 75% 100% 26 38 46 55 73.

Do you mean something like this? Here, your data is binned according to NEO into three classes, and then the median of IAS over these bins is drawn. Check out ?cut.
Data_long <- transform(Data_long, neo.binned=cut(NEO,c(0,3,7,10),labels=c("lo","med","hi")))
Plot everything in one plot.
ggplot(Data_long, aes(x = time, y = IAS, color = neo.binned))
+ stat_summary(aes(group=neo.binned),fun.y="median",geom="line")
And stealing from CMichael's answer you can do it all in multiple (somehow you linked to facetted plots in your question):
ggplot(Data_long,aes(x=time,y=IAS))
+ stat_summary(fun.y="median",geom="line")
+ facet_grid(neo.binned ~ .)

Do you mean facetting #ziggystar initial Plot?
quantiles = quantile(Data_long$NEO,c(0.25,0.5,0.75))
Data_long$NEOQuantile = ifelse(Data_long$NEO<=quantiles[1],"first NEO Quantile",
ifelse(Data_long$NEO<=quantiles[2],
"second NEO Quantile",
ifelse(Data_long$NEO<=quantiles[3],
"third NEO Quantile","forth NEO Quantile")))
require(ggplot2)
p = ggplot(Data_long,aes(x=time,y=IAS)) + stat_quantile(quantiles=c(1),formula=y ~ x)
p = p + facet_grid(.~NEOQuantile)
p

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to group by dependent variable? - r

Related

Specify range of emmeans outputs to range of individual factor levels

How to use + geom_line() with a categorical x-variable and quantitative y-variable [duplicate]

Adjusting facet order and legend labels when using plot_model function of sjplot

R continuous vs categorical percentage share with geom_line

R: Plot interaction between categorial Factor and continuous Variable on DV

Categories

Resources