I am trying to create a figure using outputs from emmeans, plotting lines for 5 levels of a factor. I would like the range of each ribbon to correspond to the range of data on the x axis in which that level occurs, not across the whole x axis. i.e. some factors only had data at specific ranges of the x axis and I do not want to extrapolate beyond these ranges.
Current code that extrapolates across whole range is:
newdata=emmeans(model, ~x|factor, at=list(factor=levels(data$factor), x=seq(min(data$x), max(data$x), len=100)), type='response') %>% as.data.frame
figure=ggplot(data, aes(y=y, x=x, color=factor, fill=factor))+
geom_ribbon(data=newdata, aes(x=x, y=response,ymin=lower.CL, ymax=upper.CL), alpha=0.3, colour = NA)+
geom_line(data=newdata, aes(x=x, y=response))
figure
I since found a bulky workaround:
#Build dataframes with max and min for each factor
factorvariable.1 <- c("factorvariable.1")
data.factorvariable.1=filter(data, factor %in% factorvariable.1)
factorvariable.1.range=range(data.factorvariable.1$x)%>% as.data.frame
factorvariable.1.range$factor=factorvariable.1
factorvariable.1.range$min.max=c('min','max')
factorvariable.2 <- c("factorvariable.2")
data.factorvariable.2=filter(data, factor %in% factorvariable.2)
factorvariable.2.range=range(data.factorvariable.2$x)%>% as.data.frame
factorvariable.2.range$factor=factorvariable.2
factorvariable.2.range$min.max=c('min','max')
Range=rbind(factorvariable.1.range,factorvariable.2.range)
Range <- spread(Range, min.max, .)
#filter emmeans data by max and min values
newdata=emmeans(model, ~x|factor, at=list(factor=levels(data$factor), x=seq(min(data$x), max(data$x), len=100)), type='response') %>% as.data.frame
newdata=merge(newdata, Range, by="factor")
newdata= newdata%>%filter(x>min)
newdata= newdata%>%filter(x<max)
newdata
Related
I'm trying to do a plot with facets with some data from a previous model. As a simple example:
t=1:10;
x1=t^2;
x2=sqrt(t);
y1=sin(t);
y2=cos(t);
How can I plot this data in a 2x2 grid, being the rows one factor (levels x and y, plotted with different colors) and the columns another factor (levels 1 and 2, plotted with different linetypes)?
Note: t is the common variable for the X axis of all subplots.
ggplot will be more helpful if the data can be first put into tidy form. df is your data, df_tidy is that data in tidy form, where the series is identified in one column that can be mapped in ggplot -- in this case to the facet.
library(tidyverse)
df <- tibble(
t=1:10,
x1=t^2,
x2=sqrt(t),
y1=sin(t),
y2=cos(t),
)
df_tidy <- df %>%
gather(series, value, -t)
ggplot(df_tidy, aes(t, value)) +
geom_line() +
facet_wrap(~series, scales = "free_y")
I'm eviews user and eviews very basically draws scatter plots matrix.
In the following graph, I have 13 different group datas and Eviews draws one group data against 12 groups' data in 12 plots in one graph with regression line.
How can I realize same graph with Rstudio?
Here is an example on how to do the requested plot in ggplot:
First some data:
z <- matrix(rnorm(1000), ncol= 10)
The basic idea here is to convert the wide matrix to long format where the variable that is compared to all others is duplicated as many times as there are other variables. Each of these other variables gets a specific label in the key column. ggplot likes the data in this format
library(tidyverse)
z %>%
as.tibble() %>% #convert matrix to tibble or data.frame
gather(key, value, 2:10) %>% #convert to long format specifying variable columns 2:10
mutate(key = factor(key, levels = paste0("V", 1:10))) %>% #specify levels so the facets go in the correct order to avoid V10 being before V2
ggplot() +
geom_point(aes(value, V1))+ #plot points
geom_smooth(aes(value, V1), method = "lm", se = F)+ #plot lm fit without se
facet_wrap(~key) #facet by key
I want to automatically set the number of breaks and the position of the breaks itself for the axis of a discrete variable such that the labels which are plotted are actually readable.
For example in the code below, the resulting plot should only show a portion of the labels/the x-variable.
ggData <- data.frame(x=paste0('B',1:100), y=rnorm(100))
ggplot(ggData, aes_string('x', 'y')) +
geom_point(size=2.5, shape=19, na.rm = TRUE)
So far, I tried to use pretty, and pretty_breaks which are, however, not for discrete variables.
Fist we turn the factor into a character and then into a ordered factor. Secondly, we subset ggData$x to create a vector (labels) with the ticks we want. In the example every 10 elements. Finally, we create the plot using scale_x_discrete, using the previous vector (labels), inside the parameter breaks.
ggData <- data.frame(x=paste0('B',1:100), y=rnorm(100))
ggData$x <- as.character(ggData$x)
ggData$x <- factor(ggData$x, levels=unique(ggData$x))
labels <- ggData$x[seq(0, 100, by= 10)]
ggplot(ggData, aes_string('x', 'y')) +
geom_point(size=2.5, shape=19, na.rm = TRUE) +
scale_x_discrete(breaks=labels)
I have a data frame of two variables, x and y in R. What i want to do is bin each entry by its value of x, but then display the density of the value of y for all entries in each bin. More specifically, for each interval in units of x, i want to plot the sum(of all values of y of entries whose values of x are in the specific interval)/(sum of all values of y for all entries). I know how to do this manually via vector manipulation, but i have to make a lot of these plots and wanted to know if their was a quicker way to do this, maybe via some advanced hist.
You could generate the groupings using cut and then use a facet_grid to display the multiple histograms:
# Sample data with y depending on x
set.seed(144)
dat <- data.frame(x=rnorm(1000))
dat$y <- dat$x + rnorm(1000)
# Generate bins of x values
dat$grp <- cut(dat$x, breaks=2)
# Plot
library(ggplot2)
ggplot(dat, aes(x=y)) + geom_histogram() + facet_grid(grp~.)
bargraph from sciplot allows us to plot bar chart with error bars. It also allows grouping by independent variables (factors). I want to group by dependent variable, how can I achieve that
bargraph.CI(x.factor, response, group=NULL, split=FALSE,
col=NULL, angle=NULL, density=NULL,
lc=TRUE, uc=TRUE, legend=FALSE, ncol=1,
leg.lab=NULL, x.leg=NULL, y.leg=NULL, cex.leg=1,
bty="n", bg="white", space=if(split) c(-1,1),
err.width=if(length(levels(as.factor(x.factor)))>10) 0 else .1,
err.col="black", err.lty=1,
fun = function(x) mean(x, na.rm=TRUE),
ci.fun= function(x) c(fun(x)-se(x), fun(x)+se(x)),
ylim=NULL, xpd=FALSE, data=NULL, subset=NULL, ...)
The specification of bargraph.CI is shown above. The response variable is usually numerical vector. This time, I really want to plot three response variables (A,B,C) against the same independent variables. Let me use the data frame "mpg" to illustrate the problem. I can sucessufully get a plot with the following code, here the DV is hwy
data(mpg)
attach(mpg)
bargraph.CI(
class, #categorical factor for the x-axis
hwy, #numerical DV for the y-axis
group=NULL, #grouping factor
legend=T,
ylab="Highway MPG",
xlab="Class")
I can also successfully get a plot with the only change being the DV (changed from hwy to cty)
data(mpg)
attach(mpg)
bargraph.CI(
class, #categorical factor for the x-axis
cty, #numerical DV for the y-axis
group=NULL, #grouping factor
legend=T,
ylab="Highway MPG",
xlab="Class")
However, if I want to use the two DVs at the same time, I mean, for each group, I want to display two bars, one for cty and one for hwy.
data(mpg)
attach(mpg)
bargraph.CI(
class, #categorical factor for the x-axis
c(cty,hwy), #numerical DV for the y-axis
group=NULL, #grouping factor
legend=T,
ylab="Highway MPG",
xlab="Class")
it won't work because of mismatched dimension. How can I achieve this? Well, actually similar effect of bargraph can be achieved by using the method from Boxplot schmoxplot: How to plot means and standard errors conditioned by a factor in R? with ggplot2. So if you have any idea of how to do it with ggplot2, it's also fine for me.
As happens often when displaying data, you should manipulate the data first and then use bargraph.CI. In your expamle, the data.frame that you would like to visualize is the following:
df <- data.frame(class=c(mpg$class, mpg$class),
value=c(mpg$cty, mpg$hwy),
grp=rep(c("cty", "hwy"), each=nrow(mpg)))
Then you can use bargraph.CI on this new data.frame.
bargraph.CI(
class, #categorical factor for the x-axis
value, #numerical DV for the y-axis
group=grp, #grouping factor
data=df,
legend=T,
ylab="Highway MPG",
xlab="Class")