I am using several tree plots (ctree, evtree, rpart, chaid) and I rely on categorical data. Levels of data are described with text labels.
In plot, it is not clear whether the displayed text belongs to the left or right node.
Is it possible to either warp the text labels in plot, or provide slightly different vertical alignment for the text displayed in left and right node?
As requested, here is a code producing such an issue in the plot:
<- data.frame(
y = as.factor(sample(1:3,200,r=T)),
x1 = as.factor(sample(1:3,200,r=T)),
x2 = as.factor(sample(1:3,200,r=T)),
x3 = as.factor(sample(1:3,200,r=T)),
x4 = as.factor(sample(1:3,200,r=T))
)
Df1[1:5] <- lapply(Df1[1:5], function(x) factor(x, levels = c(1,2,3),labels = c("long long long long long text","text1","lorem ipsum dolor")))
library("partykit")
library("rpart")
library("evtree")
library("CHAID")
rp <- rpart(y ~ .,data=Df1, minbucket=30)
plot(as.party(rp))
ct <- ctree(y~ . , data = Df1, minbucket=50)
plot(ct)
ev <- evtree(y ~ ., data = Df1, maxdepth = 5)
plot(ev)
ctrl <- chaid_control(minsplit=90, minbucket=30, minprob=0.05,alpha2=0.01, alpha3=-1, alpha4=0.01)
chaid1 <- chaid( y ~ ., data= Df1, control=ctrl)
plot(chaid1,cex=0.6)
Can't see how this would be possible using parameters in ?plot.party. You could, however, add \n (new line) to factor levels.
levels(Df1$x2)[1] <- "long long long \n long long text"
plot(as.party(rp))
Related
I would like to rescale the values on the legend of a plot coming from conditional_effects.
By doing something like this
plot(conditional_effects(brm_c_5, effects = "t:w_c_ratio",cond = conditions5), rug = T, points = T)
I'm getting the following
For time being I'm doing
p_col_1 <- ggplot_build(p_col_1)
and then I'm chainging the ranges in here p_col_1$plot$scales$scales[[3]]$range$range and here p_col_1$plot$scales$scales[[4]]$range$range but I'm not trusting this solution.
EXAMPLE:
As example please see this code. The defaults values for kidney$age is from 10 to 69 but let's say that I want to rescale it from -1 to 1. Then I could use the solution via ggplot_build but I'm looking for a smarter and more elegant solution.
library(brms)
fit1 <- brm(time | cens(censored) ~ age + sex + disease,
data = kidney, family = weibull, init = "0")
fit1
p_tr <- (plot(conditional_effects(fit1, effects = "disease:age"), rug = T, points = T)[[1]])
p_tr <- ggplot_build(p_tr)
p_tr$plot$scales$scales[[3]]$range$range <- c(1,0, -1) %>% as.character()
p_tr$plot$scales$scales[[4]]$range$range <- c(1, 0 ,-1)%>% as.character()
plot(p_tr %>% ggplot_gtable)
`
How could I rescale the values of w_c_ratio from -0.9:+0.9 in the original scale (which is going from 2 to 10)?
I'm trying to enter the below data into a data frame, to make a ggplot line graph.
#functions for the hh budget and utility functions
pqxf <- function(y)(1*y) # replace p with price of y
pqyf <- function(x)(-1.25*x)+20 # -1.25 is the wage rate
utilityf <- function(x)80*(1/(x)) # 80 is the utility provided
hours <- c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,20)
#functions are turned into data frames
pqy <- data.frame("consumption" =
pqxf(c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,20)))
pqx <- data.frame("leisure" =
pqxf(c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,20)))
utility <- data.frame("utility" =
utilityf(c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,20)))
#each data frame is combined into a single data frame, that will be used for tables and charts
hh <- data.frame(pqx, pqy, utility, hours)
print(hh)
#this shows the utility, and the cost of x and y, one data frame
library(ggplot2)
ggplot(hh, aes(x=pqx, y=hours))+
xlim(0,20)+ylim(0,20)+ # limits set for the assignment
labs(x = "leisure(hours)",y="counsumption(units)")+
geom_line(aes(x = pqx, y = pqy))+
geom_line(aes(x = pqx, y = utility))+
geom_point(aes(x=8,y=10))+ #values of x and y of tangent point
geom_hline(yintercept = 10,linetype="dotted")+ # y of tangent point
geom_vline(xintercept = 8,linetype = "dotted")+ #x of tangent point
geom_text(label="E", x=8,y=10,hjust=-1,size=2)+
geom_text(label="-1.25(units/hour)= -w = MRS", x=9,y=2,hjust=.02,size=2)+
geom_text(label="U=80", x=4,y=19,hjust=1,size=2)
when I enter I get the following message:
Error in is.finite(x) : default method not implemented for type 'list'
Should I store data in a different format than a data frame? format my data frame differently, or set up ggplot differently, so that it can handle lists?
Try to replace pqx with leisure, and pqy with comsumption.
In the modelr package the function gather_predictions can be used to add predictions from multiple models to a data frame, I'm however unsure on how to specify these models in the function call. The help documentation gives the following exmaple:
df <- tibble::data_frame(
x = sort(runif(100)),
y = 5 * x + 0.5 * x ^ 2 + 3 + rnorm(length(x))
)
m1 <- lm(y ~ x, data = df)
grid <- data.frame(x = seq(0, 1, length = 10))
grid %>% add_predictions(m1)
m2 <- lm(y ~ poly(x, 2), data = df)
grid %>% spread_predictions(m1, m2)
grid %>% gather_predictions(m1, m2)
here the models are specifically mentioned in the function call. That works fine if we have a few models we want predictions for, but what if we have a large or unknown amount of models? In this case manually specifying the models isn't really workable anymore.
the way the help documentation phrases the arguments segment seems to suggest you need to add every model as a separate argument.
gather_predictions and spread_predictions take multiple models. The
name will be taken from either the argument name of the name of the
model.
And for example inputting a list of models into gather_predictions doesn't work.
Is there some easy way to input a list / large amount of models to gather_predictions?
example for 10 models in a list:
modelslist <- list()
for (N in 1:10) {
modelslist[[N]] <- lm(y ~ poly(x, N), data = df)
}
If having the models stored some other way than a list works better, that's fine as well.
m <- grid %>% gather_predictions(lm(y ~ poly(x, 1), data = df))
for (N in 2:10) {
m <- rbind(m, grid %>% gather_predictions(lm(y ~ poly(x, N), data = df)))
}
There are workarounds to solve this problem. My approach was to:
1. build a list of models with specific names
2. use a tweaked version of modelr::gather_predictions() to apply all models in the list to data
# prerequisites
library(tidyverse)
set.seed(1363)
# I'll use generic name 'data' throughout the code, so you can easily try other datasets.
# for this example I'll use your data df
data=df
# data visualization
ggplot(data, aes(x, y)) +
geom_point(size=3)
your sample data
# build a list of models
models <-vector("list", length = 5)
model_names <- vector("character", length=5)
for (i in 1:5) {
modelformula <- str_c("y ~ poly(x,", i, ")", sep="")
models[[i]] <- lm(as.formula(modelformula), data = data)
model_names[[i]] <- str_c('model', i) # remember we name the models here sequantially
}
# apply names to the models list
names(models) <- model_names
# this is modified verison of modelr::gather_predictions() in order to accept list of models
gather.predictions <- function (data, models, .pred = "pred", .model = "model")
{
df <- map2(models, .pred, modelr::add_predictions, data = data)
names(df) <- names(models)
bind_rows(df, .id = .model)
}
# the rest is the same as modelr's function...
grids <- gather.predictions(data = data, models = models, .pred = "y")
ggplot(data, aes(x, y)) +
geom_point() +
geom_line(data = grids, colour = "red") +
facet_wrap(~ model)
example of polynomial models (degree 1:5) applied to your sample data
side note: there are good reasons why I chose strings to build the model...to discuss.
Or even the words in the plot itself? Any hints on that are welcome.
dat <- selectByDate(mydata, year = 2003)
dat <- data.frame(date = mydata$date, obs = mydata$nox, mod = mydata$nox)
dat <- transform(dat, month = as.numeric(format(date, "%m")))
mod1 <- transform(dat, mod = mod + 10 * month + 10 * month * rnorm(nrow(dat)),model = "model 1")
mod1 <- transform(mod1, mod = c(mod[5:length(mod)], mod[(length(mod) - 3) :
length(mod)]))
mod2 <- transform(dat, mod = mod + 7 * month + 7 * month * rnorm(nrow(dat)),
model = "model 2")
mod.dat <- rbind(mod1, mod2)
Much of this appears to have been hard coded, so I don't think modifying this plot will be easy in general. In the specific case of the legend text, you can modify some arguments in the plot object after creating it:
out <- TaylorDiagram(mod.dat, obs = "obs", mod = "mod", group = "model")
out$plot$legend$right$args$key$text$cex <- 1.5
out$plot$legend$right$args$key$cex.title <- 1.5
I don't see anything similar that only applies to the text in the plot itself. To modify that you'd likely have to dig further into the code itself and modify it to get the specific results you want.
Indeed, digging further, much of the details of the plot are taking place in custom panel functions panel.taylor.setup and panel.taylor in which almost all of the specific sizes of things are hard coded.
I have made a function that can plot the loadings from many factor analyses at once, also when their variables do not overlap perfectly (or at all). It works fine, but sometimes factor loadings are identical across analyses which means that the points get plotted on top of each other.
library(pacman)
p_load(devtools, psych, stringr, plotflow)
source_url("https://raw.githubusercontent.com/Deleetdk/psych2/master/psych2.R")
loadings.plot2 = function(fa.objects, fa.names=NA) {
fa.num = length(fa.objects) #number of fas
#check names are correct or set automatically
if (length(fa.names)==1 & is.na(fa.names)) {
fa.names = str_c("fa.", 1:fa.num)
}
if (length(fa.names) != fa.num) {
stop("Names vector does not match the number of factor analyses.")
}
#merge into df
d = data.frame() #to merge into
for (fa.idx in 1:fa.num) { #loop over fa objects
loads = fa.objects[[fa.idx]]$loadings
rnames = rownames(loads)
loads = as.data.frame(as.vector(loads))
rownames(loads) = rnames
colnames(loads) = fa.names[fa.idx]
d = merge.datasets(d, loads, 1)
}
#reshape to long form
d2 = reshape(d,
varying = 1:fa.num,
direction="long",
ids = rownames(d))
d2$time = as.factor(d2$time)
d2$id = as.factor(d2$id)
colnames(d2)[2] = "fa"
print(d2)
#plot
g = ggplot(reorder_by(id, ~ fa, d2), aes(x=fa, y=id, color=time, group=time)) +
geom_point(position=position_dodge()) +
xlab("Loading") + ylab("Indicator") +
scale_color_discrete(name="Analysis",
labels=fa.names)
return(g)
}
#Some example plots
fa1 = fa(iris[-5])
fa2 = fa(iris[-c(1:50),-5])
fa3 = fa(ability)
fa4 = fa(ability[1:50,])
loadings.plot2(list(fa1,fa1,fa2))
Here I've plotted the same object twice just to show the effect. The plot has no red points because the green ones from fa.2 are on top. Instead, I want them to be dodged on the y-axis. However, position="dodge" with various settings does not appear to make a difference.
However, position="jitter" works, but it is random, so sometimes it does not work well as well as makes the plot chaotic to look at.
How do I make the points dodged on the y-axis?
Apparently, you can only dodge sideways, but there is a workaround. The trick is to flip your x and y, do the position_dodge, and then do a coord_flip().
g = ggplot(data = reorder_by(id, ~ fa, d2), aes(x=id, y=fa, color=time, group=time)) +
geom_point(position=position_dodge(width = .5)) +
xlab("Loading") + ylab("Indicator") +
scale_color_discrete(name="Analysis",
labels=fa.names) +
coord_flip()
Possible duplicate
In the linked post, the right answer states that one must use position_jitter() instead of position_dodge(). It has worked for me.