r increase the font size of values in dendogram plot - r

How can I can increase the font size of labels , x2, x1, x3, x4 in the plot produced based on the function varclus
set.seed(1)
x1 <- rnorm(200)
x2 <- rnorm(200)
x3 <- x1 + x2 + rnorm(200)
x4 <- x2 + rnorm(200)
x <- cbind(x1,x2,x3,x4)
v <- varclus(x, similarity="spear") # spearman is the default anyway
v # invokes print.varclus
print(round(v$sim,2))
plot(v)
Thanks.

plot.varclus internally calls plot.hclus as you can see by running:
getS3method("plot",class = 'varclus')
and it passes along the labels argument (and the ... argument(s)).
this includes a font scaling argument cex
so try:
plot(v,
cex = 1.5)

Related

rgl: plot3d with "extended" plotting symbols

I am trying to extend the symbols available to me for plotting in 3D. In 2D, I use:
x1 <- sort(rnorm(10))
y1 <- rnorm(10)
z1 <- rnorm(10) + atan2(x1, y1)
x2 <- sort(rnorm(10))
y2 <- rnorm(10)
z2 <- rnorm(10) + atan2(x2, y2)
x3 <- sort(rnorm(10))
y3 <- rnorm(10)
z3 <- rnorm(10) + atan2(x3, y3)
new.styles <- -1*c(9818:9824, 9829, 9830, 9831)
In 2D, my plot works and gives the appropriate symbol:
plot(x1, y1, col="red", bg="red", pch=new.styles, cex = 2)
and the plot is here:
In 3D, however, the symbols do not get translated correctly.
rgl::plot3d(x1, y1, z1, col="red", bg="red", pch=new.styles, size = 10)
this yields:
The symbols are getting replaced with (one) circle.
I also tried with pch3d and got blank plots. However, pch3d does work with the "standard" plotting symbols.
rgl::pch3d(x1, y1, z1, col="red", bg="red", pch=10:19, size = 10)
I get the plot:
So, it appears to be that at least the symbols are not displaying in 3D. How can I display the preferred symbols?
I was able to only get a solution using text3d() -- hopefully there exists a better solution.
x1 <- sort(rnorm(12))
y1 <- rnorm(12)
z1 <- rnorm(12) + atan2(x1, y1)
x2 <- sort(rnorm(12))
y2 <- rnorm(12)
z2 <- rnorm(12) + atan2(x2, y2)
x3 <- sort(rnorm(12))
y3 <- rnorm(12)
z3 <- rnorm(12) + atan2(x3, y3)
new.styles <- c(9818:9824, 9829, 9830, 9831, 9832, 9827)
rgl::open3d()
pal.col <- RColorBrewer::brewer.pal(name = "Paired", n = 12)
for (i in 1:12)
rgl::text3d(x1[i], y1[i], z1[i], col=pal.col[i], text = intToUtf8(new.styles[i]), cex = 2, usePlotmath = TRUE)
rgl::box3d()
This yields the figure:
This may well be too complicated, hopefully there are better solutions out there.
This is the best I could do:
Set up file for texture/shape:
crown <- tempfile(pattern = "crown", fileext = ".png")
png(filename = crown)
plot(1,1, ann=FALSE, axes=FALSE, pch=-9818, cex = 40, col = 2)
dev.off()
Load package, define a function to plot the texture at a random point:
library(rgl)
xyz <- cbind(c(0,1,1,0), 0, c(0,0,1,1))
add_quad_point <- function(shape = crown, sd = 3) {
pos <- rnorm(3, sd = sd)
m <- sweep(xyz, MARGIN=2, STATS = pos, FUN = "+")
quads3d(m,
texture = shape,
texcoords = xyz[,c(1,3)],
col = "white",
specular = "black")
}
open3d()
replicate(10, add_quad_point())
axes3d()
## close3d()

Visualize components of data-generating process in R

I try to replicate this figure with the true underlying function given also there (see also code below).
I was wondering how the author came up with this (at first glance easy to replicate) figure. If I look e.g. at the first component of (11) f(X_1) = 8*sin(X_1) I cannot see how the author obtains the corresponding graph which has negative function values (As far as I understand the paper, the domain of the X's take values in the range of 0 to 3). Same confusion about the last linear component.
Link to full article: https://epub.ub.uni-muenchen.de/2057/1/tr002.pdf
This is my code
rm(list = ls())
library(mboost)
set.seed(2)
n_sim <- 50
n <- 100
# generate design matrix
x <- seq(from=0.00001, to=3, length.out=n)
x1 <- sample(x, size= n)
x2 <- sample(x, size= n)
x3 <- sample(x, size= n)
x4 <- sample(x, size= n)
x5 <- sample(x, size= n)
x6 <- sample(x, size= n)
x7 <- sample(x, size= n)
x8 <- sample(x, size= n)
x9 <- sample(x, size= n)
X <- matrix(c(x1, x2, x3, x4, x5, x6, x7, x8, x9), nrow = n, ncol = 9)
# generate true underlying function and observations with errors
f_true_train <- 1+ 8*sin(X[,1]) + 3*log(X[,2]) - 0.8*(X[,7]^4-X[,7]^3-5*X[,7]^2) - 3*X[,8]
y <- f_true_train + rnorm(n, 0, 3)
# plot components of true underlying function as in Fig. 1
# of Boosting Additive Models using Component-wise P-Splines by Schmid & Hothorn (2007)
plot(X[,1], 8*sin(X[,1]))
plot(X[,2], 3*log(X[,2]))
plot(X[,7], - 0.8*(X[,7]^4-X[,7]^3-5*X[,7]^2))
plot(X[,8], - 3*X[,8])

Multiplicative regression

I am trying to estimate a regression model on a data set with one continuous dependent variable (y) and three categorical independent variables (x1,x2,x3). For example imagine y is the price you pay for a smartphone and x are three features (say color, size and storage space).
My assumption is that each feature represents a multiplicative factor relative to an (unknown) baseline price. So if the baseline price for your phone is 100 a red color would increase this by 25%, a large size decrease it by 50% and high storage space increase by 75%. This means the final price of the phone would be 100 x (1+0.25) x (1-0.50) x (1+0.75) = 109.375.
The problem is that I only know the final price (not the baseline price) and the individual features. How can I estimate the multiplicative factors that go along with these features? I have written a brief simulation in R below to illustrate this problem.
Thanks for your help with this,
Michael
x_fun <- function() {
tmp1 <- runif(N)
tmp2 <- cut(tmp1, quantile(tmp1, probs=c(0, 1/3, 2/3, 3/3)))
levels(tmp2) <- seq(1:length(levels(tmp2)))
tmp2[is.na(tmp2)] <- 1
as.factor(tmp2)}
N <- 1000
x1 <- x_fun()
x2 <- x_fun()
x3 <- x_fun()
f1 <- 1+0.25*(as.numeric(x1)-2)
f2 <- 1+0.50*(as.numeric(x2)-2)
f3 <- 1+0.75*(as.numeric(x3)-2)
y_Base <- runif(min=0, max=1000, N)
y <- y_Base*f1*f2*f3
output <- data.frame(y, x1, x2, x3)
rm(y_Base, f1, f2, f3, N, y, x_fun, x1, x2, x3)
I think you can do it like this if you know the base levels of your factors:
N <- 1000
set.seed(42)
x1 <- x_fun()
x2 <- x_fun()
x3 <- x_fun()
f1 <- 1+0.25*(as.numeric(x1)-2)
f2 <- 1+0.50*(as.numeric(x2)-2)
f3 <- 1+0.75*(as.numeric(x3)-2)
y_Base <- runif(min=0, max=1000, N)
y <- y_Base*f1*f2*f3
str(x1)
output <- data.frame(y, x1, x2, x3)
#rm(y_Base, f1, f2, f3, N, y, x_fun, x1, x2, x3)
output[, c("x1", "x2", "x3")] <- lapply(output[, c("x1", "x2", "x3")], relevel, ref = "2")
fit <- glm(y ~ x1 + x2 + x3, data = output, family = gaussian(link = "log"))
summary(fit)
predbase <- exp(log(output$y) - predict(fit, type = "link") + coef(fit)["(Intercept)"])
library(ggplot2)
ggplot(data.frame(x = y_Base, y = predbase, output[, c("x1", "x2", "x3")]),
aes(x = x, y = y)) +
geom_point() +
facet_wrap( ~ x1 + x2 + x3) +
geom_abline(slope = 1, color = "dark red")

How to plot ols with r.c. splines

I'd like to plot the predicted line of the regression that contains a restricted cubic spline due to non-linearity in the model and the standard error bands. I can get the predicted points, but am not sure to to just plot the lines and error bands. ggplot is preferred, or base graphics is fine also. Thanks.
Here is an example from the documentation:
library(rms)
# Fit a complex model and approximate it with a simple one
x1 <- runif(200)
x2 <- runif(200)
x3 <- runif(200)
x4 <- runif(200)
y <- x1 + x2 + rnorm(200)
f <- ols(y ~ rcs(x1,4) + x2 + x3 + x4)
pred <- fitted(f) # or predict(f) or f$linear.predictors
f2 <- ols(pred ~ rcs(x1,4) + x2 + x3 + x4, sigma=1)
fastbw(f2, aics=100000)
options(datadist=NULL)
And a plot of the predicted values of the model:
plot(predict(f2))
The rms package has a number of helpful functions for this purpose. It is worth looking at http://biostat.mc.vanderbilt.edu/wiki/Main/RmS
In this instance, you can simple set datadist (which set up distribution summaries for predictor variables) appropriately and then use plot(Predict(f) or ggplot(Predict(f))
set.seed(5)
# Fit a complex model and approximate it with a simple one
x1 <- runif(200)
x2 <- runif(200)
x3 <- runif(200)
x4 <- runif(200)
y <- x1 + x2 + rnorm(200)
f <- ols(y ~ rcs(x1,4) + x2 + x3 + x4)
ddist <- datadist(x1,x2,x3,x4)
options(datadist='ddist')
plot(Predict(f))
ggplot(Predict(f))

Plot a function for every row in an R data frame [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
This is my data frame:
Variables:
$ X1 (dbl) 3.742382, 4.185260, 3.869329, 4.468430, 4.287528, 4.422470, 4.23...
$ X2 (dbl) 7.0613552, 3.1143999, 6.4780125, 0.8486984, 3.4132880, 1.6816965...
$ X3 (dbl) -2.02416823, 9.10853246, -0.56165113, 16.16834346, 8.02026020, 1...
$ X4 (dbl) 15.0497971, 5.0139219, 13.8001589, -2.0927945, 6.5455396, -0.790...
Xn are the parameters of a 4th degree polynomial:
f(x) = X1*x + X2*x^2 + X2*x^3 + X2*x^4
Thus, each row represents a function. Is it possible to plot each function in the same graph?
Something like this?
DF <- data.frame(X1 = rnorm(10), X2 = rnorm(10), X3 = rnorm(10), X4 = rnorm(10))
# fixed plot region:
xmin<-0
xmax<-10
ymin<- -10
ymax<-10
for (i in 1:10 ) {
curve(DF$X1[i]*x+DF$X2[i]*x^2+DF$X3[i]*x^3+DF$X4[i]*x^4, xlim=c(xmin,xmax), ylim=c(ymin,ymax), add=TRUE)
}
EDIT:By using ggplot:
library(ggplot2)
library(reshape)
xmin<-0
xmax<-10
step<-0.01
DF <- data.frame(X1 = rnorm(10), X2 = rnorm(10), X3 = rnorm(10), X4 = rnorm(10))
xx<-seq(xmin,xmax,by=step)
DF2<-data.frame(matrix("", ncol = length(DF$X1), nrow = length(xx)))
DF2$xx<-xx
for(i in 1:length(DF$X1)){
DF2[,i]<-DF$X1[i]*xx+DF$X2[i]*xx^2+DF$X3[i]*xx^3+DF$X4[i]*xx^4
}
DF3 <- melt(DF2 , id.vars = "xx")
ggplot(DF3, aes(xx,value)) + geom_line(aes(colour = variable))
As addition to #T.Des answer I have an alternative approach using dplyr and ggplot2 packages. Run the script step by step to see how it works.
library(dplyr)
library(ggplot2)
# get the parameters as a dataframe
dt = data.frame(x1 = c(2,4,5),
x2 = c(6,7,2),
x3 = c(1,2,4),
x4 = c(2,1,8))
# create plot_id based on the number of plots you want to produce
dt$plot_id = 1:nrow(dt)
# input how many points you want for your plot
Np = 10
data.frame(expand.grid(dt$plot_id, 1:Np)) %>% # create all combinations of plot id and points
select(plot_id = Var1,
point=Var2) %>%
inner_join(dt, by="plot_id") %>% # join back the parameters
mutate(y = x1*point + x2*point^2 + x3*point^3 + x4*point^4, # calculate your output value
plot_id = as.character(plot_id)) %>%
ggplot(., aes(point,y,color=plot_id)) +
geom_line()
Based on my example dataset you should get something like:
I would try something like this (although it is not an optimal solution)
f <- function (x, X0, X1, X2, X3, X4){
return (X0 + X1*x + X2*x^2 + X3*x^3 + X4*x^4)
}
x <- seq(-10,10,length.out = 100)
for (i in c(1:length(DF$X1)){
plot( f(x, 0, DF$X1[i], DF$X2[i], DF$X3[i], DF$X4[i]) ,t='l')
par(new = TRUE)
}
Please note that I have defined a function of degree 4 (therefore you need 5 coefficients to express it). Your data suggested that X0 = 0.
I also decided to plot the polynoms between -10 and 10, you can change it if you want...
You also could do it like this with pure base R functionality:
set.seed(1)
x <- seq(-10, 10, length.out = 100)
xx <- rbind(x, x ^ 2, x ^ 3, x ^ 4)
m <- matrix(rnorm(40), ncol = 4) # your coefficients
matplot(x, t(m %*% xx), type = "l")
So note that there is no constant term in your polynomial. You just need to adapt x to your data and run this code.

Resources