representing the name of variables in a scatterplot - r

I need to write a function which draws a plot for the variables. The problem is that it doesn't print the name of variables.
visual<-function( x , y){
df<-cbind(x,y)
df<-scale(df, center = TRUE, scale = TRUE)
df<-as.data.frame(df)
ggpairs(df, columns=1:2,xlab = colnames(df)[1],ylab =colnames(df)[2])
}
If we have these to vectors:
a <- c(128.095014, 71.430997, 88.704595, 48.180638)
b <- c(10.584888, 10.246740, 4.422322, 9.621246)
visual(a,b)
What is wrong with that?

You can use substitute to get the names of the objects passed into your function.
visual<-function(x, y){
xname <- substitute(x)
yname <- substitute(y)
df<-cbind(x,y)
df<-scale(df, center = TRUE, scale = TRUE)
df<-as.data.frame(df)
names(df) <- c(xname, yname)
GGally::ggpairs(df, columns=1:2, xlab = colnames(df)[1], ylab =colnames(df)[2])
}
b<-c(128.095014, 71.430997, 88.704595, 48.180638)
a<-c(10.584888, 10.246740, 4.422322, 9.621246)
visual(a,b)
output

Related

Adding descriptive statistics to a function when making use of the ellipsis as input variable

For an assignment I have created a function in R that calculates the regression coefficients, predicted values and residuals of data that is useful for multiple linear regression. It did that as follows:
MLR <- function(y_var, ...){
y <- y_var
X <- as.matrix(cbind(...))
intercept <- rep(1, length(y))
X <- cbind(intercept, X)
regression_coef <- solve(t(X) %*% X) %*% t(X) %*% y
predicted_val <- X %*% regression_coef
residual_val <- y - predicted_val
scatterplot <- plot(predicted_val, residual_val,
ylab = 'Residuals', xlab = 'Predicted values',
main = 'Predicted values against the residuals',
abline(0,0))
list('y' = y,
'X' = X,
'Regression coefficients' = regression_coef,
'Predicted values' = predicted_val,
'Residuals' = residual_val,
'Scatterplot' = scatterplot
)
}
Now, my struggle is to add descriptive statistics of my input variables. Since I want my independent variables to be able to be any number, I used the ellipsis as input variable. Is there a way to calculate useful descriptive statistics (mean, variance, standard deviation) of my independent variables (defined by the ...)?
This
mean(...)
does not work...
Thank you for the replies already!
Try this slight changes on your function. I have applied to some variables of iris dataset. You can compute the desired statistics over X and then output as an additional slot for your output. Here the code:
#Function
MLR <- function(y_var, ...){
y <- y_var
X <- as.matrix(cbind(...))
RX <- X
intercept <- rep(1, length(y))
X <- cbind(intercept, X)
regression_coef <- solve(t(X) %*% X) %*% t(X) %*% y
predicted_val <- X %*% regression_coef
residual_val <- y - predicted_val
scatterplot <- plot(predicted_val, residual_val,
ylab = 'Residuals', xlab = 'Predicted values',
main = 'Predicted values against the residuals',
abline(0,0))
#Summary
#Stats
DMeans <- apply(RX,2,mean,na.rm=T)
DSD <- apply(RX,2,sd,na.rm=T)
DVar <- apply(RX,2,var,na.rm=T)
DSummary <- rbind(DMeans,DSD,DVar)
#Out
list('y' = y,
'X' = X,
'Regression coefficients' = regression_coef,
'Predicted values' = predicted_val,
'Residuals' = residual_val,
'Scatterplot' = scatterplot,
'Summary' = DSummary
)
}
#Apply
MLR(y_var = iris$Sepal.Length,iris$Sepal.Width,iris$Petal.Length)
The final slot of the output will look like this:
$Scatterplot
NULL
$Summary
[,1] [,2]
DMeans 3.0573333 3.758000
DSD 0.4358663 1.765298
DVar 0.1899794 3.116278
I think I've got it. Unfortunately, the ellipsis seems to be quite quirky to work with them. Check if the cbind(...) functions correctly inside your function (when I've checked it at the output, it was only 1 column wide, while I input 2 variables into it, and that don't seem right.
My solution don't read variable names - it uses placeholder names (Var_1, Var_2, ... , Var_n)
MLR <- function(y_var, ...){
# these two packages will come in handy
require(dplyr)
require(tidyr)
y <- y_var
X <- as.matrix(cbind(...))
# firstly, we need to make df/tibble out of ellipsis
X2 <- list(...)
n <- tibble(n = rep(0, times = length(y)))
index <- 0
for(Var in X2){
index <- index + 1
n[, paste0("Var_", index)] <- Var
}
# after the df was created, now it's time for calculating desc
# Using tidyr::gather with dplyr::summarize creates nice summary,
# where each row is another variable
descriptives <- tidyr::gather(n, key = "Variable", value = "Value") %>%
group_by(Variable) %>%
summarize(mean = mean(Value), var = var(Value), sd = sd(Value), .groups = "keep")
# everything except the output list is the same
intercept <- rep(1, length(y))
X <- cbind(intercept, X)
regression_coef <- solve(t(X) %*% X) %*% t(X) %*% y
predicted_val <- X %*% regression_coef
residual_val <- y - predicted_val
scatterplot <- plot(predicted_val, residual_val,
ylab = 'Residuals', xlab = 'Predicted values',
main = 'Predicted values against the residuals',
abline(0,0))
list('y' = y,
'X' = X,
'Regression coefficients' = regression_coef,
'Predicted values' = predicted_val,
'Residuals' = residual_val,
'Scatterplot' = scatterplot,
'descriptives' = descriptives[-1,] # need to remove the first row
# because it is "n" placeholder
)
}

R Programming other alternatives for plot

I wonder how you can simplify these two :
plot (payroll,wins)
id = identify(payroll, wins,labels = code, n = 5)
plot (payroll,wins)
with(data, text(payroll, wins, labels = code, pos = 1, cex=0.5))
using other alternatives - pch() dan as.numeric()?
Not sure it's easier but you change pch during identification as below (taken from the R-help). Every time you click empty point change to filled-in dot.
# data simulation
data <- data.frame(payroll = rnorm(10), wins = rnorm(10), code = letters[1:10])
identifyPch <- function(x, y = NULL, n = length(x), plot = FALSE, pch = 19, ...)
{
xy <- xy.coords(x, y)
x <- xy$x
y <- xy$y
sel <- rep(FALSE, length(x))
while (sum(sel) < n) {
ans <- identify(x[!sel], y[!sel], labels = which(!sel), n = 1, plot = plot, ...)
if(!length(ans)) {
break
}
ans <- which(!sel)[ans]
points(x[ans], y[ans], pch = pch)
sel[ans] <- TRUE
}
## return indices of selected points
which(sel)
}
if(dev.interactive()) { ## use it
with(data, plot(payroll,wins))
id = with(data, identifyPch(payroll, wins))
}

Zipf_plot() : How to compare two objects in one graph?

I'm trying to use the Zipf_plot function from the tm package to compare two different document-term-matrices - and I'm not an R expert ..
Maybe you could tell me, if there's a way to fit both in this function?
Zipf_plot(x, type = "l", ... )
I know, there's a possibility to get both (or more) of them in one window:
par(mfrow=c())
but I'd really appreciate a solution with two or more dtms in one graph.
Thanks in advance! :-)
You could try par(new=T) or try to adjust the function according to your needs, e.g.:
library(tm)
data("acq")
data("crude")
m1 <- DocumentTermMatrix(acq)
m2 <- DocumentTermMatrix(crude)
Zipf_plot(m1, col = "red")
par(new=T)
Zipf_plot(m2, col="blue")
Zipf_plot_multi <- function (xx, type = "l", cols = rainbow(length(xx)), ...) {
stopifnot(is.list(xx) & length(xx)==length(cols))
for (idx in seq_along(xx)) {
x <- xx[[idx]]
if (inherits(x, "TermDocumentMatrix"))
x <- t(x)
y <- log(sort(slam::col_sums(x), decreasing = TRUE))
x <- log(seq_along(y))
m <- lm(y ~ x)
dots <- list(...)
if (is.null(dots$xlab))
dots$xlab <- "log(rank)"
if (is.null(dots$ylab))
dots$ylab <- "log(frequency)"
if (idx==1) {
do.call(plot, c(list(x, y, type = type, col = cols[idx]), dots))
} else {
lines(x, y, col = cols[idx])
}
abline(m, col = cols[idx], lty = "dotted")
print(coef(m))
}
}
Zipf_plot_multi(list(m1, m2), xlim=c(0, 7), ylim=c(0,6))

label ylab in timeSeries::plot, type = 'o'

How do I label the y-axis, using timeSeries::plot, with Greek letters? i.e. change SB, SP, etc. to \alpha, \beta etc., I'm am aware I need expression(), in some way. However I can't even get to the labels (I normally use ggplot2). Code below.
# install.packages("xtable", dependencies = TRUE)
library("timeSeries")
## Load Swiss Pension Fund Benchmark Data -
LPP <- LPP2005REC[1:12, 1:4]
colnames(LPP) <- abbreviate(colnames(LPP), 2)
finCenter(LPP) <- "GMT"
timeSeries::plot(LPP, type = "o")
It have been pointed out that the object structure, obtained with str(), is quite particular in LPP compared to say this object z
z <- ts(matrix(rnorm(300), 100, 3), start = c(1961, 1), frequency = 12)
plot(z)
If any one has an answer to both or any I would appreciate it. I realize I can convert the data and plot it with ggplot2, I have seen that here on SO, but I am interested in doing in directly on the timeSeries object LPP and the stats (time-series object) z
[ REVISION & Edited ]
When plot.type is "multiple", we can't define ylab directly. Both plot(ts.obj) (S3 method) and plot(timeSeries.obj) (S4 method) take colnames(obj) as ylab, and I don't know any methods of using Greek letters as colname. (The difference in structure mainly comes from the difference of S3 and S4; colnames(timeSeries.obj) is equivalent to timeSeries.obj#units; the defaults is Series i and TS.i).
We can step in ylab using the arugument, panel (It wants a function and the default is lines). It is used in for(i in 1:ncol(data)). I couldn't give panel.function a suitable "i" (I guess it can in some way, but I didn't think up), so I got "i" using which col the data matches.
for timeSeries
ylabs <- expression(alpha, beta, gamma, delta)
row1 <- LPP[1,]
timeSeries.panel.f <- function(x, y, ...) {
lines(x, y, ...)
mtext(ylabs[which(row1 %in% y[1])], 2, line = 3)
}
plot(LPP, panel = timeSeries.panel.f, type = "o", ann = F)
title("Title")
mtext("Time", 1, line = 3)
## If you aren't so concerned about warnings, here is more general.
## (Many functions read `...` and they return warnings).
timeSeries.panel.f2 <- function(x, y, ..., ylabs = ylabs, row1 = row1) {
lines(x, y, ...)
mtext(ylabs[which(row1 %in% y[1])], 2, line = 3)
}
plot(LPP, panel = timeSeries.panel.f2, type = "o", ann = F,
ylabs = expression(alpha, beta, gamma, delta), row1 = LPP[1,])
title("Title")
mtext("Time", 1, line = 3)
for ts
ylabs <- expression(alpha, beta, gamma)
row1 <- z[1,]
ts.panel.f <- function(y, ...) {
lines(y, ...)
mtext(ylabs[which(row1 %in% y[1])], 2, line = 3)
}
plot(z, panel = ts.panel.f, ann = F)
title("Title")
mtext("Time", 1, line = 3)
Of course you can archieve it using new functions made from the original (mostly the same as the original). I showed only the modified points.
modified plot(ts.obj) (made from plot.ts)
my.plot.ts <- function(~~~, my.ylab = NULL) {
:
nm <- my.ylab # before: nm <- colnames(x)
:
}
# use
my.plot.ts(z, my.ylab = expression(alpha, beta, gamma), type = "o")
modified plot(timeSeries.obj)
# made from `.plot.timeSeries`
my.plot.timeSeries <- function(~~~, my.ylab = NULL) {
:
my.plotTimeSeries(~~~, my.ylab = my.ylab)
}
# made from `timeSeries:::.plotTimeSeries`
my.plotTimeSeries <- function(~~~, my.ylab) {
:
nm <- my.ylab # before: nm <- colnames(x)
:
}
#use
my.plot.timeSeries(LPP, my.ylab = expression(alpha, beta, gamma, delta), type="o")

Intercepting & using the value of an optional variable captured in the dots (...)

I need to intercept the value of an optional xlim in a function so that I can change the units of it before plotting. The following function confirms that xlim was passed, but I can't access the value.
foo <- function(x, y, ...) {
if ("xlim" %in% names(list(...))) {
print(xlim) # not found/can't use value!
}
# modify xlim and pass to plotting functions
return()
}
But foo(x = 1:5, y = 1:5, xlim = c(2,4)) gives:
Error in print(xlim) : object 'xlim' not found
What trick do I need use the value? Seems like it should just work, but I see from looking around on SO that the dots can be vexing. I've played a bit with exists, deparse etc but I don't really 'get' the proper use of those functions.
EDIT: so here is the final snippet which was the leanest way to access the value:
dots <- list(...)
if (any(names(dots) == "xlim")) {
xlim <- dots$xlim
print(xlim)
}
This is because xlim is actually a list element, and is not (yet) an actual object in the function's environment. You could do
foo <- function(x, y, ...) {
m <- match.call(expand.dots = FALSE)$...
if(any(names(m) == "xlim")) m[["xlim"]]
else stop("no xlim value")
}
foo(x = 1:5, y = 1:5, xlim = c(2,4))
# c(2, 4)
foo(x = 1:5, y = 1:5, ylim = c(2,4))
# Error in foo(x = 1:5, y = 1:5, ylim = c(2, 4)) : no xlim value
You can see what match.call is doing if we examine the function as
f <- function(x, y, ...) {
match.call(expand.dots = FALSE)$...
}
It is a list of all the entered dot arguments with their respective expressions, so there are many different ways to get the values, the above is just one way.
f(x = 1:5, y = 1:5, xlim = c(2,4))
# $xlim
# c(2, 4)
Alternatively, you could do
g <- function(x, y, ...) {
dots <- list(...)
any(names(dots) == "xlim")
}
g(x = 1:5, y = 1:5, xlim = c(2,4))
# [1] TRUE
Also keep in mind that match.call keeps the argument as an unevaluated call, while list(...) evaluates the argument. This might be important for you passing the argument to other functions.

Resources