R plotting dataframe and keeping track of factor to number conversion - r

He is my dataframe and its plot
my_df <- data.frame(var_1= as.factor(sample(c(0,1), 10, replace = TRUE)),
var_2 = sample(1:20, 10, replace = TRUE),
var_3 = as.factor(sample(c('a','b', 'c'), 10, replace = TRUE)))
plot(my_df)
So factors are getting converted to numeric values. How can I figure out map between factor value and its numeric representation? For example it looks like var_3 have the following conversion is in place a -> 1, b -> 2, c -> 3.
Also can I display this conversion map on the graph?

plot.data.frame passes ... arguments to ?pairs which has a labels argument
my_df <- data.frame(var_1= as.factor(sample(c(0,1), 10, replace = TRUE)),
var_2 = sample(1:20, 10, replace = TRUE),
var_3 = as.factor(sample(c('a','b', 'c'), 10, replace = TRUE)))
plot(my_df, labels = LETTERS[1:3])
So just get a vector of labels and use that
f <- function(data, default = names(data), use.varname = TRUE) {
default <- rep_len(default, ncol(data))
sapply(seq_along(data), function(ii) {
x <- data[, ii]
if (is.factor(x)) {
lbl <- paste(levels(x), seq.int(nlevels(x)), sep = ' -> ', collapse = '\n')
if (use.varname)
paste(default[ii], lbl, sep = '\n') else lbl
} else default[ii]
})
}
f(my_df, use.varname = FALSE)
# [1] "0 -> 1\n1 -> 2" "var_2" "a -> 1\nb -> 2\nc -> 3"
f(my_df, use.varname = TRUE)
# [1] "var_1\n0 -> 1\n1 -> 2" "var_2" "var_3\na -> 1\nb -> 2\nc -> 3"
plot(my_df, labels = f(my_df))

A better option might be to use ggpairs that gives more information
library(GGally)
ggpairs(my_df)

The integers corresponds to the levels of your factor. If you want a scatterplot as that reported in your example, you can simply set the labels for your axis using the axis function.
plot(as.numeric(my_df$var_3), as.numeric(as.vector(my_df$var_1)), axes = F)
axis(side = 1, labels = levels(my_df$var_3), at = 1:length(levels(my_df$var_3)))
axis(side = 2)
box()
Now, if you want a multi-plot result, you can do as follows.
par(mfrow=c(3,3))
for (i in 1:ncol(my_df)){
for (j in 1:ncol(my_df)){
if (i == j) {
plot(1, cex = 0, ylim = c(0,2), xlim = c(0,2))
text(1, 1, labels = paste(names(my_df)[j]))
} else {
plot(as.numeric(my_df[,i]), as.numeric(my_df[,j]), axes = F,
xlab = names(my_df)[i], ylab = names(my_df)[j])
if (is.factor(my_df[,i])){
axis(side = 1, labels = levels(my_df[,i]), at = 1:length(levels(my_df[,i])))
} else {
axis(side = 1)
}
if (is.factor(my_df[,j])){
axis(side = 2, labels = levels(my_df[,j]), at = 1:length(levels(my_df[,j])))
} else {
axis(side = 2)
}
box()
}
}
}
par(mfrow=c(1,1))
Definitely verbose and not very pretty, but as you can see the variable levels (class names) are retained and plotted at the axis ticks...

Related

Dot Plot include vertical line and dots of different colors

I needed to include in the code below, a vertical line,
for example, in position x = 5 and that all points smaller than 5 have another color,
for example blue.
The values of a variable can be read from the x-axis, and the y-axis shows the order of the observations in the variable (from bottom to top). Isolated points as the far ends, and on either side in a plot, suggest potentional outliers
Thanks
library(dplyr)
library(lattice)
n = 1000
df <- tibble(
xx1 = runif(n, min = 3, max = 10),
xx2 = runif(n, min = 3, max = 10),
xx3 = runif(n, min = 3, max = 10)
)
MyVar <- c("xx1","xx2","xx3")
MydotplotBR <- function(DataSelected){
P <- dotplot(as.matrix(as.matrix(DataSelected)),
groups=FALSE,
strip = strip.custom(bg = 'white',
par.strip.text = list(cex = 1.2)),
scales = list(x = list(relation = "same",tck = 1,
draw = TRUE, at=seq(0,10,1)),x=list(at=seq),
y = list(relation = "free", draw = FALSE),
auto.key = list(x =1)),
col=10,
axes = FALSE,
cex = 0.4, pch = 5,
xlim=c(0,10),
xlab = list(label = "Variable Value", cex = 1.5),
ylab = list(label = "Order of data in the file", cex = 1.5))
print(P)
}
(tempoi <- Sys.time())
Vertemp <- MydotplotBR(df[,MyVar])
(tempof <- Sys.time()-tempoi)
I find it weird that you want a color dependent only on the x-axis when values are also used on the y-axis of other plots.
Nevertheless, here's a homemade pairs_cutoff() function doing what you want.
pairs_cutoff <- function(data, cutoff, cols = c("red", "blue"),
only.lower = F, ...){
data <- as.data.frame(data)
cns <- colnames(data)
nc <- ncol(data)
layout(matrix(seq_len(nc^2), ncol = nc))
invisible(
sapply(seq_len(nc), function(i){
sapply(seq_len(nc), function(j){
if(i == j){
plot.new()
legend("center", bty = "n", title = cns[i], cex = 1.5, text.font = 2, legend = "")
} else {
if(j < i & only.lower)
plot.new()
else{
if(is.null(cutoff))
cols <- cols[1]
plot(data[,i], data[,j], col = cols[(data[,i] < cutoff) + 1],
xlab = cns[i], ylab = cns[j], ...)
}
}
})
})
)
}
Using your suggested data :
n = 1000
dat <- tibble(
xx1 = runif(n, min = 3, max = 10),
xx2 = runif(n, min = 3, max = 10),
xx3 = runif(n, min = 3, max = 10)
)
pairs_cutoff(dat, cutoff = 5, only.lower = T)
outputs the following plot :
You can specify extra parameters to the plot function (eg. pch) directly to pairs_cutoff.
Also, if you want the full symmetric grid of plots, set only.lower = F.

How to plot a two-columned grid of time series with custom titles using R?

I have the following code (although without data, sadly):
detrend_plot <- cbind(l_p_lng,l_vol_lng,l_p_oil,l_rgdpe, ldiff_p_lng,ldiff_vol_lng,ldiff_p_oil,ldiff_rgdpe)
plot.ts(detrend_plot, main="",)
which gives the following plot:
What I want to do is to add custom titles, individual y-axis labels, and x-axis labels. I know that this is possible using GGPLOT, although my knowledge of it is sparse. Has anyone encountered a similar problem? I don't think this is possible using the regular plot.ts( ) function.
I don't think you can pass multiple titles and labels to plot.ts directly, but you can just loop over your columns with vectors of labels for each:
set.seed(1)
z <- ts(matrix(rt(200 * 8, df = 3), 200, 8), start = c(1961, 1), frequency = 12)
## vectors of x, y, and main labels
xl <- sprintf('x label %s', 1:8)
yl <- sprintf('y label %s', 1:8)
ml <- sprintf('main label %s', 1:8)
par(mfrow = c(4, 2), mar = c(5, 5, 1, 1), oma = c(0, 0, 1, 2))
lapply(1:8, function(ii) {
x <- z[, ii, drop = FALSE]
plot(x, xlab = xl[ii], ylab = yl[ii], main = ml[ii])
})
You can also pass vectors of arguments (eg, for x- or y-axis limits) using lists:
ylim <- list(c(-10, 10))
ylim <- rep(ylim, 8)
par(mfrow = c(4, 2), mar = c(5, 5, 1, 1), oma = c(0, 0, 1, 2))
lapply(1:8, function(ii) {
x <- z[, ii, drop = FALSE]
plot(x, xlab = xl[ii], ylab = yl[ii], main = ml[ii], col = ii, ylim = ylim[[ii]])
})
To get a figure closer to the default plot.ts look, you can just set top and bottom margins to 0 and adjust the axes (which is what plot.ts is doing under the hood). This method is a bit more verbose than plot.ts but will allow for more customization:
par(mfrow = c(4, 2), mar = c(0, 5, 0, 1), oma = c(5, 0, 3, 2))
lapply(1:8, function(ii) {
x <- z[, ii, drop = FALSE]
plot(x, xlab = xl[ii], ylab = yl[ii], col = ii, axes = FALSE)
axis(2, las = 1)
box()
if (ii %in% 7:8) {
axis(1)
title(xlab = 'Year', xpd = NA)
}
if (ii %in% 1:2)
title(main = c('Group 1', 'Group 2')[ii], xpd = NA, line = 1)
})

Changing the colour of a calibration plot

I've been generating calibration plots for my cph models of survival data. However, the default setting puts the "ideal" line in grey, which makes it difficult to discriminate. I've tried to specify the colour parameters in plot(), but this obviously only changes the line for "observed". What can I pass in plot() to change the line of the "ideal" line in a calibration plot generated in rms?
Here is one option:
Let's say you have code to create a cph model of survival data and use calibrate from the rms package:
library(rms)
set.seed(1)
n <- 200
d.time <- rexp(n)
x1 <- runif(n)
x2 <- factor(sample(c('a', 'b', 'c'), n, TRUE))
f <- cph(Surv(d.time) ~ pol(x1,2) * x2, x=TRUE, y=TRUE, surv=TRUE,time.inc=1.5)
cal <- calibrate(f, u=1.5, cmethod='KM', m=50, B=20)
This will generate a calibrate object:
R> class(cal)
[1] "calibrate"
If you are using plot on this object, you can discover the function being called in rms:
R> getAnywhere("plot.calibrate.default")
A single object matching ‘plot.calibrate.default’ was found
It was found in the following places
registered S3 method for plot from namespace rms
namespace:rms
with value
function (x, xlab, ylab, xlim, ylim, legend = TRUE, subtitles = TRUE,
cex.subtitles = 0.75, riskdist = TRUE, scat1d.opts = list(nhistSpike = 200),
...)
You can create your own function based on this function, and alter the color of the ideal line. In this case, we make the ideal line green (and revise the text labels to match):
myplot <- function (x, xlab, ylab, subtitles = TRUE, conf.int = TRUE, cex.subtitles = 0.75,
riskdist = TRUE, add = FALSE, scat1d.opts = list(nhistSpike = 200),
par.corrected = NULL, ...)
{
at <- attributes(x)
u <- at$u
units <- at$units
if (length(par.corrected) && !is.list(par.corrected))
stop("par.corrected must be a list")
z <- list(col = "blue", lty = 1, lwd = 1, pch = 4)
if (!length(par.corrected))
par.corrected <- z
else for (n in setdiff(names(z), names(par.corrected))) par.corrected[[n]] <- z[[n]]
predicted <- at$predicted
if ("KM" %in% colnames(x)) {
type <- "stratified"
pred <- x[, "mean.predicted"]
cal <- x[, "KM"]
cal.corrected <- x[, "KM.corrected"]
se <- x[, "std.err"]
}
else {
type <- "smooth"
pred <- x[, "pred"]
cal <- x[, "calibrated"]
cal.corrected <- x[, "calibrated.corrected"]
se <- NULL
}
un <- if (u == 1)
paste(units, "s", sep = "")
else units
if (missing(xlab))
xlab <- paste("Predicted ", format(u), units, "Survival")
if (missing(ylab))
ylab <- paste("Fraction Surviving ", format(u), " ",
un, sep = "")
if (length(se) && conf.int) {
ciupper <- function(surv, d) ifelse(surv == 0, 0, pmin(1,
surv * exp(d)))
cilower <- function(surv, d) ifelse(surv == 0, 0, surv *
exp(-d))
errbar(pred, cal, cilower(cal, 1.959964 * se), ciupper(cal,
1.959964 * se), xlab = xlab, ylab = ylab, type = "b",
add = add, ...)
}
else if (add)
lines(pred, cal, type = if (type == "smooth")
"l"
else "b")
else plot(pred, cal, xlab = xlab, ylab = ylab, type = if (type ==
"smooth")
"l"
else "b", ...)
err <- NULL
if (riskdist && length(predicted)) {
do.call("scat1d", c(list(x = predicted), scat1d.opts))
if (type == "smooth") {
s <- !is.na(pred + cal.corrected)
err <- predicted - approxExtrap(pred[s], cal.corrected[s],
xout = predicted, ties = mean)$y
}
}
if (subtitles && !add) {
if (type == "smooth") {
Col <- par.corrected$col
substring(Col, 1, 1) <- toupper(substring(Col, 1,
1))
title(sub = sprintf("Black: observed Green: ideal\n%s : optimism corrected",
Col), adj = 0, cex.sub = cex.subtitles)
w <- if (length(err))
paste("B=", at$B, " based on ", at$what, "\nMean |error|=",
round(mean(abs(err)), 3), " 0.9 Quantile=",
round(quantile(abs(err), 0.9, na.rm = TRUE),
3), sep = "")
else paste("B=", at$B, "\nBased on ", at$what, sep = "")
title(sub = w, adj = 1, cex.sub = cex.subtitles)
}
else {
title(sub = paste("n=", at$n, " d=", at$d, " p=",
at$p, ", ", at$m, " subjects per group\nGreen: ideal",
sep = ""), adj = 0, cex.sub = cex.subtitles)
title(sub = paste("X - resampling optimism added, B=",
at$B, "\nBased on ", at$what, sep = ""), adj = 1,
cex.sub = cex.subtitles)
}
}
abline(0, 1, col = "green")
if (type == "stratified")
points(pred, cal.corrected, pch = par.corrected$pch,
col = par.corrected$col)
else lines(pred, cal.corrected, col = par.corrected$col,
lty = par.corrected$lty, lwd = par.corrected$lwd)
invisible()
}
Then you can use your custom function with your calibrate object:
myplot(cal)

R Programming other alternatives for plot

I wonder how you can simplify these two :
plot (payroll,wins)
id = identify(payroll, wins,labels = code, n = 5)
plot (payroll,wins)
with(data, text(payroll, wins, labels = code, pos = 1, cex=0.5))
using other alternatives - pch() dan as.numeric()?
Not sure it's easier but you change pch during identification as below (taken from the R-help). Every time you click empty point change to filled-in dot.
# data simulation
data <- data.frame(payroll = rnorm(10), wins = rnorm(10), code = letters[1:10])
identifyPch <- function(x, y = NULL, n = length(x), plot = FALSE, pch = 19, ...)
{
xy <- xy.coords(x, y)
x <- xy$x
y <- xy$y
sel <- rep(FALSE, length(x))
while (sum(sel) < n) {
ans <- identify(x[!sel], y[!sel], labels = which(!sel), n = 1, plot = plot, ...)
if(!length(ans)) {
break
}
ans <- which(!sel)[ans]
points(x[ans], y[ans], pch = pch)
sel[ans] <- TRUE
}
## return indices of selected points
which(sel)
}
if(dev.interactive()) { ## use it
with(data, plot(payroll,wins))
id = with(data, identifyPch(payroll, wins))
}

Using multiple three dot ellipsis in R [duplicate]

This question already has answers here:
How to use R's ellipsis feature when writing your own function?
(5 answers)
Split up `...` arguments and distribute to multiple functions
(4 answers)
Closed 6 years ago.
Is there a way to pass arbitrary arguments to more than one command inside a function? The following function clearly does not work but I hope it explains what I am trying to achieve.
test = function(x = rnorm(20), y = rnorm(20), ..., ---){
plot(x, y, type = "p", ...)
lines(x, y, ---)
}
The goal is to be able to write a function that creates plot with say lines and points and polygon and can take arbitrary arguments for each command and pass them to the respective commands without me having to explicitly specify arguments for each command.
Here is a hackish approach:
.. <- "/////" #or anything which won't be used as a valid parameter
f <- function(...){
arguments <- list(...)
if(.. %in% arguments){
i <- which(arguments == ..)
terms <- unlist(arguments[1:(i-1)])
factors <- unlist(arguments[(i+1):length(arguments)])
c(sum(terms),prod(factors))
}
}
Then, for example,
> f(2,3,4,..,7,8,10)
[1] 9 560
You could obviously extend the idea to multiple ... fields, each delimited with ..
OPTION 1
Function
test = function(x = rnorm(20), y = rnorm(20), plot_options = NA, ...){
if (is.na(plot_options) == FALSE){
eval(parse(text = paste0("plot(x, y, ", plot_options, ")")))
} else {
plot(x, y, type = "n")
}
lines(x, y, ...)
}
USAGE
test()
set.seed(42)
m = rnorm(20)
n = rnorm(20)
test(x = m, y = n,
plot_options = "type = 'p', col = 'red', pch = 19, xlab = 'Test Plot', ylab = 'Y-axis'")
OPTION 2 (#Gregor's Solution)
Function
test2 = function(x = rnorm(20), y = rnorm(20), ..., line_options){
plot(x, y, ...)
if (missing(line_options)) {
lines(x, y)
} else {
do.call(lines, c(list(x = x, y = y), line_options))
}
}
USAGE
par(mfrow = c(2, 2), mar = c(2, 2, 1, 1))
test2(main = 'default')
test2(line_options = list(lty = 2), main = 'line')
test2(col = 'red', main = 'plot')
test2(col = 'red', line_options = list(lty = 2, col = 'blue'), main = 'line and plot')

Resources