Potential bug in stargazer omit.labels - r

There appears to be a bug in version 5.2 of the stargazer package, where the omit.label functionality does not work consistently depending on the order of the included models:
library(stargazer)
library(ggplot2)
as.data.frame(data("midwest"))
fit.1 <- lm(poptotal ~ popadults, data = midwest)
fit.2 <- lm(poptotal ~ popadults + state, data = midwest)
# Works, column listed as "Yes":
stargazer(fit.2, omit = c("state"), omit.labels = c("States"))
# Does not work, both columns listed as "No":
stargazer(fit.1, fit.2, omit = c("state"), omit.labels = c("States"))
# Works, first column "Yes", second "No":
stargazer(fit.2, fit.1, omit = c("state"), omit.labels = c("States"))
Does anyone know of a workaround?

I just manually specified dummies for each column using the add.lines property. For your example:
stargazer(fit.1, fit.2, omit = c("state"),
add.lines = list(
c("States", "No", "Yes")
)
)

Here's one approach, using a wrapper function to generate the add.lines values automatically. This also has (to me) a more natural syntax than having separate "omit" and "omit.labels" arguments. Plus, you can omit variables without having an indicator:
gazer<- function(...,indicate=NULL, staroptions=NULL){
dots <- list(...)
if (is.null(indicate)==FALSE) {
indicate.lines<-sapply(names(indicate), function(indic)
ifelse(
sapply(dots,function(x) length(grep(indic,names(coef(x))))>0
) ,"Yes","No"
)
)
indicate.lines<-rbind(unlist(indicate),indicate.lines)
staroptions$omit <- c(staroptions$omit,names(indicate))
staroptions$add.lines <- c(split(indicate.lines,rep(1:ncol(indicate.lines), each=nrow(indicate.lines))),staroptions$add.lines)
}
do.call(stargazer,c(dots,staroptions))
}
You provide a list of names and labels in indicate() and all your other stargazer options in a list in staroptions
For your example:
gazer(fit.1,fit.2,indicate=list(state="State"))

Related

R for loop going wrong when applied to function

I am trying to work on a for loop to make running a function I've developed more efficient.
However, when I put it in a for loop, it is overwriting columns that it should not be and returning incorrect results.
Edit: The error is that in the resulting dataframe MiSeq_Bord_Outliers_table0, the resulting columns containing label Outlier_type is returning incorrect outputs.
As per the Outlier_Hunter function, when Avg_Trim_Cov and S2_Total_Read_Pairs_Processed are below their
respective Q1 Thresholds their respective Outlier_type columns should read "Lower_Outlier", if between Q1 & Q3 Threshold, "Normal" and if above Q3 Threshold then "Upper_outlier". But when the for loop is executed, only "Upper_outlier" is shown in the Outlier_type columns.
Edit: The inputs have been simplified and tested on the different computer with a clean console. If there were any artifacts there before, they should have been eliminated now, and there should be no errors here now. It is important to run the outlier_results_1var part first. If you test run this code and get errors, please let me know which part failed.
Edit: MiSeq_Bord_Outliers_table0_error is the error that is being reproduced. This is the error result, not an input.
Can someone please tell me why is it returning these incorrect results and what I can do to fix it? I will upload the relevant code below. Or is there another way to do this without a for loop?
#libraries used
library(tidyverse)
library(datapasta)
library(data.table)
library(janitor)
library(ggpubr)
library(labeling)
#2.) Outlier_Hunter Function
#Function to Generate the Outlier table
#Outlier Hunter function takes 4 arguments: the dataset, column/variable of interest,
#Q1 and Q3. Q1 and Q3 are stored in the results of Quartile_Hunter.
#Input ex: MiSeq_Bord_final_report0, Avg_Trim_Cov, MiSeq_Bord_Quartiles_ATC$First_Quartile[1], MiSeq_Bord_Quartiles_ATC$Third_Quartile[1]
#Usage ex: Outlier_Hunter(MiSeq_Bord_final_report0, Avg_Trim_Cov,
#MiSeq_Bord_Quartiles_ATC$First_Quartile[1], MiSeq_Bord_Quartiles_ATC$Third_Quartile[1])
#Here is the Function to get the Outlier Table
Outlier_Hunter <- function(Platform_Genus_final_report0, my_col, Q1, Q3) {
#set up and generalize the variable name you want to work with
varname <- enquo(my_col)
#print(varname) #just to see what variable the function is working with
#get the outliers
Platform_Genus_Variable_Outliers <- Platform_Genus_final_report0 %>%
select(ReadID, Platform, Genus, !!varname) %>%
#Tell if it is an outlier, and if so, what kind of outlier
mutate(
Q1_Threshold = Q1,
Q3_Threshold = Q3,
Outlier_type =
case_when(
!!varname < Q1_Threshold ~ "Lower_Outlier",
!!varname >= Q1_Threshold & !!varname <= Q3_Threshold ~ "Normal",
!!varname > Q3_Threshold ~ "Upper_Outlier"
)
)
}
#MiSeq_Bord_Quartiles entries
MiSeq_Bord_Quartiles <- data.frame(
stringsAsFactors = FALSE,
row.names = c("Avg_Trim_Cov", "S2_Total_Read_Pairs_Processed"),
Platform = c("MiSeq", "MiSeq"),
Genus = c("Bord", "Bord"),
Min = c(0.03, 295),
First_Quartile = c(80.08, 687613.25),
Median = c(97.085, 818806.5),
Third_Quartile = c(121.5625, 988173.75),
Max = c(327.76, 2836438)
)
#Remove the hashtag below to test if what you have is correct
#datapasta::df_paste(head(MiSeq_Bord_Quartiles, 5))
#dataset entry
MiSeq_Bord_final_report0 <- data.frame(
stringsAsFactors = FALSE,
ReadID = c("A005_20160223_S11_L001","A050_20210122_S6_L001",
"A073_20210122_S7_L001",
"A076_20210426_S11_L001",
"A080_20210426_S12_L001"),
Platform = c("MiSeq","MiSeq",
"MiSeq","MiSeq","MiSeq"),
Genus = c("Bordetella",
"Bordetella","Bordetella",
"Bordetella","Bordetella"),
Avg_Raw_Read_bp = c(232.85,241.09,
248.54,246.99,248.35),
Avg_Trimmed_Read_bp = c(204.32,232.6,
238.56,242.54,244.91),
Avg_Trim_Cov = c(72.04,101.05,
92.81,41.77,54.83),
Genome_Size_Mb = c(4.1, 4.1, 4.1, 4.1, 4.1),
S1_Input_reads = c(1450010L,
1786206L,1601542L,710792L,925462L),
S1_Contaminant_reads = c(12220L,6974L,
7606L,1076L,1782L),
S1_Total_reads_removed = c(12220L,6974L,
7606L,1076L,1782L),
S1_Result_reads = c(1437790L,
1779232L,1593936L,709716L,923680L),
S2_Read_Pairs_Written = c(712776L,882301L,
790675L,352508L,459215L),
S2_Total_Read_Pairs_Processed = c(718895L,889616L,
796968L,354858L,461840L)
)
MiSeq_Bord_final_report0
#Execution for 1 variable
outlier_results_1var <- Outlier_Hunter(MiSeq_Bord_final_report0, Avg_Trim_Cov,
MiSeq_Bord_Quartiles$First_Quartile[1], MiSeq_Bord_Quartiles$Third_Quartile[1])
#Now do it with a for loop
col_var_outliers <- row.names(MiSeq_Bord_Quartiles)
#col_var_outliers <- c("Avg_Trim_Cov", "S2_Total_Read_Pairs_Processed")
#change line above to change input of variables few into Outlier Hunter Function
outlier_list_MiSeq_Bord <- list()
for (y in col_var_outliers) {
outlier_results0 <- Outlier_Hunter(MiSeq_Bord_final_report0, y, MiSeq_Bord_Quartiles[y, "First_Quartile"], MiSeq_Bord_Quartiles[y, "Third_Quartile"])
outlier_results1 <- outlier_results0
colnames(outlier_results1)[5:7] <- paste0(y, "_", colnames(outlier_results1[, c(5:7)]), sep = "")
outlier_list_MiSeq_Bord[[y]] <- outlier_results1
}
MiSeq_Bord_Outliers_table0 <- reduce(outlier_list_MiSeq_Bord, left_join, by = c("ReadID", "Platform", "Genus"))
#the columns containing label Outlier_type is where the code goes wrong.
#When Avg_Trim_Cov and S2_Total_Read_Pairs_Processed are below their
#respective Q1 Thresholds their respective Outlier_type columns should read
#"Lower_Outlier", if between Q1 & Q3 Threshold, "Normal" and if above Q3
#Threshold then "Upper_outlier". But when the for loop is executed, only
"Upper_outlier" is shown in the Outlier_type columns.
datapasta::df_paste(head(MiSeq_Bord_Outliers_table0, 5))
MiSeq_Bord_Outliers_table0_error <- data.frame(
stringsAsFactors = FALSE,
ReadID = c("A005_20160223_S11_L001",
"A050_20210122_S6_L001",
"A073_20210122_S7_L001","A076_20210426_S11_L001",
"A080_20210426_S12_L001"),
Platform = c("MiSeq",
"MiSeq","MiSeq","MiSeq",
"MiSeq"),
Genus = c("Bordetella","Bordetella","Bordetella",
"Bordetella","Bordetella"),
Avg_Trim_Cov = c(72.04,
101.05,92.81,41.77,54.83),
Avg_Trim_Cov_Q1_Threshold = c(80.08,
80.08,80.08,80.08,80.08),
Avg_Trim_Cov_Q3_Threshold = c(121.5625,
121.5625,121.5625,121.5625,
121.5625),
Avg_Trim_Cov_Outlier_type = c("Upper_Outlier","Upper_Outlier",
"Upper_Outlier","Upper_Outlier",
"Upper_Outlier"),
S2_Total_Read_Pairs_Processed = c(718895L,
889616L,796968L,354858L,
461840L),
S2_Total_Read_Pairs_Processed_Q1_Threshold = c(687613.25,
687613.25,687613.25,
687613.25,687613.25),
S2_Total_Read_Pairs_Processed_Q3_Threshold = c(988173.75,
988173.75,988173.75,
988173.75,988173.75),
S2_Total_Read_Pairs_Processed_Outlier_type = c("Upper_Outlier","Upper_Outlier",
"Upper_Outlier","Upper_Outlier",
"Upper_Outlier")
)
For use in a loop like you do, it would be more useful to write your Outlier_Hunter() function to take the target column as a character string rather than an expression.
To do that, try replacing all instances of !!varname in your function with .data[[my_col]], and remove the enquo() line altogether.
Note that with these changes, you also need to change how you call the function when you don't have the column name in a variable. For example, your single execution would become:
Outlier_Hunter(
MiSeq_Bord_final_report0,
"Avg_Trim_Cov",
MiSeq_Bord_Quartiles$First_Quartile[1],
MiSeq_Bord_Quartiles$Third_Quartile[1]
)
For more info about programming with tidy evaluation functions, you may find this rlang vignette useful.

Problems following a code example. InformationValue::Woe

I'm learning new feature selection methods with this entry of a blog:
https://www.machinelearningplus.com/machine-learning/feature-selection/
Point 9. And I stumbled upon some problems. First is the CV, which I have solved.
library(InformationValue)
adult <- read.csv('https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data',
sep = ',', fill = F, strip.white = T,stringsAsFactors = FALSE)
colnames(adult) <- c('age', 'WORKCLASS', 'fnlwgt', 'EDUCATION',
'educatoin_num', 'MARITALSTATUS', 'OCCUPATION', 'RELATIONSHIP', 'RACE', 'SEX',
'capital_gain', 'capital_loss', 'hours_per_week', 'NATIVECOUNTRY', 'ABOVE50K')
inputData <- adult
print(head(inputData))
But then I can't solve the next chunk
# Choose Categorical Variables to compute Info Value.
cat_vars <- c ("WORKCLASS", "EDUCATION", "MARITALSTATUS", "OCCUPATION", "RELATIONSHIP", "RACE", "SEX", "NATIVECOUNTRY") # get all categorical variables
# Init Output
df_iv <- data.frame(VARS=cat_vars, IV=numeric(length(cat_vars)), STRENGTH=character(length(cat_vars)), stringsAsFactors = F) # init output dataframe
# Get Information Value for each variable
for (factor_var in factor_vars){
df_iv[df_iv$VARS == factor_var, "IV"] <- InformationValue::IV(X=inputData[, factor_var], Y=inputData$ABOVE50K)
df_iv[df_iv$VARS == factor_var, "STRENGTH"] <- attr(InformationValue::IV(X=inputData[, factor_var], Y=inputData$ABOVE50K), "howgood")
}
# Sort
df_iv <- df_iv[order(-df_iv$IV), ]
df_iv
And I keep getting 0 values in IV and, of course, Not predictive in the column of the dataframe.
I've tried to do a
factor_vars=cat_vars
But it doesn't seems to work and quite frankly I can't figure out why this doesn't work.
Just solved it. In first instance the argument of stringsAsFactors = FALSE its unnecesary, since we need factors.
Then, consulting the IV function and looking at the summary of the dataset, i noticed that despise its a factor the function requieres a numeric input, the function cannot extract its "value" (level). So we must work arround it.
as.numeric(inputData$ABOVE50K)
"Solves it" Although maybe i should change the values since it gives 1-2 instead of the classic 0-1 response. Im working on it.
I think theres got to be an easiest solution, but:
levels(inputData$ABOVE50K)
inputData$ABOVE50K2 = as.numeric(inputData$ABOVE50K)
inputData$ABOVE50K3= ifelse(inputData$ABOVE50K2 ==1,0, ifelse(inputData$ABOVE50K2==2,1,NA))
inputData$ABOVE50K3 <- factor(inputData$ABOVE50K3)
And the output is the same. So there is no need to change the levels to 0-1.
# Choose Categorical Variables to compute Info Value.
cat_vars <- c ("WORKCLASS", "EDUCATION", "MARITALSTATUS", "OCCUPATION", "RELATIONSHIP", "RACE", "SEX", "NATIVECOUNTRY") # get all categorical variables
factor_vars= cat_vars
# Init Output
df_iv <- data.frame(VARS=cat_vars, IV=numeric(length(cat_vars)), STRENGTH=character(length(cat_vars)), stringsAsFactors = F) # init output dataframe
# Get Information Value for each variable
for (factor_var in factor_vars){
df_iv[df_iv$VARS == factor_var, "IV"] <- InformationValue::IV(X=inputData[, factor_var], Y=inputData$ABOVE50K3)
df_iv[df_iv$VARS == factor_var, "STRENGTH"] <- attr(InformationValue::IV(X=inputData[, factor_var], Y=inputData$ABOVE50K3), "howgood")
}
# Sort
df_iv <- df_iv[order(-df_iv$IV), ]
df_iv

Properly calling variable name when creating multiple Benford plots

I am creating Benford plots for all the numeric variables in my dataset. https://en.wikipedia.org/wiki/Benford%27s_law
Running a single variable
#install.packages("benford.analysis")
library(benford.analysis)
plot(benford(iris$Sepal.Length))
Looks great. And the legend says "Dataset: iris$Sepal.Length", perfect!.
Using apply to run 4 variables,
apply(iris[1:4], 2, function(x) plot(benford(x)))
Creates four plots, however, each plot's legend says "Dataset: x"
I attempted to use a for loop,
for (i in colnames(iris[1:4])){
plot(benford(iris[[i]]))
}
This creates four plots, but now the legends says "Dataset: iris[[i]]". And I would like the name of the variable on each chart.
I tried a different loop, hoping to get titles with an evaluated parsed string like "iris$Sepal.Length":
for (i in colnames(iris[1:4])){
plot(benford(eval(parse(text=paste0("iris$", i)))))
}
But now the legend says "Dataset: eval(parse(text=paste0("iris$", i)))".
AND, Now I've run into the infamous eval(parse(text=paste0( (eg: How to "eval" results returned by "paste0"? and R: eval(parse(...)) is often suboptimal )
I would like labels such as "Dataset: iris$Sepal.Length" or "Dataset: Sepal.Length". How can I create multiple plots with meaningfully variable names in the legend?
This is happening because of the first line within the benford function=:
benford <- function(data, number.of.digits = 2, sign = "positive", discrete=TRUE, round=3){
data.name <- as.character(deparse(substitute(data)))
Source: https://github.com/cran/benford.analysis/blob/master/R/functions-new.R
data.name is then used to name your graph. Whatever variable name or expression you pass to the function will unfortunately be caught by the deparse(substitute()) call, and will be used as the name for your graph.
One short-term solution is to copy and rewrite the function:
#install.packages("benford.analysis")
library(benford.analysis)
#install.packages("data.table")
library(data.table) # needed for function
# load hidden functions into namespace - needed for function
r <- unclass(lsf.str(envir = asNamespace("benford.analysis"), all = T))
for(name in r) eval(parse(text=paste0(name, '<-benford.analysis:::', name)))
benford_rev <- function{} # see below
for (i in colnames(iris[1:4])){
plot(benford_rev(iris[[i]], data.name = i))
}
This has negative side effects of:
Not being maintainable with package revisions
Fills your GlobalEnv with normally hidden functions in the package
So hopefully someone can propose a better way!
benford_rev <- function(data, number.of.digits = 2, sign = "positive", discrete=TRUE, round=3, data.name = as.character(deparse(substitute(data)))){ # changed
# removed line
benford.digits <- generate.benford.digits(number.of.digits)
benford.dist <- generate.benford.distribution(benford.digits)
empirical.distribution <- generate.empirical.distribution(data, number.of.digits,sign, second.order = FALSE, benford.digits)
n <- length(empirical.distribution$data)
second.order <- generate.empirical.distribution(data, number.of.digits,sign, second.order = TRUE, benford.digits, discrete = discrete, round = round)
n.second.order <- length(second.order$data)
benford.dist.freq <- benford.dist*n
## calculating useful summaries and differences
difference <- empirical.distribution$dist.freq - benford.dist.freq
squared.diff <- ((empirical.distribution$dist.freq - benford.dist.freq)^2)/benford.dist.freq
absolute.diff <- abs(empirical.distribution$dist.freq - benford.dist.freq)
### chi-squared test
chisq.bfd <- chisq.test.bfd(squared.diff, data.name)
### MAD
mean.abs.dev <- sum(abs(empirical.distribution$dist - benford.dist)/(length(benford.dist)))
if (number.of.digits > 3) {
MAD.conformity <- NA
} else {
digits.used <- c("First Digit", "First-Two Digits", "First-Three Digits")[number.of.digits]
MAD.conformity <- MAD.conformity(MAD = mean.abs.dev, digits.used)$conformity
}
### Summation
summation <- generate.summation(benford.digits,empirical.distribution$data, empirical.distribution$data.digits)
abs.excess.summation <- abs(summation - mean(summation))
### Mantissa
mantissa <- extract.mantissa(empirical.distribution$data)
mean.mantissa <- mean(mantissa)
var.mantissa <- var(mantissa)
ek.mantissa <- excess.kurtosis(mantissa)
sk.mantissa <- skewness(mantissa)
### Mantissa Arc Test
mat.bfd <- mantissa.arc.test(mantissa, data.name)
### Distortion Factor
distortion.factor <- DF(empirical.distribution$data)
## recovering the lines of the numbers
if (sign == "positive") lines <- which(data > 0 & !is.na(data))
if (sign == "negative") lines <- which(data < 0 & !is.na(data))
if (sign == "both") lines <- which(data != 0 & !is.na(data))
#lines <- which(data %in% empirical.distribution$data)
## output
output <- list(info = list(data.name = data.name,
n = n,
n.second.order = n.second.order,
number.of.digits = number.of.digits),
data = data.table(lines.used = lines,
data.used = empirical.distribution$data,
data.mantissa = mantissa,
data.digits = empirical.distribution$data.digits),
s.o.data = data.table(second.order = second.order$data,
data.second.order.digits = second.order$data.digits),
bfd = data.table(digits = benford.digits,
data.dist = empirical.distribution$dist,
data.second.order.dist = second.order$dist,
benford.dist = benford.dist,
data.second.order.dist.freq = second.order$dist.freq,
data.dist.freq = empirical.distribution$dist.freq,
benford.dist.freq = benford.dist.freq,
benford.so.dist.freq = benford.dist*n.second.order,
data.summation = summation,
abs.excess.summation = abs.excess.summation,
difference = difference,
squared.diff = squared.diff,
absolute.diff = absolute.diff),
mantissa = data.table(statistic = c("Mean Mantissa",
"Var Mantissa",
"Ex. Kurtosis Mantissa",
"Skewness Mantissa"),
values = c(mean.mantissa = mean.mantissa,
var.mantissa = var.mantissa,
ek.mantissa = ek.mantissa,
sk.mantissa = sk.mantissa)),
MAD = mean.abs.dev,
MAD.conformity = MAD.conformity,
distortion.factor = distortion.factor,
stats = list(chisq = chisq.bfd,
mantissa.arc.test = mat.bfd)
)
class(output) <- "Benford"
return(output)
}
I have just updated the package (GitHub version) to allow for a user supplied name.
Now the function has a new parameter called data.name in which you can provide a character vector with the name of the data and override the default. Thus, for your example you can simply run the following code.
First install the GitHub version (I will submit this version to CRAN soon).
devtools::install_github("carloscinelli/benford.analysis") # install new version
Now you can provide the name of the data inside the for loop:
library(benford.analysis)
for (i in colnames(iris[1:4])){
plot(benford(iris[[i]], data.name = i))
}
And all the plots will have the correct naming as you wish (below).
Created on 2019-08-10 by the reprex package (v0.2.1)

Resolving a variable value within a function call R

I have a data frame defined as follows:
model_comp
logLik IC Lack of fit Res var
W2.4 -353.2939 716.5878 1.361885e-01 26.80232
baro5 -353.2936 718.5871 NaN 27.04363
LL.5 -353.2940 718.5880 NaN 27.04384
LL.3 -360.3435 728.6871 3.854799e-04 29.99842
W1.3 -360.3842 728.7684 3.707592e-04 30.01948
W1.4 -360.3129 730.6258 7.850947e-05 30.25028
LL.4 -360.3170 730.6340 7.818416e-05 30.25243
The best model fit is the one with the lowest IC (information criteria). I want to use the best fit to do some plotting etc... So I created:
> bestmodel <- noquote(paste0(as.name(rownames(model_comp[which.min(model_comp$IC),])),"()"))
> bestmodel
[1] W2.4()
I want to use the W2.4() as a function call to a the DRC package.
For example this call works when manually specified:
drm(y~x,logDose = 10, fct=W2.4())
I'm trying to use the value in bestmodel instead to do something like:
drm(y~x,logDose = 10,fct = as.formula(paste(bestmodel)))
I've tried all the options given here with no success. I've messed with as.formula(), noquote(), as.name() with no success.
I also tried as.name(paste0(as.name(bestmodel),"()")) where I didn't add on the "()" in the bestmodel definition above. Still no dice.
model_comp <- structure(list(logLik = c(-353.293902612472, -353.293568997018,
-353.294024776211, -360.343530770823, -360.384220907907, -360.312897918459,
-360.317018443052), IC = c(716.587805224944, 718.587137994035,
718.588049552421, 728.687061541646, 728.768441815814, 730.625795836919,
730.634036886105), `Lack of fit` = c(0.136188459104035, NaN,
NaN, 0.000385479884900107, 0.000370759187117765, 7.85094742623572e-05,
7.81841606352332e-05), `Res var` = c(26.8023196097934, 27.0436263934882,
27.0438389102235, 29.9984226526044, 30.0194755526501, 30.2502847248304,
30.2524338881051)), .Names = c("logLik", "IC", "Lack of fit",
"Res var"), row.names = c("W2.4", "baro5", "LL.5", "LL.3", "W1.3",
"W1.4", "LL.4"), class = "data.frame")
Just using noquote() not to draw the quotes around a string doesn't turn a character value into an executable piece of code. There is a big different in R between a character value an a symbol or function call. You can't really just replace one with the other.
So let's say you have extracted the character value from the rownames
x <- "W2.4"
This is basically the string version of the function you want. You can get the value of a symbol (in this case the function W2.4 from the drc:package) from its string name with get(). So you can call
drm(y~x, logDose = 10, fct = get(x)())
Note the extra parenthesis. The get(x)-call returns the W2.4 function, and the second set of parenthesis calls that function returned by get().
Using the ryegrass dataset that comes with the drc package, we can see that these two lines return the same thing
drm(rootl ~ conc, data = ryegrass, fct = W2.4())
drm(rootl ~ conc, data = ryegrass, fct = get(x)())

r - taking difference of two xyplots?

I have several xyplot objects that I have saved as .RDATA files. I am now interested in being able to look at their differences. I have tried things like
plot1-plot2
but this does not work (I get the "non-numeric argument to binary operator error).
I would also be able to do this if I knew how to extract the timeseries data stored within the lattice xyplot object, but I have looked everywhere and can't figure out how to do this either.
Any suggestions?
EDIT:
just to make it perfectly clear what I mean for MrFlick, by "taking the difference of two plots" I mean plotting the elementwise difference of the timeseries from each plot, assuming it exists (i.e. assuming that the plots have the same domain). Graphically,
I might want to take the following two plots, stored as xyplot objects:
and end up with something that looks like this:
-Paul
Here is a little function I wrote to plot the difference of two xyplots:
getDifferencePlot = function(plot1,plot2){
data1 = plot1$panel.args
data2 = plot2$panel.args
len1 = length(data1)
len2 = length(data2)
if (len1!=len2)
stop("plots do not have the same number of panels -- cannot take difference")
if (len1>1){
plotData = data.table(matrix(0,0,4))
setNames(plotData,c("x","y1","y2","segment"))
for (i in 1:len1){
thing1 = data.table(cbind(data1[[i]]$x,data1[[i]]$y))
thing2 = data.table(cbind(data2[[i]]$x,data2[[i]]$y))
finalThing = merge(thing1, thing2,by = "V1")
segment = rep(i,nrow(finalThing))
finalThing = cbind(finalThing,segment)
setNames(finalThing,c("x","y1","y2","segment"))
plotData = rbind(plotData,finalThing)
}
}
if (len1==1){
plotData = data.table(matrix(0,0,3))
setNames(plotData,c("x","y1","y2"))
thing1 = data.table(cbind(data1[[i]]$x,data1[[i]]$y))
thing2 = data.table(cbind(data2[[i]]$x,data2[[i]]$y))
plotData = merge(thing1, thing2,by = "V1")
}
plotData$difference = plotData$y1-plotData$y2
if (len1==1)
diffPlot = xyplot(difference~x,plotData,type = "l",auto.key = T)
if (len1>1)
diffPlot = xyplot(difference~x|segment,plotData,type = "l",auto.key = T)
return(diffPlot)
}

Resources