Unused arguments in R error - r

I am new to R , I am trying to run example which is given in "rebmix-help pdf". It use galaxy dataset and here is the code
library(rebmix)
devAskNewPage(ask = TRUE)
data("galaxy")
write.table(galaxy, file = "galaxy.txt", sep = "\t",eol = "\n", row.names = FALSE, col.names = FALSE)
REBMIX <- array(list(NULL), c(3, 3, 3))
Table <- NULL
Preprocessing <- c("histogram", "Parzen window", "k-nearest neighbour")
InformationCriterion <- c("AIC", "BIC", "CLC")
pdf <- c("normal", "lognormal", "Weibull")
K <- list(7:20, 7:20, 2:10)
for (i in 1:3) {
for (j in 1:3) {
for (k in 1:3) {
REBMIX[[i, j, k]] <- REBMIX(Dataset = "galaxy.txt",
Preprocessing = Preprocessing[k], D = 0.0025,
cmax = 12, InformationCriterion = InformationCriterion[j],
pdf = pdf[i], K = K[[k]])
if (is.null(Table))
Table <- REBMIX[[i, j, k]]$summary
else Table <- merge(Table, REBMIX[[i, j,k]]$summary, all = TRUE, sort = FALSE)
}
}
}
It is giving me error ERROR:
unused argument (InformationCriterion = InformationCriterion[j])
Plz help

I'm running R 3.0.2 (Windows) and the library rebmix defines a function REBMIX where InformationCriterion is not listed as a named argument, but Criterion.
Brief invoke REBMIX as :
REBMIX[[i, j, k]] <- REBMIX(Dataset = "galaxy.txt",
Preprocessing = Preprocessing[k], D = 0.0025,
cmax = 12, Criterion = InformationCriterion[j],
pdf = pdf[i], K = K[[k]])

It looks as though there have been substantial changes to the rebmix package since the example mentioned in the OP was created. Among the most noticable changes is the use of S4 classes.
There's also an updated demo in the rebmix package using the galaxy data (see demo("rebmix.galaxy"))
To get the above example to produce results (Note: I am not familiar with this package or the rebmix algorithm!!!):
Change the argument to Criterion as mentioned by #Giupo
Use the S4 slot access operator # instead of $
Don't name the results object REDMIX because that's already the function name
library(rebmix)
data("galaxy")
## Don't re-name the REBMIX object!
myREBMIX <- array(list(NULL), c(3, 3, 3))
Table <- NULL
Preprocessing <- c("histogram", "Parzen window", "k-nearest neighbour")
InformationCriterion <- c("AIC", "BIC", "CLC")
pdf <- c("normal", "lognormal", "Weibull")
K <- list(7:20, 7:20, 2:10)
for (i in 1:3) {
for (j in 1:3) {
for (k in 1:3) {
myREBMIX[[i, j, k]] <- REBMIX(Dataset = list(galaxy),
Preprocessing = Preprocessing[k], D = 0.0025,
cmax = 12, Criterion = InformationCriterion[j],
pdf = pdf[i], K = K[[k]])
if (is.null(Table)) {
Table <- myREBMIX[[i, j, k]]#summary
} else {
Table <- merge(Table, myREBMIX[[i, j,k]]#summary, all = TRUE, sort = FALSE)
}
}
}
}

I guess this is late. But I encountered a similar problem just a few minutes ago. And I realized the real scenario that you may face when you got this kind of error msg... It's just the version conflict.
You may use a different version of the R package from the tutorial, thus the argument names could be different between what you are running and what the real code use.
So please check the version first before you try to manually edit the file. Also, it happens that your old version package is still in the path and it overrides the new one. This was exactly what I had... since I manually installed the old and new version separately...

Related

R: Collect All Function Definitions from a Library

I am working with R. I found this previous post on stackoverflow which shows how to get a "list" of all functions that belong to a given library:
How to find all functions in an R package?
For example:
#load desired library
library(ParBayesianOptimization)
#find out all functions from this library
getNamespaceExports("ParBayesianOptimization")
[1] "addIterations" "getLocalOptimums" "bayesOpt" "getBestPars" "changeSaveFile" "updateGP"
The above code tells me the name of all functions that are used in the "ParBayesianOptimization" library. From here, I could manually inspect each one of these functions - for example:
# manually inspect any one of these functions
getAnywhere(bayesOpt)
A single object matching ‘bayesOpt’ was found
It was found in the following places
package:ParBayesianOptimization
namespace:ParBayesianOptimization
with value
#function stats here
function (FUN, bounds, saveFile = NULL, initGrid, initPoints = 4,
iters.n = 3, iters.k = 1, otherHalting = list(timeLimit = Inf,
minUtility = 0), acq = "ucb", kappa = 2.576, eps = 0,
parallel = FALSE, gsPoints = pmax(100, length(bounds)^3),
convThresh = 1e+08, acqThresh = 1, errorHandling = "stop",
plotProgress = FALSE, verbose = 1, ...)
{
startT <- Sys.time()
optObj <- list()
etc etc etc ...
saveFile = saveFile, verbose = verbose, ...)
return(optObj)
}
#function ends here
<bytecode: 0x000001cbb4145db0>
<environment: namespace:ParBayesianOptimization>
Goal : Is it possible to take each one of these functions and create a notepad file with their full definitions?
Something that would look like this:
My attempt:
I thought I could first make an "object" in R that contained all the functions found in this library:
library(plyr)
a = getNamespaceExports("ParBayesianOptimization")
my_list = do.call("rbind.fill", lapply(a, as.data.frame))
X[[i]]
1 addIterations
2 getLocalOptimums
3 bayesOpt
4 getBestPars
5 changeSaveFile
6 updateGP
Then, I could manually create an "assignment arrow":
header_text <- rep("<-")
Then, "paste" this to each function name:
combined_list <- as.character(paste(my_list, header_text, sep = ""))
But this is not looking correct:
combined_list
[1] "c(\"addIterations\", \"getLocalOptimums\", \"bayesOpt\", \"getBestPars\", \"changeSaveFile\", \"updateGP\")<- "
The goal is to automate the process of manually copying/pasting :
function_1 = getAnywhere("first function ParBayesianOptimization library")
function_2 = getAnywhere("second function ParBayesianOptimization library")
etc
final_list = c(function_1, function_2 ...)
And removing the generic description from each function:
A single object matching ‘bayesOpt’ was found
It was found in the following places
package:ParBayesianOptimization
namespace:ParBayesianOptimization
with value
In the end, if I were to "call" the final_list object, all the functions from this library should get recreated and reassigned.
Can someone please show me how to do this?
Thanks
You can use the dump function for this
pkg <- "ParBayesianOptimization"
dump(getNamespaceExports(pkg), file="funs.R", envir = asNamespace(pkg))
This code will help you write the function definitions of all the functions in a library to a text file.
fn_list <- getNamespaceExports("ParBayesianOptimization")
for(i in seq_along(fn_list)) {
header <- paste('\n\n####Function', i, '\n\n\n')
cat(paste0(header, paste0(getAnywhere(fn_list[i]), collapse = '\n'), '\n\n'),
file = 'function.txt', append = TRUE)
}

Weird characters appearing in the plot legend when using DoHeatmap

I was using Seurat to analyse single cell RNA-seq data and I managed to draw a heatmap plot with DoHeatmap() after clustering and marker selection, but got a bunch of random characters appearing in the legend. They are random characters as they will change every time you run the code. I was worrying over it's something related to my own dataset, so I then tried the test Seurat object 'ifnb' but still got the same issue (see the red oval in the example plot).
example plot
I also tried importing the Seurat object in R in the terminal (via readRDS) and ran the plotting function, but got the same issue there, so it's not a Rstudio thing.
Here are the codes I ran:
'''
library(Seurat)
library(SeuratData)
library(patchwork)
InstallData("ifnb")
LoadData("ifnb")
ifnb.list <- SplitObject(ifnb, split.by = "stim")
ifnb.list <- lapply(X = ifnb.list, FUN = function(x) {
x <- NormalizeData(x)
x <- FindVariableFeatures(x, selection.method = "vst", nfeatures = 2000)
})
features <- SelectIntegrationFeatures(object.list = ifnb.list)
immune.anchors <- FindIntegrationAnchors(object.list = ifnb.list, anchor.features = features)
immune.combined <- IntegrateData(anchorset = immune.anchors)
immune.combined <- ScaleData(immune.combined, verbose = FALSE)
immune.combined <- RunPCA(immune.combined, npcs = 30, verbose = FALSE)
immune.combined <- RunUMAP(immune.combined, reduction = "pca", dims = 1:30)
immune.combined <- FindNeighbors(immune.combined, reduction = "pca", dims = 1:30)
immune.combined <- FindClusters(immune.combined, resolution = 0.5)
DefaultAssay(immune.combined) <- 'RNA'
immune_markers <- FindAllMarkers(immune.combined, latent.vars = "stim", test.use = "MAST", assay = 'RNA')
immune_markers %>%
group_by(cluster) %>%
top_n(n = 10, wt = avg_log2FC) -> top10_immune
DoHeatmap(immune.combined, slot = 'data',features = top10_immune$gene, group.by = 'stim', assay = 'RNA')
'''
Does anyone have any idea how to solve this issue other than reinstalling everything?
I have been having the same issue myself and while I have solved it by not needing the legend, I think you could use this approach and use a similar solution:
DoHeatmap(immune.combined, slot = 'data',features = top10_immune$gene, group.by = 'stim', assay = 'RNA') +
scale_color_manual(
values = my_colors,
limits = c('CTRL', 'STIM'))
Let me know if this works! It doesn't solve the source of the odd text values but it does the job! If you haven't already, I would recommend creating a forum question on the Seurat forums to see where these characters are coming from!
When I use seurat4.0, I met the same problem.
While I loaded 4.1, it disappeared

Exporting Seurat Object Data by Cluster

I'm using Seurat to perform a single cell analysis and am interested in exporting the data for all cells within each of my clusters. I tried to use the below code but have had no success.
My Seurat object is called Patients. I also attached a screenshot of my Seurat object. I am looking to extract all the clusters (i.e. Ductal1, Macrophage1, Macrophage2, etc...)
meta.data.cluster <- unique(x = Patients#meta.data$active.ident)
for(group in meta.data.cluster) {
group.cells <- WhichCells(object = Patients, subset.name = "active.ident" , accept.value = group)
data_to_write_out <- as.data.frame(x = as.matrix(x = Patients#raw.data[, group.cells]))
write.csv(x = data_to_write_out, row.names = TRUE, file = paste0(save_dir,"/",group, "_cluster_outfile.csv"))
}
I am new to R and coding so any help is greatly appreciated! :)
It doesn't work because there is no active.ident column under your metadata. For example if we use an example dataset like yours and set the ident:
library(Seurat)
M = matrix(rnbinom(5000,mu=20,size=1),ncol=50)
colnames(M) = paste0("P",1:50)
rownames(M) = paste0("gene",1:100)
Patients = CreateSeuratObject(M)
Patients$grp = sample(c("Ductal1","Macrophage1","Macrophage2"),50,replace=TRUE)
Idents(Patients) = Patients$grp
You can see this line of code gives you no value:
meta.data.cluster <- unique(x = Patients#meta.data$active.ident)
meta.data.cluster
NULL
You can do:
meta.data.cluster <- unique(Idents(Patients))
for(group in meta.data.cluster) {
group.cells <- WhichCells(object = Patients, idents = group)
data_to_write_out <- as.data.frame(GetAssayData(Patients,slot = 'counts')[,group.cells])
write.csv(data_to_write_out, row.names = TRUE, file = paste0(save_dir,"/",group, "_cluster_outfile.csv"))
}
Note also you can get the counts out using GetAssayData . You can subset one group and write out like this:
wh <- which(Idents(Patients) =="Macrophage1" )
da = as.data.frame(GetAssayData(Patients,slot = 'counts')[,wh])
write.csv(da,...)

%dopar% safe way of write to csv inside foreach loop

[EDITED]
It is a general question: I have seen some posts saying that it is not a good idea to use foreach and write.csv inside a foreach loop due to different cores trying to write in the file at the same time, resulting in missing results. Still, I need to write in an external file inside the parallel loop to get my output (500000+ rows and 10+ columns). Otherwise, it crushes for memory issues. So, I would like to know if there is a more safe way to write a result file within a foreach loop.
I appreciate any help on this
I am adding some more info and a much more simple code and data than what I actually have.
Description: I have two different polygons layers (sf, polygon), each with 500000+ sf. I need to calculate the area of different raster classes (1 raster layer with 3 classes) within each one of the polygons. This is the most time-consuming part of the script, specifically because I need to use sf::sf_intersection multiple times. Then, I use many different combinations of if-else and rules to populate a df with values and rules.
This is the original code, which I get memory issues with the original data:
require(sf)
require(raster)
require(rgdal)
require(rgeos)
require(dplyr)
require(stars)
## Sample data
set.seed(131)
sample_raster = raster(nrows = 1, ncols = 1, res = 0.5, xmn = 0, xmx = 11, ymn = 0, ymx = 11)
values(sample_raster) = rep(1:3, length.out = ncell(sample_raster))
crs(sample_raster) = CRS('+init=EPSG:4326')
plot(sample_raster, axes=T)
sample_raster
##
m = rbind(c(0,0), c(1,0), c(1,1), c(0,1), c(0,0))
p = st_polygon(list(m))
n = 100
l = vector("list", n)
for (i in 1:n)
l[[i]] = p + 10 * runif(2)
sample_poly = st_sfc(l)
data = data.frame(PR_ID = seq(1:100),
COND1 = rep(1:10, length.out = 100))
sample_poly = st_sf(cbind(data, sample_poly))
plot(sample_poly, col = sf.colors(categorical = TRUE, alpha = .5), add=T)
sample_poly = sample_poly %>% st_set_crs(4326)
sample_poly
##
## Code
require(parallel)
require(foreach)
require(doParallel)
idall = as.character(sample_poly$PR_ID)
area = as.numeric(st_area(sample_poly))/10000
# i=1
# listID = idall
# mainpoly = sample_poly
# mainras = sample_raster
# mainpolyarea = area
per.imovel.paralallel = function (listID, mainpoly, mainras, mainpolyarea) { # Starting the function
## Setting the parallel work up into your computer
UseCores = detectCores()-1
cl = parallel::makeCluster(UseCores, output="")
doParallel::registerDoParallel(cl)
writeLines(c(""), "log.txt") # Creates a LOG FILE in the folder to follow processing
FOREACH.RESULT = foreach(i = 1:length(listID), .packages=c('raster', 'rgdal', 'rgeos', 'dplyr', 'parallel',
'doParallel', 'sf', 'stars'), .inorder = T , .combine ='rbind') %dopar%
{ # Stating the paral-loop
sink("log.txt", append=TRUE) # LOG FILE in the home folder
cat(paste(i, "of", length(listID), as.character(Sys.time()),"\n")) # Write to LOG FILE
sink() # end diversion of output
########################
### Pick one poly
px = sf::st_buffer(mainpoly[mainpoly$PR_ID == listID[i],], # Conditional to select the geometry PR_ID in position i
dist = 0.1) # buffer = 0 w/ byid, selects the geometry
########################
### Intersect with raster and get area
px2 = sf::st_buffer(px, dist = 0.1) # Buffer because raster::mask() masks out partially covered cells since it call rasterize() first
desm_prop = raster::crop(mainras, as_Spatial(px2))
desm_prop_shp = if(all(is.na(values(desm_prop)))){NULL
} else {sf::st_intersection(st_cast(sf::st_as_sf(stars::st_as_stars(desm_prop)), "POLYGON"), px)}
names(desm_prop_shp)[1] = if(any(names(desm_prop_shp) == "layer")){"values"
} else {NULL}
desm_prop_bet0108 = if(is.null(desm_prop_shp)){NULL
} else {desm_prop_shp[desm_prop_shp$values == 1, ]}
desm_prop_bet0108 = if(is.null(desm_prop_bet0108) | length(desm_prop_bet0108) == 0){NULL
} else if(length(desm_prop_bet0108$values) == 0){NULL
} else {desm_prop_bet0108}
desm_prop_after08 = if(is.null(desm_prop_shp)){NULL
} else {desm_prop_shp[desm_prop_shp$values == 2, ]}
desm_prop_after08 = if(is.null(desm_prop_after08) | length(desm_prop_after08) == 0){NULL
} else if(length(desm_prop_after08$values) == 0){NULL
} else {desm_prop_after08}
desm_prop_upto00 = if(is.null(desm_prop_shp)){NULL
} else {desm_prop_shp[desm_prop_shp$values == 3, ]}
desm_prop_upto00 = if(is.null(desm_prop_upto00) | length(desm_prop_upto00) == 0){NULL
} else if(length(desm_prop_upto00$values) == 0){NULL
} else {desm_prop_upto00}
area_desm_prop_bet0108 <- if(is.null(desm_prop_bet0108)){0
} else { sum(as.numeric(sf::st_area(desm_prop_bet0108)/10000))} # Deforestation area in PX 2001 - 2008
area_desm_prop_after08 <- if(is.null(desm_prop_after08)){0
} else { sum(as.numeric(sf::st_area(desm_prop_after08)/10000))} # Deforestation area in PX after 2008
area_desm_prop_upto00 <- if(is.null(desm_prop_upto00)){0
} else { sum(as.numeric(sf::st_area(desm_prop_upto00)/10000))} # Deforestation area in PX upto 2000
########################
# RESULTS
TEMP.RESULTS = data.frame(PR_ID = as.character(listID[i]),
PR_AREA_HA = mainpolyarea[i],
PR_D09 = area_desm_prop_after08,
PR_D0108 = area_desm_prop_bet0108,
PR_D00 = area_desm_prop_upto00)
return (TEMP.RESULTS)
} # Ending the loop
return (FOREACH.RESULT)
parallel::stopCluster(cl) # stop cluster
stopImplicitCluster() # stop cluster
gc()
} # Ending the function
#####################################################################################################
results_feach = per.imovel.paralallel (listID = idall, mainpoly = sample_poly, mainras = sample_raster, mainpolyarea = area)
warnings()
I have also tried #mischva11 (modified) suggestion by adding this:
length_of_chunk = round(length(idall)/(length(idall)/10)) # generate chunks of 10 lines
lchunks = split(idall, sort(rep_len(1:length_of_chunk, length(idall))))
for (z in 1:length_of_chunk){
# split up the data in chunks
idall_chunk = as.vector(unlist(lchunks[z]))
results_chunk = per.imovel.paralallel (listID = idall_chunk, mainpoly = sample_poly, mainras = sample_raster, mainpolyarea = area)
# save your foreach results for each chunk, append after the first one
if (z == 1) {write.table(results_chunk, file = "TESTDATAresults1.csv")
}else {write.table(results_chunk, file = "TESTDATAresults1.csv", append = TRUE, col.names = FALSE)}
print(NULL) # print(results_chunk)
}
It works like a charm for this example.
BUT, I have a setback when running it with the real script/data: it takes ages for the foreach to close. I am watching my machine performance and log file.. after processing all lines of my sf object, my CPU work goes down as expected, but it still takes more than 30min (i did not wait for it to completely finish) to close the foreach function.
Because of it, I thought about writing the output on the flow inside the foreach work. But clearly it is not a good idea as explained here. I have seen some posts about the package 'flock' which look the output file for writing the output. I have not tested but it sounds promising.
The problem here is, that you need communication between the cores. One core has to wait for the next one until it's finished writing in the csv. That's not easily done and not possible with foreach as far as I now. foreach does provide this method with the variable inorder(by default true). You are telling us, you got memory issues. So one solution is to chunk up your output if it's possible. I do not have a good dataset for this example, so I use mtcars which will be filled by NAs
library(foreach)
library(parallel)
library(doParallel)
registerDoParallel(4)
# split your output here, I use 5 chunks here. My data is mtcars */
length_of_chunk <-round(nrow(mtcars)/5)
for ( z in 1:length_of_chunk-1){
x<-0
#here the data gets split up
data <- mtcars[(z*length_of_chunk):(z*length_of_chunk+length_of_chunk),]
#foreach with those 5 datarows
results <- foreach(i=1:length_of_chunk, .combine=rbind) %dopar% {
#***your code***
y = data[i,]
return(y)
}
print(results)
# save your foreach results and then begin again
if (z==1) {write.table(results, file= "test.csv")}
else {write.table(results, file="test.csv", append=TRUE, col.names = FALSE)}
}

Properly calling variable name when creating multiple Benford plots

I am creating Benford plots for all the numeric variables in my dataset. https://en.wikipedia.org/wiki/Benford%27s_law
Running a single variable
#install.packages("benford.analysis")
library(benford.analysis)
plot(benford(iris$Sepal.Length))
Looks great. And the legend says "Dataset: iris$Sepal.Length", perfect!.
Using apply to run 4 variables,
apply(iris[1:4], 2, function(x) plot(benford(x)))
Creates four plots, however, each plot's legend says "Dataset: x"
I attempted to use a for loop,
for (i in colnames(iris[1:4])){
plot(benford(iris[[i]]))
}
This creates four plots, but now the legends says "Dataset: iris[[i]]". And I would like the name of the variable on each chart.
I tried a different loop, hoping to get titles with an evaluated parsed string like "iris$Sepal.Length":
for (i in colnames(iris[1:4])){
plot(benford(eval(parse(text=paste0("iris$", i)))))
}
But now the legend says "Dataset: eval(parse(text=paste0("iris$", i)))".
AND, Now I've run into the infamous eval(parse(text=paste0( (eg: How to "eval" results returned by "paste0"? and R: eval(parse(...)) is often suboptimal )
I would like labels such as "Dataset: iris$Sepal.Length" or "Dataset: Sepal.Length". How can I create multiple plots with meaningfully variable names in the legend?
This is happening because of the first line within the benford function=:
benford <- function(data, number.of.digits = 2, sign = "positive", discrete=TRUE, round=3){
data.name <- as.character(deparse(substitute(data)))
Source: https://github.com/cran/benford.analysis/blob/master/R/functions-new.R
data.name is then used to name your graph. Whatever variable name or expression you pass to the function will unfortunately be caught by the deparse(substitute()) call, and will be used as the name for your graph.
One short-term solution is to copy and rewrite the function:
#install.packages("benford.analysis")
library(benford.analysis)
#install.packages("data.table")
library(data.table) # needed for function
# load hidden functions into namespace - needed for function
r <- unclass(lsf.str(envir = asNamespace("benford.analysis"), all = T))
for(name in r) eval(parse(text=paste0(name, '<-benford.analysis:::', name)))
benford_rev <- function{} # see below
for (i in colnames(iris[1:4])){
plot(benford_rev(iris[[i]], data.name = i))
}
This has negative side effects of:
Not being maintainable with package revisions
Fills your GlobalEnv with normally hidden functions in the package
So hopefully someone can propose a better way!
benford_rev <- function(data, number.of.digits = 2, sign = "positive", discrete=TRUE, round=3, data.name = as.character(deparse(substitute(data)))){ # changed
# removed line
benford.digits <- generate.benford.digits(number.of.digits)
benford.dist <- generate.benford.distribution(benford.digits)
empirical.distribution <- generate.empirical.distribution(data, number.of.digits,sign, second.order = FALSE, benford.digits)
n <- length(empirical.distribution$data)
second.order <- generate.empirical.distribution(data, number.of.digits,sign, second.order = TRUE, benford.digits, discrete = discrete, round = round)
n.second.order <- length(second.order$data)
benford.dist.freq <- benford.dist*n
## calculating useful summaries and differences
difference <- empirical.distribution$dist.freq - benford.dist.freq
squared.diff <- ((empirical.distribution$dist.freq - benford.dist.freq)^2)/benford.dist.freq
absolute.diff <- abs(empirical.distribution$dist.freq - benford.dist.freq)
### chi-squared test
chisq.bfd <- chisq.test.bfd(squared.diff, data.name)
### MAD
mean.abs.dev <- sum(abs(empirical.distribution$dist - benford.dist)/(length(benford.dist)))
if (number.of.digits > 3) {
MAD.conformity <- NA
} else {
digits.used <- c("First Digit", "First-Two Digits", "First-Three Digits")[number.of.digits]
MAD.conformity <- MAD.conformity(MAD = mean.abs.dev, digits.used)$conformity
}
### Summation
summation <- generate.summation(benford.digits,empirical.distribution$data, empirical.distribution$data.digits)
abs.excess.summation <- abs(summation - mean(summation))
### Mantissa
mantissa <- extract.mantissa(empirical.distribution$data)
mean.mantissa <- mean(mantissa)
var.mantissa <- var(mantissa)
ek.mantissa <- excess.kurtosis(mantissa)
sk.mantissa <- skewness(mantissa)
### Mantissa Arc Test
mat.bfd <- mantissa.arc.test(mantissa, data.name)
### Distortion Factor
distortion.factor <- DF(empirical.distribution$data)
## recovering the lines of the numbers
if (sign == "positive") lines <- which(data > 0 & !is.na(data))
if (sign == "negative") lines <- which(data < 0 & !is.na(data))
if (sign == "both") lines <- which(data != 0 & !is.na(data))
#lines <- which(data %in% empirical.distribution$data)
## output
output <- list(info = list(data.name = data.name,
n = n,
n.second.order = n.second.order,
number.of.digits = number.of.digits),
data = data.table(lines.used = lines,
data.used = empirical.distribution$data,
data.mantissa = mantissa,
data.digits = empirical.distribution$data.digits),
s.o.data = data.table(second.order = second.order$data,
data.second.order.digits = second.order$data.digits),
bfd = data.table(digits = benford.digits,
data.dist = empirical.distribution$dist,
data.second.order.dist = second.order$dist,
benford.dist = benford.dist,
data.second.order.dist.freq = second.order$dist.freq,
data.dist.freq = empirical.distribution$dist.freq,
benford.dist.freq = benford.dist.freq,
benford.so.dist.freq = benford.dist*n.second.order,
data.summation = summation,
abs.excess.summation = abs.excess.summation,
difference = difference,
squared.diff = squared.diff,
absolute.diff = absolute.diff),
mantissa = data.table(statistic = c("Mean Mantissa",
"Var Mantissa",
"Ex. Kurtosis Mantissa",
"Skewness Mantissa"),
values = c(mean.mantissa = mean.mantissa,
var.mantissa = var.mantissa,
ek.mantissa = ek.mantissa,
sk.mantissa = sk.mantissa)),
MAD = mean.abs.dev,
MAD.conformity = MAD.conformity,
distortion.factor = distortion.factor,
stats = list(chisq = chisq.bfd,
mantissa.arc.test = mat.bfd)
)
class(output) <- "Benford"
return(output)
}
I have just updated the package (GitHub version) to allow for a user supplied name.
Now the function has a new parameter called data.name in which you can provide a character vector with the name of the data and override the default. Thus, for your example you can simply run the following code.
First install the GitHub version (I will submit this version to CRAN soon).
devtools::install_github("carloscinelli/benford.analysis") # install new version
Now you can provide the name of the data inside the for loop:
library(benford.analysis)
for (i in colnames(iris[1:4])){
plot(benford(iris[[i]], data.name = i))
}
And all the plots will have the correct naming as you wish (below).
Created on 2019-08-10 by the reprex package (v0.2.1)

Resources