I am getting a caught segfault error every time I try to run any plotting functions from the ggplot2 package (1.0.0). I have tried this with qplot, geom_dotplot, geom_histogram, etc. Data from the package (e.g. diamonds or economics) work just fine.
I am operating on Mac OS 10.9.4 (the latest version) and on R 3.1.1 (also the latest version). I get the same error with the standard R GUI, RStudio, and when using R from the command line. The command brings up the default graphic device (Quartz for R GUI and command line), but also the terminal error.
library(ggplot2)
qplot(1:10)
gives me the error:
*** caught segfault ***
address 0x18, cause 'memory not mapped'
Traceback:
1: .Call("plyr_split_indices", PACKAGE = "plyr", group, n)
2: split_indices(scale_id, n)
3: scale_apply(layer_data, x_vars, scale_train, SCALE_X, panel$x_scales)
4: train_position(panel, data, scale_x(), scale_y())
5: ggplot_build(x)
6: print.ggplot(list(data = list(), layers = list(<environment>), scales = <S4 object of class "Scales">, mapping = list(x = 1:3), theme = list(), coordinates = list(limits = list(x = NULL, y = NULL)), facet = list(shrink = TRUE), plot_env = <environment>, labels = list(x = "1:3", y = "count")))
7: print(list(data = list(), layers = list(<environment>), scales = <S4 object of class "Scales">, mapping = list(x = 1:3), theme = list(), coordinates = list( limits = list(x = NULL, y = NULL)), facet = list(shrink = TRUE), plot_env = <environment>, labels = list(x = "1:3", y = "count")))
Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Here is my session info:
R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin13.1.0 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] graphics grDevices utils datasets stats methods base
other attached packages:
[1] ggplot2_1.0.0 marelac_2.1.3 seacarb_3.0 shape_1.4.1 beepr_1.1 birk_1.1
loaded via a namespace (and not attached):
[1] audio_0.1-5 colorspace_1.2-4 digest_0.6.4 grid_3.1.1 gtable_0.1.2
[6] MASS_7.3-34 munsell_0.4.2 plyr_1.8.1 proto_0.3-10 Rcpp_0.11.2
[11] reshape2_1.4 scales_0.2.4 stringr_0.6.2 tools_3.1.1
I've gathered from others that this is a memory issue of some sort, but this error occurs even when I have over 2 GB of free RAM. I know this is a widely used package, so of course this doesn't happen for everyone, but why is it happening for me? Does anyone know what I can do to fix this problem?
In case anyone else has this problem or similar in the future, I sent a bug report to the package maintainer and he recommended uninstalling all installed packages and starting over. I took his advice and it worked!
I followed advice from this posting: http://r.789695.n4.nabble.com/Reset-R-s-library-to-base-packages-only-remove-all-installed-contributed-packages-td3596151.html
ip <- installed.packages()
pkgs.to.remove <- ip[!(ip[,"Priority"] %in% c("base", "recommended")), 1]
sapply(pkgs.to.remove, remove.packages)
This is not an answer to this question but it might be helpful for someone. (Inspired by user1310503. Thanks!)
I am working on a data.frame df with three cols: col1, col2, col3.
Initially,
df =data.frame(col1=character(),col2=numeric(),col3=numeric(),stringsAsFactors = F)
In the process, rbind is used for many times, like:
aList<-list(col1="aaa", col2 = "123", col3 = "234")
dfNew <- as.data.frame(aList)
df <- rbind(df, dfNew)
At last, df is written to file via data.table::fwrite
data.table::fwrite(x = df, file = fileDF, append = FALSE, row.names = F, quote = F, showProgress = T)
df has 5973 rows and 3 cols. The "caught segfault" always occurs:
address 0x1, cause 'memory not mapped'.
The solution to this problem is:
aList<-list(col1=as.character("aaa"), col2 = as.numeric("123"), col3 = as.numeric("234"))
dfNew <- as.data.frame(aList)
dfNew$col1 <- as.characer(dfNew$col1)
dfNew$col2 <- as.numeric(dfNew$col2)
dfNew$col3 <- as.numeric(dfNew$col3)
df <- rbind(df, dfNew)
Then this problem is solved. Possible reason is that the classes of cols are different.
This is not an answer to this question but it might be useful for someone. I had segfaults when I did pdf to create a PDF graphics device and then used plot. This happened with R 2.15.3, 3.2.4, and one or two other versions, running on Scientific Linux release 6.7. I tried many different things, but the only ways I could get it to work were (a) using png or tiff instead of pdf, or (b) saving large .RData files and then using a completely separate R program to create the graphics.
Related
I am following a tutorial here. A few days ago I was able to run this code without error and run it on my own data set (it was always a little hit and miss with obtaining this error) - however now I try to run the code and I always obtain the same error.
Error in solve.QP(Dmat, dvec, Amat, bvec = b0, meq = 2) :
constraints are inconsistent, no solution!
I get that the solver cannot solve the equations but I am a little confused as to why it worked previously and now it does not... The author of the article has this code working...
library(tseries)
library(data.table)
link <- "https://raw.githubusercontent.com/DavZim/Efficient_Frontier/master/data/mult_assets.csv"
df <- data.table(read.csv(link))
df_table <- melt(df)[, .(er = mean(value),
sd = sd(value)), by = variable]
er_vals <- seq(from = min(df_table$er), to = max(df_table$er), length.out = 1000)
# find an optimal portfolio for each possible possible expected return
# (note that the values are explicitly set between the minimum and maximum of the expected returns per asset)
sd_vals <- sapply(er_vals, function(er) {
op <- portfolio.optim(as.matrix(df), er)
return(op$ps)
})
SessionInfo:
R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252 LC_MONETARY=Spanish_Spain.1252
[4] LC_NUMERIC=C LC_TIME=Spanish_Spain.1252
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] lpSolve_5.6.13.1 data.table_1.12.0 tseries_0.10-46 rugarch_1.4-0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.0 MASS_7.3-51.1 mclust_5.4.2
[4] lattice_0.20-38 quadprog_1.5-5 Rsolnp_1.16
[7] TTR_0.23-4 tools_3.5.3 xts_0.11-2
[10] SkewHyperbolic_0.4-0 GeneralizedHyperbolic_0.8-4 quantmod_0.4-13.1
[13] spd_2.0-1 grid_3.5.3 KernSmooth_2.23-15
[16] yaml_2.2.0 numDeriv_2016.8-1 Matrix_1.2-15
[19] nloptr_1.2.1 DistributionUtils_0.6-0 ks_1.11.3
[22] curl_3.3 compiler_3.5.3 expm_0.999-3
[25] truncnorm_1.0-8 mvtnorm_1.0-8 zoo_1.8-4
tseries::portfolio.optim disallows short selling by default, see argument short. If short = FALSE asset weights may not go below 0. And as the weights must sum up to 1, also no individual asset weight could be above 1. There's no leverage.
(Possibly, in an earlier version of tseries default could have been short = TRUE. This would explain why it previously worked for you.)
Your target return (pm) cannot exceed the highest return of any of the input assets.
Solution 1: Allow short selling, but remember that that's a different efficient frontier. (For reference, see any lecture or book discussing Markowitz optimization. There's a mathematical solution to the problem without short-selling restriction.)
op <- portfolio.optim(as.matrix(df), er, shorts = T)
Solution 2: Limit the target returns between the worst and the best asset's return.
er_vals <- seq(from = min(colMeans(df)), to = max(colMeans(df)), length.out = 1000)
Here's a plot of the obtained efficient frontiers.
Here's the full script that gives both solutions.
library(tseries)
library(data.table)
link <- "https://raw.githubusercontent.com/DavZim/Efficient_Frontier/master/data/mult_assets.csv"
df <- data.table(read.csv(link))
df_table <- melt(df)[, .(er = mean(value),
sd = sd(value)), by = variable]
# er_vals <- seq(from = min(df_table$er), to = max(df_table$er), length.out = 1000)
er_vals1 <- seq(from = 0, to = 0.15, length.out = 1000)
er_vals2 <- seq(from = min(colMeans(df)), to = max(colMeans(df)), length.out = 1000)
# find an optimal portfolio for each possible possible expected return
# (note that the values are explicitly set between the minimum and maximum of the expected returns per asset)
sd_vals1 <- sapply(er_vals1, function(er) {
op <- portfolio.optim(as.matrix(df), er, short = T)
return(op$ps)
})
sd_vals2 <- sapply(er_vals2, function(er) {
op <- portfolio.optim(as.matrix(df), er, short = F)
return(op$ps)
})
plot(x = sd_vals1, y = er_vals1, type = "l", col = "red",
xlab = "sd", ylab = "er",
main = "red: allowing short-selling;\nblue: disallowing short-selling")
lines(x = sd_vals2, y = er_vals2, type = "l", col = "blue")
I've been getting WFA to run on the full set of intraday GBPUSD 30min data, and have come across a couple of things that need addressing. The first is I believe the save function needs changing to remove the time from the string (as shown here as a pull request on the R-Finance/quantstrat repo on github). The walk.forward function throws this error:
Error in gzfile(file, "wb") : cannot open the connection
In addition: Warning message:
In gzfile(file, "wb") :
cannot open compressed file 'wfa.GBPUSD.2002-10-21 00:30:00.2002-10-23 23:30:00.RData', probable reason 'Invalid argument'
The second is a rare case scenario where its ends up calling runSum on a data set with less rows than the period you are testing (n). This is the traceback():
8: stop("Invalid 'n'")
7: runSum(x, n)
6: runMean(x, n)
5: (function (x, n = 10, ...)
{
ma <- runMean(x, n)
if (!is.null(dim(ma))) {
colnames(ma) <- "SMA"
}
return(ma)
})(x = Cl(mktdata)[, 1], n = 25)
4: do.call(indFun, .formals)
3: applyIndicators(strategy = strategy, mktdata = mktdata, parameters = parameters,
...)
2: applyStrategy(strategy, portfolios = portfolio.st, mktdata = symbol[testing.timespan]) at custom.walk.forward.R#122
1: walk.forward(strategy.st, paramset.label = "WFA", portfolio.st = portfolio.st,
account.st = account.st, period = "days", k.training = 3,
k.testing = 1, obj.func = my.obj.func, obj.args = list(x = quote(result$apply.paramset)),
audit.prefix = "wfa", anchored = FALSE, verbose = TRUE)
The extended GBPUSD data used in the creation of the Luxor Demo includes an erroneous date (2002/10/27) with only 1 observation which causes this problem. I can also foresee this being an issue when testing longer signal periods on instruments like Crude where they have only a few trading hours on Sunday evenings (UTC).
Given that I have purely been following the Luxor demo with the same (extended) intra-day data set, are these genuine issues or have they been caused by package updates etc?
What is the preferred way for these things to be reported to the authors of QS, and find out if/when fixes are likely to be made?
SessionInfo():
R version 3.3.0 (2016-05-03)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252 LC_MONETARY=English_Australia.1252 LC_NUMERIC=C LC_TIME=English_Australia.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] quantstrat_0.9.1739 foreach_1.4.3 blotter_0.9.1741 PerformanceAnalytics_1.4.4000 FinancialInstrument_1.2.0 quantmod_0.4-5 TTR_0.23-1
[8] xts_0.9.874 zoo_1.7-13
loaded via a namespace (and not attached):
[1] compiler_3.3.0 tools_3.3.0 codetools_0.2-14 grid_3.3.0 iterators_1.0.8 lattice_0.20-33
quantstrat is on github here:
https://github.com/braverock/quantstrat
Issues and patches should be reported via github issues.
I know that the title of this question is a duplicate of this Question and this Question but the solutions over there don't work for me and the error message is (slightly) different:
Error in grid.Call(L_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
polygon edge not found
(note the missing part about the missing font)
I tried all suggestions that I found (updating / reinstalling all loaded graphic packages, ggplot2, GGally, and scales, reinitialising the Fonts on Mac OSX by starting in safe mode, moving the Fonts from /Fonts/ (Disabled) back into /Fonts...) but none of it resolved the problem.
The error seems to occure when I plot a ggplot graph with
scale_y_continuous(label=scientific_10)
where scientific_10 is defined as
scientific_10 <- function(x) {
parse(text = gsub("e", " %*% 10^", scientific_format()(x)))
}
Therefore the I suspect that the scales library has something to do with it.
The most puzzling is that the error only occurs each so-and-so many times, maybe each 3rd or 5th time i try to plot the same graph...
> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.9.5 (Mavericks)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] gridExtra_2.0.0 scales_0.3.0 broom_0.4.0 tidyr_0.3.1 ggplot2_1.0.1 GGally_0.5.0 dplyr_0.4.3
loaded via a namespace (and not attached):
[1] Rcpp_0.11.5 magrittr_1.5 MASS_7.3-43 mnormt_1.5-1 munsell_0.4.2 colorspace_1.2-6 lattice_0.20-33 R6_2.0.1
[9] stringr_0.6.2 plyr_1.8.1 tools_3.2.2 parallel_3.2.2 grid_3.2.2 gtable_0.1.2 nlme_3.1-121 psych_1.5.8
[17] DBI_0.3.1 htmltools_0.2.6 lazyeval_0.1.10 yaml_2.1.13 assertthat_0.1 digest_0.6.8 reshape2_1.4.1 rmarkdown_0.8.1
[25] labeling_0.3 reshape_0.8.5 proto_0.3-10
traceback()
35: grid.Call(L_textBounds, as.graphicsAnnot(x$label), x$x, x$y,
resolveHJust(x$just, x$hjust), resolveVJust(x$just, x$vjust),
x$rot, 0)
34: widthDetails.text(x)
33: widthDetails(x)
32: (function (x)
{
widthDetails(x)
})(list(label = expression(5 %*% 10^+5, 7.5 %*% 10^+5, 1 %*%
10^+6, 1.25 %*% 10^+6, 1.5 %*% 10^+6), x = 1, y = c(0.0777214770341215,
0.291044141334423, 0.504366805634725, 0.717689469935027, 0.931012134235329
), just = "centre", hjust = 1, vjust = 0.5, rot = 0, check.overlap = FALSE,
name = "axis.text.y.text.8056", gp = list(fontsize = 9.6,
col = "black", fontfamily = "", lineheight = 0.9, font = 1L),
vp = NULL))
31: grid.Call.graphics(L_setviewport, vp, TRUE)
30: push.vp.viewport(X[[i]], ...)
I solved it by installing the library extrafont, installing a set of specific fonts and forcing ggplot to use only these fonts:
require(extrafont)
# need only do this once!
font_import(pattern="[A/a]rial", prompt=FALSE)
require(ggplot2)
# extending the help file example
df <- data.frame(gp = factor(rep(letters[1:3], each = 10)), y = rnorm(30))
ds <- plyr::ddply(df, "gp", plyr::summarise, mean = mean(y), sd = sd(y))
plotobj <- ggplot(df, aes(gp, y)) +
geom_point() +
geom_point(data = ds, aes(y = mean), colour = 'red', size = 3) +
theme(text=element_text(size=16, family="Arial"))
print(plotobj)
I experienced the same issue when trying to plot ggplot/grid output to the graph window in Rstudio. However, plotting to an external graphing device seems to work fine.
The external device of choice depends on your system, but the script below, paraphrased from this blog, works for most systems:
a = switch(tolower(Sys.info()["sysname"]),
"darwin" = "quartz",
"linux" = "x11",
"windows" = "windows")
options("device" = a)
graphics.off()
rm(a)
and to switch back to using the Rstudio plot window:
options("device"="RStudioGD")
graphics.off()
Note that by switching, you lose any existing plots.
A lot of solutions for this particular error direct you to look under the hood of your computer but this error can also be caused by a scripting error in which R expects to match elements from two data structures but cannot.
For me the error was caused by calling a fairly complex graphing function (see below) that read an ordered character vector as well as a matrix whose row names were supposed to each match a value in the ordered character vector. The problem was that some of my values contained dashes in them and R's read.table() function translated those dashes to periods (Ex: "HLA-DOA" became "HLA.DOA").
I was using the ComplexHeatmap package with a call like this:
oncoPrint(mat,
get_type = function(x) strsplit(x, ";")[[1]],
alter_fun_list = alter_fun_list,
col = col,
row_order = my_order,
column_title = "OncoPrint",
heatmap_legend_param = list(title = "Alternations", at = c("AMP", "HOMDEL", "MUT"), labels = c("Amplification", "Deep deletion", "Mutation"))
)
In this call:
mat was a matrix that had dashes swapped out for periods
my_order was a character vector containing the same values as the row names of matexcept the dashes remained
every other argument is essential to the call but irrelevant to this post
To help R find this elusive "polygon edge", I just edited my character vector with:
row_order <- gsub("\\.", "-", row_order)
If you've tried re-installing packages, restarting your computer and re-enabling fonts - maybe check and see if you've got some faulty character matching going on in your call.
i tried to set the font of aes,returned the error info
the added words:
p <- p + theme(text = element_text(family = "宋体"))
when i tried to remove the setting,it's ok then.
Actually, I have the same problem on my MAC and couldn't solve it on a regular base... Since it also happens like every 5th or 10th execution I decided to wrap the whole ggplot command into a trycatch call and execute it until it doesn't fail...
The code would looks like this
error_appeared <- FALSE
repeat{
tryCatch({ # we put everything into a try catch block, because sometimes we get an error
gscat <-
ggplot() # my ggplot command which sometimes fail
ggsave('file.pdf', gscat, width=8,height=8)
plot(gscat)
},
error=function(e) {
print('redo the ratioscatterplot.')
error_appeared <- TRUE
}
)
if(!error_appeared){
break
}
}
Actually I figured out, only the drawing/plotting of the figure gives problems! Saving always works.
Maybe this is helping someone, since I couldn't find a solution which actually solves the whole thing!
Additional:
If somebody wants to play with the problem on a "reproducible example" the code below throws an average of 2 errors out of 20 within the loop.
library(scales)
library(ggplot2)
df <- data.frame(
log2.Ratio.H.L.normalized.rev = c(2.53861265542646, 0.402176424979483, 0.438931541934545, 0.639695233399582, 0.230203013366421,
2.88223218956399, 1.23051046036618, 2.56554843533357, 0.265436896049098,
1.32866415755805, -0.92108963514092, 0.0976107966264223, -0.43048946484291,
-0.558665259531966, 4.13183638727079, 0.904580434921318, -0.0733780789564803,
-0.621932351219966, 1.48594198341242, -0.365611185917855, 1.21088754922081,
-2.3717583289898, 2.95160644380282, 3.71446534016249),
Intensity = c(5951600000, 2.4433e+10, 1.1659e+10, 2273600000, 6.852e+10, 9.8746e+10, 5701600000,
1758500000, 987180000, 3.4167e+11, 1.5718e+10, 6.8888e+10, 5.5936e+10,
8702900000, 1093500000, 4426200000, 1.3681e+11, 7.773e+09, 5860400000,
1.2861e+12, 2017900000, 2061300000, 240520000, 1382700000),
my_label = c("RPL18",
"hCG_2024613", "NOL7", "PRPF4B", "HIST1H2BC", "XRCC1", "C9orf30",
"CABIN1", "MGC3731", "XRCC6", "RPL23", "RPL27", "RPL17", "RPL32",
"XPC", "RPL15", "GNL3", "RPL29", "JOSD3", "PARP1", "DNAPTP6",
"ORC2L", "NCL", "TARDBP"))
unlink("figures", recursive=TRUE)
if(!dir.exists('figures')) dir.create('figures')
for(i in 1:20) {
error_appeared <- FALSE
repeat{
tryCatch({ # we put everything into a try catch block, because sometimes we get an error
gscat <-
ggplot(df, aes_string("log2.Ratio.H.L.normalized.rev", 'Intensity')) +
geom_point(data=df[abs(df[["log2.Ratio.H.L.normalized.rev"]]) < 1,],
color='black', alpha=.3, na.rm=TRUE) +
scale_y_log10(labels = scales::trans_format("log10", scales::math_format()))
ggsave(file.path('figures', paste0('intensity_scatter_', i, '.pdf')),
gscat, width=8, height=8)
plot(gscat)
},
error=function(e) {
# print(e)
print(sprintf('%s redo the ratioscatterplot.', i))
error_appeared <- TRUE
}
)
if(!error_appeared){
break
}
}
}
I am getting a caught segfault error every time I try to run any plotting functions from the ggplot2 package (1.0.0). I have tried this with qplot, geom_dotplot, geom_histogram, etc. Data from the package (e.g. diamonds or economics) work just fine.
I am operating on Mac OS 10.9.4 (the latest version) and on R 3.1.1 (also the latest version). I get the same error with the standard R GUI, RStudio, and when using R from the command line. The command brings up the default graphic device (Quartz for R GUI and command line), but also the terminal error.
library(ggplot2)
qplot(1:10)
gives me the error:
*** caught segfault ***
address 0x18, cause 'memory not mapped'
Traceback:
1: .Call("plyr_split_indices", PACKAGE = "plyr", group, n)
2: split_indices(scale_id, n)
3: scale_apply(layer_data, x_vars, scale_train, SCALE_X, panel$x_scales)
4: train_position(panel, data, scale_x(), scale_y())
5: ggplot_build(x)
6: print.ggplot(list(data = list(), layers = list(<environment>), scales = <S4 object of class "Scales">, mapping = list(x = 1:3), theme = list(), coordinates = list(limits = list(x = NULL, y = NULL)), facet = list(shrink = TRUE), plot_env = <environment>, labels = list(x = "1:3", y = "count")))
7: print(list(data = list(), layers = list(<environment>), scales = <S4 object of class "Scales">, mapping = list(x = 1:3), theme = list(), coordinates = list( limits = list(x = NULL, y = NULL)), facet = list(shrink = TRUE), plot_env = <environment>, labels = list(x = "1:3", y = "count")))
Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Here is my session info:
R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin13.1.0 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] graphics grDevices utils datasets stats methods base
other attached packages:
[1] ggplot2_1.0.0 marelac_2.1.3 seacarb_3.0 shape_1.4.1 beepr_1.1 birk_1.1
loaded via a namespace (and not attached):
[1] audio_0.1-5 colorspace_1.2-4 digest_0.6.4 grid_3.1.1 gtable_0.1.2
[6] MASS_7.3-34 munsell_0.4.2 plyr_1.8.1 proto_0.3-10 Rcpp_0.11.2
[11] reshape2_1.4 scales_0.2.4 stringr_0.6.2 tools_3.1.1
I've gathered from others that this is a memory issue of some sort, but this error occurs even when I have over 2 GB of free RAM. I know this is a widely used package, so of course this doesn't happen for everyone, but why is it happening for me? Does anyone know what I can do to fix this problem?
In case anyone else has this problem or similar in the future, I sent a bug report to the package maintainer and he recommended uninstalling all installed packages and starting over. I took his advice and it worked!
I followed advice from this posting: http://r.789695.n4.nabble.com/Reset-R-s-library-to-base-packages-only-remove-all-installed-contributed-packages-td3596151.html
ip <- installed.packages()
pkgs.to.remove <- ip[!(ip[,"Priority"] %in% c("base", "recommended")), 1]
sapply(pkgs.to.remove, remove.packages)
This is not an answer to this question but it might be helpful for someone. (Inspired by user1310503. Thanks!)
I am working on a data.frame df with three cols: col1, col2, col3.
Initially,
df =data.frame(col1=character(),col2=numeric(),col3=numeric(),stringsAsFactors = F)
In the process, rbind is used for many times, like:
aList<-list(col1="aaa", col2 = "123", col3 = "234")
dfNew <- as.data.frame(aList)
df <- rbind(df, dfNew)
At last, df is written to file via data.table::fwrite
data.table::fwrite(x = df, file = fileDF, append = FALSE, row.names = F, quote = F, showProgress = T)
df has 5973 rows and 3 cols. The "caught segfault" always occurs:
address 0x1, cause 'memory not mapped'.
The solution to this problem is:
aList<-list(col1=as.character("aaa"), col2 = as.numeric("123"), col3 = as.numeric("234"))
dfNew <- as.data.frame(aList)
dfNew$col1 <- as.characer(dfNew$col1)
dfNew$col2 <- as.numeric(dfNew$col2)
dfNew$col3 <- as.numeric(dfNew$col3)
df <- rbind(df, dfNew)
Then this problem is solved. Possible reason is that the classes of cols are different.
This is not an answer to this question but it might be useful for someone. I had segfaults when I did pdf to create a PDF graphics device and then used plot. This happened with R 2.15.3, 3.2.4, and one or two other versions, running on Scientific Linux release 6.7. I tried many different things, but the only ways I could get it to work were (a) using png or tiff instead of pdf, or (b) saving large .RData files and then using a completely separate R program to create the graphics.
5 days and still no answer
As can be seen by Simon's comment, this is a reproducible and very strange issue. It seems that the issue only arises when a stepwise regression with very high predictive power is wrapped in a function.
I have been struggling with this for a while and any help would be much appreciated. I am trying to write a function that runs several stepwise regressions and outputs all of them to a list. However, R is having trouble reading the dataset that I specify in my function arguments. I found several similar errors on various boards (here, here, and here), however none of them seemed to ever get resolved. It all comes down to some weird issues with calling step() in a user-defined function. I am using the following script to test my code. Run the whole thing several times until an error arises (trust me, it will):
test.df <- data.frame(a = sample(0:1, 100, rep = T),
b = as.factor(sample(0:5, 100, rep = T)),
c = runif(100, 0, 100),
d = rnorm(100, 50, 50))
test.df$b[10:100] <- test.df$a[10:100] #making sure that at least one of the variables has some predictive power
stepModel <- function(modeling.formula, dataset, outfile = NULL) {
if (is.null(outfile) == FALSE){
sink(file = outfile,
append = TRUE, type = "output")
print("")
print("Models run at:")
print(Sys.time())
}
model.initial <- glm(modeling.formula,
family = binomial,
data = dataset)
model.stepwise1 <- step(model.initial, direction = "backward")
model.stepwise2 <- step(model.stepwise1, scope = ~.^2)
output <- list(modInitial = model.initial, modStep1 = model.stepwise1, modStep2 = model.stepwise2)
sink()
return(output)
}
blah <- stepModel(a~., dataset = test.df)
This returns the following error message (if the error does not show up right away, keep re-running the test.df script as well as the call for stepModel(), it will show up eventually):
Error in is.data.frame(data) : object 'dataset' not found
I have determined that everything runs fine up until model.stepwise2 starts to get built. Somehow, the temporary object 'dataset' works just fine for the first stepwise regression, but fails to be recognized by the second. I found this by commenting out part of the function as can be seen below. This code will run fine, proving that the object 'dataset' was originally being recognized:
stepModel1 <- function(modeling.formula, dataset, outfile = NULL) {
if (is.null(outfile) == FALSE){
sink(file = outfile,
append = TRUE, type = "output")
print("")
print("Models run at:")
print(Sys.time())
}
model.initial <- glm(modeling.formula,
family = binomial,
data = dataset)
model.stepwise1 <- step(model.initial, direction = "backward")
# model.stepwise2 <- step(model.stepwise1, scope = ~.^2)
# sink()
# output <- list(modInitial = model.initial, modStep1 = model.stepwise1, modStep2 = model.stepwise2)
return(model.stepwise1)
}
blah1 <- stepModel1(a~., dataset = test.df)
EDIT - before anyone asks, all the summary() functions were there because the full function (i edited it so that you could focus in on the error) has another piece that defines a file to which you can output stepwise trace. I just got rid of them
EDIT 2 - session info
sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-pc-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] tcltk stats graphics grDevices utils datasets methods base
other attached packages:
[1] sqldf_0.4-6.4 RSQLite.extfuns_0.0.1 RSQLite_0.11.3 chron_2.3-43
[5] gsubfn_0.6-5 proto_0.3-10 DBI_0.2-6 ggplot2_0.9.3.1
[9] caret_5.15-61 reshape2_1.2.2 lattice_0.20-6 foreach_1.4.0
[13] cluster_1.14.2 plyr_1.8
loaded via a namespace (and not attached):
[1] codetools_0.2-8 colorspace_1.2-1 dichromat_2.0-0 digest_0.6.2 grid_2.15.1
[6] gtable_0.1.2 iterators_1.0.6 labeling_0.1 MASS_7.3-18 munsell_0.4
[11] RColorBrewer_1.0-5 scales_0.2.3 stringr_0.6.2 tools_2.15
EDIT 3 - this performs all the same operations as the function, just without using a function. This will run fine every time, even when the algorithm doesn't converge:
modeling.formula <- a~.
dataset <- test.df
outfile <- NULL
if (is.null(outfile) == FALSE){
sink(file = outfile,
append = TRUE, type = "output")
print("")
print("Models run at:")
print(Sys.time())
}
model.initial <- glm(modeling.formula,
family = binomial,
data = dataset)
model.stepwise1 <- step(model.initial, direction = "backward")
model.stepwise2 <- step(model.stepwise1, scope = ~.^2)
output <- list(modInitial = model.initial, modStep1 = model.stepwise1, modStep2 = model.stepwise2)
Using do.call to refer to the data set in the calling environment works for me. See https://stackoverflow.com/a/7668846/210673 for the original suggestion. Here's a version that works (with sink code removed).
stepModel2 <- function(modeling.formula, dataset) {
model.initial <- do.call("glm", list(modeling.formula,
family = "binomial",
data = as.name(dataset)))
model.stepwise1 <- step(model.initial, direction = "backward")
model.stepwise2 <- step(model.stepwise1, scope = ~.^2)
list(modInitial = model.initial, modStep1 = model.stepwise1, modStep2 = model.stepwise2)
}
blah <- stepModel2(a~., dataset = "test.df")
It fails for me consistently with set.seed(6) with the original code. The reason it fails is that the dataset variable is not present within the step function, and although it's not needed in making model.stepwise1, it is needed for model.stepwise2 when model.stepwise1 keeps a linear term. So that's the case when your version fails. Calling the dataset from the global environment as I do here fixes this issue.