In mice package for extract complete dataset you can use complete() command as follow :
install.packages("mice")
library ("mice")
imp1=mice(nhanes,10)
fill1=complete(imp,1)
fill2=complete(imp,2)
fillall=complete(imp,"long")
But can some one tell me how to extract complete dataset in Amelia package??
install.packages("Amelia")
library ("Amelia")
imp2= amelia(freetrade, m = 5, ts = "year", cs = "country")
The str() function is always helpful here. You'll see that the complete datasets are stored in the imputations element of the object returned by amelia():
> str(imp2, 1)
List of 12
$ imputations:List of 5
..- attr(*, "class")= chr [1:2] "mi" "list"
$ m : num 5
$ missMatrix : logi [1:171, 1:10] FALSE FALSE FALSE FALSE FALSE FALSE ...
..- attr(*, "dimnames")=List of 2
$ overvalues : NULL
$ theta : num [1:9, 1:9, 1:5] -1 -0.0161 0.199 -0.0368 -0.0868 ...
$ mu : num [1:8, 1:5] -0.0161 0.199 -0.0368 -0.0868 -0.0658 ...
$ covMatrices: num [1:8, 1:8, 1:5] 0.8997 -0.3077 0.0926 0.2206 -0.1115 ...
$ code : num 1
$ message : chr "Normal EM convergence."
$ iterHist :List of 5
$ arguments :List of 23
..- attr(*, "class")= chr [1:2] "ameliaArgs" "list"
$ orig.vars : chr [1:10] "year" "country" "tariff" "polity" ...
- attr(*, "class")= chr "amelia"
To get each imputation alone, just do imp2$imputations[[1]], etc. up through all imputations that you requested. In your example, there are five:
> str(imp2$imputations, 1)
List of 5
$ imp1:'data.frame': 171 obs. of 10 variables:
$ imp2:'data.frame': 171 obs. of 10 variables:
$ imp3:'data.frame': 171 obs. of 10 variables:
$ imp4:'data.frame': 171 obs. of 10 variables:
$ imp5:'data.frame': 171 obs. of 10 variables:
- attr(*, "class")= chr [1:2] "mi" "list"
Related
I am new to machine learning and not very familiar with lists. I need to raster stack layers (Ras) to be in the same order as the training data using in the model (xgb_train_1). The names are in a component called finalModel called feature names. I have tried to reference this list in a list in several ways:
so: xgb_train_1 -> finalModel -> feature_names
Ras1 <- Ras[xgb_train_1$finalModel['feature_names']]
Ras1 <- Ras[xgb_train_1$finalModel[feature_names]]
Ras1 <- Ras[xgb_train_1[finalModel[feature_names]]]
None work and I receive the error "NULL" when I call Ras1
str(xgb_train_1$finalModel)
List of 13
$ handle :Class 'xgb.Booster.handle' <externalptr>
$ raw : raw [1:300765128] 7b 22 43 6f ...
$ niter : num 10000
$ call : language xgboost::xgb.train(params = list(eta = param$eta, max_depth = param$max_depth, gamma = param$gamma, colsampl| __truncated__ ...
$ params :List of 8
..$ eta : num 0.01
..$ max_depth : num 20
..$ gamma : num 1
..$ colsample_bytree : num 1
..$ min_child_weight : num 3
..$ subsample : num 1
..$ objective : chr "reg:squarederror"
..$ validate_parameters: logi TRUE
$ callbacks :List of 1
..$ cb.print.evaluation:function (env = parent.frame())
.. ..- attr(*, "call")= language cb.print.evaluation(period = print_every_n)
.. ..- attr(*, "name")= chr "cb.print.evaluation"
$ feature_names: chr [1:20] "Age" "Drainage" "HSG" "LU2005" ...
$ nfeatures : int 20
$ xNames : chr [1:20] "Age" "Drainage" "HSG" "LU2005" ...
$ problemType : chr "Regression"
$ tuneValue :'data.frame': 1 obs. of 7 variables:
..$ nrounds : num 10000
..$ max_depth : num 20
..$ eta : num 0.01
..$ gamma : num 1
..$ colsample_bytree: num 1
..$ min_child_weight: num 3
..$ subsample : num 1
$ obsLevels : logi NA
$ param : list()
- attr(*, "class")= chr "xgb.Booster"
I have been following an online example for R Kohonen self-organising maps (SOM) which suggested that the data should be centred and scaled before computing the SOM.
However, I've noticed the object created seems to have attributes for centre and scale, in which case am I really applying a redundant step by centring and scaling first? Example script below
# Load package
require(kohonen)
# Set data
data(iris)
# Scale and centre
dt <- scale(iris[, 1:4],center=TRUE)
# Prepare SOM
set.seed(590507)
som1 <- som(dt,
somgrid(6,6, "hexagonal"),
rlen=500,
keep.data=TRUE)
str(som1)
The output from the last line of the script is:
List of 13
$ data :List of 1
..$ : num [1:150, 1:4] -0.898 -1.139 -1.381 -1.501 -1.018 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : NULL
.. .. ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length"
"Petal.Width"
.. ..- attr(*, "scaled:center")= Named num [1:4] 5.84 3.06 3.76 1.2
.. .. ..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width"
"Petal.Length" "Petal.Width"
.. ..- attr(*, "scaled:scale")= Named num [1:4] 0.828 0.436 1.765 0.762
.. .. ..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width"
"Petal.Length" "Petal.Width"
$ unit.classif : num [1:150] 3 5 5 5 4 2 4 4 6 5 ...
$ distances : num [1:150] 0.0426 0.0663 0.0768 0.0744 0.1346 ...
$ grid :List of 6
..$ pts : num [1:36, 1:2] 1.5 2.5 3.5 4.5 5.5 6.5 1 2 3 4 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : NULL
.. .. ..$ : chr [1:2] "x" "y"
..$ xdim : num 6
..$ ydim : num 6
..$ topo : chr "hexagonal"
..$ neighbourhood.fct: Factor w/ 2 levels "bubble","gaussian": 1
..$ toroidal : logi FALSE
..- attr(*, "class")= chr "somgrid"
$ codes :List of 1
..$ : num [1:36, 1:4] -0.376 -0.683 -0.734 -1.158 -1.231 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:36] "V1" "V2" "V3" "V4" ...
.. .. ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length"
"Petal.Width"
$ changes : num [1:500, 1] 0.0445 0.0413 0.0347 0.0373 0.0337 ...
$ alpha : num [1:2] 0.05 0.01
$ radius : Named num [1:2] 3.61 0
..- attr(*, "names")= chr [1:2] "66.66667%" ""
$ user.weights : num 1
$ distance.weights: num 1
$ whatmap : int 1
$ maxNA.fraction : int 0
$ dist.fcts : chr "sumofsquares"
- attr(*, "class")= chr "kohonen"
Note notice that in lines 7 and 10 of the output there are references to centre and scale. I would appreciate an explanation as to the process here.
Your step with scaling is not redundant because in source code there are no scaling, and attributes, that you see in 7 and 10 are attributes from train dataset.
To check this, just run and compare results of this chunk of code:
# Load package
require(kohonen)
# Set data
data(iris)
# Scale and centre
dt <- scale(iris[, 1:4],center=TRUE)
#compare train datasets
str(dt)
str(as.matrix(iris[, 1:4]))
# Prepare SOM
set.seed(590507)
som1 <- kohonen::som(dt,
kohonen::somgrid(6,6, "hexagonal"),
rlen=500,
keep.data=TRUE)
#without scaling
som2 <- kohonen::som(as.matrix(iris[, 1:4]),
kohonen::somgrid(6,6, "hexagonal"),
rlen=500,
keep.data=TRUE)
#compare results of som function
str(som1)
str(som2)
In R, str() is handy for showing the structure of an object, such as the list of lists returned by lm() and other modelling functions, but it gives way too much output. I'm looking for some tool to create a simple tree diagram showing only the names of the list elements and their structure.
e.g., for this example,
data(Prestige, package="car")
out <- lm(prestige ~ income+education+women, data=Prestige)
str(out, max.level=2)
#> List of 12
#> $ coefficients : Named num [1:4] -6.79433 0.00131 4.18664 -0.00891
#> ..- attr(*, "names")= chr [1:4] "(Intercept)" "income" "education" "women"
#> $ residuals : Named num [1:102] 4.58 -9.39 4.69 4.22 8.15 ...
#> ..- attr(*, "names")= chr [1:102] "gov.administrators" "general.managers" "accountants" "purchasing.officers" ...
#> $ effects : Named num [1:102] -472.99 -123.61 -92.61 -2.3 6.83 ...
#> ..- attr(*, "names")= chr [1:102] "(Intercept)" "income" "education" "women" ...
#> $ rank : int 4
#> $ fitted.values: Named num [1:102] 64.2 78.5 58.7 52.6 65.3 ...
#> ..- attr(*, "names")= chr [1:102] "gov.administrators" "general.managers" "accountants" "purchasing.officers" ...
#> $ assign : int [1:4] 0 1 2 3
#> $ qr :List of 5
#> ..$ qr : num [1:102, 1:4] -10.1 0.099 0.099 0.099 0.099 ...
#> .. ..- attr(*, "dimnames")=List of 2
#> .. ..- attr(*, "assign")= int [1:4] 0 1 2 3
#> ..$ qraux: num [1:4] 1.1 1.44 1.06 1.06
#> ..$ pivot: int [1:4] 1 2 3 4
#> ..$ tol : num 1e-07
#> ..$ rank : int 4
#> ..- attr(*, "class")= chr "qr"
#> $ df.residual : int 98
...
I would like to get something like this:
This is similar to what I get from tree for file folders in my file system:
C:\Dropbox\Documents\images>tree
Folder PATH listing
Volume serial number is 2250-8E6F
C:.
+---cartoons
+---chevaliers
+---icons
+---milestones
+---minard
+---minard-besancon
The result could be either in graphic characters, as in tree or an actual graphic as shown above. Is anything like this available?
A simple approach to getting this from the str output would be something like...
a <- capture.output(str(out, max.level=2))
a <- trimws(gsub("\\:.*", "", a[grepl("\\$", a)]))
cat(a, sep="\n")
$ coefficients
$ residuals
$ effects
$ rank
$ fitted.values
$ assign
$ qr
..$ qr
..$ qraux
..$ pivot
..$ tol
..$ rank
$ df.residual
$ xlevels
$ call
$ terms
$ model
..$ prestige
..$ income
..$ education
..$ women
I would like to extract the p-values from the Anderson-Darling test (ad.test from package kSamples). The test result is a list of 12 containing a 2x3 matrix. The p value is part of the 2x3 matrix and is present in element 7.
When using the following code:
lapply(AD_result, "[[", 7)
I get the following subset of AD test results (first 2 of a total of 50 shown)
[[1]]
AD T.AD asympt. P-value
version 1: 1.72 0.94536 0.13169
version 2: 1.51 0.66740 0.17461
[[2]]
AD T.AD asympt. P-value
version 1: 12.299 14.624 6.9248e-07
version 2: 11.900 14.144 1.1146e-06
My question is how to extract only the p-value (e.g. from version 1) and put these 50 results into a vector
The output from str(AD_result) is:
List of 55
$ :List of 12
..$ test.name : chr "Anderson-Darling"
..$ k : int 2
..$ ns : int [1:2] 103 2905
..$ N : int 3008
..$ n.ties : int 2873
..$ sig : num 0.762
..$ ad : num [1:2, 1:3] 1.72 1.51 0.945 0.667 0.132 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:2] "version 1:" "version 2:"
.. .. ..$ : chr [1:3] "AD" "T.AD" " asympt. P-value"
..$ warning : logi FALSE
..$ null.dist1: NULL
..$ null.dist2: NULL
..$ method : chr "asymptotic"
..$ Nsim : num 1
..- attr(*, "class")= chr "kSamples"
You could try:
unlist(lapply(AD_result, function(x) x$ad[,3]))
I am trying to read in a file in "flexible data format" using R.
I got the number of bytes I should be reading in (counting from EOF, e.g., I should be reading EOF-32 to EOF bytes in as my data).
I am seeking the equivalences to the fseek and fread from MATLAB in R.
I think you would do better with a different approach (if I've got the right "flexible data format" file format here). You can deal with much of these (horrible) files with basic string functions in R:
library(stringr)
# read in fdf file
l <- readLines("http://rud.is/dl/Fe.fdf")
# some basic cleanup
l <- sub("#.*$", "", l) # remove comments
l <- sub("^=.*$", "", l) # remove comments
l <- gsub("\ +", " ", l) # compress spaces
l <- str_trim(l) # beg/end space trim
l <- grep("^$", l, value=TRUE, invert=TRUE) # ignore blank lines
# start of data blocks
blocks <- which(grepl("^%block", l))
# all "easy"/simple lines
simple <- str_split_fixed(grep("^[[:digit:]%]", l, value=TRUE, invert=TRUE),
"[[:space:]]+", 2)
# "simple" name/val [unit] conversions
convert_vals <- function(simple) {
vals <- simple[,2]
names(vals) <- simple[,1]
lapply(vals, function(v) {
# if logical
if (tolower(v) %in% c("t", "true", ".true.", "f", "false", ".false.")) {
return(as.logical(gsub("\\.", "", v)))
}
# if it's just a number
# i may be missing a numeric fmt char in this horrible format
if (grepl("^[[:digit:]\\.\\+\\-]+$", v)) {
return(as.numeric(v))
}
# if value and unit convert to an actual number with a unit attribute
# or convert it here from the table starting on line 927 of fdf.f
if (grepl("^[[:digit:]]", v) & (!any(is.na(str_locate(v, " "))))) {
vu <- str_split_fixed(v, " ", 2)
x <- as.numeric(vu[,1])
attr(x, "unit") <- vu[,2]
return(x)
}
# handle "1.d-3" and other vals with other if's
# anything not handled is returned
return(v)
})
}
# handle begin/end block "complex" data conversion
convert_blocks <- function(lines) {
block_names <- sub("^%block ", "", grep("^%block", lines, value=TRUE))
lapply(blocks, function(blk_start) {
blk <- lines[blk_start]
blk_info <- str_split_fixed(blk, " ", 2)
blk_end <- which(grepl(sprintf("^%%endblock %s", blk_info[,2]), lines))
# this is overly simplistic since you have to do some conversions, but you know the line
# range of the data values now so you can process them however you need to
read.table(text=lines[(blk_start+1):(blk_end-1)],
header=FALSE, stringsAsFactors=FALSE, fill=TRUE)
}) -> blks
names(blks) <- block_names
return(blks)
}
fdf <- c(convert_vals(simple),
convert_blocks(l))
str(fdf)
Output of the str:
List of 32
$ SystemName : chr "bcc Fe ferro GGA"
$ SystemLabel : chr "Fe"
$ WriteCoorStep : chr ""
$ WriteMullikenPop : num 1
$ NumberOfSpecies : num 1
$ NumberOfAtoms : num 1
$ PAO.EnergyShift : atomic [1:1] 50
..- attr(*, "unit")= chr "meV"
$ PAO.BasisSize : chr "DZP"
$ Fe : num 2
$ LatticeConstant : atomic [1:1] 2.87
..- attr(*, "unit")= chr "Ang"
$ KgridCutoff : atomic [1:1] 15
..- attr(*, "unit")= chr "Ang"
$ xc.functional : chr "GGA"
$ xc.authors : chr "PBE"
$ SpinPolarized : logi TRUE
$ MeshCutoff : atomic [1:1] 150
..- attr(*, "unit")= chr "Ry"
$ MaxSCFIterations : num 40
$ DM.MixingWeight : num 0.1
$ DM.Tolerance : chr "1.d-3"
$ DM.UseSaveDM : logi TRUE
$ DM.NumberPulay : num 3
$ SolutionMethod : chr "diagon"
$ ElectronicTemperature : atomic [1:1] 25
..- attr(*, "unit")= chr "meV"
$ MD.TypeOfRun : chr "cg"
$ MD.NumCGsteps : num 0
$ MD.MaxCGDispl : atomic [1:1] 0.1
..- attr(*, "unit")= chr "Ang"
$ MD.MaxForceTol : atomic [1:1] 0.04
..- attr(*, "unit")= chr "eV/Ang"
$ AtomicCoordinatesFormat : chr "Fractional"
$ ChemicalSpeciesLabel :'data.frame': 1 obs. of 3 variables:
..$ V1: int 1
..$ V2: int 26
..$ V3: chr "Fe"
$ PAO.Basis :'data.frame': 5 obs. of 3 variables:
..$ V1: chr [1:5] "Fe" "0" "6." "2" ...
..$ V2: num [1:5] 2 2 0 2 0
..$ V3: chr [1:5] "" "P" "" "" ...
$ LatticeVectors :'data.frame': 3 obs. of 3 variables:
..$ V1: num [1:3] 0.5 0.5 0.5
..$ V2: num [1:3] 0.5 -0.5 0.5
..$ V3: num [1:3] 0.5 0.5 -0.5
$ BandLines :'data.frame': 5 obs. of 5 variables:
..$ V1: int [1:5] 1 40 28 28 34
..$ V2: num [1:5] 0 2 1 0 1
..$ V3: num [1:5] 0 0 1 0 1
..$ V4: num [1:5] 0 0 0 0 1
..$ V5: chr [1:5] "\\Gamma" "H" "N" "\\Gamma" ...
$ AtomicCoordinatesAndAtomicSpecies:'data.frame': 1 obs. of 4 variables:
..$ V1: num 0
..$ V2: num 0
..$ V3: num 0
..$ V4: int 1
You can see the output (and the file and this code) in this gist since it's easier to copy/past/clone a gist.
You still need to:
deal with unit conversion (but with this grid::unit-like structure that shld be far more straightforward)
swap out the naive read.table with a better "block reader"
deal with file includes (pretty simple, tho, if you add a function or two)
With a bit of tweaking/polish this cld be a new R package, not that I'd ever want a data file in this format ever.