Perhaps it is just me, but I have always found str unsatisfactory. It is frequently too verbose, yet not very informative in many occasions.
I actually really like the description of the function (?str):
Compactly display the internal structure of an R object
and this bit in particular
Ideally, only one line for each ‘basic’ structure is displayed.
Only that, in many cases, the default str implementation simply does not do justice to such description.
Ok, let's say it works partially good for data.frames.
library(ggplot2)
str(mpg)
> str(mpg)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 234 obs. of 11 variables:
$ manufacturer: chr "audi" "audi" "audi" "audi" ...
$ model : chr "a4" "a4" "a4" "a4" ...
$ displ : num 1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
$ year : int 1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
$ cyl : int 4 4 4 4 6 6 6 4 4 4 ...
$ trans : chr "auto(l5)" "manual(m5)" "manual(m6)" "auto(av)" ...
$ drv : chr "f" "f" "f" "f" ...
$ cty : int 18 21 20 21 16 18 18 18 16 20 ...
$ hwy : int 29 29 31 30 26 26 27 26 25 28 ...
$ fl : chr "p" "p" "p" "p" ...
$ class : chr "compact" "compact" "compact" "compact" ...
Yet, for a data.frame it's not as informative as I would like. In addition to class, it would be very useful that it shows number of NA values, and number of unique values, for example.
But for other objects, it quickly becomes unmanageable. For example:
gp <- ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point()
str(gp)
> str(gp)
List of 9
$ data :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 234 obs. of 11 variables:
..$ manufacturer: chr [1:234] "audi" "audi" "audi" "audi" ...
..$ model : chr [1:234] "a4" "a4" "a4" "a4" ...
..$ displ : num [1:234] 1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
..$ year : int [1:234] 1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
..$ cyl : int [1:234] 4 4 4 4 6 6 6 4 4 4 ...
..$ trans : chr [1:234] "auto(l5)" "manual(m5)" "manual(m6)" "auto(av)" ...
..$ drv : chr [1:234] "f" "f" "f" "f" ...
..$ cty : int [1:234] 18 21 20 21 16 18 18 18 16 20 ...
..$ hwy : int [1:234] 29 29 31 30 26 26 27 26 25 28 ...
..$ fl : chr [1:234] "p" "p" "p" "p" ...
..$ class : chr [1:234] "compact" "compact" "compact" "compact" ...
$ layers :List of 1
..$ :Classes 'LayerInstance', 'Layer', 'ggproto' <ggproto object: Class LayerInstance, Layer>
aes_params: list
compute_aesthetics: function
compute_geom_1: function
compute_geom_2: function
compute_position: function
compute_statistic: function
data: waiver
draw_geom: function
geom: <ggproto object: Class GeomPoint, Geom>
aesthetics: function
default_aes: uneval
draw_group: function
draw_key: function
draw_layer: function
draw_panel: function
extra_params: na.rm
handle_na: function
non_missing_aes: size shape
parameters: function
required_aes: x y
setup_data: function
use_defaults: function
super: <ggproto object: Class Geom>
geom_params: list
inherit.aes: TRUE
layer_data: function
map_statistic: function
mapping: NULL
position: <ggproto object: Class PositionIdentity, Position>
compute_layer: function
compute_panel: function
required_aes:
setup_data: function
setup_params: function
super: <ggproto object: Class Position>
print: function
show.legend: NA
stat: <ggproto object: Class StatIdentity, Stat>
compute_group: function
compute_layer: function
compute_panel: function
default_aes: uneval
extra_params: na.rm
non_missing_aes:
parameters: function
required_aes:
retransform: TRUE
setup_data: function
setup_params: function
super: <ggproto object: Class Stat>
stat_params: list
subset: NULL
super: <ggproto object: Class Layer>
$ scales :Classes 'ScalesList', 'ggproto' <ggproto object: Class ScalesList>
add: function
clone: function
find: function
get_scales: function
has_scale: function
input: function
n: function
non_position_scales: function
scales: list
super: <ggproto object: Class ScalesList>
$ mapping :List of 2
..$ x: symbol displ
..$ y: symbol hwy
$ theme : list()
$ coordinates:Classes 'CoordCartesian', 'Coord', 'ggproto' <ggproto object: Class CoordCartesian, Coord>
aspect: function
distance: function
expand: TRUE
is_linear: function
labels: function
limits: list
range: function
render_axis_h: function
render_axis_v: function
render_bg: function
render_fg: function
train: function
transform: function
super: <ggproto object: Class CoordCartesian, Coord>
$ facet :List of 1
..$ shrink: logi TRUE
..- attr(*, "class")= chr [1:2] "null" "facet"
$ plot_env :<environment: R_GlobalEnv>
$ labels :List of 2
..$ x: chr "displ"
..$ y: chr "hwy"
- attr(*, "class")= chr [1:2] "gg" "ggplot"
Whaaattttt???, what happened to "Compactly display". That's not compact!
And it can be worse, crazy scary, for example, for S4 objects. If you want try this:
library(rworldmap)
newmap <- getMap(resolution = "coarse")
str(newmap)
I do not post the output here because it is too much. It does not even fit in the console buffer!
How can you possibly understand the internal structure of the object with such a NON-compact display? It's just too many details and you easily get lost. Or at least I do.
Well, all right. Before someone tells me, hey checkout ?str and tweak the arguments, that's what I did. Of course it can get better, but I am still kind of disappointed with str.
The best solution I've got is to create a function that do this
if(isS4(obj)){
str(obj, max.level = 2, give.attr = FALSE, give.head = FALSE)
} else {
str(obj, max.level = 1, give.attr = FALSE, give.head = FALSE)
}
This displays compactly the top level structures of the object. The output for the sp object above (S4 object) becomes much more insightful
Formal class 'SpatialPolygonsDataFrame' [package "sp"] with 5 slots
..# data :'data.frame': 243 obs. of 49 variables:
..# polygons :List of 243
.. .. [list output truncated]
..# plotOrder :7 135 28 167 31 23 9 66 84 5 ...
..# bbox :-180 -90 180 83.6
..# proj4string:Formal class 'CRS' [package "sp"] with 1 slot
So now you can see there are 5 top level structures, and you can investigate them further individually.
Similar for the ggplot object above, now you can see
List of 9
$ data :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 234 obs. of 11 variables:
$ layers :List of 1
$ scales :Classes 'ScalesList', 'ggproto'
$ mapping :List of 2
$ theme : list()
$ coordinates:Classes 'CoordCartesian', 'Coord', 'ggproto'
$ facet :List of 1
$ plot_env :
$ labels :List of 2
Although this is much better, I still feel it could be much more insightful. So, perhaps someone has felt the same way and created a nice function that is more informative and still compactly displays the information. Anyone?
In such situation I use glimpse from the tibble package which is less verbose and briefly descriptive of the data structure.
library(tibble)
glimpse(gp)
There is the lobstr package by Hadley. Besides several other more or less helpful functions it includes lobstr::tree() which tries to be more predictable, compact and overall more helpful than str().
An important difference between the two is that str() is an S3 generic whereas lobstr::tree() is not. That means package developers can and will include their own methods for str() which can substantially improve the usefulness of str(). But it also means that str() output can be very inconsistent.
For comparison, here is a display of the structure of a simple lm() with both functions. lobstr::tree() also prints a colorized output, which improves legibility further, but you obviously can't see the colors here on SO. Note in particular the much more concise and useful parts of the formula and the data frame items:
m <- lm(mpg~cyl, mtcars)
lobstr::tree(m)
#> S3<lm>
#> ├─coefficients<dbl [2]>: 37.8845764854614, -2.87579013906448
#> ├─residuals<dbl [32]>: 0.370164348925359, 0.370164348925418, -3.58141592920354, 0.770164348925411, 3.82174462705436, -2.52983565107459, -0.578255372945636, -1.98141592920354, -3.58141592920354, -1.42983565107459, ...
#> ├─effects<dbl [32]>: -113.649737406208, -28.5956806590543, -3.70425398161014, 0.709596949580206, 3.82344788077055, -2.59040305041979, -0.576552119229446, -2.10425398161014, -3.70425398161014, -1.49040305041979, ...
#> ├─rank: 2
#> ├─fitted.values<dbl [32]>: 20.6298356510746, 20.6298356510746, 26.3814159292035, 20.6298356510746, 14.8782553729456, 20.6298356510746, 14.8782553729456, 26.3814159292035, 26.3814159292035, 20.6298356510746, ...
#> ├─assign<int [2]>: 0, 1
#> ├─qr: S3<qr>
#> │ ├─qr<dbl [64]>: -5.65685424949238, 0.176776695296637, 0.176776695296637, 0.176776695296637, 0.176776695296637, 0.176776695296637, 0.176776695296637, 0.176776695296637, 0.176776695296637, 0.176776695296637, ...
#> │ ├─qraux<dbl [2]>: 1.17677669529664, 1.01602374277435
#> │ ├─pivot<int [2]>: 1, 2
#> │ ├─tol: 1e-07
#> │ └─rank: 2
#> ├─df.residual: 30
#> ├─xlevels: <list>
#> ├─call: <language> lm(formula = mpg ~ cyl, data = mtcars)
#> ├─terms: S3<terms/formula> mpg ~ cyl
#> └─model: S3<data.frame>
#> ├─mpg<dbl [32]>: 21, 21, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2, ...
#> └─cyl<dbl [32]>: 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, ...
str(m)
#> List of 12
#> $ coefficients : Named num [1:2] 37.88 -2.88
#> ..- attr(*, "names")= chr [1:2] "(Intercept)" "cyl"
#> $ residuals : Named num [1:32] 0.37 0.37 -3.58 0.77 3.82 ...
#> ..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
#> $ effects : Named num [1:32] -113.65 -28.6 -3.7 0.71 3.82 ...
#> ..- attr(*, "names")= chr [1:32] "(Intercept)" "cyl" "" "" ...
#> $ rank : int 2
#> $ fitted.values: Named num [1:32] 20.6 20.6 26.4 20.6 14.9 ...
#> ..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
#> $ assign : int [1:2] 0 1
#> $ qr :List of 5
#> ..$ qr : num [1:32, 1:2] -5.657 0.177 0.177 0.177 0.177 ...
#> .. ..- attr(*, "dimnames")=List of 2
#> .. .. ..$ : chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
#> .. .. ..$ : chr [1:2] "(Intercept)" "cyl"
#> .. ..- attr(*, "assign")= int [1:2] 0 1
#> ..$ qraux: num [1:2] 1.18 1.02
#> ..$ pivot: int [1:2] 1 2
#> ..$ tol : num 1e-07
#> ..$ rank : int 2
#> ..- attr(*, "class")= chr "qr"
#> $ df.residual : int 30
#> $ xlevels : Named list()
#> $ call : language lm(formula = mpg ~ cyl, data = mtcars)
#> $ terms :Classes 'terms', 'formula' language mpg ~ cyl
#> .. ..- attr(*, "variables")= language list(mpg, cyl)
#> .. ..- attr(*, "factors")= int [1:2, 1] 0 1
#> .. .. ..- attr(*, "dimnames")=List of 2
#> .. .. .. ..$ : chr [1:2] "mpg" "cyl"
#> .. .. .. ..$ : chr "cyl"
#> .. ..- attr(*, "term.labels")= chr "cyl"
#> .. ..- attr(*, "order")= int 1
#> .. ..- attr(*, "intercept")= int 1
#> .. ..- attr(*, "response")= int 1
#> .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
#> .. ..- attr(*, "predvars")= language list(mpg, cyl)
#> .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
#> .. .. ..- attr(*, "names")= chr [1:2] "mpg" "cyl"
#> $ model :'data.frame': 32 obs. of 2 variables:
#> ..$ mpg: num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
#> ..$ cyl: num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
#> ..- attr(*, "terms")=Classes 'terms', 'formula' language mpg ~ cyl
#> .. .. ..- attr(*, "variables")= language list(mpg, cyl)
#> .. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
#> .. .. .. ..- attr(*, "dimnames")=List of 2
#> .. .. .. .. ..$ : chr [1:2] "mpg" "cyl"
#> .. .. .. .. ..$ : chr "cyl"
#> .. .. ..- attr(*, "term.labels")= chr "cyl"
#> .. .. ..- attr(*, "order")= int 1
#> .. .. ..- attr(*, "intercept")= int 1
#> .. .. ..- attr(*, "response")= int 1
#> .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
#> .. .. ..- attr(*, "predvars")= language list(mpg, cyl)
#> .. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
#> .. .. .. ..- attr(*, "names")= chr [1:2] "mpg" "cyl"
#> - attr(*, "class")= chr "lm"
Created on 2022-11-23 with reprex v2.0.2
Related
I had a large dataset that contains more than 300,000 rows/observations and 22 variables. I used the CLARA method for the clustering and plotted the results using fviz_cluster. Using the silhouette method, I got 10 as my number of clusters and from there I applied it to my CLARA algorithm.
clara.res <- clara(df, 10, samples = 50,trace = 1,sampsize = 1000, pamLike = TRUE)
str(clara.res)
List of 10
$ sample : chr [1:1000] "100046" "100303" "10052" "100727" ...
$ medoids : num [1:10, 1:22] 0.925 0.125 0.701 0 0 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:10] "193751" "137853" "229261" "257462" ...
.. ..$ : chr [1:22] "COD" "DMW" "HER" "SPR" ...
$ i.med : int [1:10] 104171 42062 143627 174961 300065 13836 192832 207079 185241 228575
$ clustering: Named int [1:302251] 1 1 1 2 3 4 5 3 3 3 ...
..- attr(*, "names")= chr [1:302251] "1" "10" "100" "1000" ...
$ objective : num 0.37
$ clusinfo : num [1:10, 1:4] 71811 40181 46271 10155 31309 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:4] "size" "max_diss" "av_diss" "isolation"
$ diss : 'dissimilarity' num [1:499500] 1.392 2.192 0.937 2.157 1.643 ...
..- attr(*, "Size")= int 1000
..- attr(*, "Metric")= chr "euclidean"
..- attr(*, "Labels")= chr [1:1000] "100046" "100303" "10052" "100727" ...
$ call : language clara(x = df, k = 10, samples = 50, sampsize = 1000, trace = 1, pamLike = TRUE)
$ silinfo :List of 3
..$ widths : num [1:1000, 1:3] 1 1 1 1 1 1 1 1 1 1 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:1000] "83395" "181310" "34452" "42991" ...
.. .. ..$ : chr [1:3] "cluster" "neighbor" "sil_width"
..$ clus.avg.widths: num [1:10] 0.645 0.408 0.487 0.513 0.839 ...
..$ avg.width : num 0.612
$ data : num [1:302251, 1:22] 1 1 1 0.366 0.35 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:302251] "1" "10" "100" "1000" ...
.. ..$ : chr [1:22] "COD" "DMW" "HER" "SPR" ...
- attr(*, "class")= chr [1:2] "clara" "partition"
For the plot:
fviz_cluster(clara.res,
palette = c(
"#004c6d",
"#00a1c1",
"#ffc334",
"#78ab63",
"#00ffff",
"#00cfe3",
"#6efa75",
"#cc0089",
"#ff9509",
"#ffb6de"
), # color palette
ellipse.type = "t",geom = "point",show.clust.cent = TRUE,repel = TRUE,pointsize = 0.5,
ggtheme = theme_classic()
)+ xlim(-7, 3) + ylim (-5, 4) + labs(title = "Plot of clusters")
The result:
I reckoned that this cluster plot is based on PCA and have been trying to figure out which variables in my original data were chosen as Dim1 and Dim2 or what these x and y-axis represent. Can somebody help me how to find out these Dim1 and Dim2 and eigenvalues/variance of the whole Dim that exist without running PCA separately?
I saw there are some other functions/packages for PCA such as get_eigenvalue in factoextra and FactomineR, but it seemed that will require me to use the PCA algorithm from the beginning? How can I integrate it directly with my CLARA results?
Also, my Dim1 only consists of 12.3% and Dim2 8.8%, does it mean that these variables are not representative enough or? considering that I would have 22 dimensions in total (from my 22 variables), I think it's alright, no? I am not sure how these percentages of Dim1 and Dim2 affect my cluster results. I was thinking to do the screeplot from my CLARA results but I also can't figure it out.
I'd appreciate any insights.
I really like the labelled package.
I have an analysis that has tons of labels that I need to make. Instead of adding them one by one, is there a way to have it loop through all columns and to modify them in the same way. For example, if I wanted to make them all Title Case. Please note, I'm hoping to change the label, not the actual column name.
library(labelled)
library(ggplot2)
mpg_new <- ggplot2::mpg %>%
set_variable_labels(manufacturer = "Manufacturer")
labelled::var_label(mpg_new$manufacturer)
If we need to convert to title case on all of them, can pass a named vector as well in set_variable_labels
library(labelled)
library(ggplot2)
data(mpg)
var_labels <- setNames(tools::toTitleCase(names(mpg)), names(mpg))
mpg_new <- mpg %>%
set_variable_labels(.labels = var_labels, .strict = FALSE)
-checking
> str(mpg_new)
tibble [234 × 11] (S3: tbl_df/tbl/data.frame)
$ manufacturer: chr [1:234] "audi" "audi" "audi" "audi" ...
..- attr(*, "label")= chr "Manufacturer"
$ model : chr [1:234] "a4" "a4" "a4" "a4" ...
..- attr(*, "label")= chr "Model"
$ displ : num [1:234] 1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
..- attr(*, "label")= chr "Displ"
$ year : int [1:234] 1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
..- attr(*, "label")= chr "Year"
$ cyl : int [1:234] 4 4 4 4 6 6 6 4 4 4 ...
..- attr(*, "label")= chr "Cyl"
$ trans : chr [1:234] "auto(l5)" "manual(m5)" "manual(m6)" "auto(av)" ...
..- attr(*, "label")= chr "Trans"
$ drv : chr [1:234] "f" "f" "f" "f" ...
..- attr(*, "label")= chr "Drv"
$ cty : int [1:234] 18 21 20 21 16 18 18 18 16 20 ...
..- attr(*, "label")= chr "Cty"
$ hwy : int [1:234] 29 29 31 30 26 26 27 26 25 28 ...
..- attr(*, "label")= chr "Hwy"
$ fl : chr [1:234] "p" "p" "p" "p" ...
..- attr(*, "label")= chr "Fl"
$ class : chr [1:234] "compact" "compact" "compact" "compact" ...
..- attr(*, "label")= chr "Class"
Another option to achieve your desired result would be via labelled::var_label like so:
library(labelled)
library(ggplot2)
mpg_new <- ggplot2::mpg
var_label(mpg_new) <- stringr::str_to_title(names(mpg_new))
var_label(mpg_new, unlist = TRUE)
#> manufacturer model displ year cyl
#> "Manufacturer" "Model" "Displ" "Year" "Cyl"
#> trans drv cty hwy fl
#> "Trans" "Drv" "Cty" "Hwy" "Fl"
#> class
#> "Class"
I have been following an online example for R Kohonen self-organising maps (SOM) which suggested that the data should be centred and scaled before computing the SOM.
However, I've noticed the object created seems to have attributes for centre and scale, in which case am I really applying a redundant step by centring and scaling first? Example script below
# Load package
require(kohonen)
# Set data
data(iris)
# Scale and centre
dt <- scale(iris[, 1:4],center=TRUE)
# Prepare SOM
set.seed(590507)
som1 <- som(dt,
somgrid(6,6, "hexagonal"),
rlen=500,
keep.data=TRUE)
str(som1)
The output from the last line of the script is:
List of 13
$ data :List of 1
..$ : num [1:150, 1:4] -0.898 -1.139 -1.381 -1.501 -1.018 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : NULL
.. .. ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length"
"Petal.Width"
.. ..- attr(*, "scaled:center")= Named num [1:4] 5.84 3.06 3.76 1.2
.. .. ..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width"
"Petal.Length" "Petal.Width"
.. ..- attr(*, "scaled:scale")= Named num [1:4] 0.828 0.436 1.765 0.762
.. .. ..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width"
"Petal.Length" "Petal.Width"
$ unit.classif : num [1:150] 3 5 5 5 4 2 4 4 6 5 ...
$ distances : num [1:150] 0.0426 0.0663 0.0768 0.0744 0.1346 ...
$ grid :List of 6
..$ pts : num [1:36, 1:2] 1.5 2.5 3.5 4.5 5.5 6.5 1 2 3 4 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : NULL
.. .. ..$ : chr [1:2] "x" "y"
..$ xdim : num 6
..$ ydim : num 6
..$ topo : chr "hexagonal"
..$ neighbourhood.fct: Factor w/ 2 levels "bubble","gaussian": 1
..$ toroidal : logi FALSE
..- attr(*, "class")= chr "somgrid"
$ codes :List of 1
..$ : num [1:36, 1:4] -0.376 -0.683 -0.734 -1.158 -1.231 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:36] "V1" "V2" "V3" "V4" ...
.. .. ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length"
"Petal.Width"
$ changes : num [1:500, 1] 0.0445 0.0413 0.0347 0.0373 0.0337 ...
$ alpha : num [1:2] 0.05 0.01
$ radius : Named num [1:2] 3.61 0
..- attr(*, "names")= chr [1:2] "66.66667%" ""
$ user.weights : num 1
$ distance.weights: num 1
$ whatmap : int 1
$ maxNA.fraction : int 0
$ dist.fcts : chr "sumofsquares"
- attr(*, "class")= chr "kohonen"
Note notice that in lines 7 and 10 of the output there are references to centre and scale. I would appreciate an explanation as to the process here.
Your step with scaling is not redundant because in source code there are no scaling, and attributes, that you see in 7 and 10 are attributes from train dataset.
To check this, just run and compare results of this chunk of code:
# Load package
require(kohonen)
# Set data
data(iris)
# Scale and centre
dt <- scale(iris[, 1:4],center=TRUE)
#compare train datasets
str(dt)
str(as.matrix(iris[, 1:4]))
# Prepare SOM
set.seed(590507)
som1 <- kohonen::som(dt,
kohonen::somgrid(6,6, "hexagonal"),
rlen=500,
keep.data=TRUE)
#without scaling
som2 <- kohonen::som(as.matrix(iris[, 1:4]),
kohonen::somgrid(6,6, "hexagonal"),
rlen=500,
keep.data=TRUE)
#compare results of som function
str(som1)
str(som2)
In R, str() is handy for showing the structure of an object, such as the list of lists returned by lm() and other modelling functions, but it gives way too much output. I'm looking for some tool to create a simple tree diagram showing only the names of the list elements and their structure.
e.g., for this example,
data(Prestige, package="car")
out <- lm(prestige ~ income+education+women, data=Prestige)
str(out, max.level=2)
#> List of 12
#> $ coefficients : Named num [1:4] -6.79433 0.00131 4.18664 -0.00891
#> ..- attr(*, "names")= chr [1:4] "(Intercept)" "income" "education" "women"
#> $ residuals : Named num [1:102] 4.58 -9.39 4.69 4.22 8.15 ...
#> ..- attr(*, "names")= chr [1:102] "gov.administrators" "general.managers" "accountants" "purchasing.officers" ...
#> $ effects : Named num [1:102] -472.99 -123.61 -92.61 -2.3 6.83 ...
#> ..- attr(*, "names")= chr [1:102] "(Intercept)" "income" "education" "women" ...
#> $ rank : int 4
#> $ fitted.values: Named num [1:102] 64.2 78.5 58.7 52.6 65.3 ...
#> ..- attr(*, "names")= chr [1:102] "gov.administrators" "general.managers" "accountants" "purchasing.officers" ...
#> $ assign : int [1:4] 0 1 2 3
#> $ qr :List of 5
#> ..$ qr : num [1:102, 1:4] -10.1 0.099 0.099 0.099 0.099 ...
#> .. ..- attr(*, "dimnames")=List of 2
#> .. ..- attr(*, "assign")= int [1:4] 0 1 2 3
#> ..$ qraux: num [1:4] 1.1 1.44 1.06 1.06
#> ..$ pivot: int [1:4] 1 2 3 4
#> ..$ tol : num 1e-07
#> ..$ rank : int 4
#> ..- attr(*, "class")= chr "qr"
#> $ df.residual : int 98
...
I would like to get something like this:
This is similar to what I get from tree for file folders in my file system:
C:\Dropbox\Documents\images>tree
Folder PATH listing
Volume serial number is 2250-8E6F
C:.
+---cartoons
+---chevaliers
+---icons
+---milestones
+---minard
+---minard-besancon
The result could be either in graphic characters, as in tree or an actual graphic as shown above. Is anything like this available?
A simple approach to getting this from the str output would be something like...
a <- capture.output(str(out, max.level=2))
a <- trimws(gsub("\\:.*", "", a[grepl("\\$", a)]))
cat(a, sep="\n")
$ coefficients
$ residuals
$ effects
$ rank
$ fitted.values
$ assign
$ qr
..$ qr
..$ qraux
..$ pivot
..$ tol
..$ rank
$ df.residual
$ xlevels
$ call
$ terms
$ model
..$ prestige
..$ income
..$ education
..$ women
In mice package for extract complete dataset you can use complete() command as follow :
install.packages("mice")
library ("mice")
imp1=mice(nhanes,10)
fill1=complete(imp,1)
fill2=complete(imp,2)
fillall=complete(imp,"long")
But can some one tell me how to extract complete dataset in Amelia package??
install.packages("Amelia")
library ("Amelia")
imp2= amelia(freetrade, m = 5, ts = "year", cs = "country")
The str() function is always helpful here. You'll see that the complete datasets are stored in the imputations element of the object returned by amelia():
> str(imp2, 1)
List of 12
$ imputations:List of 5
..- attr(*, "class")= chr [1:2] "mi" "list"
$ m : num 5
$ missMatrix : logi [1:171, 1:10] FALSE FALSE FALSE FALSE FALSE FALSE ...
..- attr(*, "dimnames")=List of 2
$ overvalues : NULL
$ theta : num [1:9, 1:9, 1:5] -1 -0.0161 0.199 -0.0368 -0.0868 ...
$ mu : num [1:8, 1:5] -0.0161 0.199 -0.0368 -0.0868 -0.0658 ...
$ covMatrices: num [1:8, 1:8, 1:5] 0.8997 -0.3077 0.0926 0.2206 -0.1115 ...
$ code : num 1
$ message : chr "Normal EM convergence."
$ iterHist :List of 5
$ arguments :List of 23
..- attr(*, "class")= chr [1:2] "ameliaArgs" "list"
$ orig.vars : chr [1:10] "year" "country" "tariff" "polity" ...
- attr(*, "class")= chr "amelia"
To get each imputation alone, just do imp2$imputations[[1]], etc. up through all imputations that you requested. In your example, there are five:
> str(imp2$imputations, 1)
List of 5
$ imp1:'data.frame': 171 obs. of 10 variables:
$ imp2:'data.frame': 171 obs. of 10 variables:
$ imp3:'data.frame': 171 obs. of 10 variables:
$ imp4:'data.frame': 171 obs. of 10 variables:
$ imp5:'data.frame': 171 obs. of 10 variables:
- attr(*, "class")= chr [1:2] "mi" "list"