export Hmisc::describe output to excel/csv - r

Is there any way I can export this data to a csv file, instead of typing things in manually.
Below is the output from Hmisc describe function:
library(Hmisc) # Hmisc describe
> Hmisc::describe(data)
data
3 Variables 6 Observations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
ID
n missing distinct Info Mean Gmd
6 0 3 0.857 112.2 1.267
Value 110 112 113
Frequency 1 2 3
Proportion 0.167 0.333 0.500
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Date
n missing distinct
6 0 3
Value 23/04/2018 24/04/2018 25/04/2018
Frequency 3 2 1
Proportion 0.500 0.333 0.167
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Revenue
n missing distinct Info Mean Gmd
6 0 6 1 74 17.2
lowest : 51 65 70 85 86, highest: 65 70 85 86 87
Value 51 65 70 85 86 87
Frequency 1 1 1 1 1 1
Proportion 0.167 0.167 0.167 0.167 0.167 0.167
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Dataset:
> data
ID Date Revenue
1 113 23/04/2018 51
2 113 23/04/2018 87
3 113 23/04/2018 70
4 112 24/04/2018 85
5 112 24/04/2018 65
6 110 25/04/2018 86

I doubt writing it to csv would be helpful. Try writing it to text file instead.
cat(capture.output(Hmisc::describe(data)), file = 'result.txt', sep = '\n')

Probably not going to be easy. You could use capture.output but then you would need to parse the sections differently depending on their class and counts. You could also assign the results to a data object and try to work with that, but again, there will be a diversity of formats:
obj <- describe(iris)
str(obj)
# this is the canonical example of a dataframe but it doesn't even capture all the cases.
List of 5
$ Sepal.Length:List of 6
..$ descript: chr "Sepal.Length"
..$ units : NULL
..$ format : NULL
..$ counts : Named chr [1:13] "150" "0" "35" "0.998" ...
.. ..- attr(*, "names")= chr [1:13] "n" "missing" "distinct" "Info" ...
..$ values :List of 2
.. ..$ value : num [1:35] 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5 5.1 5.2 ...
.. ..$ frequency: num [1:35(1d)] 1 3 1 4 2 5 6 10 9 4 ...
..$ extremes: Named num [1:10] 4.3 4.4 4.5 4.6 4.7 7.3 7.4 7.6 7.7 7.9
.. ..- attr(*, "names")= chr [1:10] "L1" "L2" "L3" "L4" ...
..- attr(*, "class")= chr "describe"
$ Sepal.Width :List of 6
..$ descript: chr "Sepal.Width"
..$ units : NULL
..$ format : NULL
..$ counts : Named chr [1:13] "150" "0" "23" "0.992" ...
.. ..- attr(*, "names")= chr [1:13] "n" "missing" "distinct" "Info" ...
..$ values :List of 2
.. ..$ value : num [1:23] 2 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3 ...
.. ..$ frequency: num [1:23(1d)] 1 3 4 3 8 5 9 14 10 26 ...
..$ extremes: Named num [1:10] 2 2.2 2.3 2.4 2.5 3.9 4 4.1 4.2 4.4
.. ..- attr(*, "names")= chr [1:10] "L1" "L2" "L3" "L4" ...
..- attr(*, "class")= chr "describe"
$ Petal.Length:List of 6
..$ descript: chr "Petal.Length"
..$ units : NULL
..$ format : NULL
..$ counts : Named chr [1:13] "150" "0" "43" "0.998" ...
.. ..- attr(*, "names")= chr [1:13] "n" "missing" "distinct" "Info" ...
..$ values :List of 2
.. ..$ value : num [1:43] 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.9 3 ...
.. ..$ frequency: num [1:43(1d)] 1 1 2 7 13 13 7 4 2 1 ...
..$ extremes: Named num [1:10] 1 1.1 1.2 1.3 1.4 6.3 6.4 6.6 6.7 6.9
.. ..- attr(*, "names")= chr [1:10] "L1" "L2" "L3" "L4" ...
..- attr(*, "class")= chr "describe"
$ Petal.Width :List of 6
..$ descript: chr "Petal.Width"
..$ units : NULL
..$ format : NULL
..$ counts : Named chr [1:13] "150" "0" "22" "0.99" ...
.. ..- attr(*, "names")= chr [1:13] "n" "missing" "distinct" "Info" ...
..$ values :List of 2
.. ..$ value : num [1:22] 0.1 0.2 0.3 0.4 0.5 0.6 1 1.1 1.2 1.3 ...
.. ..$ frequency: num [1:22(1d)] 5 29 7 7 1 1 7 3 5 13 ...
..$ extremes: Named num [1:10] 0.1 0.2 0.3 0.4 0.5 2.1 2.2 2.3 2.4 2.5
.. ..- attr(*, "names")= chr [1:10] "L1" "L2" "L3" "L4" ...
..- attr(*, "class")= chr "describe"
$ Species :List of 5
..$ descript: chr "Species"
..$ units : NULL
..$ format : NULL
..$ counts : Named num [1:3] 150 0 3
.. ..- attr(*, "names")= chr [1:3] "n" "missing" "distinct"
..$ values :List of 2
.. ..$ value : chr [1:3] "setosa" "versicolor" "virginica"
.. ..$ frequency: num [1:3(1d)] 50 50 50
..- attr(*, "class")= chr "describe"
- attr(*, "descript")= chr "iris"
- attr(*, "dimensions")= int [1:2] 150 5
- attr(*, "class")= chr "describe"

Related

Kernel density estimation in R for 6 Dimensional data

This is the frist time to perform KDE in R with data which has more than 5 variables for me for anomaly detection.
As far as I know that KDE is performable for multidimensional data but I couldn't find examples which using more than 5 dimensional data.
I'm using data which have 'age', 'trestbps', 'chol', 'thalach', and 'oldpeak' 5 variables as like below.
'data.frame': 176 obs. of 5 variables:
$ age : int 30 50 50 50 50 60 50 40 50 40 ...
$ trestbps: int 130 130 130 130 130 130 130 130 130 130 ...
$ chol : int 198 245 221 288 205 309 240 243 289 250 ...
$ thalach : int 130 166 164 159 184 131 154 152 124 179 ...
$ oldpeak : num 1.6 2.4 0 0.2 0 1.8 0.6 0 1 0 ...
I performed KDE for those data, with the approach as like below, but I'm not sure it is correct approach, and proper result.
evpts <- do.call(expand.grid, lapply(df3, quantile, prob = c(0.1,.25,.5,.75,.9)))
hat2 <- kde(df3, eval.points = evpts)
> str(hat2)
List of 9
$ x : num [1:176, 1:5] 30 50 50 50 50 60 50 40 50 40 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:5] "age" "trestbps" "chol" "thalach" ...
$ eval.points:'data.frame': 3125 obs. of 5 variables:
..$ age : Named num [1:3125] 40 40 50 60 60 40 40 50 60 60 ...
.. ..- attr(*, "names")= chr [1:3125] "10%" "25%" "50%" "75%" ...
..$ trestbps: Named num [1:3125] 108 108 108 108 108 112 112 112 112 112 ...
.. ..- attr(*, "names")= chr [1:3125] "10%" "10%" "10%" "10%" ...
..$ chol : Named num [1:3125] 194 194 194 194 194 194 194 194 194 194 ...
.. ..- attr(*, "names")= chr [1:3125] "10%" "10%" "10%" "10%" ...
..$ thalach : Named num [1:3125] 114 114 114 114 114 ...
.. ..- attr(*, "names")= chr [1:3125] "10%" "10%" "10%" "10%" ...
..$ oldpeak : Named num [1:3125] 0 0 0 0 0 0 0 0 0 0 ...
.. ..- attr(*, "names")= chr [1:3125] "10%" "10%" "10%" "10%" ...
..- attr(*, "out.attrs")=List of 2
.. ..$ dim : Named int [1:5] 5 5 5 5 5
.. .. ..- attr(*, "names")= chr [1:5] "age" "trestbps" "chol" "thalach" ...
.. ..$ dimnames:List of 5
.. .. ..$ age : chr [1:5] "age=40" "age=40" "age=50" "age=60" ...
.. .. ..$ trestbps: chr [1:5] "trestbps=108" "trestbps=112" "trestbps=120" "trestbps=128" ...
.. .. ..$ chol : chr [1:5] "chol=194.00" "chol=211.00" "chol=244.00" "chol=283.75" ...
.. .. ..$ thalach : chr [1:5] "thalach=113.50" "thalach=128.25" "thalach=150.00" "thalach=164.00" ...
.. .. ..$ oldpeak : chr [1:5] "oldpeak=0.0" "oldpeak=0.0" "oldpeak=0.8" "oldpeak=1.8" ...
$ estimate : Named num [1:3125] 5.64e-12 5.64e-12 2.85e-09 7.76e-10 7.76e-10 ...
..- attr(*, "names")= chr [1:3125] "1" "2" "3" "4" ...
$ H : num [1:5, 1:5] 6.972 0.866 5.065 -6.541 0.189 ...
$ gridded : logi FALSE
$ binned : logi FALSE
$ names : chr [1:5] "age" "trestbps" "chol" "thalach" ...
$ w : num [1:176] 1 1 1 1 1 1 1 1 1 1 ...
$ type : chr "kde"
- attr(*, "class")= chr "kde"
If it is not proper approach, could you please help me to get correct approach?
Thank you for your support.

R Unable to plot loaded randomForest object

I'm unable to call the function randomForest.plot() when loading a randomForest object through an RData file.
library("randomForest")
load("rf.RData")
plot(rf)
I get the error:
Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x), :
'data' must be of a vector type, was 'NULL'
Get the same error when I call randomForest:::plot.randomForest(rf)
Other function calls on rf work just fine.
EDIT:
See output of str(rf)
str(rf)
List of 15
$ call : language randomForest(x = data[, match("feat1", names(data)):match("feat_n", names(data))], y = data[, match("my_y", n| __truncated__ ...
$ type : chr "regression"
$ predicted : Named num [1:723012] -1141 -1767 -1577 NA -1399 ...
..- attr(*, "names")= chr [1:723012] "1" "2" "3" "4" ...
$ oob.times : int [1:723012] 3 4 6 3 2 3 2 6 7 5 ...
$ importance : num [1:150, 1:2] 6172 928 6367 5754 1013 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:150] "feat1" "feat2" "feat3" "feat4" ...
.. ..$ : chr [1:2] "%IncMSE" "IncNodePurity"
$ importanceSD : Named num [1:150] 400.9 96.7 500.1 428.9 194.8 ...
..- attr(*, "names")= chr [1:150] "feat1" "feat2" "feat3" "feat4" ...
$ localImportance: NULL
$ proximity : NULL
$ ntree : num 60
$ mtry : num 10
$ forest :List of 11
..$ ndbigtree : int [1:60] 392021 392219 392563 392845 393321 392853 392157 392709 393223 392679 ...
..$ nodestatus : num [1:393623, 1:60] -3 -3 -3 -3 -3 -3 -3 -3 -3 -3 ...
..$ leftDaughter : num [1:393623, 1:60] 2 4 6 8 10 12 14 16 18 20 ...
..$ rightDaughter: num [1:393623, 1:60] 3 5 7 9 11 13 15 17 19 21 ...
..$ nodepred : num [1:393623, 1:60] -8.15 -31.38 5.62 -59.87 -16.06 ...
..$ bestvar : num [1:393623, 1:60] 118 57 82 77 65 148 39 39 12 77 ...
..$ xbestsplit : num [1:393623, 1:60] 1.08e+02 -8.26e+08 -2.50 8.55e+03 1.20e+04 ...
..$ ncat : Named int [1:150] 1 1 1 1 1 1 1 1 1 1 ...
.. ..- attr(*, "names")= chr [1:150] "feat1" "feat2" "feat3" "feat4" ...
..$ nrnodes : int 393623
..$ ntree : num 60
..$ xlevels :List of 150
.. ..$ feat1 : num 0
.. ..$ feat2 : num 0
.. ..$ feat3 : num 0
.. ..$ feat4 : num 0
.. ..$ featn : num 0
.. .. [list output truncated]
$ coefs : NULL
$ y : num [1:723012] -1885 -1918 -1585 -1838 -2035 ...
$ test : NULL
$ inbag : NULL
- attr(*, "class")= chr "randomForest"

Error 'duplicate subscripts for columns' on using CreateTableOne

I was trying to do CreateTableOne from tableone package for my dataset called m.dataaaaaa using the following code:
CreateTableOne(vars =Vars,strata = "ejecfraclesstha40_gps", factorVars =Catvars, data = m.dataaaaaa, test = T)
But I got the following error :
Error in [<-.data.frame(x, i, value = value) : duplicate
subscripts for columns In addition: Warning message: In
ModuleReturnVarsExist(vars, data) : The data frame does not have:
ejecfraclesstha40 Dropped
structure of the data is shown below as it is a big database
str(m.dataaaaaa)
Classes ‘data.table’ and 'data.frame': 194 obs. of 203 variables:
$ ejecfraclesstha40_gps : num 1 0 1 0 0 0 1 1 1 0 ...
$ Serial.ID : num 2 3 4 7 10 14 17 20 23 24 ...
..- attr(*, "format.spss")= chr "F4.0"
$ Serial.ID_matched.EF.cohort.Ivan1.to.2 : num 2 NA 4 NA NA NA 17 20 23 NA ...
..- attr(*, "format.spss")= chr "F8.0"
$ ps..matched.EF.cohort.Ivan1.to.2 : num 0.138 NA 0.19 NA NA NA 0.176 0.286 0.152 NA ...
..- attr(*, "format.spss")= chr "F8.3"
$ psweight1.to.2 : num 1 NA 1 NA NA NA 1 1 1 NA ...
..- attr(*, "format.spss")= chr "F8.2"
$ matched_ID1.to.2 : num 483 NA 763 NA NA NA 180 176 239 NA ...
..- attr(*, "format.spss")= chr "F8.2"
$ matched_cases_in_control1.to.2 : num 2 NA 2 NA NA NA 2 2 2 NA ...
..- attr(*, "format.spss")= chr "F8.2"
$ ejecfrac_4gps : num 1 3 1 3 3 3 1 1 1 3 ...
..- attr(*, "format.spss")= chr "F8.2"
..- attr(*, "labels")= Named num 1 2 3 4
.. ..- attr(*, "names")= chr "EF<35%" "EF=35 - <40%" "EF=40 - <=50" "EF>50%"
$ ejecfrac_4gps30 : num 1 4 1 3 3 4 1 1 1 4 ...
..- attr(*, "format.spss")= chr "F8.2"
..- attr(*, "labels")= Named num 1 2 3 4
.. ..- attr(*, "names")= chr "EF<=30%" "EF>30 - 39%" "EF=40 - 49%" "EF>=50%"
$ renisch : num 29 31 23 18 48 19 10 29 17 13 ...
..- attr(*, "label")= chr "renal + visceral ischemic time"
..- attr(*, "format.spss")= chr "F3.0"
..- attr(*, "display_width")= int 12
$ totxct : num 46 31 55 46 48 19 54 29 17 37 ...
..- attr(*, "label")= chr "total cross-clamp time"
..- attr(*, "format.spss")= chr "F4.0"
..- attr(*, "display_width")= int 12
The original database was read from spss into r.
My main problem is with this error :
Error in [<-.data.frame(x, i, value = value) : duplicate subscripts for columns
Any advice will be greatly appreciated.

Reaaragment of list data in R

I have a list like
a=
:x1
..$y: chr [1:100] "da" "da" "dw" "dw"...........
..$z: num [1:100] 1 3 7 10 14 15 16...........
:x2
..$y chr [1:150] "sdd" "gtr" "fr" "sw"........
..$z num [1:150] 1 2 3 7 10 15 16............
i want to create a list which is a split of current list in a way that z vector should be split between 1:10, 11:20, 21:30,......
for eg.
a1=
:list1
..$x1
.. ..$y: chr [1:10] "da" "da"...........
.. ..$z: num [1:10] 1 3 7 10 ...........
..$x2
.. ..$y chr [1:10] "sdd" "gtr"........
.. ..$z num [1:10] 1 2 3 7 10............
:list2
..$x1
.. ..$y: chr [1:10] "des" "ded"...........
.. ..$z: num [1:10] 14 15 16...........
..$x2
.. ..$y chr [1:10] "dwd" "ded"........
.. ..$z num [1:10] 15 16............
:list3
..$x1
.. ..$y: chr [1:10] "ded" "sa"...........
.. ..$z: num [1:10] 21 24 27...........
..$x2
.. ..$y chr [1:10] "dww" "dw"........
.. ..$z num [1:10] 24 27 30............
I am trying some for loop but is throwing some errors.

How Can I Quickly Inspect Built-in Data Sets (PSA)?

One of the best ways to make a question reproducible is to use one of the built in data sets. Using data(), however, is frustrating because no information about the structure of the data set is provided.
How can I quickly view the structure of available data sets?
The following function may help:
dataStr <- function(fun=function(x) TRUE)
str(
Filter(
fun,
Filter(
Negate(is.null),
mget(data()$results[, "Item"], inh=T, ifn=list(NULL))
) ) )
It accepts a filtering function, applies it to all the data sets, and prints out the structure of the matching data sets. For example, if we're looking for matrices:
> dataStr(is.matrix)
List of 8
$ WorldPhones : num [1:7, 1:7] 45939 60423 64721 68484 71799 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:7] "1951" "1956" "1957" "1958" ...
.. ..$ : chr [1:7] "N.Amer" "Europe" "Asia" "S.Amer" ...
$ occupationalStatus : 'table' int [1:8, 1:8] 50 16 12 11 2 12 0 0 19 40 ...
..- attr(*, "dimnames")=List of 2
.. ..$ origin : chr [1:8] "1" "2" "3" "4" ...
.. ..$ destination: chr [1:8] "1" "2" "3" "4" ...
$ volcano : num [1:87, 1:61] 100 101 102 103 104 105 105 106 107 108 ...
--- 5 entries omitted ---
Or for data frames (also omitting entries):
> dataStr(is.data.frame)
List of 42
$ BOD :'data.frame': 6 obs. of 2 variables:
..$ Time : num [1:6] 1 2 3 4 5 7
..$ demand: num [1:6] 8.3 10.3 19 16 15.6 19.8
..- attr(*, "reference")= chr "A1.4, p. 270"
$ CO2 :Classes ‘nfnGroupedData’, ‘nfGroupedData’, ‘groupedData’ and 'data.frame': 84 obs. of 5 variables:
..$ Plant : Ord.factor w/ 12 levels "Qn1"<"Qn2"<"Qn3"<..: 1 1 1 1 1 1 1 2 2 2 ...
..$ Type : Factor w/ 2 levels "Quebec","Mississippi": 1 1 1 1 1 1 1 1 1 1 ...
..$ Treatment: Factor w/ 2 levels "nonchilled","chilled": 1 1 1 1 1 1 1 1 1 1 ...
..$ conc : num [1:84] 95 175 250 350 500 675 1000 95 175 250 ...
..$ uptake : num [1:84] 16 30.4 34.8 37.2 35.3 39.2 39.7 13.6 27.3 37.1 ...
--- 40 entries omitted ---
Or even for simple vectors:
> dataStr(function(x) is.atomic(x) && is.vector(x) && !is.ts(x))
List of 4
$ euro : Named num [1:11] 13.76 40.34 1.96 166.39 5.95 ...
..- attr(*, "names")= chr [1:11] "ATS" "BEF" "DEM" "ESP" ...
$ islands: Named num [1:48] 11506 5500 16988 2968 16 ...
..- attr(*, "names")= chr [1:48] "Africa" "Antarctica" "Asia" "Australia" ...
$ precip : Named num [1:70] 67 54.7 7 48.5 14 17.2 20.7 13 43.4 40.2 ...
..- attr(*, "names")= chr [1:70] "Mobile" "Juneau" "Phoenix" "Little Rock" ...
$ rivers : num [1:141] 735 320 325 392 524 ...

Resources