Plot results from describe.by{psych} output [closed] - r

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
I have the results from a describe.by{psych} applied on a dataframe. The results is a list.
List of 1000
$ 1 :Classes ‘psych’, ‘describe’ and 'data.frame': 20 obs. of 13 variables:
..$ var : int [1:20] 1 2 3 4 5 6 7 8 9 10 ...
..$ n : num [1:20] 24 24 24 24 24 24 24 24 24 24 ...
..$ mean : num [1:20] 24 30.8 24 31.6 240 ...
..$ sd : num [1:20] 0.937 3.667 0.937 3.537 9.367 ...
..$ median : num [1:20] 23.9 31 23.9 31.9 238.6 ...
..$ trimmed : num [1:20] 24 30.9 24 31.7 239.7 ...
..$ mad : num [1:20] 1.11 4.12 1.11 3.29 11.09 ...
..$ min : num [1:20] 22.6 24 22.6 25.3 225.9 ...
..$ max : num [1:20] 25.6 36.9 25.6 36.9 256 ...
..$ range : num [1:20] 3 12.9 3 11.6 30 ...
..$ skew : num [1:20] 0.309 -0.258 0.309 -0.411 0.309 ...
..$ kurtosis: num [1:20] -1.163 -0.898 -1.163 -0.819 -1.163 ...
..$ se : num [1:20] 0.191 0.749 0.191 0.722 1.912 ...
$ 2 :Classes ‘psych’, ‘describe’ and 'data.frame': 20 obs. of 13 variables:
..$ var : int [1:20] 1 2 3 4 5 6 7 8 9 10 ...
..$ n : num [1:20] 7 7 7 7 7 7 7 7 7 7 ...
..$ mean : num [1:20] 16.3 39.3 16.3 40.7 162.9 ...
..$ sd : num [1:20] 0.609 8.045 0.609 8.394 6.086 ...
..$ median : num [1:20] 16.4 39.1 16.4 39.6 164.2 ...
..$ trimmed : num [1:20] 16.3 39.3 16.3 40.7 162.9 ...
I would like to plot a graph ( probably candlestick) or boxplots with this sample for each of the 13 metrics. Is there a package in which I can directly leverage the summary stats computed ?

You question is vague.
describeBy( describe.by is deprecated) , Report basic summary statistics by a grouping variable.
So I guess that a boxplot it is the nearest plot.
For example :
describeBy(sat.act,sat.act$gender)
group: 1
var n mean sd median trimmed mad min max range skew kurtosis se
gender 1 247 1.00 0.00 1 1.00 0.00 1 1 0 NaN NaN 0.00
education 2 247 3.00 1.54 3 3.12 1.48 0 5 5 -0.54 -0.60 0.10
age 3 247 25.86 9.74 22 24.23 5.93 14 58 44 1.43 1.43 0.62
ACT 4 247 28.79 5.06 30 29.23 4.45 3 36 33 -1.06 1.89 0.32
SATV 5 247 615.11 114.16 630 622.07 118.61 200 800 600 -0.63 0.13 7.26
SATQ 6 245 635.87 116.02 660 645.53 94.89 300 800 500 -0.72 -0.12 7.41
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
group: 2
var n mean sd median trimmed mad min max range skew kurtosis se
gender 1 453 2.00 0.00 2 2.00 0.00 2 2 0 NaN NaN 0.00
education 2 453 3.26 1.35 3 3.40 1.48 0 5 5 -0.74 0.27 0.06
age 3 453 25.45 9.37 22 23.70 5.93 13 65 52 1.77 3.03 0.44
ACT 4 453 28.42 4.69 29 28.63 4.45 15 36 21 -0.39 -0.42 0.22
SATV 5 453 610.66 112.31 620 617.91 103.78 200 800 600 -0.65 0.42 5.28
SATQ 6 442 596.00 113.07 600 602.21 133.43 200 800 600 -0.58 0.13 5.38
>
You can plot this like :
boxplot(sat.act,sat.act$gender, col ='pink')

Related

"'groups' must be a factor" on Shapiro-Wilk test on Rcmdr

I am trying to run a shapiro-wilk normality test on R (Rcmdr to be more accurate) by going to "Statistics=>Summary=>Descriptive statistics" and then selecting one of my dependent variable and choosing "summary by group".
Rcmdr automatically triggers the following code :
normalityTest(Algometre.J0 ~ Modalite, test="shapiro.test",
data=Dataset)
And I am getting the following error message :
'groups' must be a factor.
I have already categorized my independant variable as a factor (I swear, I did !)
Any idea what's wrong ?
Thanx in advance
Here is what str(Dataset) shows :
'data.frame': 76 obs. of 11 variables:
$ Modalite : chr "C" "C" "C" "C" ...
$ Angle.J0 : num 20.1 20.5 21 22.5 19.1 ...
$ Angle.J1 : num 21.7 22.6 22.8 23.3 20.5 ...
$ Angle.J2 : num 22.3 23 23.9 24.2 21 ...
$ Epaisseur.J0: num 1.97 1.54 1.76 1.89 1.53 1.87 1.54 2 1.79 1.41 ...
$ Epaisseur.J1: num 2.07 1.49 1.87 1.91 1.54 1.9 1.51 2.03 1.71 1.48 ...
$ Epaisseur.J2: num 2.08 1.69 1.77 2 1.61 1.99 1.38 2.06 1.86 1.53 ...
$ Algometre.J0: num 45 40 105 165 66.3 ...
$ Algometre.J1: num 32.7 39.7 91.7 124 63.7 ...
$ Algometre.J2: num 51.3 58.7 101 138 60.3 ...
$ ObsNumber : int 1 2 3 4 5 6 7 8 9 10 ...
What does that mean ?

How to perform sensitivity analysis using Lek's profile in R?

I am trying to do sensitivity analysis using R. My data set has few continuous explanatory variables and a categorical response variable (7 categories).
I tried to run the below mentioned code.
model=train(factor(mode)~Time+Cost+Age+Income,
method="nnet",
preProcess("center","scale"),
data=train,
verbose=F,
trControl=trainControl(method='cv', verboseIter = F),
tuneGrid=expand.grid(.size=c(1:20), .decay=c(0,0.001,0.01,0.1)))
After getting the output through this code, I tried to develop Lek's profile using the below mentioned code.
Lekprofile(model)
However, I got the error stating "Errors in xvars[, x_names]: subscript out of bound"
Please help me to resolve the error.
It doesn't work for a classification model , for example, if we use a regression model:
library(caret)
library(NeuralNetTools)
library(mlbench)
data(BostonHousing)
str(BostonHousing)
'data.frame': 506 obs. of 14 variables:
$ crim : num 0.00632 0.02731 0.02729 0.03237 0.06905 ...
$ zn : num 18 0 0 0 0 0 12.5 12.5 12.5 12.5 ...
$ indus : num 2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 ...
$ chas : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ nox : num 0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 ...
$ rm : num 6.58 6.42 7.18 7 7.15 ...
$ age : num 65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ...
$ dis : num 4.09 4.97 4.97 6.06 6.06 ...
$ rad : num 1 2 2 3 3 3 5 5 5 5 ...
$ tax : num 296 242 242 222 222 222 311 311 311 311 ...
$ ptratio: num 15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 ...
$ b : num 397 397 393 395 397 ...
$ lstat : num 4.98 9.14 4.03 2.94 5.33 ...
$ medv : num 24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...
We train the model, exclude the categorical chas:
model = train(medv ~ .,data=BostonHousing[,-4],method="nnet",
trControl=trainControl(method="cv",number=10),
tuneGrid=data.frame(size=c(5,10,20),decay=0.1))
lekprofile(model)
You can see the y-axis is meant to be continuous. We can try to discretize our response variable medv and you can see it crashes:
BostonHousing$medv = cut(BostonHousing$medv,4)
model = train(medv ~ .,data=BostonHousing[,-4],method="nnet",
trControl=trainControl(method="cv",number=10),
tuneGrid=data.frame(size=c(5,10,20),decay=0.1))
lekprofile(model)
Error in `[.data.frame`(preds, , ysel, drop = FALSE) :
undefined columns selected

Error using IRMI imputation

I've been trying to use irmi from VIM package to replace NA's.
My data looks something like this:
> str(sub_mex)
'data.frame': 21 obs. of 83 variables:
$ pH : num 7.2 7.4 7.4 7.36 7.2 7.82 7.67 7.73 7.79 7.7 ...
$ Cond : num 1152 1078 1076 1076 1018 ...
$ CO3 : num NA NA NA NA NA ...
$ Mg : num 25.8 24.9 24.3 24.8 23.4 ...
$ NO3 : num 49.7 25.6 27.1 39.6 52.8 ...
$ Cd : num 0.0088 0.0104 0.0085 0.0092 0.0086 ...
$ As_H : num 0.006 0.0059 0.0056 0.0068 0.0073 ...
$ As_F : num 0.0056 0.0058 0.0057 0.0066 0.0065 0.004 0.004 0.004 0.0048 0.0078 ...
$ As_FC : num NA NA NA NA NA NA NA NA NA 0.0028 ...
$ Pb : num 0.0097 0.0096 0.0092 0.01 0.0093 0.0275 0.024 0.0255 0.031 0.024 ...
$ Fe : num 0.39 0.26 0.27 0.28 0.32 0.135 0.08 NA 0.13 NA ...
$ No_EPT : int 0 0 0 0 0 0 0 0 0 0 ...
I've subset my sub_mex dataset to analyze observations separately, so i have sub_t dataset. Which look something like this
> str(sub_t)
'data.frame': 5 obs. of 83 variables:
$ pH : num 7.82 7.67 7.73 7.79 7.7
$ CO3 : num 45 NA 37.2 41.9 40.3
$ Mg : num 41.3 51.4 47.7 51.8 53
$ NO3 : num 47.1 40.7 39.9 42.1 37.6
$ Cd : num 0.0173 0.0145 0.016 0.016 0.0154
$ As_H : num 0.00949 0.01009 0.00907 0.00972 0.00954
$ As_F : num 0.004 0.004 0.004 0.0048 0.0078
$ As_FC : num NA NA NA NA 0.0028
$ Pb : num 0.0275 0.024 0.0255 0.031 0.024
$ Fe : num 0.135 0.08 NA 0.13 NA
$ No_EPT : int 0 0 0 0 0
I impute NA's of the sub_mex dataset using:
imp_mexi <- irmi(sub_mex) which works fine
However when I try to impute the subset sub_t I got the following error message:
> imp_t <- irmi(sub_t)
Error in indexNA2s[, variable[j]] : subscript out of bounds
Does anyone have an idea of how to solve this? I want to impute my data sub_t and I don't want to use a subset of the ìmp_mexi imputed dataset.
Any help will be deeply appreciated.
I had a similar issue and discovered that one of the columns in my dataframe was entirely missing- hence the out of bounds error.

Accessing data from output R program

I have an output generated by the cuts functions which is the one below...lets call this ouput 'data'.
cuts: [20,25)
Time Kilometres
21 20 7.3
22 21 8.4
23 22 9.5
24 23 10.6
25 24 11.7
------------------------------------------------------------
cuts: [25,30)
Time Kilometres
26 25 12.8
27 26 13.9
28 27 15.0
29 28 16.1
30 29 17.2
------------------------------------------------------------
cuts: [30,35)
Time Kilometres
31 30 18.3
32 31 19.4
33 32 20.5
34 33 21.6
35 34 22.7
How could I access the data in each cut..like get the kilometres data from cuts:[20,25]..etc I tried doing data$Kilometres...but this does not work...So I basically want a new data frame where I could use the kilometres data seperately for each cut
The output of by here is a list, so you can use basic list indexing, either by number or name. Using the data from your question from a few hours ago, and the answer from Matthew Lundberg, we can index as follows:
> x[[1]]
Time Velocity
1 0.0 0.00
2 1.5 1.21
3 3.0 1.26
4 4.5 1.31
> x[["[6,12)"]]
Time Velocity
5 6.0 1.36
6 7.5 1.41
7 9.0 1.46
8 10.5 1.51
You can review the structure of objects in R by using str. This is usually useful to help you decide how you can extract certain information. Here's str(x):
> str(x)
List of 7
$ [0,6) :Classes ‘AsIs’ and 'data.frame': 4 obs. of 2 variables:
..$ Time : num [1:4] 0 1.5 3 4.5
..$ Velocity: num [1:4] 0 1.21 1.26 1.31
$ [6,12) :Classes ‘AsIs’ and 'data.frame': 4 obs. of 2 variables:
..$ Time : num [1:4] 6 7.5 9 10.5
..$ Velocity: num [1:4] 1.36 1.41 1.46 1.51
$ [12,18):Classes ‘AsIs’ and 'data.frame': 6 obs. of 2 variables:
..$ Time : num [1:6] 12 13 14 15 16 17
..$ Velocity: num [1:6] 1.56 1.61 1.66 1.71 1.76 1.81
$ [18,24):Classes ‘AsIs’ and 'data.frame': 5 obs. of 2 variables:
..$ Time : num [1:5] 18 19 20 21 22.5
..$ Velocity: num [1:5] 1.86 1.91 1.96 2.01 2.06
$ [24,30):Classes ‘AsIs’ and 'data.frame': 4 obs. of 2 variables:
..$ Time : num [1:4] 24 25.5 27 28.5
..$ Velocity: num [1:4] 2.11 2.16 2.21 2.26
$ [30,36):Classes ‘AsIs’ and 'data.frame': 4 obs. of 2 variables:
..$ Time : num [1:4] 30 31.5 33 34.5
..$ Velocity: num [1:4] 2.31 2.36 2.41 2.42
$ [36,42):Classes ‘AsIs’ and 'data.frame': 1 obs. of 2 variables:
..$ Time : num 36
..$ Velocity: num 2.43
- attr(*, "dim")= int 7
- attr(*, "dimnames")=List of 1
..$ cuts: chr [1:7] "[0,6)" "[6,12)" "[12,18)" "[18,24)" ...
- attr(*, "call")= language by.data.frame(data = mydf, INDICES = cuts, FUN = I)
- attr(*, "class")= chr "by"
From this we can see that we have a named list of seven items, and each list contains a data.frame. Thus, if we wanted a vector of just the "Velocity" variable (second column) for the third interval, we would use something like:
> x[[3]][[2]]
[1] 1.56 1.61 1.66 1.71 1.76 1.81

R- get a single column from many columns

I have wavelenghts from 350 to 2500 each one have data:
x350 x351 x352 x353 x354 ...... x2500
0.18 0.17 0.17 0.17 0.16 ...... 0.3
0.16 0.15 0.15 0.15 0.15 ...... 0.47
0.14 0.14 0.13 0.13 0.13 ...... 0.35
I need to make one column without the name of the wavelenght and give to this new colum a name:
Wave
0.18
0.16
0.14
0.17
0.15
0.14
0.17
0.15
0.13
0.16
0.15
0.13
.
.
.
0.3
0.47
0.35
m is my file and the columns of the wavelenghts are from 17 col to 2167 col. I tried:
a <- list(m[1:16,17:2167])
but I get the list with the names of the columns in between:
list(structure(list(X350 = c(0.15723315, 0.138406682, 0.174909807,
0.143139974, 0.123193808, 0.154449448, 0.163255619, 0.126194713,
0.14327512, 0.066265248, 0.139851395, 0.158271497, 0.158060045,
0.145313933, 0.143890661), X351 = c(0.154324452, 0.135509959,
0.173350322, 0.139867145, 0.121439474, 0.15276091, 0.160391152,
0.125592826, 0.140349489, 0.065316491, 0.137927937, 0.158400317,
0.156211611, 0.142498763, 0.141353986), X352 = c(0.151243533....
How can I get just one column with one name from 2465 columns?
More info
str(m)
'data.frame': 16 obs. of 2167 variables:
$ pott : int 48 49 50 51 52 53 54 55 56 57 ...
$ b : chr "B1" "B1" "B1" "B1" ...
$ F : int 1 1 1 1 1 1 1 1 1 1 ...
$ G : chr "Sunstar" "Quarrion" "Nacozari" "W130114" ...
$ R : int 3 3 3 3 3 3 3 3 3 3 ...
$ D : int 80 80 81 80 81 80 82 82 82 82 ...
$ W: num 1.8 1.5 1.3 1.9 1.8 1.25 1.85 2.1 1.6 2.4 ...
$ S : num 43.4 35.7 44.7 48.6 45.3 35.5 49.2 49.1 46.8 41.5 ...
$ R : num -0.327 1.149 2.348 1.636 1.952 ...
$ V : num 76.4 49 118.9 108 114.5 ...
$ J : num 158 114 191 169 183 ...
$ P: num 19.9 10.6 24.1 21.1 23.6 ...
$ Ce : num 0.367 0.13 0.466 0.36 0.462 ...
$ Ci : num 273 246 280 263 272 ...
$ S : num 23.5 29 30.9 29.4 24.1 ...
$ L : num 42.5 34.4 32.4 34 41.4 ...
$ X350 : num 0.176 0.157 0.138 0.175 0.143 ...
$ X351 : num 0.172 0.154 0.136 0.173 0.14 ...
$ X352 : num 0.169 0.151 0.133 0.172 0.138 ...
$ X353 : num 0.167 0.147 0.132 0.17 0.137 ...
$ X354 : num 0.165 0.147 0.13 0.167 0.133 ...
$ X355 : num 0.162 0.146 0.127 0.166 0.13 ...
$ X356 : num 0.159 0.144 0.126 0.164 0.128 ...
$ X357 : num 0.158 0.14 0.125 0.161 0.125 ...
$ X358 : num 0.155 0.138 0.123 0.159 0.124 ...
$ X359 : num 0.153 0.137 0.121 0.157 0.123 ...
$ X360 : num 0.15 0.135 0.12 0.154 0.122 ...
....$2500
I guess your data are in a text file
data <- read.table("your_file", header=T, quote="\"")
so, data will look like
structure(list(x350 = c(0.18, 0.16, 0.14), x351 = c(0.17, 0.15,
0.14), x352 = c(0.17, 0.15, 0.13), x353 = c(0.17, 0.15, 0.13)), .Names = c("x350",
"x351", "x352", "x353"), class = "data.frame", row.names = c(NA,
-3L))
and
result <- data.frame(Wave = unlist(data,use.names=FALSE))
will produce
Wave
1 0.18
2 0.16
3 0.14
4 0.17
5 0.15
6 0.14
7 0.17
8 0.15
9 0.13
10 0.17
11 0.15
12 0.13

Resources