r subset(df, condition) different result from df$[condition, ] [duplicate]

r subset(df, condition) different result from df$[condition, ] [duplicate] - r

This question already has answers here:
How to subset data in R without losing NA rows?
(3 answers)
Closed 4 years ago.
Some wired output with subsetting data.frame in R.
here is files I used
https://d37djvu3ytnwxt.cloudfront.net/assets/courseware/v1/ccdc87b80d92a9c24de2f04daec5bb58/asset-v1:MITx+15.071x+2T2017+type#asset+block/WHO.csv
After read data in R , there are 194 obs. with 13 vars.
> str(WHO)
'data.frame': 194 obs. of 13 variables:
$ Country : Factor w/ 194 levels "Afghanistan",..: 1 2 3 4 5 6 7 8 9 10 ...
$ Region : Factor w/ 6 levels "Africa","Americas",..: 3 4 1 4 1 2 2 4 6 4 ...
$ Population : int 29825 3162 38482 78 20821 89 41087 2969 23050 8464 ...
$ Under15 : num 47.4 21.3 27.4 15.2 47.6 ...
$ Over60 : num 3.82 14.93 7.17 22.86 3.84 ...
$ FertilityRate : num 5.4 1.75 2.83 NA 6.1 2.12 2.2 1.74 1.89 1.44 ...
$ LifeExpectancy : int 60 74 73 82 51 75 76 71 82 81 ...
$ ChildMortality : num 98.5 16.7 20 3.2 163.5 ...
$ CellularSubscribers : num 54.3 96.4 99 75.5 48.4 ...
$ LiteracyRate : num NA NA NA NA 70.1 99 97.8 99.6 NA NA ...
$ GNI : num 1140 8820 8310 NA 5230 ...
$ PrimarySchoolEnrollmentMale : num NA NA 98.2 78.4 93.1 91.1 NA NA 96.9 NA ...
$ PrimarySchoolEnrollmentFemale: num NA NA 96.4 79.4 78.2 84.5 NA NA 97.5 NA ...
But the result of subsetting with function subset differ from df[,] as example below.
> Outliers <- WHO[WHO$GNI > 10000 & WHO$FertilityRate > 2.5,]
> nrow(Outliers)
[1] 27
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers
NA <NA> <NA> NA NA NA NA NA NA NA
23 Botswana Africa 2004 33.75 5.63 2.71 66 53.3 142.82
NA.1 <NA> <NA> NA NA NA NA NA NA NA
NA.2 <NA> <NA> NA NA NA NA NA NA NA
(trimmed ...)
There is a lot of NA obs.
While use subset function, yield correct results.
> Outliers <- subset(WHO, GNI > 10000 & FertilityRate > 2.5)
> nrow(Outliers)
[1] 7
> Outliers
Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers
23 Botswana Africa 2004 33.75 5.63 2.71 66 53.3 142.82
56 Equatorial Guinea Africa 736 38.95 4.53 5.04 54 100.3 59.15
63 Gabon Africa 1633 38.49 7.38 4.18 62 62.0 117.32
83 Israel Europe 7644 27.53 15.15 2.92 82 4.2 121.66
88 Kazakhstan Europe 16271 25.46 10.04 2.52 67 18.7 155.74
131 Panama Americas 3802 28.65 10.13 2.52 77 18.5 188.60
150 Saudi Arabia Eastern Mediterranean 28288 29.69 4.59 2.76 76 8.6 191.24
(trimmed ...)

What about making sure you get rid of the NAs first?
Outliers <- WHO[!is.na(WHO$GNI) & WHO$GNI > 10000 &
!is.na(WHO$FertilityRate) & WHO$FertilityRate > 2.5,]

Related

Agricolae, tapply, error: arguments must have same length

I am new to R and I am having issues moving forward with the data analysis. My Excel data has a lot of NA's and I tried troubleshooting this error. Here's my code if anyone can help, and a link to a sample of my data
file:///C:/Users/steph/Documents/DLI%20ANOVA%20Sample.htm
Some of my variables have 4 reps instead of all 8reps, so I have a lot of NA's in the excel file. I keep getting this error after I try tapply:
Error in tapply(X = data1$gi..m3., INDEX = data1$cultivar, FUN = mean, :
arguments must have same length
library(agricolae)
data1=read.csv("DLI ANOVA Sample.csv", header=T, as.is=T)
#setting factors
block = as.factor(data1$block)
treatmentt = as.factor(data1$trt)
cultivar<-factor(data1$cv,c("CR", "LB","RF","RR","S","SNS","SNY","SSJ","YC"))
str(data1)
#Summary statistics
tapply(X = data1$growth.index, INDEX = data1$cultivar, FUN = mean, na.rm=T)
tapply(X = data1$growth.index, INDEX = data1$treatment, FUN = mean, na.rm=T)
data.frame': 288 obs. of 24 variables:
$ block : int 1 1 2 2 3 3 4 4 1 1 ...
$ trt : chr "HL-L" "HL-L" "HL-L" "HL-L" ..
$ cv : chr "CR" "CR" "CR" "CR" ...
$ rep : int 1 2 3 4 5 6 7 8 1 2 ...
$ height : int 23 20 25 19 23 19 22 19 19 24
$ growth.index : num 0.0221 0.0258 0.0276 0.0227 0.0209
$ number.of.mature.fruit : int 34 30 35 34 28 25 40 24 12 16 ...
$ mature.fruit.fw : num 163 163 186 152 169 ...
$ number.of.immature.fruit : int 38 28 40 27 35 37 44 48 20 30 ...
$ immature.fruit.fw : num 77.4 66.6 87.6 43.4 81.3 ...
$ Total.number.of.fruit : num 72 58 75 61 63 62 84 72 32 46 ...
$ Total.fruit.fw : num 241 230 273 195 250 ...
$ Fruit.Water.Content..g. : num NA 209 NA 176 NA ...
$ Brix.. : num 4.9 NA 5.6 NA 4.7 NA 5.1 NA 5.6 NA ...
$ pH : num 4.17 NA 4.3 NA 4.1 ...
$ EC.uS.mL : num 4.46 NA 9.19 NA 8.24 ...
$ X..citric.Acid : num 0.704 NA 0.397 NA 0.653 ...
$ Sugar.Acid.Ratio : num 6.96 NA 14.11 NA 7.2 ...
$ oedema.injury.level..1.6. : int 3 3 1 2 1 1 1 2 2 1 ...
$ Stomatal.conductance : num NA 365 NA 422 NA ...
$ spad : num NA NA NA 64.3 NA 65.5 NA 68.7 NA 55.6 ...
$ Irrigation.Events : int NA 14 NA 12 NA 13 NA 16 NA 13 ...
$ WUE : num NA 0.00584 NA 0.00693 NA ...
$ transpiration..g.H2O.lost..g.dry.biomass.: num NA 117 NA 111 NA ...

Can we use as.factor to convert categorical variables having multiple levels for decision tree or we need to use model.matrix?

I am trying to build a decison tree model in R having both categorical and numerical variables.Some categorical variables have 3 levels , so can I just use as.factor and then use in my model? I tried to use model.matrix but my doubt is model.matrix converts the variable in numeric values of 0s and 1s and splitting happens on basis of these numeric values. For eg if Color has 3 level- blue,red,green, the splitting rule will look like color_green < 0.5 instead it should always take 0s and 1s only.

If you are asking whether you can use factors to build an rpart decision tree. Then yes. See below example from the documentation. Note that there are a lot of possible packages for decision trees.
library(rpart)
rpart(Reliability ~ ., data=car90)
#> n=76 (35 observations deleted due to missingness)
#>
#> node), split, n, loss, yval, (yprob)
#> * denotes terminal node
#>
#> 1) root 76 53 average (0.2 0.12 0.3 0.11 0.28)
#> 2) Country=Germany,Korea,Mexico,Sweden,USA 49 29 average (0.31 0.18 0.41 0.1 0)
#> 4) Tires=145,155/80,165/80,185/80,195/60,195/65,195/70,205/60,215/65,225/75,275/40 17 9 Much worse (0.47 0.29 0 0.24 0) *
#> 5) Tires=175/70,185/65,185/70,185/75,195/75,205/70,205/75,215/70 32 12 average (0.22 0.12 0.62 0.031 0)
#> 10) HP.revs< 4650 13 7 Much worse (0.46 0.23 0.31 0 0) *
#> 11) HP.revs>=4650 19 3 average (0.053 0.053 0.84 0.053 0) *
#> 3) Country=Japan,Japan/USA 27 6 Much better (0 0 0.11 0.11 0.78) *
str(car90)
#> 'data.frame': 111 obs. of 34 variables:
#> $ Country : Factor w/ 10 levels "Brazil","England",..: 5 5 4 4 4 4 10 10 10 NA ...
#> $ Disp : num 112 163 141 121 152 209 151 231 231 189 ...
#> $ Disp2 : num 1.8 2.7 2.3 2 2.5 3.5 2.5 3.8 3.8 3.1 ...
#> $ Eng.Rev : num 2935 2505 2775 2835 2625 ...
#> $ Front.Hd : num 3.5 2 2.5 4 2 3 4 6 5 5.5 ...
#> $ Frt.Leg.Room: num 41.5 41.5 41.5 42 42 42 42 42 41 41 ...
#> $ Frt.Shld : num 53 55.5 56.5 52.5 52 54.5 56.5 58.5 59 58 ...
#> $ Gear.Ratio : num 3.26 2.95 3.27 3.25 3.02 2.8 NA NA NA NA ...
#> $ Gear2 : num 3.21 3.02 3.25 3.25 2.99 2.85 2.84 1.99 1.99 2.33 ...
#> $ HP : num 130 160 130 108 168 208 110 165 165 101 ...
#> $ HP.revs : num 6000 5900 5500 5300 5800 5700 5200 4800 4800 4400 ...
#> $ Height : num 47.5 50 51.5 50.5 49.5 51 49.5 50.5 51 50.5 ...
#> $ Length : num 177 191 193 176 175 186 189 197 197 192 ...
#> $ Luggage : num 16 14 17 10 12 12 16 16 16 15 ...
#> $ Mileage : num NA 20 NA 27 NA NA 21 NA 23 NA ...
#> $ Model2 : Factor w/ 21 levels ""," Turbo 4 (3)",..: 1 1 1 1 1 1 1 14 13 1 ...
#> $ Price : num 11950 24760 26900 18900 24650 ...
#> $ Rear.Hd : num 1.5 2 3 1 1 2.5 2.5 4.5 3.5 3.5 ...
#> $ Rear.Seating: num 26.5 28.5 31 28 25.5 27 28 30.5 28.5 27.5 ...
#> $ RearShld : num 52 55.5 55 52 51.5 55.5 56 58.5 58.5 56.5 ...
#> $ Reliability : Ord.factor w/ 5 levels "Much worse"<"worse"<..: 5 5 NA NA 4 NA 3 3 3 NA ...
#> $ Rim : Factor w/ 6 levels "R12","R13","R14",..: 3 4 4 3 3 4 3 3 3 3 ...
#> $ Sratio.m : num NA NA NA NA NA NA NA NA NA NA ...
#> $ Sratio.p : num 0.86 0.96 0.97 0.71 0.88 0.78 0.76 0.83 0.87 0.88 ...
#> $ Steering : Factor w/ 3 levels "manual","power",..: 2 2 2 2 2 2 2 2 2 2 ...
#> $ Tank : num 13.2 18 21.1 15.9 16.4 21.1 15.7 18 18 16.5 ...
#> $ Tires : Factor w/ 30 levels "145","145/80",..: 16 20 20 8 17 28 13 23 23 22 ...
#> $ Trans1 : Factor w/ 4 levels "","man.4","man.5",..: 3 3 3 3 3 3 1 1 1 1 ...
#> $ Trans2 : Factor w/ 4 levels "","auto.3","auto.4",..: 3 3 2 2 3 3 2 3 3 3 ...
#> $ Turning : num 37 42 39 35 35 39 41 43 42 41 ...
#> $ Type : Factor w/ 6 levels "Compact","Large",..: 4 3 3 1 1 3 3 2 2 NA ...
#> $ Weight : num 2700 3265 2935 2670 2895 ...
#> $ Wheel.base : num 102 109 106 100 101 109 105 111 111 108 ...
#> $ Width : num 67 69 71 67 65 69 69 72 72 71 ...

Descriptive stats for MI data in R: Take 3

As an R beginner, I have found it surprisingly difficult to figure out how to compute descriptive statistics on multiply imputed data (more so than running some of the other basic analyses, such as correlations and regressions).
These types of questions are prefaced with apologies (Descriptive statistics (Means, StdDevs) using multiply imputed data: R) but have have not been answered (https://stats.stackexchange.com/questions/296193/pooling-basic-descriptives-from-several-multiply-imputed-datasets-using-mice) or are quickly cast a down vote.
Here is a description of a miceadds function(https://www.rdocumentation.org/packages/miceadds/versions/2.10-14/topics/stats0), which I find difficult to follow with data that has been stored in the mids format.
I have gotten some output such as mean, median, min, max using the summary(complete(imp)) but would love to know how to get additional summary output (e.g., skew/kurtosis, standard deviation, variance).
Illustration borrowed from a previous poster above:
> imp <- mice(nhanes, seed = 23109)
iter imp variable
1 1 bmi hyp chl
1 2 bmi hyp chl
1 3 bmi hyp chl
1 4 bmi hyp chl
1 5 bmi hyp chl
2 1 bmi hyp chl
2 2 bmi hyp chl
2 3 bmi hyp chl
> summary(complete(imp))
age bmi hyp chl
1:12 Min. :20.40 1:18 Min. :113
2: 7 1st Qu.:24.90 2: 7 1st Qu.:186
3: 6 Median :27.40 Median :199
Mean :27.37 Mean :194
3rd Qu.:30.10 3rd Qu.:218
Max. :35.30 Max. :284
Would someone kindly take the time to illustrate how one might take the mids object to get the basic descriptives?

Below are some steps you can do to better understand what happens with R objects after each step. I would also recommend to look at this tutorial:
https://gerkovink.github.io/miceVignettes/
library(mice)
# nhanes object is just a simple dataframe:
data(nhanes)
str(nhanes)
#'data.frame': 25 obs. of 4 variables:
# $ age: num 1 2 1 3 1 3 1 1 2 2 ...
#$ bmi: num NA 22.7 NA NA 20.4 NA 22.5 30.1 22 NA ...
#$ hyp: num NA 1 1 NA 1 NA 1 1 1 NA ...
#$ chl: num NA 187 187 NA 113 184 118 187 238 NA ...
# you can generate multivariate imputation using mice() function
imp <- mice(nhanes, seed=23109)
#The output variable is an object of class "mids" which you can explore using str() function
str(imp)
# List of 17
# $ call : language mice(data = nhanes)
# $ data :'data.frame': 25 obs. of 4 variables:
# ..$ age: num [1:25] 1 2 1 3 1 3 1 1 2 2 ...
# ..$ bmi: num [1:25] NA 22.7 NA NA 20.4 NA 22.5 30.1 22 NA ...
# ..$ hyp: num [1:25] NA 1 1 NA 1 NA 1 1 1 NA ...
# ..$ chl: num [1:25] NA 187 187 NA 113 184 118 187 238 NA ...
# $ m : num 5
# ...
# $ imp :List of 4
#..$ age: NULL
#..$ bmi:'data.frame': 9 obs. of 5 variables:
#.. ..$ 1: num [1:9] 28.7 30.1 22.7 24.9 30.1 35.3 27.5 29.6 33.2
#.. ..$ 2: num [1:9] 27.2 30.1 27.2 25.5 29.6 26.3 26.3 30.1 30.1
#.. ..$ 3: num [1:9] 22.5 30.1 20.4 22.5 27.4 22 26.3 27.4 35.3
#.. ..$ 4: num [1:9] 27.2 22 22.7 21.7 25.5 27.2 24.9 30.1 22
#.. ..$ 5: num [1:9] 28.7 28.7 20.4 21.7 25.5 22.5 22.5 25.5 22.7
#...
#You can extract individual components of this object using $, for example
#To view the actual imputation for bmi column
imp$imp$bmi
# 1 2 3 4 5
# 1 28.7 27.2 22.5 27.2 28.7
# 3 30.1 30.1 30.1 22.0 28.7
# 4 22.7 27.2 20.4 22.7 20.4
# 6 24.9 25.5 22.5 21.7 21.7
# 10 30.1 29.6 27.4 25.5 25.5
# 11 35.3 26.3 22.0 27.2 22.5
# 12 27.5 26.3 26.3 24.9 22.5
# 16 29.6 30.1 27.4 30.1 25.5
# 21 33.2 30.1 35.3 22.0 22.7
# The above output is again just a regular dataframe:
str(imp$imp$bmi)
# 'data.frame': 9 obs. of 5 variables:
# $ 1: num 28.7 30.1 22.7 24.9 30.1 35.3 27.5 29.6 33.2
# $ 2: num 27.2 30.1 27.2 25.5 29.6 26.3 26.3 30.1 30.1
# $ 3: num 22.5 30.1 20.4 22.5 27.4 22 26.3 27.4 35.3
# $ 4: num 27.2 22 22.7 21.7 25.5 27.2 24.9 30.1 22
# $ 5: num 28.7 28.7 20.4 21.7 25.5 22.5 22.5 25.5 22.7
# complete() function returns imputed dataset:
mat <- complete(imp)
# The output of this function is a regular data frame:
str(mat)
# 'data.frame': 25 obs. of 4 variables:
# $ age: num 1 2 1 3 1 3 1 1 2 2 ...
# $ bmi: num 28.7 22.7 30.1 22.7 20.4 24.9 22.5 30.1 22 30.1 ...
# $ hyp: num 1 1 1 2 1 2 1 1 1 1 ...
# $ chl: num 199 187 187 204 113 184 118 187 238 229 ...
# So you can run any descriptive statistics you need with this object
# Just like you would do with a regular dataframe:
> summary(mat)
# age bmi hyp chl
# Min. :1.00 Min. :20.40 Min. :1.00 Min. :113.0
# 1st Qu.:1.00 1st Qu.:24.90 1st Qu.:1.00 1st Qu.:187.0
# Median :2.00 Median :27.50 Median :1.00 Median :204.0
# Mean :1.76 Mean :27.48 Mean :1.24 Mean :204.9
# 3rd Qu.:2.00 3rd Qu.:30.10 3rd Qu.:1.00 3rd Qu.:229.0
# Max. :3.00 Max. :35.30 Max. :2.00 Max. :284.0

There are several mistakes in both your code and the answer from Katia and the link provided by Katia is no longer available.
To compute simple statistics after multiple imputation, you must follow Rubin's Rule, which is the method used in mice for a selected bunch of model fits.
When using
library(mice)
imp <- mice(nhanes, seed = 23109)
mat <- complete(imp)
mat
age bmi hyp chl
1 1 28.7 1 199
2 2 22.7 1 187
3 1 30.1 1 187
4 3 22.7 2 204
5 1 20.4 1 113
6 3 24.9 2 184
7 1 22.5 1 118
8 1 30.1 1 187
9 2 22.0 1 238
10 2 30.1 1 229
11 1 35.3 1 187
12 2 27.5 1 229
13 3 21.7 1 206
14 2 28.7 2 204
15 1 29.6 1 238
16 1 29.6 1 238
17 3 27.2 2 284
18 2 26.3 2 199
19 1 35.3 1 218
20 3 25.5 2 206
21 1 33.2 1 238
22 1 33.2 1 229
23 1 27.5 1 131
24 3 24.9 1 284
25 2 27.4 1 186
You only return the first imputed dataset, whereas you imputed five by default. See ?mice::complete for more informations "The default is action = 1L returns the first imputed data set."
To get the five imputed datasets, you have to specify the action argument of mice::complete
mat2 <- complete(imp, "long")
mat2
.imp .id age bmi hyp chl
1 1 1 1 28.7 1 199
2 1 2 2 22.7 1 187
3 1 3 1 30.1 1 187
4 1 4 3 22.7 2 204
5 1 5 1 20.4 1 113
6 1 6 3 24.9 2 184
7 1 7 1 22.5 1 118
8 1 8 1 30.1 1 187
9 1 9 2 22.0 1 238
10 1 10 2 30.1 1 229
...
115 5 15 1 29.6 1 187
116 5 16 1 25.5 1 187
117 5 17 3 27.2 2 284
118 5 18 2 26.3 2 199
119 5 19 1 35.3 1 218
120 5 20 3 25.5 2 218
121 5 21 1 22.7 1 238
122 5 22 1 33.2 1 229
123 5 23 1 27.5 1 131
124 5 24 3 24.9 1 186
125 5 25 2 27.4 1 186
Both summary(mat) and summary(mat2) are false.
Let's focus on bmi. The first one provides the mean bmi over the first imputed dataset. The second one provides the mean of an artifical m times larger dataset. The second dataset also has inappropriately low variance.
mean(mat$bmi)
27.484
mean(mat2$bmi)
26.5192
I have not found a better solution than applying manually Rubin's rule to the mean estimate. The correct estimate is simply the mean of estimates accross all imputed datasets
res <- with(imp, mean(bmi)) #get the mean for each imputed dataset, stored in res$analyses
do.call(sum, res$analyses) / 5 #compute mean over m = 5 mean estimations
26.5192
The variance / standard deviation has to be calculated appropriately. You can use Rubin's rule to compute any simple statistic that you wish. You can find the way of doing so here https://bookdown.org/mwheymans/bookmi/rubins-rules.html
Hope this helps.

Summary of a Subset in R does not work - Why?

I am doing the Analytics Edge course on EdX and ran into this problem. We have a dataset which we are subsetting. Running a Str on the subset works as intended, however trying summary on the same subset throws an error. Can someone explain why?
> str(WHO_Europe)
'data.frame': 53 obs. of 13 variables:
$ Country : Factor w/ 194 levels "Afghanistan",..: 2 4 8 10 11 16 17 22 26 42 ...
$ Region : Factor w/ 6 levels "Africa","Americas",..: 4 4 4 4 4 4 4 4 4 4 ...
$ Population : int 3162 78 2969 8464 9309 9405 11060 3834 7278 4307 ...
$ Under15 : num 21.3 15.2 20.3 14.5 22.2 ...
$ Over60 : num 14.93 22.86 14.06 23.52 8.24 ...
$ FertilityRate : num 1.75 NA 1.74 1.44 1.96 1.47 1.85 1.26 1.51 1.48 ...
$ LifeExpectancy : int 74 82 71 81 71 71 80 76 74 77 ...
$ ChildMortality : num 16.7 3.2 16.4 4 35.2 5.2 4.2 6.7 12.1 4.7 ...
$ CellularSubscribers : num 96.4 75.5 103.6 154.8 108.8 ...
$ LiteracyRate : num NA NA 99.6 NA NA NA NA 97.9 NA 98.8 ...
$ GNI : num 8820 NA 6100 42050 8960 ...
$ PrimarySchoolEnrollmentMale : num NA 78.4 NA NA 85.3 NA 98.9 86.5 99.3 94.8 ...
$ PrimarySchoolEnrollmentFemale: num NA 79.4 NA NA 84.1 NA 99.2 88.4 99.7 97 ...
> Summary(WHO_Europe)
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘Summary’ for signature ‘"data.frame"’
> write.csv(WHO_Europe,"WHO_Europe.CSV")
> Summary(WHO_Europe)
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘Summary’ for signature ‘"data.frame"’

Trying to draw a SPC Chart in R

I am trying to create a control chart using the code below but I am getting the error below. The data has the first Column as date then 12 other columns with different variables of data.
library("qcc")
attach(data)
Data_Frame_Data <- as.data.frame.matrix(data)
q <- qcc(Cancer_Activity
, type="xbar"
, nsigmas=3)
Error in sd.xbar(c(1396310400, 1398902400, 1401580800, 1404172800,
1406851200, : group sizes must be larger than one
This is the output when I run str(data)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 48 obs. of 13 variables:
$ Date : POSIXct, format: "2014-04-01" "2014-05-01" "2014-06-01" "2014-07-01" ...
$ CW_Activity : num 37 29.5 34 46 39.5 41.5 42 40 46 39.5 ...
$ CW_Breach : num 3.5 6 8.5 10 5.5 8 4.5 3 3.5 4 ...
$ ICHT_Activity: num 73.5 89 60 83.5 85 88.5 65.5 80 75.5 74 ...
$ ICHT_Breach : num 8 11.5 11.5 12 11 15 9.5 14 8.5 16.5 ...
$ LNWH_Activity: num 67 76.5 56 79.5 67 83 77.5 67 66 60.5 ...
$ LNWH_Breach : num 10 12.5 13 14 10.5 16 16.5 12 5 13.5 ...
$ THH_Activity : num 30 26 24.5 36 31 25 33 21.5 42 25.5 ...
$ THH_Breach : num 2 3 2 1 5 1.5 3.5 0.5 3.5 3 ...
$ RBH_Activity : num 2.5 5 6.5 7 6.5 7.5 3.5 9 8 6.5 ...
$ RBH_Breach : num 0.5 1 2 2 4 4 1 2 2.5 2 ...
$ NWL_Activity : num 210 226 181 252 229 ...
$ NWL_Breach : num 24 34 37 39 36 44.5 35 31.5 23 39 ...

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

r subset(df, condition) different result from df$[condition, ] [duplicate] - r

What about making sure you get rid of the NAs first? Outliers <- WHO[!is.na(WHO$GNI) & WHO$GNI > 10000 & !is.na(WHO$FertilityRate) & WHO$FertilityRate > 2.5,]

Related

Agricolae, tapply, error: arguments must have same length

Can we use as.factor to convert categorical variables having multiple levels for decision tree or we need to use model.matrix?

Descriptive stats for MI data in R: Take 3

Summary of a Subset in R does not work - Why?

Trying to draw a SPC Chart in R

Categories

Resources