How to split a CHR column, pivot, then combine tables?

How to split a CHR column, pivot, then combine tables? - r

So I have two tables:
LST data (24 months in total) (already pivoted_longer)
Buffer Date LST
<chr> <chr> <chr>
1 100 15/01/2010 6.091741043
2 100 16/02/2010 6.405879111
3 100 20/03/2010 8.925945159
4 100 24/04/2011 6.278147269
5 100 07/05/2010 6.133940129
6 100 08/06/2010 7.705591939
7 100 13/07/2011 4.066052173
8 100 11/08/2010 5.962087092
9 100 12/09/2010 5.761892842
10 100 17/10/2011 3.155769317
# ... with 1,550 more rows
Weather data (24 months in total)
Weather variable 15/01/2010 16/02/2010 20/03/2010 24/04/2011 07/05/2010
1 Temperature 12.0 15.0 16.0 23.00 21.50
2 Wind_speed 10.0 9.0 10.5 19.50 9.50
3 Wind_trend 1.0 1.0 1.0 0.00 1.00
4 Wind_direction 22.5 45.0 67.5 191.25 56.25
5 Humidity 40.0 44.5 22.0 24.50 7.00
6 Pressure 1024.0 1018.5 1025.0 1005.50 1015.50
7 Pressure_trend 1.0 1.0 1.0 1.00 1.00
If I pivot the weather data I get:
1 Temperature 15/01/2010 12
2 Temperature 16/02/2010 15
3 Temperature 20/03/2010 16
4 Temperature 24/04/2011 23
5 Temperature 07/05/2010 21.5
6 Temperature 08/06/2010 36.5
7 Temperature 13/07/2011 33
8 Temperature 11/08/2010 34.5
9 Temperature 12/09/2010 33
10 Temperature 17/10/2011 27
# ... with 158 more rows
(each weather variable listed in turn).
I need to combine 1) and 3) - using the date and something like data_long <- merge(LST_data,weather_data,by="Date") I think - appending weather data columns to each row in 1).
But I'm stuck.

The solution I found to this was to pivot the weather data (longer):
weather_long <- weather %>% pivot_longer(cols = 2:21, names_to = "Date", values_to = "Value")
which gives a tibble in the format:
# A tibble: 180 x 3
`Weather variable` Date Value
<chr> <chr> <dbl>
1 Temperature 28/10/2016 17
2 Temperature 31/12/2016 22
3 Temperature 16/01/2017 25
4 Temperature 05/03/2017 19
(as described above in the question).
Because this process changes the 'Date' variable type:
tibble [180 x 3] (S3: tbl_df/tbl/data.frame)
$ Weather variable: chr [1:180] "Temperature" "Temperature" "Temperature" "Temperature" ...
$ Date : chr [1:180] "28/10/2016" "31/12/2016" "16/01/2017" "05/03/2017" ...
$ Value : num [1:180] 17 22 25 19 20 22 11 10 3 9 ...
I then corrected this:
weather_long$Date <- as.Date(weather_long$Date, format = "%d/%m/%Y")
Next was to convert the weather data to the 'wide' format (in preparation for the next step):
weather_wide <- weather_long %>%
pivot_wider(names_from = "Weather variable", values_from = "Value")
Then join it to the LST data using the Date column as the key:
LST_Weather_dataset <- full_join(data_long, weather_wide, by = "Date")
This produced the desired result:
str(LST_Weather_dataset)
'data.frame': 380 obs. of 16 variables:
$ Buffer : int 100 200 300 400 500 600 700 800 900 1000 ...
$ Date : Date, format: "2016-10-28" "2016-10-28" "2016-10-28" "2016-10-28" ...
$ LST : num 0.918 0.951 0.791 0.748 0.687 ...
$ Month : num 10 10 10 10 10 10 10 10 10 10 ...
$ Year : num 2016 2016 2016 2016 2016 ...
$ JulianDay : num 302 302 302 302 302 302 302 302 302 302 ...
$ TimePeriod : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ Temperature : num 17 17 17 17 17 17 17 17 17 17 ...
$ Humidity : num 59 59 59 59 59 59 59 59 59 59 ...
$ Humidity_trend: num 1 1 1 1 1 1 1 1 1 1 ...
$ Wind_speed : num 19 19 19 19 19 19 19 19 19 19 ...
$ Wind_gust : num 0 0 0 0 0 0 0 0 0 0 ...
$ Wind_trend : num 2 2 2 2 2 2 2 2 2 2 ...
$ Wind_direction: num 338 338 338 338 338 ...
$ Pressure : num 1017 1017 1017 1017 1017 ...
$ Pressure_trend: num 2 2 2 2 2 2 2 2 2 2 ...

Related

Error in prune.tree(tree = dt, best = pruned_tree_size, method = "misclass") : misclass only for classification trees

'data.frame': 33510 obs. of 10 variables:
$ model : Factor w/ 92 levels " 1 Series"," 2 Series",..: 3 54 25 72 19 16 37 41 29 29 ...
$ year : int 2009 2019 2014 2016 2016 2017 2019 2017 2019 2015 ...
$ price : int 4675 40950 11472 17998 14399 9980 37990 14000 12299 8484 ...
$ transmission: Factor w/ 3 levels "Automatic","Manual",..: 2 3 3 2 2 2 1 2 2 2 ...
$ mileage : int 70000 19322 83417 30010 45693 70860 1499 20122 4132 25000 ...
$ fuelType : Factor w/ 4 levels "Diesel","Electric",..: 4 4 1 1 1 4 1 1 4 4 ...
$ tax : int 165 150 145 235 20 30 145 30 145 0 ...
$ mpg : num 47.9 34 54.3 44.1 65.7 55.4 40.9 64.2 48.7 65.7 ...
$ engineSize : num 2 3 2.1 2 2.1 1 2 1.5 1.1 1 ...
$ automaker : Factor w/ 4 levels "BMW","Ford","Mercedes",..: 1 1 3 2 3 2 3 2 2 2 ...
mycars_formula = price ~ year + transmission + mileage + fuelType + tax + mpg + engineSize + automaker
dt_mycars <- tree(mycars_formula, data = training_mycars)
cv_mycars <- cv.tree(dt_mycars, FUN=prune.misclass)
pruned_tree_size <- rev(cv_mycars$size)[which.min(rev(cv_mycars$dev))]
p_dt_mycars <- prune.misclass(dt_mycars, best = pruned_tree_size)
Error in prune.tree(tree = dt, best = pruned_tree_size, method = "misclass") :
misclass only for classification trees
Can someone explain to me why I cannot use misclass method?
I know that my factor model has too many levels so I exclude it from my formula. if you have a suggetion also about how i can include it as well it would be very helpful.

Can we use as.factor to convert categorical variables having multiple levels for decision tree or we need to use model.matrix?

I am trying to build a decison tree model in R having both categorical and numerical variables.Some categorical variables have 3 levels , so can I just use as.factor and then use in my model? I tried to use model.matrix but my doubt is model.matrix converts the variable in numeric values of 0s and 1s and splitting happens on basis of these numeric values. For eg if Color has 3 level- blue,red,green, the splitting rule will look like color_green < 0.5 instead it should always take 0s and 1s only.

If you are asking whether you can use factors to build an rpart decision tree. Then yes. See below example from the documentation. Note that there are a lot of possible packages for decision trees.
library(rpart)
rpart(Reliability ~ ., data=car90)
#> n=76 (35 observations deleted due to missingness)
#>
#> node), split, n, loss, yval, (yprob)
#> * denotes terminal node
#>
#> 1) root 76 53 average (0.2 0.12 0.3 0.11 0.28)
#> 2) Country=Germany,Korea,Mexico,Sweden,USA 49 29 average (0.31 0.18 0.41 0.1 0)
#> 4) Tires=145,155/80,165/80,185/80,195/60,195/65,195/70,205/60,215/65,225/75,275/40 17 9 Much worse (0.47 0.29 0 0.24 0) *
#> 5) Tires=175/70,185/65,185/70,185/75,195/75,205/70,205/75,215/70 32 12 average (0.22 0.12 0.62 0.031 0)
#> 10) HP.revs< 4650 13 7 Much worse (0.46 0.23 0.31 0 0) *
#> 11) HP.revs>=4650 19 3 average (0.053 0.053 0.84 0.053 0) *
#> 3) Country=Japan,Japan/USA 27 6 Much better (0 0 0.11 0.11 0.78) *
str(car90)
#> 'data.frame': 111 obs. of 34 variables:
#> $ Country : Factor w/ 10 levels "Brazil","England",..: 5 5 4 4 4 4 10 10 10 NA ...
#> $ Disp : num 112 163 141 121 152 209 151 231 231 189 ...
#> $ Disp2 : num 1.8 2.7 2.3 2 2.5 3.5 2.5 3.8 3.8 3.1 ...
#> $ Eng.Rev : num 2935 2505 2775 2835 2625 ...
#> $ Front.Hd : num 3.5 2 2.5 4 2 3 4 6 5 5.5 ...
#> $ Frt.Leg.Room: num 41.5 41.5 41.5 42 42 42 42 42 41 41 ...
#> $ Frt.Shld : num 53 55.5 56.5 52.5 52 54.5 56.5 58.5 59 58 ...
#> $ Gear.Ratio : num 3.26 2.95 3.27 3.25 3.02 2.8 NA NA NA NA ...
#> $ Gear2 : num 3.21 3.02 3.25 3.25 2.99 2.85 2.84 1.99 1.99 2.33 ...
#> $ HP : num 130 160 130 108 168 208 110 165 165 101 ...
#> $ HP.revs : num 6000 5900 5500 5300 5800 5700 5200 4800 4800 4400 ...
#> $ Height : num 47.5 50 51.5 50.5 49.5 51 49.5 50.5 51 50.5 ...
#> $ Length : num 177 191 193 176 175 186 189 197 197 192 ...
#> $ Luggage : num 16 14 17 10 12 12 16 16 16 15 ...
#> $ Mileage : num NA 20 NA 27 NA NA 21 NA 23 NA ...
#> $ Model2 : Factor w/ 21 levels ""," Turbo 4 (3)",..: 1 1 1 1 1 1 1 14 13 1 ...
#> $ Price : num 11950 24760 26900 18900 24650 ...
#> $ Rear.Hd : num 1.5 2 3 1 1 2.5 2.5 4.5 3.5 3.5 ...
#> $ Rear.Seating: num 26.5 28.5 31 28 25.5 27 28 30.5 28.5 27.5 ...
#> $ RearShld : num 52 55.5 55 52 51.5 55.5 56 58.5 58.5 56.5 ...
#> $ Reliability : Ord.factor w/ 5 levels "Much worse"<"worse"<..: 5 5 NA NA 4 NA 3 3 3 NA ...
#> $ Rim : Factor w/ 6 levels "R12","R13","R14",..: 3 4 4 3 3 4 3 3 3 3 ...
#> $ Sratio.m : num NA NA NA NA NA NA NA NA NA NA ...
#> $ Sratio.p : num 0.86 0.96 0.97 0.71 0.88 0.78 0.76 0.83 0.87 0.88 ...
#> $ Steering : Factor w/ 3 levels "manual","power",..: 2 2 2 2 2 2 2 2 2 2 ...
#> $ Tank : num 13.2 18 21.1 15.9 16.4 21.1 15.7 18 18 16.5 ...
#> $ Tires : Factor w/ 30 levels "145","145/80",..: 16 20 20 8 17 28 13 23 23 22 ...
#> $ Trans1 : Factor w/ 4 levels "","man.4","man.5",..: 3 3 3 3 3 3 1 1 1 1 ...
#> $ Trans2 : Factor w/ 4 levels "","auto.3","auto.4",..: 3 3 2 2 3 3 2 3 3 3 ...
#> $ Turning : num 37 42 39 35 35 39 41 43 42 41 ...
#> $ Type : Factor w/ 6 levels "Compact","Large",..: 4 3 3 1 1 3 3 2 2 NA ...
#> $ Weight : num 2700 3265 2935 2670 2895 ...
#> $ Wheel.base : num 102 109 106 100 101 109 105 111 111 108 ...
#> $ Width : num 67 69 71 67 65 69 69 72 72 71 ...

List being added to a dataframe

Why is a list being added to my dataframe here?
Here's my dataframe
df <- data.frame(ch = rep(1:10, each = 12), # care home id
year_id = rep(2018),
month_id = rep(1:12), # month using the system over the course of a year (1 = first month, 2 = second month...etc.)
totaladministrations = rbinom(n=120, size = 1000, prob = 0.6), # administrations that were scheduled to have been given in the month
missed = rbinom(n=120, size = 20, prob = 0.8), # administrations that weren't given in the month (these are bad!)
beds = rep(rbinom(n = 10, size = 60, prob = 0.6), each = 12), # number of beds in the care home
rating = rep(rbinom(n= 10, size = 4, prob = 0.5), each = 12)) # latest inspection rating (1. Inadequate, 2. Requires Improving, 3. Good, 4 Outstanding)
df <- arrange(df, df$ch, df$year_id, df$month_id)
str(df)
> str(df)
'data.frame': 120 obs. of 7 variables:
$ ch : int 1 1 1 1 1 1 1 1 1 1 ...
$ year_id : num 2018 2018 2018 2018 2018 ...
$ month_id : int 1 2 3 4 5 6 7 8 9 10 ...
$ totaladministrations: int 576 598 608 576 608 637 611 613 593 626 ...
$ missed : int 18 18 19 16 16 13 17 16 15 17 ...
$ beds : int 38 38 38 38 38 38 38 38 38 38 ...
$ rating : int 2 2 2 2 2 2 2 2 2 2 ...
All good so far.
I just want to add another column that sequences the month number within the ch group (this equates to the actual month_id in this example but ignore that, my real life data is different), so I'm using:
df <- df %>% group_by(ch) %>%
mutate(sequential_month_counter = 1:n())
This appears to add a bunch stuff I don't really understand or want or need, such as a list ...
str(df)
> str(df)
Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 120 obs. of 8 variables:
$ ch : int 1 1 1 1 1 1 1 1 1 1 ...
$ year_id : num 2018 2018 2018 2018 2018 ...
$ month_id : int 1 2 3 4 5 6 7 8 9 10 ...
$ totaladministrations : int 601 590 593 599 615 611 628 587 604 600 ...
$ missed : int 16 14 17 16 18 16 15 18 15 20 ...
$ beds : int 35 35 35 35 35 35 35 35 35 35 ...
$ rating : int 3 3 3 3 3 3 3 3 3 3 ...
$ sequential_month_counter: int 1 2 3 4 5 6 7 8 9 10 ...
- attr(*, "groups")=Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 10 obs. of 2 variables:
..$ ch : int 1 2 3 4 5 6 7 8 9 10
..$ .rows:List of 10
.. ..$ : int 1 2 3 4 5 6 7 8 9 10 ...
.. ..$ : int 13 14 15 16 17 18 19 20 21 22 ...
.. ..$ : int 25 26 27 28 29 30 31 32 33 34 ...
.. ..$ : int 37 38 39 40 41 42 43 44 45 46 ...
.. ..$ : int 49 50 51 52 53 54 55 56 57 58 ...
.. ..$ : int 61 62 63 64 65 66 67 68 69 70 ...
.. ..$ : int 73 74 75 76 77 78 79 80 81 82 ...
.. ..$ : int 85 86 87 88 89 90 91 92 93 94 ...
.. ..$ : int 97 98 99 100 101 102 103 104 105 106 ...
.. ..$ : int 109 110 111 112 113 114 115 116 117 118 ...
..- attr(*, ".drop")= logi TRUE
What's going on here? I just want a dataframe. Why is there all that additional output after $ sequential_month_counter: int 1 2 3 4 5 6 7 8 9 10 ... and more importantly can I ignore it and just keep treating it as a normal dataframe (i'll be running some generalised linear mixed models on the df)?

The attribute "groups" is where dplyr stores the grouping information added when you did group_by(ch). It doesn't hurt anything, and it will disappear if you ungroup():
df %>% group_by(ch) %>%
mutate(sequential_month_counter = 1:n()) %>%
ungroup %>%
str
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 120 obs. of 8 variables:
# $ ch : int 1 1 1 1 1 1 1 1 1 1 ...
# $ year_id : num 2018 2018 2018 2018 2018 ...
# $ month_id : int 1 2 3 4 5 6 7 8 9 10 ...
# $ totaladministrations : int 575 597 579 605 582 599 577 604 630 632 ...
# $ missed : int 18 16 16 18 18 11 10 13 17 16 ...
# $ beds : int 33 33 33 33 33 33 33 33 33 33 ...
# $ rating : int 3 3 3 3 3 3 3 3 3 3 ...
# $ sequential_month_counter: int 1 2 3 4 5 6 7 8 9 10 ...
As a side-note, you should use bare column names inside dplyr verbs, not data$column. With arrange, it doesn't much matter, but in grouped operations it will cause bugs. You should get in the habit of using arrange(df, ch, year_id, month_id) instead of arrange(df, df$ch, df$year_id, df$month_id).

R: Subsetting returns "0 obs."

I'm trying to subset my dataset 'eggdat' for daytime and nighttime hours. This:
'data.frame': 54847 obs. of 10 variables:
$ year : int 2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 ...
$ month : int 7 7 7 7 7 7 7 7 7 7 ...
$ day : int 31 31 31 31 31 31 31 31 31 31 ...
$ hour : int 20 20 20 20 20 20 20 20 20 20 ...
$ minute: int 5 5 5 5 5 5 5 5 5 5 ...
$ second: int 0 1 2 3 4 5 6 7 8 9 ...
$ Roll : num -159 179 -164 -155 -137 ...
$ Pitch : num -31.36 -41.05 -23.85 -6.62 -9.13 ...
$ Yaw : num -71.8 -113.3 -67.2 -140.2 -78.2 ...
$ temp1 : num 25 33.5 34 34 34 34 34 34 34 34 ...
Subsetting for daytime works fine:
daytime <- eggdat[eggdat$hour >= 7 & eggdat$hour <= 20, ]
'data.frame': 18847 obs. of 10 variables:
$ year : int 2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 ...
$ month : int 7 7 7 7 7 7 7 7 7 7 ...
$ day : int 31 31 31 31 31 31 31 31 31 31 ...
$ hour : int 20 20 20 20 20 20 20 20 20 20 ...
$ minute: int 5 5 5 5 5 5 5 5 5 5 ...
$ second: int 0 1 2 3 4 5 6 7 8 9 ...
$ Roll : num -159 179 -164 -155 -137 ...
$ Pitch : num -31.36 -41.05 -23.85 -6.62 -9.13 ...
$ Yaw : num -71.8 -113.3 -67.2 -140.2 -78.2 ...
$ temp1 : num 25 33.5 34 34 34 34 34 34 34 34 ...
Doing exactly the same thing for nighttime, however, returns a subset with 0 observations:
nighttime <- eggdat[eggdat$hour <= 7 & eggdat$hour >= 21, ]
'data.frame': 0 obs. of 10 variables:
$ year : int
$ month : int
$ day : int
$ hour : int
$ minute: int
$ second: int
$ Roll : num
$ Pitch : num
$ Yaw : num
$ temp1 : num
I really don't know what to do.. I tried using subset , but without success.. I also tried eggdat$hour <- as.factor(eggdat$hour), but couldn't get it to work either.
Even more confusingly, adding the quotation marks in the subset function (daytime <- eggdat[eggdat$hour >= '7' & eggdat$hour <= '20', ] and nighttime <- eggdat[eggdat$hour <= '7' & eggdat$hour >= '21', ]) resulted in the daytime subset containing '0 obs.', but the nighttime subset working fine, so it's just the other way around!
Daytime: 'data.frame': 0 obs. of 10 variables:
Nighttime:
'data.frame': 28800 obs. of 10 variables:
$ year : int 2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 ...
$ month : int 7 7 7 7 7 7 7 7 7 7 ...
$ day : int 31 31 31 31 31 31 31 31 31 31 ...
$ hour : int 21 21 21 21 21 21 21 21 21 21 ...
$ minute: int 0 0 0 0 0 0 0 0 0 0 ...
$ second: int 0 1 2 3 4 5 6 7 8 9 ...
$ Roll : num 65.8 65.8 66.1 65.6 65.6 ...
$ Pitch : num 6.35 6.34 6.24 6.4 6.27 ...
$ Yaw : num 171 172 174 176 176 ...
$ temp1 : num 41.5 41.5 41.5 41.5 41.5 41.5 41.5 41.5 41.5 41.5 ...
I really don't know what to do, I'm very confused by all of this..

You want eggdat[eggdat$hour <= 7 | eggdat$hour >= 21, ]
x < 7 & x > 21 translates to x smaller than 7 AND larger than 21
x < 7 | x > 21 translates to x smaller than 7 OR larger than 21

Carc data from rda file to numeric matrix

I try to make KDA (Kernel discriminant analysis) for carc data, but when I call command X<-data.frame(scale(X)); r shows error:
"Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric"
I tried to use as.numeric(as.matrix(carc)) and carc<-na.omit(carc), but it does not help either
library(ks);library(MASS);library(klaR);library(FSelector)
install.packages("klaR")
install.packages("FSelector")
library(ks);library(MASS);library(klaR);library(FSelector)
attach("carc.rda")
data<-load("carc.rda")
data
carc<-na.omit(carc)
head(carc)
class(carc) # check for its class
class(as.matrix(carc)) # change class, and
as.numeric(as.matrix(carc))
XX<-carc
X<-XX[,1:12];X.class<-XX[,13];
X<-data.frame(scale(X));
fit.pc<-princomp(X,scores=TRUE);
plot(fit.pc,type="line")
X.new<-fit.pc$scores[,1:5]; X.new<-data.frame(X.new);
cfs(X.class~.,cbind(X.new,X.class))
X.new<-fit.pc$scores[,c(1,4)]; X.new<-data.frame(X.new);
fit.kda1<-Hkda(x=X.new,x.group=X.class,pilot="samse",
bw="plugin",pre="sphere")
kda.fit1 <- kda(x=X.new, x.group=X.class, Hs=fit.kda1)
Can you help to resolve this problem and make this analysis?
Added:The car data set( Chambers, kleveland, Kleiner & Tukey 1983)
> head(carc)
P M R78 R77 H R Tr W L T D G C
AMC_Concord 4099 22 3 2 2.5 27.5 11 2930 186 40 121 3.58 US
AMC_Pacer 4749 17 3 1 3.0 25.5 11 3350 173 40 258 2.53 US
AMC_Spirit 3799 22 . . 3.0 18.5 12 2640 168 35 121 3.08 US
Audi_5000 9690 17 5 2 3.0 27.0 15 2830 189 37 131 3.20 Europe
Audi_Fox 6295 23 3 3 2.5 28.0 11 2070 174 36 97 3.70 Europe

Here is a small dataset with similar characteristics to what you describe
in order to answer this error:
"Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric"
carc <- data.frame(type1=rep(c('1','2'), each=5),
type2=rep(c('5','6'), each=5),
x = rnorm(10,1,2)/10, y = rnorm(10))
This should be similar to your data.frame
str(carc)
# 'data.frame': 10 obs. of 3 variables:
# $ type1: Factor w/ 2 levels "1","2": 1 1 1 1 1 2 2 2 2 2
# $ type2: Factor w/ 2 levels "5","6": 1 1 1 1 1 2 2 2 2 2
# $ x : num -0.1177 0.3443 0.1351 0.0443 0.4702 ...
# $ y : num -0.355 0.149 -0.208 -1.202 -1.495 ...
scale(carc)
# Similar error
# Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric
Using set()
require(data.table)
DT <- data.table(carc)
cols_fix <- c("type1", "type2")
for (col in cols_fix) set(DT, j=col, value = as.numeric(as.character(DT[[col]])))
str(DT)
# Classes ‘data.table’ and 'data.frame': 10 obs. of 4 variables:
# $ type1: num 1 1 1 1 1 2 2 2 2 2
# $ type2: num 5 5 5 5 5 6 6 6 6 6
# $ x : num 0.0465 0.1712 0.1582 0.1684 0.1183 ...
# $ y : num 0.155 -0.977 -0.291 -0.766 -1.02 ...
# - attr(*, ".internal.selfref")=<externalptr>

The first column(s) of your data set may be factors. Taking the data from corrgram:
library(corrgram)
carc <- auto
str(carc)
# 'data.frame': 74 obs. of 14 variables:
# $ Model : Factor w/ 74 levels "AMC Concord ",..: 1 2 3 4 5 6 7 8 9 10 ...
# $ Origin: Factor w/ 3 levels "A","E","J": 1 1 1 2 2 2 1 1 1 1 ...
# $ Price : int 4099 4749 3799 9690 6295 9735 4816 7827 5788 4453 ...
# $ MPG : int 22 17 22 17 23 25 20 15 18 26 ...
# $ Rep78 : num 3 3 NA 5 3 4 3 4 3 NA ...
# $ Rep77 : num 2 1 NA 2 3 4 3 4 4 NA ...
# $ Hroom : num 2.5 3 3 3 2.5 2.5 4.5 4 4 3 ...
# $ Rseat : num 27.5 25.5 18.5 27 28 26 29 31.5 30.5 24 ...
# $ Trunk : int 11 11 12 15 11 12 16 20 21 10 ...
# $ Weight: int 2930 3350 2640 2830 2070 2650 3250 4080 3670 2230 ...
# $ Length: int 186 173 168 189 174 177 196 222 218 170 ...
# $ Turn : int 40 40 35 37 36 34 40 43 43 34 ...
# $ Displa: int 121 258 121 131 97 121 196 350 231 304 ...
# $ Gratio: num 3.58 2.53 3.08 3.2 3.7 3.64 2.93 2.41 2.73 2.87 ...
So exclude them by trying this:
X<-XX[,3:14]
or this
X<-XX[,-(1:2)]

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to split a CHR column, pivot, then combine tables? - r

Related

Error in prune.tree(tree = dt, best = pruned_tree_size, method = "misclass") : misclass only for classification trees

Can we use as.factor to convert categorical variables having multiple levels for decision tree or we need to use model.matrix?

List being added to a dataframe

R: Subsetting returns "0 obs."

Carc data from rda file to numeric matrix

Categories

Resources