I would like to subset a data into few pieces given a cutoff keyword.
Would prefer flexible sub-setting (ie cutoff varies depending on user input). In this case: two cutoff keywords resulting in three output tables. Thanks!
keyword_cutoff <- c("Merc 240D", "Fiat 128")
data input:
library(data.table)
tmp_mtcars <- setDT(mtcars, keep.rownames =TRUE)[]
colnames(tmp_mtcars)[1] <- "cartype"
desired output table #1:
cartype mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
desired output table #2:
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
desired output table #3:
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
We need a grouping variable to split. This can be done by converting a logical vector (cartype %in% keyword_cutoff) into numeric index by taking the cumulative sum of the logical index and split it to a list of data.tables
lst <- split(tmp_mtcars, tmp_mtcars[, cumsum(cartype %in% keyword_cutoff)])
lst[[1]]
# cartype mpg cyl disp hp drat wt qsec vs am gear carb
#1: Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#2: Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#3: Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#4: Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#5: Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#6: Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
#7: Duster 360 14.3 8 360 245 3.21 3.570 15.84 0 0 3 4
Related
I want to know how to use ifelse() statements to get different kinds of objects in R.
Specifically, I want to do this with the mtcars data, which is a list type object. I used the code below to create a character object version of the name of the data, called mtcars__string_object.
mtcars__string_object <- c("mtcars")
For the object itself, here is the code I want to use to get this data via an ifelse() statement, which does not work:
test_1 <-
ifelse(
((typeof(mtcars) == "list") == TRUE),
(mtcars),
NA
)
It does not work because it doesn't give me the dataset in its original form. Here are the results:
> test_1
[[1]]
[1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
[16] 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
[31] 15.0 21.4
When I just use the name of the object, here is the code I want to use to get this data via an ifelse() statement, which does not work:
test_2 <-
ifelse(
((typeof(mtcars) == "character") == TRUE),
(get(mtcars__string_object[1])),
NA
)
It does not work because it doesn't give me the dataset in its original form. Here are the results:
> test_2
[1] NA
I'm not sure how to fix the code to get the desired result. Please advise. Thanks.
Here is the code that I used for the example:
# StackOverflow materials
## sets up data
### name of object
mtcars
### text object with name of dataset
mtcars__string_object <- c("mtcars")
## checks typeof() of objects
typeof(mtcars)
typeof(mtcars__string_object)
## ifelse() statements to call data
### for object itself
# ---- NOTE: creates object
test_1 <-
ifelse(
((typeof(mtcars) == "list") == TRUE),
(mtcars),
NA
)
# ---- NOTE: displays object
test_1
# ---- NOTE: does not work
### for text object with name of dataset
# ---- NOTE: creates object
test_2 <-
ifelse(
((typeof(mtcars) == "character") == TRUE),
(get(mtcars__string_object[1])),
NA
)
# ---- NOTE: displays object
test_2
# ---- NOTE: does not work
ifelse requires all the arguments to be same length. Here, it is not the case. Therefore, we need if/else
if(is.data.frame(mtcars) ) get(mtcars__string_object) else NA
-ouptut
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
If you really want to use ifelse, below is a workaround using list
> ifelse(is.data.frame(mtcars), list(get(mtcars__string_object)), NA)[[1]]
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
Anyway, if ... else ... is a more proper way for your case, as #akrun's answer gives.
use `if`
test_3 <-
`if`(
((typeof(mtcars) == "list") == TRUE),
(mtcars),
NA
)
test_3
you will get answer.
> test_3
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
This question already has answers here:
Filter causes data missing in R [duplicate]
(1 answer)
Filter data.frame rows by a logical condition
(9 answers)
Closed 3 years ago.
I was playing around with subsetting rows from a dataframe in R. The following code selects only those rows with a cyl value of 4 or 6 from mtcars:
> mtcars[mtcars$cyl %in% c(4, 6), ]
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
So far so good. Then, just for fun and because I wasn't sure what it would do (I thought it didn't make sense) I substited == for %in%:
> mtcars[mtcars$cyl == c(4, 6), ]
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
Note how this returns some but not all rows with a cyl value of 4 or 6. I wasn't sure what to expect in the first place, but now I'm left wondering: why does this return a subset of rows with a cyl value of 4 or 6, and what is the logic? (That is, why does it return only these specific rows?)
I'm including the full dataframe for reference.
> mtcars
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
So I understand how the apply function should be used but I am not sure how to integrate with an IF statement. Here is my attempt and can someone please push me in the right direction:
data <- mtcars
apply(data, 1, function(x) {
if (data$mpg < 20) {
data$colour <- "blue"
} else {
data$colour <- "red"
}
})
I just want to add a column to data for each row of the data frame for values in data$mpg between certain ranges.
You can do this using vectorization, which is preferred in R due to its speed:
data <- mtcars
data$colour <- ifelse(data$mpg < 20, data$colour <- "blue", data$colour <- "red")
This yields the following data.frame:
mpg cyl disp hp drat wt qsec vs am gear carb colour
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 red
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 red
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 red
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 red
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 blue
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 blue
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 blue
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 red
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 red
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 blue
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 blue
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 blue
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 blue
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 blue
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 blue
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 blue
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 blue
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 red
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 red
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 red
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 red
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 blue
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 blue
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 blue
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 blue
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 red
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 red
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 red
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 blue
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 blue
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 blue
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 red
A base based option using within:
head(within(mtcars,{
my_col <-ifelse(mpg < 20, "blue", "red")
}),3)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
my_col
Mazda RX4 red
Mazda RX4 Wag red
Datsun 710 red
Or with sapply which in my experience is a bit faster than applying at a margin of 2:
mtcars$colour<-sapply(mtcars[,"mpg"], function(x) ifelse(x<20,"blue","red"))
#rm(mtcars)
#data(mtcars)
#restores mtcars^^
I have a dataset which contains 45% of Missing values:
I would like to remove the rows which has NA's values for a given period. for example, if there are rows continuously has missing values ,for almost an hour or more than 50 values missing continuously , i want to remove that rows alone.
And i don't want to leave the rows with missing values less than 15 or 25.
In short,
1) I don't want to remove all rows that has got NA value's.
2) I want to remove rows that continuously has NA values in a column
example data:
pic
Discard columnwise contiguous NAs
Try this, which uses rle(is.na...)) to determine runs of NAs. If any are > num_runs then it is discarded (Data at bottom)
myfun <- function(x, num_runs) {
# x is vector column of df
require(dplyr)
runs <- cumsum(rle(is.na(x))$lengths)
vals <- rle(is.na(x))$values
start <- dplyr::lag(runs)+1
start <- replace(start, is.na(start), 1)
M <- rbind(start[vals], runs[vals])
seqruns <- apply(M, 2, function(x) if ((x[2]-x[1]+1) > num_runs) { seq(x[1],x[2]) })
ans <- unlist(seqruns)
return(ans)
}
library(purrr)
library(dplyr)
num_runs <- 4
discard <- unlist(map(1:ncol(df), ~myfun(df[,.x, num_runs])))
df[-discard,]
Output
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 NA 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 NA 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 NA 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 NA 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
Discard rowwise contiguous NAs
Try this, which uses rle(is.na...)) to determine runs of NAs. If any are > num_runs then it is discarded (Data at bottom)
library(purrr)
num_runs <- 1 # number of contiguous NAs
keep <- map_lgl(1:nrow(df), ~!any(rle(is.na(unlist(df[.x,])))$lengths[rle(is.na(unlist(df[.x,])))$values] > num_runs))
df[keep,]
Output
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 NA 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 NA 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 NA 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 NA 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 NA 4
Lincoln Continental 10.4 8 460.0 215 NA 5.424 17.82 0 0 NA 4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 NA 4
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 NA 1
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 NA 2
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 NA 2
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 NA 4
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 NA 2
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
Data
library(dplyr)
df <- mtcars %>% replace(.==3, NA)
I will show you what I want to acheive on the example because my data is too big...
Example:
> mtcars
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
There is a column called carb. You can find the numbers from 1 to 8 in this column. I would like to be able to tell R to show me all row.names with the carb = 3. Just row.names not a whole rows.
Another way to subset:
rownames(mtcars[mtcars$carb == 3,])
#[1] "Merc 450SE" "Merc 450SL" "Merc 450SLC"
Treat rownames(mtcars) as a vector and subset as usual:
rownames(mtcars)[mtcars[["carb"]] == 3]