How can I assign NA? - r

I use the mtcars dataframe and I use the following code to assign NA to all the values of columns drat and wt <=3.
df <- mtcars %>%
mutate(across(c(drat, wt), ~ifelse(.x<=3, NA, .x)))
How can I modify the code in a way that let me to assign NA also to the values of the column qsec if the value of drat or wt in the same row is <=3? At the end I want that each row where drat or wt is NA has NA also in the column qsec. Thanks

We may use if_any on the columns that are changed to NA to return a logical vector to replace values in 'qsec'
library(dplyr)
mtcars1 <- mtcars %>%
mutate(across(c(drat, wt), ~ifelse(.x<=3, NA, .x)),
qsec = ifelse(if_any(c(drat, wt), is.na), NA, qsec))
-output
> head(mtcars1)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 NA NA 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 NA NA 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 NA NA 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 NA 3.460 NA 1 0 3 1

1) dplyr Continue the mutate to set qsec to NA if either drat or wt is NA. (If you meant and rather than or then replace | with & .)
mtcars %>%
mutate(across(c(drat, wt), ~ifelse(.x<=3, NA, .x)),
qsec = ifelse(is.na(drat) | is.na(wt), NA, qsec))
giving:
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 NA NA 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 NA NA 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 NA NA 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 NA 3.460 NA 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
Cadillac Fleetwood 10.4 8 472.0 205 NA 5.250 NA 0 0 3 4
Lincoln Continental 10.4 8 460.0 215 NA 5.424 NA 0 0 3 4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Fiat 128 32.4 4 78.7 66 4.08 NA NA 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 NA NA 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 NA NA 1 1 4 1
Toyota Corona 21.5 4 120.1 97 3.70 NA NA 1 0 3 1
Dodge Challenger 15.5 8 318.0 150 NA 3.520 NA 0 0 3 2
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Fiat X1-9 27.3 4 79.0 66 4.08 NA NA 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 NA NA 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 NA NA 1 1 5 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 NA NA 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
Volvo 142E 21.4 4 121.0 109 4.11 NA NA 1 1 4 2
2) Base R With base R we can use within like this giving the same result.
within(mtcars, {
drat <- ifelse(drat <= 3, NA, drat)
wt <- ifelse(wt <= 3, NA, wt)
qsec <- ifelse(is.na(drat) | is.na(wt), NA, qsec)
})
or at the expense of some redundancy we could use transform:
transform(mtcars,
drat = ifelse(drat <= 3, NA, drat),
wt = ifelse(wt <= 3, NA, wt),
qsec = ifelse(drat <= 3 | wt <= 3, NA, qsec))

Since the question is not tagged tidyverse here is a base R way with is.na<-.
is.na(mtcars$drat) <- mtcars$drat < 3
is.na(mtcars$wt) <- mtcars$wt < 3
is.na(mtcars$qsec) <- with(mtcars, is.na(drat) | is.na(wt))

Related

How to randomly sample multiple consecutive rows of a dataframe in R?

I've a dataframe with 100 rows and 20 columns and want to randomly sample 5 times 10 consecutive rows, e.g. 10:19, 25:34, etc. With: sample_n( df, 5 ) I'm able to extract 5 unique, randomly sampled rows, but don't know how to sample consecutive rows. Any help? Thanks!
df <- mtcars
df$row_nm <- seq(nrow(df))
set.seed(7)
sample_seq <- function(n, N) {
i <- sample(seq(N), size = 1)
ifelse(
test = i + (seq(n) - 1) <= N,
yes = i + (seq(n) - 1),
no = i + (seq(n) - 1) - N
)
}
replica <- replicate(n = 5, sample_seq(n = 10, N = nrow(df)))
# result
lapply(seq(ncol(replica)), function(x) df[replica[, x], ])
#> [[1]]
#> mpg cyl disp hp drat wt qsec vs am gear carb row_nm
#> Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 10
#> Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 11
#> Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 12
#> Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 13
#> Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 14
#> Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 15
#> Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 16
#> Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 17
#> Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 18
#> Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 19
#>
#> [[2]]
#> mpg cyl disp hp drat wt qsec vs am gear carb row_nm
#> Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 19
#> Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 20
#> Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 21
#> Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 22
#> AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 23
#> Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 24
#> Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 25
#> Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 26
#> Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 27
#> Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 28
#>
#> [[3]]
#> mpg cyl disp hp drat wt qsec vs am gear carb row_nm
#> Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 31
#> Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 32
#> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 1
#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 2
#> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 3
#> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 4
#> Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 5
#> Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 6
#> Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 7
#> Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 8
#>
#> [[4]]
#> mpg cyl disp hp drat wt qsec vs am gear carb row_nm
#> Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 28
#> Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 29
#> Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 30
#> Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 31
#> Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 32
#> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 1
#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 2
#> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 3
#> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 4
#> Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 5
#>
#> [[5]]
#> mpg cyl disp hp drat wt qsec vs am gear carb row_nm
#> Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 7
#> Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 8
#> Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 9
#> Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 10
#> Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 11
#> Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 12
#> Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 13
#> Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 14
#> Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 15
#> Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 16
Created on 2022-01-24 by the reprex package (v2.0.1)
You could so something like:
#sample data
df <- data.table(value = 1:100000)
#function which sampled consecutive rows (x = dataframe, rows = nr of consecutive rows, nr = amount of times you want to sample consecutive rows)
sample_fun <- function(x, rows, nr){
#maximum number which can be sampled
numbers <- 1:(nrow(x) - rows)
#randomly sample 5 numbers
sampled.numbers <- sample(numbers, nr)
#convert to vector (5 consecutive)
sampled.rows <- lapply(sampled.numbers, function(x){seq(x, (x+rows-1), 1)})
sampled.rows <- do.call(c, sampled.rows)
#sample and return
result <- x[sampled.rows,]
return(result)
}
sample_fun(x = df, rows = 5, nr = 2)
You don't mention if this can include replacement (i.e. if you sample 10:19, can you then also sample 15:24?). You also don't mention if you can sample anything above row 91, which would mean that sample of 10 gets cut off (i.e. 98,99,100 would only be 3 consecutive rows unless you want that to loop back to row 1). Assuming you can sample any value with replacement, the solution can be done in one line:
sapply(sample(1:100,5),function(x){seq(x,x+9)})
This applies the sequence function to each of 5 individually sampled numbers. The output will be a matrix, where each column is a sample of 10 consecutive rows, but as noted, these will potentially overlap, or go above 100.
If you want a solution where the rows will not overlap at all, and avoiding values over 100, without making values above 91 less likely to be sampled, this actually gets kind of trick, but I think the code below should work. You cant just sample from 1:91 without affect probability of your random sample, because then this means a value like 100 actually only has a 1/91 probability of being sampled (the sample value has to be 91), where other values don't involve that same constraint. This solution makes it so all rows are equally likely to be sampled.
Rows=c(1:100,1:100)
SampleRows=matrix(0,nrow=10,ncol=5)
for(i in 1:ncol(SampleRows)){
SampledValue=sample(Rows,1)
RowsIndex=min(which(Rows==SampledValue))
Sequence=Rows[RowsIndex:(RowsIndex+9)]
SampleRows[,i]=Sequence
Rows=Rows[!(Rows %in% Sequence)]
}
This approach creates a vector that sequences from 1:100, repeated twice (variable Rows), you'll see why this is important in a bit. For each of 5 iterations (corresponding to 5 samples), we take a sampled value from Rows, which will be a number 1:100, we then figure out where that number is in Rows, and take all 9 values next to it. In the first sample this will always be 10 consecutive numbers (e.g. 20:29). But then we remove those sampled values from Rows. If we happen to get the next sample as a value that would lead to overlap (like 18), then instead it samples (18,19,30,31,32,33,34...) since 20:29 have been removed. We need to do 1:100 twice in Rows, so that if we sample a value like 99, it resets from 100, back to 1.
If you want your output in a vector,throw in this at the end
sort(as.vector(SampleRows))
Let me know if this works for the needs of your problem.

Make a custom function of an dplyr procedure

I would like to make a custom function of this modified dplyr procedure:
randomly replacing percentage of values per group with NA in R dataframe
library(dplyr)
mtcars %>%
mutate(mpg = replace(mpg, sample(row_number(),
size = ceiling(0.3 * n()), replace = FALSE), NA))
The arguments should be:
df = dataframe
x = column
y = double number (here 0.3)
My approach so far:
my_func <- function(df,x,y){
df %>%
mutate(x = replace({{x}}, sample(row_number(),
size = ceiling(y * n()), replace = FALSE), NA))
}
When applying this function:
my_func(mtcars, mtcars$mpg, 0.3)
#gives:
mpg cyl disp hp drat wt qsec vs am gear carb x
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 21.0
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 NA
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 22.8
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 21.4
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 NA
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 18.1
#....etc..
My question:
I want to change mpg column by adding NA's there, not adding a new column x
Putting the first x = in {{x}} = throws an error:
Error: unexpected '=' in:
" df %>%
mutate({{x}} ="
> size = ceiling(y * n()), replace = FALSE), NA))
Error: unexpected ',' in " size = ceiling(y * n()),"
> }
Error: unexpected '}' in "}"
>
This works; does it solve your problem?
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
my_func <- function(df,x,y){
df %>%
mutate({{x}} := replace({{x}}, sample(row_number(),
size = ceiling(y * n()), replace = FALSE), NA))
}
my_func(mtcars, mpg, 0.3)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 NA 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 NA 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#> Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#> Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
#> Merc 230 NA 4 140.8 95 3.92 3.150 22.90 1 0 4 2
#> Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
#> Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
#> Merc 450SE NA 8 275.8 180 3.07 4.070 17.40 0 0 3 3
#> Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
#> Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
#> Cadillac Fleetwood NA 8 472.0 205 2.93 5.250 17.98 0 0 3 4
#> Lincoln Continental NA 8 460.0 215 3.00 5.424 17.82 0 0 3 4
#> Chrysler Imperial NA 8 440.0 230 3.23 5.345 17.42 0 0 3 4
#> Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#> Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
#> Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
#> Toyota Corona NA 4 120.1 97 3.70 2.465 20.01 1 0 3 1
#> Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
#> AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
#> Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
#> Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
#> Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
#> Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
#> Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
#> Ford Pantera L NA 8 351.0 264 4.22 3.170 14.50 0 1 5 4
#> Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
#> Maserati Bora NA 8 301.0 335 3.54 3.570 14.60 0 1 5 8
#> Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
Created on 2021-12-18 by the reprex package (v2.0.1)
From https://adv-r.hadley.nz/quasiquotation.html?q=:=#tidy-dots :
:= is like a vestigial organ: it’s recognised by R’s parser, but it doesn’t have any code associated with it. It looks like an = but allows expressions on either side, making it a more flexible alternative to =. It is used in data.table for similar reasons.

PCA analysis: getting error in dim desc(): not convenient data

I have conducted PCA on a set of data using prcomp. As a final step I am trying to use the dimdesc() function from FactoMineR to obtain p-values that identify the most significantly associated variables with my principal components.
The data frame has seven variables all of which are numerical and there are no missing values. The names are standard names such as "RCH_Home" (just in case the names could be problematic).
I write the following function:
res.desc <- dimdesc(df_PCA, axes = c(1:2), proba = 0.05)
And get the following error message:
Error in dimdesc(df_PCA, axes = c(1:2), proba = 0.05) : non convenient data
Any idea what might be going on?
Thanks!!!!
You should use the PCA function in sostitution of the prcomp
Below an example of PCA with FactoMineR.
library(FactoMineR)
library(factoextra)
library(paran)
data(cars)
mtcars_pca<-cars_pca<-PCA(mtcars)
If you want to check the percentage of variance, you can do this:
mtcars_pca$eig
> mtcars_pca$eig
eigenvalue percentage of variance cumulative percentage of variance
comp 1 6.60840025 60.0763659 60.07637
comp 2 2.65046789 24.0951627 84.17153
comp 3 0.62719727 5.7017934 89.87332
comp 4 0.26959744 2.4508858 92.32421
comp 5 0.22345110 2.0313737 94.35558
comp 6 0.21159612 1.9236011 96.27918
comp 7 0.13526199 1.2296544 97.50884
comp 8 0.12290143 1.1172858 98.62612
comp 9 0.07704665 0.7004241 99.32655
comp 10 0.05203544 0.4730495 99.79960
comp 11 0.02204441 0.2004037 100.00000
Cos2 stands for squared cosine and is an index for the quality representation of both variables and individuals. The closer this value is to one, the better the quality.
mtcars_pca$var$cos2
Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
mpg 0.8685312 0.0006891117 0.031962249 1.369725e-04 0.0023634487
cyl 0.9239416 0.0050717032 0.019276287 1.811054e-06 0.0007642822
disp 0.8958370 0.0064482423 0.002370993 1.775235e-02 0.0346868281
hp 0.7199031 0.1640467049 0.012295659 1.234773e-03 0.0651697911
drat 0.5717921 0.1999959326 0.016295731 1.970035e-01 0.0013361275
wt 0.7916038 0.0542284172 0.073281663 1.630161e-02 0.0012578888
qsec 0.2655437 0.5690984542 0.101947952 1.249426e-03 0.0060588455
vs 0.6208539 0.1422249798 0.115330572 1.244460e-02 0.0803189801
am 0.3647715 0.4887450097 0.026555457 2.501834e-04 0.0018011675
gear 0.2829342 0.5665806069 0.052667265 1.888829e-02 0.0005219259
carb 0.3026882 0.4533387304 0.175213444 4.333912e-03 0.0291718181
res.desc <- dimdesc(mtcars_pca, axes = c(1:2), proba = 0.05)
> head(res.desc)
$Dim.1
$quanti
correlation p.value
cyl 0.9612188 2.471950e-18
disp 0.9464866 2.804047e-16
wt 0.8897212 9.780198e-12
hp 0.8484710 8.622043e-10
carb 0.5501711 1.105272e-03
qsec -0.5153093 2.542578e-03
gear -0.5319156 1.728737e-03
am -0.6039632 2.520665e-04
drat -0.7561693 5.575736e-07
vs -0.7879428 8.658012e-08
mpg -0.9319502 9.347042e-15
attr(,"class")
[1] "condes" "list "
$Dim.2
$quanti
correlation p.value
gear 0.7527155 6.712704e-07
am 0.6991030 8.541542e-06
carb 0.6733043 2.411011e-05
drat 0.4472090 1.028069e-02
hp 0.4050268 2.147312e-02
vs -0.3771273 3.335771e-02
qsec -0.7543861 6.138696e-07
attr(,"class")
[1] "condes" "list "
$call
$call$num.var
[1] 1
$call$proba
[1] 0.05
$call$weights
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
$call$X
Dim.1 mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 -0.6572132031 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag -0.6293955058 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 -2.7793970426 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive -0.3117707086 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 1.9744889419 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant -0.0561375337 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 3.0026742880 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D -2.0553287289 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 -2.2874083842 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 -0.5263812077 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Merc 280C -0.5092054932 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
Merc 450SE 2.2478104359 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Merc 450SL 2.0478227622 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
Merc 450SLC 2.1485421615 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
Cadillac Fleetwood 3.8997903717 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 3.9541231097 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
Chrysler Imperial 3.5929719882 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Fiat 128 -3.8562837567 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic -4.2540325032 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla -4.2342207436 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Toyota Corona -1.9041678566 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
Dodge Challenger 2.1848507430 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
AMC Javelin 1.8633834347 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
Camaro Z28 2.8889945733 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
Pontiac Firebird 2.2459189274 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Fiat X1-9 -3.5739682964 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 -2.6512550541 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa -3.3857059882 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Ford Pantera L 1.3729574238 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Ferrari Dino -0.0009899207 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
Maserati Bora 2.6691258658 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
Volvo 142E -2.4205931001 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
it should be use the function from the same package.
If you want to choose how many dimensions you need, you can do this by the param package
library(paran)
cars_paran<-paran(mtcars, graph = TRUE)

Combine a list of data frames column wise and return a list of combined data frames using R

I would like to combine two list of data frames element wise and return a list of data frames. The following code works for the mtcars dataset
list1=split(mtcars[c(1:16),-11],mtcars[c(1:16),2])
list2=split(data.frame(mtcars[c(1:16),]),mtcars[c(1:16),2])
newList=Map(cbind, list1, list2)
How do I modify the Map function to just bind a specific column(s) from list2? Thanks
Since #thelatemail doesn't want to add an answer here is purrr version of his answer.
library(purrr)
map2(list1, map(list2, `[`, 'carb'), cbind)
#Or
#map2(list1, map(list2, `[`, 'carb'), dplyr::bind_cols)
#$`4`
# mpg cyl disp hp drat wt qsec vs am gear carb
#1 22.8 4 108.0 93 3.85 2.32 18.61 1 1 4 1
#2 24.4 4 146.7 62 3.69 3.19 20.00 1 0 4 2
#3 22.8 4 140.8 95 3.92 3.15 22.90 1 0 4 2
#$`6`
# mpg cyl disp hp drat wt qsec vs am gear carb
#1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#3 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#4 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#5 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
#6 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
#$`8`
# mpg cyl disp hp drat wt qsec vs am gear carb
#1 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#2 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#3 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
#4 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
#5 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
#6 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
#7 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4

Automatically print message when data.frame updates

I'd like for R to print the dimensions of my updated dataframes to show how each of my steps influences the number of observations remaining. Typically my dataset size changes because there are NAs. I can always quickly check dimensions:
a = data.frame(a = c(1,2), b = c(3, NA))
na.omit(a) %>% dim
but I would want to do two things in that one line:
na.omit(a) %>% size
a <- na.omit(a)
since code gets messy with many dim() calls.
Is there a way to toggle R's output? Something like
a <- na.omit(a)
message: dimensions are now m x n
Edit: Added code to only display notification when the object changes under the original code.
If you just want the dimensions to print after every line that you execute you could write a taskCallBack
tc <- addTaskCallback(function(...){message("Dimensions are: ", paste(dim(a), collapse = " "));return(TRUE)})
# When you want to remove the callback
removeTaskCallback(tc)
Here is an example session of it in use. You could probably modify the taskCallBack to only print if things change.
> a <- mtcars
>
> tc <- addTaskCallback(function(...){message("Dimensions are: ", paste(dim(a), collapse = " "));return(TRUE)})
Dimensions are: 32 11
>
> a
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
Dimensions are: 32 11
> a <- a[1:5,]
Dimensions are: 5 11
>
> # When you want to remove the callback
> removeTaskCallback(tc)
[1] TRUE
If you want to only receive notifications when the object changes we can write a little bit more to actually monitor the object and only display the message when appropriate. Here is the code I'll use:
# #object_name - Character string. The object name you want to montior
# #example tc <- addTaskCallback(monitor_object("mtcars"))
monitor_object <- function(object_name){
object_dim <- NULL
if(exists(object_name)){
object_dim <- dim(get(object_name))
}
f <- function(...){
new_dim <- NULL
if(exists(object_name)){
new_dim <- dim(get(object_name))
}
if(!identical(new_dim, object_dim)){
msg <- paste0(object_name,
" changed.\nOld dimensions: ",
paste(object_dim, collapse = " "),
"\nNew dimensions: ",
paste(new_dim, collapse = " "))
object_dim <<- new_dim
message(msg)
}
return(TRUE)
}
return(f)
}
tc <- addTaskCallback(monitor_object("mtcars"))
# When you want to remove the callback
removeTaskCallback(tc)
And an example session...
> tc <- addTaskCallback(monitor_object("mtcars"))
> mtcars <- mtcars[,-1]
mtcars changed.
Old dimensions: 32 11
New dimensions: 32 10
> head(mtcars)
cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 6 225 105 2.76 3.460 20.22 1 0 3 1
> mtcars <- mtcars[1:5,]
mtcars changed.
Old dimensions: 32 10
New dimensions: 5 10
I've actually expanded upon this code because it was a fun little exercise and made it so you can specify the function you want to use to monitor the object. I made a gist for the function that can be found here.
One option would be to write a new function like so:
omitter<-function(x){
x<-na.omit(x)
print(paste0("The dimensions are now ",dim(x)[1]," x ", dim(x)[2]))
return(x)
}
This would print the message and store the new object:
y<-omitter(data.frame(a=c(1:5,NA)))
> y
a
1 1
2 2
3 3
4 4
5 5
Edit: based on #DavidKlotz comment above, you could use message in place of print

Resources