Mutate across multiple columns based on condition (length of unique values)

Mutate across multiple columns based on condition (length of unique values) - r

I'm trying to create a function inside mutate() + across() that changes into factor any variable which has five or less unique values (or any arbitrary number) wit the idea of using later that factors to do some grouping. I think the logic of the function is correct but I'm getting some incorrect dimensions error (error in Spanish). For the sake of simplicity, I'm using the mtcars database.
mtcars %>%
mutate(across(1:ncol(.),
function(x) {
if_else(length(unique(x[,i]))<=5,
as.factor(x),
x)}
))
Error: Problem with `mutate()` input `..1`.
i `..1 = across(...)`.
x número incorreto de dimensiones
Run `rlang::last_error()` to see where the error occurred.
Any help or advice will be much appreciated.

Here we need if/else as ifelse/if_else requires all arguments to be of equal length. The length(unique expression returns a logical value of length 1 and this may break the condition. Also, with dplyr, we can use select-helpers i.e. everything() to select all the columns
library(dplyr)
out <- mtcars %>%
mutate(across(everything(),
function(x) {
if(length(unique(x))<=5)
as.factor(x) else
x}
))
-output
> str(out)
'data.frame': 32 obs. of 11 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
$ disp: num 160 160 108 258 360 ...
$ hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec: num 16.5 17 18.6 19.4 17 ...
$ vs : Factor w/ 2 levels "0","1": 1 1 2 2 1 2 1 2 2 2 ...
$ am : Factor w/ 2 levels "0","1": 2 2 2 1 1 1 1 1 1 1 ...
$ gear: Factor w/ 3 levels "3","4","5": 2 2 2 1 1 1 1 2 2 2 ...
$ carb: num 4 4 1 1 2 1 4 2 2 4 ...
In addition, the lambda function can be concise with ~ and make use of n_distinct
mtcars %>%
mutate(across(everything(),
~ if(n_distinct(.x) <=5) as.factor(.x) else .x))

Another way would be to use a predicate function in where inside across.
We can either define a custom function:
library(dplyr)
few_unique_vals <- function(x) {
length(unique(x))<=5
}
mtcars %>%
mutate(across(where(few_unique_vals), as.factor)) %>%
glimpse # for better printing
#> Rows: 32
#> Columns: 11
#> $ mpg <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2, 17.8,~
#> $ cyl <fct> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, 4, 4, 8,~
#> $ disp <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 140.8, 16~
#> $ hp <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, 180, 180~
#> $ drat <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92, 3.92,~
#> $ wt <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3.150, 3.~
#> $ qsec <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 22.90, 18~
#> $ vs <fct> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0,~
#> $ am <fct> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0,~
#> $ gear <fct> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3,~
#> $ carb <dbl> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, 1, 1, 2,~
Or we can use an anonymous purrr-style function:
mtcars %>%
mutate(across(where(~ length(unique(.x))<=5),
as.factor)) %>%
glimpse # for better printing
#> Rows: 32
#> Columns: 11
#> $ mpg <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2, 17.8,~
#> $ cyl <fct> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, 4, 4, 8,~
#> $ disp <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 140.8, 16~
#> $ hp <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, 180, 180~
#> $ drat <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92, 3.92,~
#> $ wt <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3.150, 3.~
#> $ qsec <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 22.90, 18~
#> $ vs <fct> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0,~
#> $ am <fct> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0,~
#> $ gear <fct> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3,~
#> $ carb <dbl> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, 1, 1, 2,~
Created on 2022-03-15 by the reprex package (v2.0.1)

Related

Why does glimpse() drop some digits for large numbers and how to prevent rounding

I like glimpse() from tidyverse to show a single numeric vector of large numbers without rounding, just as it does when inspecting the entire dataframe.
Good.
glimpse(mtcars)
Rows: 32
Columns: 11
$ mpg <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2, 17.8, 16.4, 17.3, 15.2, 10.4, 10.4, 14.7, 32.4, 30.4, 33.9…
$ cyl <dbl> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, 4, 4, 8, 8, 8, 8, 4, 4, 4, 8, 6, 8, 4
$ disp <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 140.8, 167.6, 167.6, 275.8, 275.8, 275.8, 472.0, 460.0, 440.0,…
$ hp <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, 180, 180, 205, 215, 230, 66, 52, 65, 97, 150, 150, 245, 175, …
$ drat <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92, 3.92, 3.07, 3.07, 3.07, 2.93, 3.00, 3.23, 4.08, 4.93, 4.22…
$ wt <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3.150, 3.440, 3.440, 4.070, 3.730, 3.780, 5.250, 5.424, 5.345,…
$ qsec <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 22.90, 18.30, 18.90, 17.40, 17.60, 18.00, 17.98, 17.82, 17.42,…
$ vs <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1
$ am <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1
$ gear <dbl> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3, 3, 3, 3, 4, 5, 5, 5, 5, 5, 4
$ carb <dbl> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, 1, 1, 2, 2, 4, 2, 1, 2, 2, 4, 6, 8, 2
Good.
glimpse(mtcars$mpg)
num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
Not good. I don't need the rounding.
glimpse(mtcars$disp)
num [1:32] 160 160 108 258 360 ...

You might be able to do something with ?pillar::pillar_options. (I would have thought that options(pillar.sigfig = 4) would work, but it doesn't.)
If you [-index rather than using $, you'll get a one-column tibble that will be formatted according to tibble rules:
glimpse(mtcars["disp"])
Rows: 32
Columns: 1
$ disp <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 140.8, 16…
Alternatively: it looks like glimpse() falls back to the base-R function str() when printing atomic vectors. From ?str, this seems to do what you want (change both the default number of digits and the trailing-zero behaviour):
options(str = strOptions(digits.d = 4, formatNum = function(x, ...)
format(x, trim = TRUE, drop0trailing = FALSE, ...)))
glimpse(mtcars$disp)
num [1:32] 160.0 160.0 108.0 258.0 360.0 225.0 360.0 146.7 140.8 167.6 ...

Is it possible to have glimpse() output with column numbers?

I am working with a data frame with 200+ variables and sometimes I have to choose variables based on their indexes as I also have to check their values.
I was expecting something like this:
Rows: 32
Columns: 11
[1] $ mpg <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2, 17.8, 16.4, 17.3, 15.2, 10.4, 10.4…
[2] $ cyl <dbl> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, 4, 4, 8, 8, 8, 8, 4, 4, 4, 8, 6, 8, 4
[3] $ disp <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 140.8, 167.6, 167.6, 275.8, 275.8, 275…
[4] $ hp <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, 180, 180, 205, 215, 230, 66, 52, 65, …
[5] $ drat <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92, 3.92, 3.07, 3.07, 3.07, 2.93, 3.00…
[6] $ wt <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3.150, 3.440, 3.440, 4.070, 3.730, 3.7…
[7] $ qsec <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 22.90, 18.30, 18.90, 17.40, 17.60, 18.…
[8] $ vs <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1
[9] $ am <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1
[10] $ gear <dbl> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3, 3, 3, 3, 4, 5, 5, 5, 5, 5, 4
[11] $ carb <dbl> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, 1, 1, 2, 2, 4, 2, 1, 2, 2, 4, 6, 8, 2
Any suggestions?

This doesn't put the output in quite the form you have it, but seems pretty close:
library(dplyr, warn.conflicts = FALSE)
glimpse_plus <- function(x, width = NULL, ...) {
x_orig <- x
names(x) <- paste0("[", 1:length(x), "]: ", names(x))
glimpse(x, width = width, ...)
invisible(x_orig)
}
glimpse_plus(mtcars)
#> Rows: 32
#> Columns: 11
#> $ `[1]: mpg` <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.…
#> $ `[2]: cyl` <dbl> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, …
#> $ `[3]: disp` <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 1…
#> $ `[4]: hp` <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, …
#> $ `[5]: drat` <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.9…
#> $ `[6]: wt` <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3…
#> $ `[7]: qsec` <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 2…
#> $ `[8]: vs` <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, …
#> $ `[9]: am` <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, …
#> $ `[10]: gear` <dbl> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, …
#> $ `[11]: carb` <dbl> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, …
Created on 2021-06-19 by the reprex package (v2.0.0)

You can take a look at skimr package.
library(skimr)
skim(mtcars)
-- Data Summary ------------------------
Values
Name mtcars
Number of rows 32
Number of columns 11
_______________________
Column type frequency:
numeric 11
________________________
Group variables None
-- Variable type: numeric ------------------------------------------------------------------------------------
# A tibble: 11 x 11
skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
* <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 mpg 0 1 20.1 6.03 10.4 15.4 19.2 22.8 33.9 ▃▇▅▁▂
2 cyl 0 1 6.19 1.79 4 4 6 8 8 ▆▁▃▁▇
3 disp 0 1 231. 124. 71.1 121. 196. 326 472 ▇▃▃▃▂
4 hp 0 1 147. 68.6 52 96.5 123 180 335 ▇▇▆▃▁
5 drat 0 1 3.60 0.535 2.76 3.08 3.70 3.92 4.93 ▇▃▇▅▁
6 wt 0 1 3.22 0.978 1.51 2.58 3.32 3.61 5.42 ▃▃▇▁▂
7 qsec 0 1 17.8 1.79 14.5 16.9 17.7 18.9 22.9 ▃▇▇▂▁
8 vs 0 1 0.438 0.504 0 0 0 1 1 ▇▁▁▁▆
9 am 0 1 0.406 0.499 0 0 0 1 1 ▇▁▁▁▆
10 gear 0 1 3.69 0.738 3 3 4 4 5 ▇▁▆▁▂
11 carb 0 1 2.81 1.62 1 2 2 4 8 ▇▂▅▁▁

A possible approach: change the variable names (updated to reflect OP's formatting).
library(dplyr)
names(mtcars) <- paste0("[", 1:ncol(mtcars), "] ", names(mtcars))
glimpse(mtcars)
#> Rows: 32
#> Columns: 11
#> $ `[1] mpg` <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2~
#> $ `[2] cyl` <dbl> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, 4~
#> $ `[3] disp` <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 14~
#> $ `[4] hp` <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, 1~
#> $ `[5] drat` <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92~
#> $ `[6] wt` <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3.~
#> $ `[7] qsec` <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 22~
#> $ `[8] vs` <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1~
#> $ `[9] am` <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1~
#> $ `[10] gear` <dbl> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4~
#> $ `[11] carb` <dbl> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, 1~
Created on 2021-06-20 by the reprex package (v2.0.0)

scale a particular variable into zscores is returning a matrix

mtcars %>% mutate(mpg_scaled = scale(mpg)) %>% glimpse
Rows: 32
Columns: 12
$ mpg <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2, 17.8, 16.4, 17.3, 15.2, 10.4, 10.4, 14.7, 32.4, 30.4, 33.9, 21.5, 15.5…
$ cyl <dbl> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, 4, 4, 8, 8, 8, 8, 4, 4, 4, 8, 6, 8, 4
$ disp <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 140.8, 167.6, 167.6, 275.8, 275.8, 275.8, 472.0, 460.0, 440.0, 78.7, 75.7,…
$ hp <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, 180, 180, 205, 215, 230, 66, 52, 65, 97, 150, 150, 245, 175, 66, 91, 113,…
$ drat <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92, 3.92, 3.07, 3.07, 3.07, 2.93, 3.00, 3.23, 4.08, 4.93, 4.22, 3.70, 2.76…
$ wt <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3.150, 3.440, 3.440, 4.070, 3.730, 3.780, 5.250, 5.424, 5.345, 2.200, 1.61…
$ qsec <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 22.90, 18.30, 18.90, 17.40, 17.60, 18.00, 17.98, 17.82, 17.42, 19.47, 18.5…
$ vs <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1
$ am <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1
$ gear <dbl> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3, 3, 3, 3, 4, 5, 5, 5, 5, 5, 4
$ carb <dbl> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, 1, 1, 2, 2, 4, 2, 1, 2, 2, 4, 6, 8, 2
$ mpg_scaled <dbl[,1]> <matrix[32 x 1]>
Expected the new field 'mpg_scaled' to be just a regular dbl like the rest but why does it say it's a matrix <matrix[32 x 1]>?
If I look at the head it appears like a regular numeric field:
mtcars %>% mutate(mpg_scaled = scale(mpg)) %>% head
mpg cyl disp hp drat wt qsec vs am gear carb mpg_scaled
1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 0.1508848
2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 0.1508848
3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 0.4495434
4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 0.2172534
5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 -0.2307345
6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 -0.3302874
What's happening here? What is mpg_scaled? How can I make it a 'regular' field like the rest?

We could also remove the dim attributes by coercing to a vector with as.vector on the matrix
library(dplyr)
mtcars %>%
mutate(mpg_scaled = as.vector(scale(mpg)))

You can just index into the resulting matrix to return a vector. base::scale's documentation says that its input x is "a numeric matrix(like object)", and its output value is "the centered, scaled matrix". So the function is built to work on matrix columns, but will accept vector input and interpret it as a one column matrix.
library(tidyverse)
mtcars %>% mutate(mpg_scaled = scale(mpg)[,1]) %>% glimpse
#> Rows: 32
#> Columns: 12
#> $ mpg <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2…
#> $ cyl <dbl> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, 4…
#> $ disp <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 14…
#> $ hp <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, 1…
#> $ drat <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92…
#> $ wt <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3.…
#> $ qsec <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 22…
#> $ vs <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1…
#> $ am <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1…
#> $ gear <dbl> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4…
#> $ carb <dbl> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, 1…
#> $ mpg_scaled <dbl> 0.1508848, 0.1508848, 0.4495434, 0.2172534, -0.2307345, -0…
Created on 2020-08-03 by the reprex package (v0.3.0)

How I can view rows and columns of 'Adult' Dataset in R [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
How I can view rows and columns of 'Adult' Dataset in R? I just started learning R.
Any help is appreciated.Pls refer to the screenhotenter image description here

First, start by running str to see the structure of your dataset.
str(Adult)
#Formal class 'transactions' [package "arules"] with 3 slots
# ..# data :Formal class 'ngCMatrix' [package "Matrix"] with #5 slots
# .. .. ..# i : int [1:612200] 1 10 25 32 35 50 59 61 63 65 #...
# .. .. ..# p : int [1:48843] 0 13 26 39 52 65 78 91 104 117 #...
# .. .. ..# Dim : int [1:2] 115 48842
# .. .. ..# Dimnames:List of 2
# .. .. .. ..$ : NULL
# .. .. .. ..$ : NULL
# .. .. ..# factors : list()
# ..# itemInfo :'data.frame': 115 obs. of 3 variables:
# .. ..$ labels : chr [1:115] "age=Young" "age=Middle-aged" #"age=Senior" "age=Old" ...
# .. ..$ variables: Factor w/ 13 levels "age","capital-gain",..: 1 #1 1 1 13 13 13 13 13 13 ...
# .. ..$ levels : Factor w/ 112 levels "10th","11th",..: 111 63 #92 69 30 54 65 82 90 91 ...
# ..# itemsetInfo:'data.frame': 48842 obs. of 1 variable:
# .. ..$ transactionID: chr [1:48842] "1" "2" "3" "4" ...
This tells you that Adult is an S4 object with three slots, data, itemInfo and itemsetInfo.
The slot data is a sparse matrix created by package Matrix;
The slot itemInfo is a data.frame;
The slot itemsetInfo is also a data.frame.
S4 objects' slots are accessed with operator #. In order to see what is in those slots, run
Adult#data
Adult#itemInfo
Adult#itemsetInfo
In the case of the two dataframes, you might prefer to run
head(Adult#itemInfo)
head(Adult#itemsetInfo)
since they have 115 and 48842 observations, respectively and don't fit in a screen display.

To get the text output shown in your example you can use this:
cat(dim(mtcars)[1], "transactions (rows)\n", dim(mtcars)[2], "items (cols)")
#32 transactions (rows)
# 11 items (cols)
Change mtcars with Adult(or any data.frame). cat lets you print out to the console, and dim() gets you rows and columns of the data.
Similarly to str from base R, you can use glimpse from dplyr package:
install.packages("dplyr") # run this the first time to install the package
dplyr::glimpse(mtcars)
# Observations: 32
# Variables: 11
# $ mpg <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2, 17.8, 16.4, 17.3, 15.2, 10.4, 10.4, 14.7, 32...
# $ cyl <dbl> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, 4, 4, 8, 8, 8, 8, 4, 4, 4, 8, 6, 8, 4
# $ disp <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 140.8, 167.6, 167.6, 275.8, 275.8, 275.8, 472.0,...
# $ hp <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, 180, 180, 205, 215, 230, 66, 52, 65, 97, 150, 1...
# $ drat <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92, 3.92, 3.07, 3.07, 3.07, 2.93, 3.00, 3.23, 4....
# $ wt <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3.150, 3.440, 3.440, 4.070, 3.730, 3.780, 5.250,...
# $ qsec <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 22.90, 18.30, 18.90, 17.40, 17.60, 18.00, 17.98,...
# $ vs <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1
# $ am <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1
# $ gear <dbl> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3, 3, 3, 3, 4, 5, 5, 5, 5, 5, 4
# $ carb <dbl> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, 1, 1, 2, 2, 4, 2, 1, 2, 2, 4, 6, 8, 2
You have the number of Observations (rows) and Variables (columns), and each variables listed with it's format type, and values.

How to include position number in str() output?

I was wondering whether anybody has a neat solution for including position numbers of variables when getting str output.
Example:
Instead of getting this:
str(cars)
'data.frame': 50 obs. of 2 variables:
$ speed: num 4 4 7 7 8 9 10 10 10 11 ...
$ dist : num 2 10 4 22 16 10 18 26 34 17 ...
I would like to get something like this:
str(cars)
'data.frame': 50 obs. of 2 variables:
1 speed: num 4 4 7 7 8 9 10 10 10 11 ...
2 dist : num 2 10 4 22 16 10 18 26 34 17 ...
in order to be able to index data.frames more easily.

Here's a hastily modified version of glimpse from dplyr which does what you need:
library(dplyr)
glimpse_n <- function(tbl, width = getOption("width")) {
cat("Observations: ", nrow(tbl), "\n", sep = "")
if (ncol(tbl) == 0)
return(invisible())
cat("Variables:\n")
rows <- as.integer(width/3)
df <- as.data.frame(head(tbl, rows))
var_types <- vapply(df, type_sum, character(1))
var_names <- paste0(sprintf("%3d ", 1:length(names(df))), format(names(df)), " (", var_types,
") ")
data_width <- width - nchar(var_names) - 2
length_est <- pmin(ceiling(max(data_width)/3) + 1, nrow(tbl))
formatted <- vapply(df, function(x) paste0(dplyr:::format_v(x), collapse = ", "),
character(1), USE.NAMES = FALSE)
truncated <- dplyr:::str_trunc(formatted, data_width)
cat(paste0(var_names, truncated, collapse = "\n"), "\n",
sep = "")
}
glimpse_n(mtcars)
## Observations: 32
## Variables:
## 1 mpg (dbl) 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2, 17.8, 1...
## 2 cyl (dbl) 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, 4, 4, 8, 8...
## 3 disp (dbl) 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 140.8, 167....
## 4 hp (dbl) 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, 180, 180, ...
## 5 drat (dbl) 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92, 3.92, 3...
## 6 wt (dbl) 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3.150, 3.44...
## 7 qsec (dbl) 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 22.90, 18.3...
## 8 vs (dbl) 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0...
## 9 am (fctr) 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, ...
## 10 gear (dbl) 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3, 3...
## 11 carb (dbl) 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, 1, 1, 2, 2...

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Mutate across multiple columns based on condition (length of unique values) - r

Related

Why does glimpse() drop some digits for large numbers and how to prevent rounding

Is it possible to have glimpse() output with column numbers?

scale a particular variable into zscores is returning a matrix

How I can view rows and columns of 'Adult' Dataset in R [closed]

How to include position number in str() output?

Categories

Resources