missing information in tibble [duplicate] - r

This question already has answers here:
How do I name the "row names" column in r
(2 answers)
Closed 1 year ago.
Taking the 'mtcars' data and moving it into a tibble
data<- tibble(mtcars)
Noted that the first column which was the vehicle make has now become in the tibble just a numeric list 1,2,3 etc
Fairly new to R so is there a way to tell a tibble to keep the format of the original data?

tibbles don't support rownames, to maintain the rownames you can create a new column.
library(dplyr)
library(tibble)
mtcars %>% rownames_to_column('make') %>% tibble()
# make mpg cyl disp hp drat wt qsec vs am gear carb
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 Mazda … 21 6 160 110 3.9 2.62 16.5 0 1 4 4
# 2 Mazda … 21 6 160 110 3.9 2.88 17.0 0 1 4 4
# 3 Datsun… 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
# 4 Hornet… 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
# 5 Hornet… 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
# 6 Valiant 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
# 7 Duster… 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
# 8 Merc 2… 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
# 9 Merc 2… 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
#10 Merc 2… 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
# … with 22 more rows

As long as tibble doesn't keep the row names, you can use dplyr::as_tibble which as an option to create a column with row names:
as_tibble(mtcars, rownames = "names_car")
Output:
names_car mpg cyl disp hp drat wt qsec vs am gear carb
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Mazda RX4 21 6 160 110 3.9 2.62 16.5 0 1 4 4
2 Mazda RX4 Wag 21 6 160 110 3.9 2.88 17.0 0 1 4 4
3 Datsun 710 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
4 Hornet 4 Drive 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
5 Hornet Sportabout 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
6 Valiant 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
7 Duster 360 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
8 Merc 240D 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
9 Merc 230 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
10 Merc 280 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4

Related

Calculate means including all factor levels but one

Using the dataframe mtcars I would like to add the column qsec_control which is calculated as the mean(qsec) of all rows that don't have the same cyl as the current row (e.g. if cyl == 6, it would take mean(qsec[cyl != 6])).
The question feels somewhat dumb, but I cant figure out how to do this.
This solution groups by cyl, then uses dplyr::cur_group_rows() to index into mtcars$qsec:
library(dplyr)
mtcars %>%
group_by(cyl) %>%
mutate(qsec_control = mean(
mtcars$qsec[-cur_group_rows()]
)) %>%
ungroup()
# A tibble: 32 × 12
mpg cyl disp hp drat wt qsec vs am gear carb qsec_cont…¹
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 17.8
2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 17.8
3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 17.2
4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 17.8
5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 18.7
6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 17.8
7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 18.7
8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 17.2
9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 17.2
10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 17.8
# … with 22 more rows, and abbreviated variable name ¹​qsec_control
Replicating zephryl's answer in data.table:
library(data.table)
data(mtcars)
setDT(mtcars)
mtcars[, qsec_control := mtcars[-.I, mean(qsec)] , by = .(cyl)]
head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb cyl2 qsec_control
1: 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 6 17.81280
2: 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 6 17.81280
3: 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 4 17.17381
4: 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 6 17.81280
5: 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 8 18.68611
6: 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 6 17.81280

empty string as column name in tibble

tibble::tibble(`` = 1:10)
Error: attempt to use zero-length variable name
tibble::tibble("" = 1:10)
Error: attempt to use zero-length variable name
How can I get around this? I need to have a column with precisely "" as the name.
My first thought is that this sounds like a report-representation thing, since one generally doesn't need nameless columns while developing or working with data. In that regard, I suggest you look at changing names in whatever reporting system you might be using (knitr, kableExtra, etc).
Having said that, R is not going to let you define a zero-length column name, but it'll let you update it later:
setNames(data.frame(" "=1),"")
#
# 1 1
setNames(tibble(" "=1),"")
# # A tibble: 1 x 1
# ``
# <dbl>
# 1 1
This can be achieved by directly modifying the names attribute of a tibble, though it's not a recommended practice. Do something like this:
attr(df, "names") <- c("", "cyl", "disp", "hp", "drat", "wt", "qsec", "vs", "am", "gear", "carb")
Tested with this dataset
df <- tibble::as_tibble(mtcars)
# A tibble: 32 x 11
mpg cyl disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
# ... with 22 more rows
Output
# A tibble: 32 x 11
`` cyl disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
# ... with 22 more rows

Why does the base R `print()` function require a tibble?

Why does the base R print() function require a tibble when using the n = X argument? It appears this is true from the examples below.
This does NOT work
library(tidyverse)
mtcars %>% print(n = 20)
#> Error in print.default(m, ..., quote = quote, right = right, max = max) :
#> invalid 'na.print' specification
This does work
mtcars %>% as_tibble() %>% print(n = 20)
#> # A tibble: 32 x 11
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
#> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
#> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
#> 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
#> 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
#> 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
#> 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
#> 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
#> 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
#> 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
#> 11 17.8 6 168. 123 3.92 3.44 18.9 1 0 4 4
#> 12 16.4 8 276. 180 3.07 4.07 17.4 0 0 3 3
#> 13 17.3 8 276. 180 3.07 3.73 17.6 0 0 3 3
#> 14 15.2 8 276. 180 3.07 3.78 18 0 0 3 3
#> 15 10.4 8 472 205 2.93 5.25 18.0 0 0 3 4
#> 16 10.4 8 460 215 3 5.42 17.8 0 0 3 4
#> 17 14.7 8 440 230 3.23 5.34 17.4 0 0 3 4
#> 18 32.4 4 78.7 66 4.08 2.2 19.5 1 1 4 1
#> 19 30.4 4 75.7 52 4.93 1.62 18.5 1 1 4 2
#> 20 33.9 4 71.1 65 4.22 1.84 19.9 1 1 4 1
#> # ... with 12 more rows
Your first example is equivalent to print(mtcars, n=20) -- which also fails.
Because mtcars is a data.frame your call dispatches on print.data.frame. And as args(print.data.frame) will tell you, there is no n= argument in it.
In short, you got confused between a specific dispatch (I presume print.tbl) with a more generic approach.
So a better title for the question might be 'Why does only the print method for tibbles have a n argument' -- for general use we commonly just invoke head as in
R> head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
R>
which of course also works in a pipelined expression.

group_by variable and sum in dplyr [duplicate]

This question already has answers here:
Why does summarize or mutate not work with group_by when I load `plyr` after `dplyr`?
(2 answers)
Closed 2 years ago.
I know this question has answers in multiple places, but I am unable to figure out where I am going wrong. Suppose I want to find the sum of hp for each group in cyl:
mtcars%>%
group_by(cyl) %>%
mutate(
sum_hp = sum(hp)
)
sum_hp is giving me 4694 for every value. I want the sum for each value of cyl.
It could be a case of plyr::mutate masking dplyr::mutate when both the packages are loaded. We can specify dplyr::<functionname> to correct this
library(dplyr)
mtcars%>%
group_by(cyl) %>%
dplyr::mutate(sum_hp = sum(hp))
# A tibble: 32 x 12
# Groups: cyl [3]
# mpg cyl disp hp drat wt qsec vs am gear carb sum_hp
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 856
# 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 856
# 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 909
# 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 856
# 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 2929
# 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 856
# 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 2929
# 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 909
# 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 909
#10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 856
# … with 22 more rows
If we use plyr::mutate, the OP's output can be reproduced
mtcars%>%
group_by(cyl) %>%
plyr::mutate(
sum_hp = sum(hp)
)
# A tibble: 32 x 12
# Groups: cyl [3]
# mpg cyl disp hp drat wt qsec vs am gear carb sum_hp
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 4694
# 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 4694
# 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 4694
# 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 4694
# 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 4694
# 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 4694
# 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 4694
# 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 4694
# 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 4694
#10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 4694
# … with 22 more rows

How do I selectively change variable data type automatically in the tidyverse?

I would like to change some of the variables from numerical to factor types, leaving other types as they are. I know how to do this one variable at a time, but I would like to automate the process for larger datasets.
I've changed variables in the mtcars dataset one by one, copying and pasting the code. I've used mapply to successfully automate this, but I've only managed to do it on a subset of mtcars. I'm not sure how I would keep the entire dataset intact with the new variable types, though. Reprex below.
#before
as_tibble(mtcars)
#> # A tibble: 32 x 11
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
#> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
#> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
#> 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
#> 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
#> 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
#> 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
#> 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
#> 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
#> 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
#> # ... with 22 more rows
#copy + paste job
mtcars$cyl <- factor(as.character(mtcars$cyl))
mtcars$hp <- factor(as.character(mtcars$hp))
mtcars$vs <- factor(as.character(mtcars$vs))
#after
as_tibble(mtcars)
#> # A tibble: 32 x 11
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <fct> <dbl> <fct> <dbl> <dbl> <dbl> <fct> <dbl> <dbl> <dbl>
#> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
#> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
#> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
#> 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
#> 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
#> 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
#> 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
#> 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
#> 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
#> 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
#> # ... with 22 more rows
Created on 2019-05-17 by the reprex package (v0.2.1)
I managed to change the variable types successfully. I would hate to do this something like 30-50 times though. What are some ways to automate this? Thank you.
library(dplyr)
as_tibble(mtcars) %>%
mutate_at(.vars = vars(cyl, hp, vs),
.funs = ~ factor(as.character(.)))
Hope this helps.
Using base R:
vars_to_make_f <- c("cyl", "hp", "vs")
mtcars[vars_to_make_f] <-
lapply(mtcars[vars_to_make_f], function(x) as.factor(as.character(x)))
mtcars
# A tibble: 32 x 11
mpg cyl disp hp drat wt qsec vs am gear carb
<dbl> <fct> <dbl> <fct> <dbl> <dbl> <dbl> <fct> <dbl> <dbl> <dbl>
1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
# ... with 22 more rows
You can use mutate_at:
mtcars %>%
mutate_at(c("cyl","hp","vs"),function(x) factor(as.character(x)))
Or use purrr modify_at:
mtcars %>%
modify_at(c("cyl","hp","vs"),function(x) factor(as.character(x)))
An option is mutate_at. The as.factor(as.character is not needed, we can directly convert to factor. But, the reverse route would be `factor -> character -> numeric)
library(dplyr)
mtcars %>%
as_tibble %>%
mutate_at(vars(cyl, hp, vs), factor)
# A tibble: 32 x 11
# mpg cyl disp hp drat wt qsec vs am gear carb
# <dbl> <fct> <dbl> <fct> <dbl> <dbl> <dbl> <fct> <dbl> <dbl> <dbl>
# 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
# 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
# 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
# 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
# 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
# 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
# 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
# 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
# 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
#10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
# … with 22 more rows

Resources