How to message an R data frame to the console? - r

If I put a data frame to the console directly, it looks nice:
> head(datasets::mtcars, 4)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
How do I get the same nice output, but through message ?
> message(head(datasets::mtcars, 4))
c(21, 21, 22.8, 21.4)c(6, 6, 4, 6)c(160, 160, 108, 258)c(110, 110, 93, 110)c(3.9, 3.9, 3.85, 3.08)c(2.62, 2.875, 2.32, 3.215)c(16.46, 17.02, 18.61, 19.44)c(0, 0, 1, 1)c(1, 1, 1, 0)c(4, 4, 4, 3)c(4, 4, 1, 1)
This question looks similar, but didn't help me.

We can use paste to create a vector of values and pass it on to message
message(do.call(paste, c(head(datasets::mtcars, 4), collapse="\n")))
-output
# 21 6 160 110 3.9 2.62 16.46 0 1 4 4
#21 6 160 110 3.9 2.875 17.02 0 1 4 4
#22.8 4 108 93 3.85 2.32 18.61 1 1 4 1
#21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Inorder to get the row names attributes, we could use capture.output
message(paste(capture.output(head(datasets::mtcars, 4)), collapse="\n"))
-output
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1

Related

If column A equals criteria return value of column B in column C

Using the R inbuilt dataset
mtcars
I want to make a column called "want".
mtcars$want<-NA
When column "carb" is equal to 1 (Column A), input value of column "qsec" (Column B) in column "want" (Column C).
If carb is not equal to 1 do nothing.
The first 5 rows of the new dataset should look like this:
mpg cyl disp hp drat wt qsec vs am gear carb want
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 NA
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 NA
Datsun 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 18.61
Hornet Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 19.44
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 NA
This should do the job:
mtcars$want <- ifelse(mtcars$carb == 1, mtcars$qsec, NA)
head(mtcars, 5)
mpg cyl disp hp drat wt qsec vs am gear carb want
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 NA
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 NA
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 18.61
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 19.44
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 NA
If you only want to achieve it in the print out you could try the following (in the data.frame itself this will still be shown as NA):
mtcars$want <- ifelse(mtcars$carb == 1, mtcars$qsec, "")
head(mtcars, 5)
mpg cyl disp hp drat wt qsec vs am gear carb want
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 18.61
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 19.44
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
If it is helpful, I am of the impression that a loop over the columns should work. One can modify the loop or add further conditionals as appropriate to fill in the other values of the column.
#written in R version 4.2.1
data(mtcars)
mtcars$want = 0
for(i in 1:dim(mtcars)[1]){
if(mtcars$carb[i] == 1){
mtcars$want[i] = mtcars$qsec[i]
}}
Result:
head(mtcars)
# mpg cyl disp hp drat wt qsec vs am gear carb want
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 0.00
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 0.00
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 18.61
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 19.44
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 0.00
#Valiant
What you can do is first set a value to your new column "want" for example 2. You can use ifelse to do your criteria and return "want" if do nothing like this:
mtcars$want <- 2
library(dplyr)
mtcars %>%
mutate(want = ifelse(carb == 1, qsec, want)) %>%
head(5)
#> mpg cyl disp hp drat wt qsec vs am gear carb want
#> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 2.00
#> Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 2.00
#> Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 18.61
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 19.44
#> Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 2.00
Created on 2022-06-30 by the reprex package (v2.0.1)

how to provide expression to `purrr::pmap` for function using tidy evaluation

I am trying to write a function using rlang so that I can subset data based on supplied expression. Although the actual function is complicated, here is a minimal version of it that illustrates the problem.
minimal version of needed function
library(rlang)
# define a function
foo <- function(data, expr = NULL) {
if (!quo_is_null(enquo(expr))) {
dplyr::filter(data, !!enexpr(expr))
} else {
data
}
}
# does the function work? yes
head(foo(mtcars, NULL)) # with NULL
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
head(foo(mtcars, mpg > 20)) # with expression
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#> Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
#> Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
problems with purrr::pmap
When used with purrr::pmap(), it works as expected when expr is NULL, but not otherwise. Instead of list, I also tried using alist to supply the input.
library(purrr)
# works when expression is `NULL`
pmap(
.l = list(data = list(head(mtcars)), expr = list(NULL)),
.f = foo
)
#> [[1]]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
# but not otherwise
pmap(
.l = list(data = list(head(mtcars)), expr = list("mpg > 20")),
.f = foo
)
#> Error: Problem with `filter()` input `..1`.
#> ℹ Input `..1` is `"mpg > 20"`.
#> x Input `..1` must be a logical vector, not a character.
Created on 2021-07-20 by the reprex package (v2.0.0)
One way to make this work is by wrapping with quote
purrr::pmap(
.l = list(data = list(head(mtcars)), expr = list(quote(mpg > 20))),
.f = foo
)
-output
[[1]]
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
which also works with the NULL
pmap(
.l = list(data = list(head(mtcars)), expr = list(quote(NULL))),
.f = foo
)
[[1]]
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Same output with subset
subset(head(mtcars), mpg > 20)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Or another option is to modify the function by changing the enexpr to parse_expr
foo1 <- function(data, expr = NULL) {
if (!quo_is_null(enquo(expr))) {
dplyr::filter(data, !!parse_expr(expr))
} else {
data
}
}
-testing
> pmap(
+ .l = list(data = list(head(mtcars)), expr = list(NULL)),
+ .f = foo1
+ )
[[1]]
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
>
> pmap(
+ .l = list(data = list(head(mtcars)), expr = list("mpg > 20")),
+ .f = foo1
+ )
[[1]]
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1

count number of rows needed to have sum greater than a particular value in R

I want to subset dataframe such that no of rows needed to get mpg value is at least 100.
library(datasets)
data(mtcars)
head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
The output should be top 5 values
here mpg sum is >100 after Hornet Sportabout
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
I want to the checksum at each row for the mpg column and then the output as no of rows it took to get that sum of at least 100
I would use cumsum in association with lag
library(dplyr)
mtcars %>%
filter(cumsum(lag(mpg, default = 0)) < 100)
This should solve it
library(tidyverse)
df_answer <- mtcars %>%
rownames_to_column() %>%
tibble() %>%
mutate(cum_sum = cumsum(mpg)) %>%
filter(cum_sum < 100)
df_answer %>%
nrow() + 1
A base R option using subset + cumsum
subset(mtcars, c(TRUE, cumsum(mpg) <= 100)[-nrow(mtcars)])
gives
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
You can also use the purrr library:
library(purrr)
which.max(purrr::accumulate(mtcars$mpg, `+`) > 100)
# 5
If you want the whole dataset you can use dplyr::slice:
library(tidyverse)
dplyr::slice(mtcars, 1 : which.max(purrr::accumulate(mtcars$mpg, `+`) > 100))
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
You can use a filter condition with dplyr:
library(tidyverse)
mtcars %>%
filter(row_number() %in% 1:(max(which(cumsum(mpg) < 100)) + 1))
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
The code can be shortened with slice:
mtcars %>%
slice(1:(max(which(cumsum(mpg) < 100)) + 1))
And packaged as a function:
fnc = function(data, var, cutoff) {
data %>%
slice(1:(max(which(cumsum({{var}}) < cutoff)) + 1))
}
fnc(mtcars, mpg, 100)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
fnc(iris, Sepal.Width, 10)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3.0 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa

Use mutate in purrr workflow

I got the following datasets:
dflist <- list(mtcars, mtcars)
dflist[[1]] %>%
mutate(cyl2 = cyl * 2)
This works!
dflist %>%
map(.x, ~.x$cyl2 = .x$cyl * 2)
Error: unexpected '=' in:
"dflist %>%
map(.x, ~x$cyl2 ="
This results in an error. I tried other options, but the function does not except the = sign. What is wrong there?
Try :
library(dplyr)
library(purrr)
dflist %>% map(~.x %>% mutate(cyl2 = cyl * 2))
#[[1]]
# mpg cyl disp hp drat wt qsec vs am gear carb cyl2
#1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 12
#2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 12
#3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 8
#4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 12
#5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 16
#....
#[[2]]
# mpg cyl disp hp drat wt qsec vs am gear carb cyl2
#1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 12
#2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 12
#3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 8
#4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 12
#5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 16
#...
Or keeping it in base R:
lapply(dflist, function(x) transform(x, cyl2 = cyl * 2))
You can also try:
modify(dflist, ~ update_list(., cyl2 = ~ cyl * 2))
[[1]]
mpg cyl disp hp drat wt qsec vs am gear carb cyl2
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 12
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 12
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 8
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 12
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 16
[[2]]
mpg cyl disp hp drat wt qsec vs am gear carb cyl2
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 12
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 12
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 8
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 12
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 16
We can use transform without anonymous function call in base R
lapply(dflist, transform, cyl2 = cyl *2)

Using select=-c() in subset function gives error: invalid argument to unary operator

I have a matrix and I would like to eliminate two columns by their names.
My code was :
trn_data = subset(trn_data, select = -c("Rye flour","Barley products"))
but R gave me an error message like this:
Error in -c("Rye flour", "Barley products") :
invalid argument to unary operator
I tried this
trn_data = subset(trn_data, select = -c(Rye flour,Barley products))
Also returning an error:
Error: unexpected symbol in "trn_data=subset(trn_data,select =-c(Rye flour"
How can I fix this? Is there any other method that can eliminate two columns by their names?
You should not provide the names as characters to subset. This works:
trn_data_subset <- subset(trn_data, select = -c(`Rye flour`,`Barley products`))
If you have spaces in the name of columns, you should use Grave Accent.
Here's an example using mtcars dataset:
mtexapmple <- mtcars[1:4,]
names(mtexapmple)[1] <- "mpg with space"
mtexapmple
#> mpg with space cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
subset(mtexapmple, select = -c(`mpg with space`, cyl))
#> disp hp drat wt qsec vs am gear carb
#> Mazda RX4 160 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 160 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 108 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 258 110 3.08 3.215 19.44 1 0 3 1
You can also do it like these:
within(trn_data, rm(`Rye flour`,`Barley products`))
or
trn_data[, !(colnames(trn_data) %in% c("Rye flour","Barley products"))]
With dplyr, we can still use - with double quote
library(dplyr)
mtexample %>%
select(-"mpg with space")
# cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 6 160 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 6 160 110 3.90 2.875 17.02 0 1 4 4
#Datsun 710 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 6 258 110 3.08 3.215 19.44 1 0 3 1
data
mtexample <- mtcars[1:4,]
names(mtexample)[1] <- "mpg with space"

Resources