I am trying to get the whole number 193525.0768 but it gets its decimals removed (?). Please explain it to me.
df <- tibble(
x = "193525.0768"
) %>%
mutate(x = as.numeric(x))
print(df, digits = 10) # decimals removed. I expect it to maintain the decimals numbers
# A tibble: 1 x 1
x
<dbl>
1 193525.
df[1,1][[1]] # decimals removed
# 193525
x <- "193525.0768"
print(as.numeric(x), digits = 10) # decimals not removed
# 193525.0768
You have a printing issue, not a reading-in issue. The tibble print method doesn't take a digits argument - see ?print.tbl for details. You can use print.data.frame explicitly to bypass the tibble print method and use the data.frame print method instead, which does take a digits argument:
tibble(x = "193525.0768") %>%
mutate(x = as.numeric(x)) %>%
print.data.frame(digits = 10)
# x
# 1 193525.0768
Or you can change the default with the pillar.sigfig option (which is mentioned in ?print.tbl). The default is 3 - which is confusing because if I were to take that literally I would expect 193525.0768 to print as 194000... there's probably documentation in the pillar package explaining the reasoning.
options(pillar.sigfig = 10)
tibble(x = "193525.0768") %>%
mutate(x = as.numeric(x))
# x
# 1 193525.0768
Alternately, use a data frame instead of a tibble:
data.frame(x = "193525.0768") %>%
mutate(x = as.numeric(x)) %>%
print(digits = 10)
# x
# 1 193525.0768
Related
Have a scenario where I have a lengthy (12 digit) index value being read into r as a double. I need to concact this with some other identifiers, but mutate(x = as.character(x)) converts to scientific format:
index <- c(123000789000, 123456000000, 123000000012)
concact_val <- c("C", "A", "B")
df <-
bind_cols(
as_tibble(index),
as_tibble(concact_val)
)
df %>%
mutate(index = as.character(index))
This outputs:
index concact_val
1.23e11 C
1.23e11 A
1.23e11 B
Whereas ideally I'd like to be able to do this:
df %>%
mutate(index = as.character(index),
index = paste0(concact_val, index)) %>%
select(-concact_val)
to output:
index
C123000789000
A123456000000
B123000000012
Is there a way around this? In this example, I created a vector for the index, but in the frame I'm reading in it's being read as a double via an API (unfortunately, I can't change the col type prior to reading in, it's being read differently than read_csv).
Use sprintf:
df %>%
mutate(result = sprintf("%s%0.0f", concact_val, index))
# # A tibble: 3 x 3
# index concact_val result
# <dbl> <chr> <chr>
# 1 123000789000 C C123000789000
# 2 123456000000 A A123456000000
# 3 123000000012 B B123000000012
If there is the chance that some index have fractional components, this will round them silently. If that's a concern (and you don't want to round), you can instead use floor(index) inside the sprintf.
We may use as.bigz from gmp
paste0(concact_val, gmp::as.bigz(index))
[1] "C123000789000" "A123456000000" "B123000000012"
Or another option is to specify the scipen in options to avoid converting to scientific format
options(scipen = 999)
In addition to sprintf and gmp solutions, we may try another option like below as a programming practice
f <- function(x) {
res <- c()
while (x) {
res <- append(res, x %% 10)
x <- x %/% 10
}
paste0(rev(res), collapse = "")
}
paste0(concact_val, Vectorize(f)(index))
# [1] "C123000789000" "A123456000000" "B123000000012"
I tried reading through R's documentation on the add_column function, but I'm a little confused as to the examples it provides. See below:
# add_column ---------------------------------
df <- tibble(x = 1:3, y = 3:1)
df %>% add_column(z = -1:1, w = 0)
df %>% add_column(z = -1:1, .before = "y")
# You can't overwrite existing columns
try(df %>% add_column(x = 4:6))
# You can't create new observations
try(df %>% add_column(z = 1:5))
What is the purpose of these letters that are being assigned a range? Eg:
z = 1:5
My understanding from the documentation is that add_column() takes in a dataframe and appends it in position based on the .before and .after arguments defaulting to the end of the dataframe.
I'm a little confused here. There is also a "..." argument that takes in Name-value pairs. Is that what I'm seeing with "z = 1:5"? What is the functional purpose of this?
data.frame columns always have a name in R, no exception.
Since add_column adds new columns, you need to specify names for these columns.
… well, technically you don’t need to. The following works:
df %>% add_column(1 : 3)
But add_column auto-generates the column name based on the expression you pass it, and you might not like the result (in this case, it’s literally 1:3, which isn’t a convenient name to work with).
Conversely, the following also works and is perfectly sensible:
z = 1 : 3
df %>% add_column(z)
Result:
# A tibble: 3 x 3
x y z
<int> <int> <int>
1 1 3 1
2 2 2 2
3 3 1 3
I am wondering with the following code does not work. Because pipe is not compatible in mutate?
tibble(x = c(1,2), y = c(3,4)) %>%
mutate(z = {. %>% (function(tb) {tb$x + tb$y})})
I know a workaround is
tibble(x = c(1,2), y = c(3,4)) %>%
mutate(z = map_depth(., .depth = 0, function(tb) {tb$x + tb$y}))
or
tibble(x = c(1,2), y = c(3,4)) %>%
mutate(z = exec(function(tb) {tb$x + tb$y}, .))
This works as you are expecting:
tibble(x = c(1,2), y = c(3,4)) %>%
mutate(z = {(.) %>% (function(tb) {tb$x + tb$y})})
# # A tibble: 2 x 3
# x y z
# <dbl> <dbl> <dbl>
# 1 1 3 4
# 2 2 4 6
The problem isn't the pipe, but rather that . seems to be interpreted as a function (which throws off the pipe).
Edit:
#Aramis7d provided a link to the documentation for magrittr in a comment. The relevant line is:
Using the dot-place holder as lhs
When the dot is used as lhs, the result will be a functional sequence, i.e. a function which applies the entire chain of right-hand sides in turn to its input. See the examples.
So in your example, you were trying to assign an entire function to z within the mutate. You can see this based on the error message returned. By using (.), we force evaluation of the . and get results as expected.
tibble(x = c(1,2), y = c(3,4)) %>%
mutate(z = {. %>% (function(tb) {tb$x + tb$y})})
# Error: Column `z` is of unsupported type function
Interesting scenario indeed.
Without any more specific use cases, this seems like the %>% operator is not at all required even if you want to use anonymous functions within the mutate() .
tibble(x = c(1,2), y = c(3,4)) %>%
mutate(z = {(function(tb){tb$x + tb$y})(.)})
returns:
# A tibble: 2 x 3
x y z
<dbl> <dbl> <dbl>
1 1 3 4
2 2 4 6
How can I parse and evaluate a column of string expressions in R as part of a pipeline?
In the example below, I produce my desired column, evaluated. But I know this isn't the right approach. I tried taking a tidyverse approach. But I'm just very confused.
library(tidyverse)
df <- tibble(name = LETTERS[1:3],
to_evaluate = c("1-1+1", "iter+iter", "4*iter-1"),
evaluated = NA)
iter = 1
for (i in 1:nrow(df)) {
df[i,"evaluated"] <- eval(parse(text=df$to_evaluate[[i]]))
}
print(df)
# # A tibble: 3 x 3
# name to_evaluate evaluated
# <chr> <chr> <dbl>
# 1 A 1-1+1 1
# 2 B iter+iter 2
# 3 C 4*iter-1 3
As part of a pipeline, I tried:
df %>% mutate(evaluated = eval(parse(text=to_evaluate)))
df %>% mutate(evaluated = !!parse_exprs(to_evaluate))
df %>% mutate(evaluated = parse_exprs(to_evaluate))
df %>% mutate(evaluated = eval(parse_expr(to_evaluate)))
df %>% mutate(evaluated = parse_exprs(to_evaluate))
df %>% mutate(evaluated = eval(parse_exprs(to_evaluate)))
df %>% mutate(evaluated = eval_tidy(parse_exprs(to_evaluate)))
None of these work.
You can try:
df %>%
rowwise() %>%
mutate(iter = 1,
evaluated = eval(parse(text = to_evaluate))) %>%
select(-iter)
name to_evaluate evaluated
<chr> <chr> <dbl>
1 A 1-1+1 1
2 B iter+iter 2
3 C 4*iter-1 3
Following this logic, also other possibilities could work. Using rlang::parse_expr():
df %>%
rowwise() %>%
mutate(iter = 1,
evaluated = eval(rlang::parse_expr(to_evaluate))) %>%
select(-iter)
On the other hand, I think it is important to quote #Martin Mächler:
The (possibly) only connection is via parse(text = ....) and all good
R programmers should know that this is rarely an efficient or safe
means to construct expressions (or calls). Rather learn more about
substitute(), quote(), and possibly the power of using
do.call(substitute, ......).
Here's a slightly different way that does everything within mutate.
df %>% mutate(
evaluated = pmap_dbl(., function(name, to_evaluate, evaluated)
eval(parse(text=to_evaluate)))
)
# A tibble: 3 x 3
name to_evaluate evaluated
<chr> <chr> <dbl>
1 A 1-1+1 1
2 B iter+iter 2
3 C 4*iter-1 3
Note that values of additional variables (such as iter=1 in your case) can be passed directly to eval():
df %>%
mutate( evaluated = map_dbl(to_evaluate, ~eval(parse(text=.x), list(iter=1))) )
One advantage is that it automatically restricts the scope of the variable, keeping its value right next to where it is used.
I have data like this, below are the 3 rows from my data set:
total=7871MB;free=5711MB;used=2159MB;shared=0MB;buffers=304MB;cached=1059MB;
free=71MB;total=5751MB;shared=3159MB;used=5MB;buffers=30MB;cached=1059MB;
cached=1059MB;total=5751MB;shared=3159MB;used=5MB;buffers=30MB;free=109MB;
Expected output as below,
total free used shared buffers cached
7871MB 5711MB 2159MB 0MB 304MB 1059MB
5751MB 71MB 5MB 3159MB 30MB 1059MB
5751MB 109MB 5MB 3159MB 30MB 1059MB
and the problem here is I want to make different columns using above data like total value, free value, used value, shared value.
I can do that by splitting using ; but in other rows values are getting shuffled, like first value coming as free then total followed by other values,
Is there any way using REGEX in , if we find total get value till ; and put into one column, if we find free get value till ; and put into another column?
Here is one possibility using strsplit.
df <- as.data.frame(matrix(unlist(lapply(strsplit(x, ";"), strsplit, "=")), nrow = 2))
colnames(df) = df[1,]
df = df[-1,]
df
# total free used shared buffers cached
# 2 7871MB 5711MB 2159MB 0MB 304MB 1059MB
Edit
I don't know how your data are structured. But you can do something like the following:
x <- "total=7871MB;free=5711MB;used=2159MB;shared=0MB; buffers=304MB;cached=1059MB;
free=71MB;total=5751MB;shared=3159MB;used=5MB;buffers=30MB;cached=1059MB;
cached=1059MB;total=5751MB;shared=3159MB;used=5MB;buffers=30MB;free=109MB;"
x %>% str_split("\n") %>% unlist() %>% as_tibble() %>%
mutate(total = str_extract(value, "total=(.*?)MB;"),
free = str_extract(value, "free=(.*?)MB;"),
used = str_extract(value, "used=(.*?)MB;"),
shared = str_extract(value, "shared=(.*?)MB;"),
buffers = str_extract(value, "buffers=(.*?)MB;"),
cached = str_extract(value, "cached=(.*?)MB;")) %>%
select(-value) %>%
mutate_all(~as.numeric(str_extract(.,"[[:digit:]]+")))
# # A tibble: 3 x 6
# total free used shared buffers cached
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 7871. 5711. 2159. 0. 304. 1059.
# 2 5751. 71. 5. 3159. 30. 1059.
# 3 5751. 109. 5. 3159. 30. 1059.
We can try using strsplit followed by sub to separate the data from the labels. Then, create a data frame using this data:
x <- 'total=7871MB;free=5711MB;used=2159MB;shared=0MB;buffers=304MB;cached=1059MB;'
y <- unlist(strsplit(x, ';'))
names <- sapply(y, function(x) gsub("=.*$", "", x))
data <- sapply(y, function(x) gsub(".*=", "", x, perl=TRUE))
df <- data.frame(names=names, data=data)
df
Demo