Undesirable behaviour - as.double rounds the decimals - r

I've got following data:
ex <- structure(list(X1 = c("0", "2912.99", "922.1", "772.9100000000001",
"7112.97", "933.09", "1190.03")), row.names = c(NA, -7L), .Names = "X1", class =
c("tbl_df", "tbl", "data.frame"))
To my surprise, when I'm trying to convert them to double, decimals are rounded up to integers.
ex %>% mutate(X1 = as.double(X1))
I have tried to change the digits options with options(digits = 22), but it didn't help. What causes this problem? Is it a matter of using dplyr::mutate? How this can be alternatively converted to double?

One may choose the number of digits printed by setting the option
of the pillar package, which is used by tibble.
> options(pillar.sigfig=16)
> ex %>% mutate(X1 = as.double(X1))
# A tibble: 7 x 1
X1
<dbl>
1 0.
2 2912.990000000000
3 922.1000000000000
4 772.9100000000001
5 7112.970000000000
6 933.0900000000000
7 1190.030000000000

If you want tibbles to be printed with more than 3 decimals as a default, you need to adjust the pillar.sigfig option as well:
options(pillar.sigfig = 22)
options(digits = 22)
ex %>% mutate(X1 = as.double(X1))
X1
<dbl>
1 0.00000000000000
2 2912.98999999999978
3 922.10000000000002
4 772.91000000000008
5 7112.97000000000025
6 933.09000000000003
7 1190.02999999999997

Related

Convert dates into numbers

I need your support while working with dates.
While importing an .xls file, the column of dates was correctly converted into numbers by R. Unfortunately some dates are still there in the format: dd/mm/yyyy or d/mm/yyyy or dd/mm/yy. Probably this results from different settings of different os. I don't know. Is there a way to manage this?
Thank you in advance
my_data <- read_excel("my_file.xls")
born_date
18520
30859
16/04/1972
26612
30291
24435
11/02/1964
26/09/1971
18427
23688
Original_dates
14/9/1950
26/6/1984
16/04/1972
9/11/1972
6/12/1982
24/11/1966
11/02/1964
26/09/1971
13/6/1950
Here is one way how we could solve it:
First we define the numeric values only by exlcuden those containing the string /.
Then we use excel_numeric_to_date function from janitor package.
Finally with coalesce we combine both:
library(dplyr)
library(janitor)
library(lubridate)
df %>%
mutate(x = ifelse(str_detect(born_date, '\\/'), NA_real_, born_date),
x = excel_numeric_to_date(as.numeric(as.character(x)), date_system = "modern"),
born_date = dmy(born_date)) %>%
mutate(born_date = coalesce(born_date, x), .keep="unused")
born_date
1 1950-09-14
2 1984-06-26
3 1972-04-16
4 1972-11-09
5 1982-12-06
6 1966-11-24
7 1964-02-11
8 1971-09-26
9 1950-06-13
10 1964-11-07
data:
df <- structure(list(born_date = c("18520", "30859", "16/04/1972",
"26612", "30291", "24435", "11/02/1964", "26/09/1971", "18427",
"23688")), class = "data.frame", row.names = c(NA, -10L))
1) This translates the two types of dates. Each returns an NA for those elements not of that type. Then we use coalesce to combine them. This only needs dplyr and no warnings are produced.
library(dplyr)
my_data %>%
mutate(born_date = coalesce(
as.Date(born_date, "%d/%m/%Y"),
as.Date(as.numeric(ifelse(grepl("/",born_date), NA, born_date)), "1899-12-30"))
)
## born_date
## 1 1950-09-14
## 2 1984-06-26
## 3 1972-04-16
## 4 1972-11-09
## 5 1982-12-06
## 6 1966-11-24
## 7 1964-02-11
## 8 1971-09-26
## 9 1950-06-13
## 10 1964-11-07
2) Here is a base R version.
my_data |>
transform(born_date = pmin(na.rm = TRUE,
as.Date(born_date, "%d/%m/%Y"),
as.Date(as.numeric(ifelse(grepl("/",born_date), NA, born_date)), "1899-12-30"))
)
Note
The input in reproducible form.
my_data <-
structure(list(born_date = c("18520", "30859", "16/04/1972",
"26612", "30291", "24435", "11/02/1964", "26/09/1971", "18427",
"23688")), class = "data.frame", row.names = c(NA, -10L))

How to Remove characters that doesn't match the string pattern from a column of a data frame

I have a column in my data frame as shown below.
I want to keep the data in the pattern "\\d+Zimmer" and remove all the digits from the column such as "9586" and "927" in the picture.
I tried following gsub function.
gsub("[^\\d+Zimmer]", "", flat_cl_one$rooms)
But it removes all the digits, as below.
What Regex can I use to get the correct result? Thank You in Advance
We can coerce any rows that have alphanumeric characters to NA and then replace the rows that don't have NA to blanks.
library(dplyr)
flat_cl_one %>%
mutate(rooms = ifelse(!is.na(as.numeric(rooms)), "", rooms))
Or we can use str_detect:
flat_cl_one %>%
mutate(rooms = ifelse(str_detect(rooms, "Zimmer", negate = TRUE), "", rooms))
Output
rooms
1 647Zimmer
2 394Zimmer
3
4
5 38210Zimmer
We could do the same thing with filter if you wanted to actually remove those rows.
flat_cl_one %>%
filter(is.na(as.numeric(rooms)))
# rooms
#1 647Zimmer
#2 394Zimmer
#3 38210Zimmer
Data
flat_cl_one <- structure(list(rooms = c("647Zimmer", "394Zimmer", "8796", "9389",
"38210Zimmer")), class = "data.frame", row.names = c(NA, -5L))
Just replace strings that don't contain the word "Zimmer"
flat_cl_one$room[!grepl("Zimmer", flat_cl_one$room)] <- ""
flat_cl_one
#> room
#> 1 3Zimmer
#> 2 2Zimmer
#> 3 2Zimmer
#> 4 3Zimmer
#> 5
#> 6
#> 7 3Zimmer
#> 8 6Zimmer
#> 9 2Zimmer
#> 10 4Zimmer
Data
flat_cl_one <- data.frame(room = c("3Zimmer", "2Zimmer", "2Zimmer", "3Zimmer",
"9586", "927", "3Zimmer", "6Zimmer",
"2Zimmer", "4Zimmer"))
Another possible solution, using stringr::str_extract (I am using #AndrewGillreath-Brown's data, to whom I thank):
library(tidyverse)
df <- structure(
list(rooms = c("647Zimmer", "394Zimmer", "8796", "9389", "38210Zimmer")),
class = "data.frame",
row.names = c(NA, -5L))
df %>%
mutate(rooms = str_extract(rooms, "\\d+Zimmer"))
#> rooms
#> 1 647Zimmer
#> 2 394Zimmer
#> 3 <NA>
#> 4 <NA>
#> 5 38210Zimmer
This pattern [^\\d+Zimmer] matches any character except a digit or the following characters + Z i m etc...
Using gsub, you can check if the string does not start with the pattern ^\\d+Zimmer using a negative lookahead (?! setting perl = TRUE and then match 1 or more digits if the assertion it true.
gsub("^(?!^\\d+Zimmer\\b)\\d+\\b", "", flat_cl_one$rooms, perl = TRUE)
See an R demo.

Change character string by key at R

here is a dataframe for example:
test_df <- structure(list(plant_id = c("AB1234", "CC0000", "ZX9998", "AA1110", "LO8880"),
NewName = c("ZY8765", "XX9999", "AC0001", "ZZ8889", "OL1119")),
row.names = c(NA, -5L), class = "data.frame",
.Names = c("plant_sp", "NewName"))
As you can see, there is a column call "plant_sp" with a 6 character code.
I'd like to tansform this code to a new code (like at column "NewName" by this format:
For letters:
A-Z
B-Y
C-X
D-W
E-V
F-U
G-T
.
.
.
For numbers:
0-9
1-8
2-7
3-6
4-5
5-4
.
.
.
plant_sp NewName
1 AB1234 ZY8765
2 CC0000 XX9999
3 ZX9998 AC0001
4 AA1110 ZZ8889
5 LO8880 OL1119
So that each character will get the opposite one by its value (0=9, 1=8... A=Z, B=Y...)
How can I do it? a pipe solution would be great.
Thanks a lot!
One option to achieve your desired result would be via a lookup table and stringr::str_replace_all:
library(dplyr)
library(stringr)
lt_letters <- setNames(rev(LETTERS), LETTERS)
lt_numbers <- setNames(rev(0:9),0:9)
test_df %>%
mutate(NewName1 = str_replace_all(plant_sp, "[A-Z0-9]", function(x) c(lt_letters, lt_numbers)[x]))
#> plant_sp NewName NewName1
#> 1 AB1234 ZY8765 ZY8765
#> 2 CC0000 XX9999 XX9999
#> 3 ZX9998 AC0001 AC0001
#> 4 AA1110 ZZ8889 ZZ8889
#> 5 LO8880 OL1119 OL1119

How to add value base on specific character ,also fix with certain digits in R

There is the basic width : xxxx.xxxxxx (4digits before "." 6 digits after".")
Have to add "0" when each side before and after "." is not enough digits.
Use regexr find "[.]" location with combination of str_pad can
fix the first 4 digits but
don't know how to add value after the specific character with fixed digits.
(cannot find a library can count the location from somewhere specified)
Data like this
> df
Category
1 300.030340
2 3400.040290
3 700.07011
4 1700.0901
5 700.070114
6 700.0791
7 3600.05059
8 4400.0402
Desired data
> df
Category
1 0300.030340
2 3400.040290
3 0700.070110
4 1700.090100
5 0700.070114
6 0700.079100
7 3600.050590
8 4400.040200
I am beginner of coding that sometime can't understand some regex like "["
e.t.c .With some explain of them would be super helpful.
Also i have a combination like this :
df$Category<-ifelse(regexpr("[.]",df$Category)==4,
paste("0",df1$Category,sep = ""),df$Category)
df$Category<-str_pad(df$Category,11,side = c("right"),pad="0")
Desire to know are there is any better way do this , especially count and
return the location from the END until specific character appear.
Using formatC:
df$Category <- formatC(as.numeric(df$Category), format = 'f', width = 11, flag = '0', digits = 6)
# > df
# Category
# 1 0300.030340
# 2 3400.040290
# 3 0700.070110
# 4 1700.090100
# 5 0700.070114
# 6 0700.079100
# 7 3600.050590
# 8 4400.040200
format = 'f': formating doubles;
width = 11: 4 digits before . + 1 . + 6 digits after .;
flag = '0': pads leading zeros;
digits = 6: the desired number of digits after the decimal point (format = "f");
Input df seems to be character data.frame:
structure(list(Category = c("300.030340", "3400.040290", "700.07011",
"1700.0901", "700.070114", "700.0791", "3600.05059", "4400.0402"
)), .Names = "Category", row.names = c(NA, -8L), class = "data.frame")
We can use sprintf
df$Category <- sprintf("%011.6f", df$Category)
df
# Category
#1 0300.030340
#2 3400.040290
#3 0700.070110
#4 1700.090100
#5 0700.070114
#6 0700.079100
#7 3600.050590
#8 4400.040200
data
df <- structure(list(Category = c(300.03034, 3400.04029, 700.07011,
1700.0901, 700.070114, 700.0791, 3600.05059, 4400.0402)),
.Names = "Category", class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8"))
There are plenty of great tricks, functions, and shortcuts to be learned, and I would encourage you to explore them all! For example, if you're trying to win code golf, you will want to use #akrun's sprintf() approach. Since you stated you're a beginner, it might be more helpful to breakdown the problem into its component parts. One transparent and easy-to-follow, in my opinion, approach would be to utilize the stringr package:
library(stringr)
location_of_dot <- str_locate(df$Category, "\\.")[, 1]
substring_left_of_dot <- str_sub(df$Category, end = location_of_dot - 1)
substring_right_of_dot <- str_sub(df$Category, start = location_of_dot + 1)
pad_left <- str_pad(substring_left_of_dot, 4, side = "left", pad = "0")
pad_right <- str_pad(substring_right_of_dot, 6, side = "right", pad = "0")
result <- paste0(pad_left, ".", pad_right)
result
Use separate in tidyr to separate Category on decimal. Use str_pad from stringr to add zeros in the front or back and paste them together.
library(tidyr) # to separate columns on decimal
library(dplyr) # to mutate and pipes
library(stringr) # to strpad
input_data <- read.table(text =" Category
1 300.030340
2 3400.040290
3 700.07011
4 1700.0901
5 700.070114
6 700.0791
7 3600.05059
8 4400.0402", header = TRUE, stringsAsFactors = FALSE) %>%
separate(Category, into = c("col1", "col2")) %>%
mutate(col1 = str_pad(col1, width = 4, side= "left", pad ="0"),
col2 = str_pad(col2, width = 6, side= "right", pad ="0"),
Category = paste(col1, col2, sep = ".")) %>%
select(-col1, -col2)

reshape (converting to character)

i have a the following matrix
data=structure(list(Metric = c("x", "y"), Result1 = c(9,
18), Result2 = c(7, 14
), Delta = c(-170, -401)), .Names = c("Metric",
"Result_1", "Result_2", "Delta"), row.names = c(NA,
-2L), class = "data.frame")
I would like to transform it in 1 row and to double the number of columns
I tried the following (as.vector(t(data))). However when I do that it transofrms everything to character and i loose data information
Any help?
We can split the data frame and then use bind_cols from the dplyr package. Although I am not sure this is your expected output as you did not provide any examples, just description.
dplyr::bind_cols(split(data, data$Metric))
Metric Result_1 Result_2 Delta Metric1 Result_11 Result_21 Delta1
1 x 9 7 -170 y 18 14 -401
In Base R
dd=stack(data)
A=dd$values
names(A)=dd$ind
data.frame(as.list(A))
Metric Metric.1 Result_1 Result_1.1 Result_2 Result_2.1 Delta Delta.1
1 x y 9 18 7 14 -170 -401

Resources