Have a scenario where I have a lengthy (12 digit) index value being read into r as a double. I need to concact this with some other identifiers, but mutate(x = as.character(x)) converts to scientific format:
index <- c(123000789000, 123456000000, 123000000012)
concact_val <- c("C", "A", "B")
df <-
bind_cols(
as_tibble(index),
as_tibble(concact_val)
)
df %>%
mutate(index = as.character(index))
This outputs:
index concact_val
1.23e11 C
1.23e11 A
1.23e11 B
Whereas ideally I'd like to be able to do this:
df %>%
mutate(index = as.character(index),
index = paste0(concact_val, index)) %>%
select(-concact_val)
to output:
index
C123000789000
A123456000000
B123000000012
Is there a way around this? In this example, I created a vector for the index, but in the frame I'm reading in it's being read as a double via an API (unfortunately, I can't change the col type prior to reading in, it's being read differently than read_csv).
Use sprintf:
df %>%
mutate(result = sprintf("%s%0.0f", concact_val, index))
# # A tibble: 3 x 3
# index concact_val result
# <dbl> <chr> <chr>
# 1 123000789000 C C123000789000
# 2 123456000000 A A123456000000
# 3 123000000012 B B123000000012
If there is the chance that some index have fractional components, this will round them silently. If that's a concern (and you don't want to round), you can instead use floor(index) inside the sprintf.
We may use as.bigz from gmp
paste0(concact_val, gmp::as.bigz(index))
[1] "C123000789000" "A123456000000" "B123000000012"
Or another option is to specify the scipen in options to avoid converting to scientific format
options(scipen = 999)
In addition to sprintf and gmp solutions, we may try another option like below as a programming practice
f <- function(x) {
res <- c()
while (x) {
res <- append(res, x %% 10)
x <- x %/% 10
}
paste0(rev(res), collapse = "")
}
paste0(concact_val, Vectorize(f)(index))
# [1] "C123000789000" "A123456000000" "B123000000012"
Related
I have a data.frame with a column that looks like that:
diagnosis
F.31.2,A.43.2,R.45.2,F.43.1
I want to somehow split this column into two colums with one containing all the values with F and one for all the other values, resulting in two columns in a df that looks like that.
F other
F.31.2,F43.1 A.43.2,R.45.2
Thanks in advance
Try next tidyverse approach. You can separate the rows by , and then create a group according to the pattern in order to reshape to wide and obtain the expected result:
library(dplyr)
library(tidyr)
#Data
df <- data.frame(diagnosis='F.31.2,A.43.2,R.45.2,F.43.1',stringsAsFactors = F)
#Code
new <- df %>% separate_rows(diagnosis,sep = ',') %>%
mutate(Group=ifelse(grepl('F',diagnosis),'F','Other')) %>%
pivot_wider(values_fn = toString,names_from=Group,values_from=diagnosis)
Output:
# A tibble: 1 x 2
F Other
<chr> <chr>
1 F.31.2, F.43.1 A.43.2, R.45.2
First, use strsplit at the commas. Then, using grep find indexes of F, and select/antiselect them by multiplying by 1 or -1 and paste them.
tmp <- el(strsplit(d$diagnosis, ","))
res <- lapply(c(1, -1), function(x) paste(tmp[grep("F", tmp)*x], collapse=","))
res <- setNames(as.data.frame(res), c("F", "other"))
res
# F other
# 1 F.31.2,F.43.1 A.43.2,R.45.2
Data:
d <- setNames(read.table(text="F.31.2,A.43.2,R.45.2,F.43.1"), "diagnosis")
I am trying to get the whole number 193525.0768 but it gets its decimals removed (?). Please explain it to me.
df <- tibble(
x = "193525.0768"
) %>%
mutate(x = as.numeric(x))
print(df, digits = 10) # decimals removed. I expect it to maintain the decimals numbers
# A tibble: 1 x 1
x
<dbl>
1 193525.
df[1,1][[1]] # decimals removed
# 193525
x <- "193525.0768"
print(as.numeric(x), digits = 10) # decimals not removed
# 193525.0768
You have a printing issue, not a reading-in issue. The tibble print method doesn't take a digits argument - see ?print.tbl for details. You can use print.data.frame explicitly to bypass the tibble print method and use the data.frame print method instead, which does take a digits argument:
tibble(x = "193525.0768") %>%
mutate(x = as.numeric(x)) %>%
print.data.frame(digits = 10)
# x
# 1 193525.0768
Or you can change the default with the pillar.sigfig option (which is mentioned in ?print.tbl). The default is 3 - which is confusing because if I were to take that literally I would expect 193525.0768 to print as 194000... there's probably documentation in the pillar package explaining the reasoning.
options(pillar.sigfig = 10)
tibble(x = "193525.0768") %>%
mutate(x = as.numeric(x))
# x
# 1 193525.0768
Alternately, use a data frame instead of a tibble:
data.frame(x = "193525.0768") %>%
mutate(x = as.numeric(x)) %>%
print(digits = 10)
# x
# 1 193525.0768
I have a messy, highly nested, list:
m <- list('form' = list('elements' = list('name' = 'Bob', 'code' = 12), 'name' = 'Mary', 'code' = 15))
> m
$form
$form$elements
$form$elements$name
[1] "Bob"
$form$elements$code
[1] 12
$form$name
[1] "Mary"
$form$code
[1] 15
How can I extract from the object m the name and code, regardless as to how nested name and code appears within a list?
Expected output:
# A tibble: 2 x 2
name code
<chr> <dbl>
1 Bob 12
2 Mary 15
1) rrapply Flatten m using rrapply giving r and then separate the name and code fields of unlist(r) using tapply, remove the dimensions using c, convert to data.frame and set the order of the columns.
Note that this is not hard coded to name and code and would work with other fields and numbers of fields.
library(rrapply)
r <- rrapply(m, f = c, how = "flatten")
nms <- names(r)
as.data.frame(c(tapply(unname(r), nms, unlist)))[unique(nms)]
giving:
name code
1 Bob 12
2 Mary 15
An alternative to the final two lines of code above would be:
out <- unstack(stack(r))
out[] <- lapply(out, type.convert)
If there could be other fields in m in addition to name and code that we want ignored then use this in place of the statement that defines r above:
cond <- function(x, .xname) .xname %in% c("name", "code")
r <- rrapply(m, cond, c, how = "flatten")
2) Base R A base R solution is the following which unlists m, and then uses tapply as in (1) grouping by the suffixes of names(r). Like (1) this is a general approach that is not hard coded to name and code. Note that tools comes with R so it is part of Base R.
r <- unlist(m)
nms <- tools::file.ext(names(r))
as.data.frame(c(tapply(unname(r), nms, unlist)))[unique(nms)]
This could help formating the list into a dataframe and then reshaping it:
library(tidyverse)
#Process
y1 <- as.data.frame(lapply(m,unlist),stringsAsFactors = F)
y1$id <- rownames(y1)
rownames(y1)<-NULL
#Dplyr mutation
y1 %>% mutate(Var=ifelse(grepl('name',id,),'name',
ifelse(grepl('code',id),'code',NA))) %>%
select(-id) %>% group_by(Var) %>%
mutate(i=1:n())%>% pivot_wider(names_from = Var,values_from = form) %>%
select(-i) %>% mutate(code=as.numeric(code))
Output:
# A tibble: 2 x 2
name code
<chr> <dbl>
1 Bob 12
2 Mary 15
I have data like this, below are the 3 rows from my data set:
total=7871MB;free=5711MB;used=2159MB;shared=0MB;buffers=304MB;cached=1059MB;
free=71MB;total=5751MB;shared=3159MB;used=5MB;buffers=30MB;cached=1059MB;
cached=1059MB;total=5751MB;shared=3159MB;used=5MB;buffers=30MB;free=109MB;
Expected output as below,
total free used shared buffers cached
7871MB 5711MB 2159MB 0MB 304MB 1059MB
5751MB 71MB 5MB 3159MB 30MB 1059MB
5751MB 109MB 5MB 3159MB 30MB 1059MB
and the problem here is I want to make different columns using above data like total value, free value, used value, shared value.
I can do that by splitting using ; but in other rows values are getting shuffled, like first value coming as free then total followed by other values,
Is there any way using REGEX in , if we find total get value till ; and put into one column, if we find free get value till ; and put into another column?
Here is one possibility using strsplit.
df <- as.data.frame(matrix(unlist(lapply(strsplit(x, ";"), strsplit, "=")), nrow = 2))
colnames(df) = df[1,]
df = df[-1,]
df
# total free used shared buffers cached
# 2 7871MB 5711MB 2159MB 0MB 304MB 1059MB
Edit
I don't know how your data are structured. But you can do something like the following:
x <- "total=7871MB;free=5711MB;used=2159MB;shared=0MB; buffers=304MB;cached=1059MB;
free=71MB;total=5751MB;shared=3159MB;used=5MB;buffers=30MB;cached=1059MB;
cached=1059MB;total=5751MB;shared=3159MB;used=5MB;buffers=30MB;free=109MB;"
x %>% str_split("\n") %>% unlist() %>% as_tibble() %>%
mutate(total = str_extract(value, "total=(.*?)MB;"),
free = str_extract(value, "free=(.*?)MB;"),
used = str_extract(value, "used=(.*?)MB;"),
shared = str_extract(value, "shared=(.*?)MB;"),
buffers = str_extract(value, "buffers=(.*?)MB;"),
cached = str_extract(value, "cached=(.*?)MB;")) %>%
select(-value) %>%
mutate_all(~as.numeric(str_extract(.,"[[:digit:]]+")))
# # A tibble: 3 x 6
# total free used shared buffers cached
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 7871. 5711. 2159. 0. 304. 1059.
# 2 5751. 71. 5. 3159. 30. 1059.
# 3 5751. 109. 5. 3159. 30. 1059.
We can try using strsplit followed by sub to separate the data from the labels. Then, create a data frame using this data:
x <- 'total=7871MB;free=5711MB;used=2159MB;shared=0MB;buffers=304MB;cached=1059MB;'
y <- unlist(strsplit(x, ';'))
names <- sapply(y, function(x) gsub("=.*$", "", x))
data <- sapply(y, function(x) gsub(".*=", "", x, perl=TRUE))
df <- data.frame(names=names, data=data)
df
Demo
Assuming I have a dataframe, df with this info
group wk source revenue
1 1 C 100
1 1 D 200
1 1 A 300
1 1 B 400
1 2 C 500
1 2 D 600
I'm trying to programatically filter's down to rows of unique combinations of group, wk and source, and then perform some operations on them, before combining them back into another dataframe. I want to write a function that can scale to any number of segments (and not just the example scenario here) and filter down rows. All I need to pass would be the column names by which I want to segment
eg.
seg <- c("group", "wk", "source")
One unique combination to filter rows in df would be
df %>% filter(group == 1 & wk == 1 & source == "A")
I wrote a recursive function (get_rows) to do so, but it doesn't seem to do what I want. Could anyone provide inputs on where I'm going wrong ?
library(dplyr)
filter_row <- function(df,x)
{
df %>% filter(group == x$group & wk == x$wk & source == x$source)
}
seg <- c("group", "wk", "source")
get_rows <- function(df,seg,pos = 1, l = list())
{
while(pos <= (length(seg) + 1))
{
if(pos <= length(seg))
for(j in 1:length(unique(df[,seg[pos]])))
{
k <- unique(df[,seg[pos]])
l[seg[pos]] <- k[j]
get_rows(df,seg,pos+1,l)
return()
}
if(pos > length(seg))
{
tmp <- df %>% filter_row(l)
<call some function on tmp>
return()
}
}
}
get_rows(df,seg)
EDIT: I understand there are prebuilt methods I can use to get what I need, but I'm curious about where I'm going wrong in the recursive function I wrote.
There might be a data.table/dplyr solution out there, but this one is pretty simple.
# Just paste together the values of the column you want to aggregate over.
# This creates a vector of factors
f <- function(data, v) {apply(data[,v,drop=F], 1, paste, collapse = ".")}
# Aggregate, tapply, ave, and a few more functions can do the same thing
by(data = df, # Your data here
INDICES = f(df, c("group", "wk", "source")), # Your data and columns here
FUN = identity, simplify = F) # Your function here
Can also use library(dplyr) and library(data.table)
df %>% data.table %>% group_by(group, wk, source) %>% do(yourfunctionhere, use . for x)