In r is there a way to add the comma separator to thousands on a Venn diagram.
venn.plot <- VennDiagram::draw.pairwise.venn(10000, 7000, 3000, c("First", "Second"), scaled = FALSE)
grid::grid.draw(venn.plot)
The resulting chart looks like the one below.
It does not look like this function was designed to do that. If you really want to use this function, you could "hack" it to replace the default formatting code it uses for labels. Note that this method is very fragile since we are editing particular "lines" of code. First make a copy of the function
myvenn <- VennDiagram::draw.pairwise.venn
Here's the default formatter
body(myvenn)[[46]]
# wrapLab <- function(num) {
# stri = ""
# if (print.mode[1] == "percent") {
# stri <- paste(signif(num * 100/denom, digits = sigdigs),
# "%", sep = "")
# if (isTRUE(print.mode[2] == "raw")) {
# stri <- paste(stri, "\n(", num, ")", sep = "")
# }
# }
# if (print.mode[1] == "raw") {
# stri <- num
# if (isTRUE(print.mode[2] == "percent")) {
# stri <- paste(stri, "\n(", paste(signif(num * 100/denom,
# digits = sigdigs), "%)", sep = ""), sep = "")
# }
# }
# return(stri)
# }
Let's replace that with a call to prettyNum to add the commas
body(myvenn)[[46]][[3]] <- quote(function(x) {
prettyNum(x ,big.mark=",",scientific=FALSE)
})
Now we can call our version of the function
venn.plot <- myvenn(10000, 7000, 3000, c("First", "Second"), scaled = FALSE)
grid::grid.draw(venn.plot)
You could also edit the items manually.
venn.plot[[5]][["label"]] <- "7,000"
venn.plot[[6]][["label"]] <- "4,000"
venn.plot[[7]][["label"]] <- "3,000"
grid::grid.draw(venn.plot)
Here is another way with a loop
venn.plot <- VennDiagram::draw.pairwise.venn(10000, 7000, 3000, c("First", "Second"), scaled = FALSE)
for(i in 1:length(venn.plot)){
if(!is.null(venn.plot[[i]][["label"]]) &&
!is.na(as.numeric(venn.plot[[i]][["label"]]))
) {
venn.plot[[i]][["label"]] <- prettyNum(venn.plot[[i]][["label"]], big.mark = ",")
}
}
Warning messages:
1: NAs introduced by coercion
2: NAs introduced by coercion
grid::grid.draw(venn.plot)
Related
I am following the tutorial here to practice rayshader. However, when I used the code below I get this error:
Error: C stack usage 17812428 is too close to the limit
See in the tutorial, when you come to the step which is the code posted in the question. I get this error. Sample data is provided in the tutorial.
Code:
tbl_census_2018 <- read.px("data/census_2018.px") %>% # Load & format
as_tibble()
Package pxR's GitHub page has more info about the read.px function which I am pasting below if it helps.
How can I fix this?
#################################################################
#
# File: read.px.R
# Purpose: reads a PC-Axis file into R
#
# Created: 20110618
# Authors: fvf, cjgb, opl
#
# Modifications:
# 20111210, cjgb: in the data string, "-" may represent the value 0
# 20111210, cjgb: fixing the strsplit when the split character is contained in the data part
# 20120329, cjgb: number strings in the DATA part can contain ";" as separators.
# Although deprecated, cases still lurk.
# 20130228, cjgb: There can be ; inside quoted strings that create chaos
# 20130608 fvf: Ability to read files with keys in data area.
# ":" added to defaut na.string (EuroStat files)
# 20130624: use str_split (line 91) to read DATA area
# 20130917, cjgb: changes to prevent errors with EOL characteres
# 20131115, cjgb: some files do not have heading (or stub): only one of
# them is really required
# 20131118, cjgb: fixed a bug happening when missing (i.e. "..") was the last value in DATA
# fixing it required that the last quote was not eliminated (same for first quote)
# 20141222, fvf: fixing some bug in relation to read files with KEYS (sparse array)
# 20150211, fvf: The parameter "encoding" is NULL by default. "encoding" is determined by
# the file itself: if CHARSET="ANSI" then "latin1" else "CP437".
# 20150212. fvf: I have to delete => 20130917, cjgb: tmp[2] <- gsub(";.*", "", tmp[2])
# many px-files have a semicolon at the end of line in DATA area:
# i.e: read.px('http://www.ine.es/pcaxisdl//t20/e245/p05/a2002/l0/00004001.px')
# 20150216. fvf minor correction of a bug in the modification: 20150211,fvf
# 20150219. fvf Solving a bug: a missing "DROP=FALSE" was producing a read error on files with a single key
#################################################################
read.px <- function(filename, encoding = NULL,
na.strings = c('"."', '".."', '"..."', '"...."', '"....."', '"......"', '":"')) {
## auxiliary functions ##
clean.spaces <- function(x){
gsub("^[[:space:]]+|[[:space:]]+$", "", x) # discards heading|trailing whitespace
}
get.attributes <- function(x){
x <- gsub( "([A-Z-]*)\\((.*)\\).*", "\\1;\\2", x ) ## separates label-attribute with ";"
x <- ldply(strsplit(x, ";"),
function(y) c(y, "value")[1:2])
}
break.clean <- function(x) {
x <- clean.spaces( strsplit(x, split = '\\"')[[1]] ) ## breaks by '"'
x[! x %in% c("," , "")] ## and drops spurious seps
}
## end: auxiliary functions ##
# modification by fvf (150211): Determining the character encoding used in the file => encoding
if (is.null(encoding)) {
charset <- readLines(filename, 5) # read the first five lines
encoding <- ifelse(any(grepl('CHARSET.*ANSI', charset, ignore.case = T)),
"latin1", "CP437") # comprobado en debian y osx
}
a <- scan(filename, what = "character", sep = "\n", quiet = TRUE, fileEncoding = encoding)
# modification by fvf: 130608
a <- paste(a, collapse = "\n") # Se mantienen "CR/LF luego se quitaran selectivamente
tmp <- strsplit( a, "DATA=" )[[1]]
tmp[1] <- gsub("\n", " ", tmp[1]) # fvf[130608]: elimina CR de la cabecera
tmp[2] <- gsub(";", "", tmp[2]) # fvf[150212] (la modificacion rev 92 a 94) da multiples problemas en INEBase
# i.e: read.px('http://www.ine.es/pcaxisdl//t20/e245/p05/a2002/l0/00004001.px')
# en muchos ficheros cada linea del area DATA tiene ";" antes del "EOL"
# lo que produce que solo se lea la primera de las lineas de datos
a <- paste(tmp[1], "DATA=", tmp[2], sep = "")
## modification by cjgb, 20130228 concerning line separators within quoted strings
## ; is the logical line end in px files
## so we should do:
## a <- unlist(strsplit(a, ";"))
## but there might be ; inside quoted strings
## so we need the following workaround:
punto.coma <- str_locate_all(a, ";")[[1]][,1] # where the ";" are
comillas <- str_locate_all(a, '"')[[1]][,1] # where the '"' are
## ";" not after an odd number of '"'
## these are the proper "cuts"
cortes <- Filter( function(x) sum(comillas < x) %% 2 == 0, punto.coma )
a <- str_sub(a, c(1, cortes + 1), c(cortes - 1, str_length(a)))
a <- a[!is.na(a)]
a <- a[a != ""]
## end of modification by cjgb, 20130228 concerning line separators within quoted strings
# change strsplit by str-split. In big px-files:
# "Error: C stack usage is too close to the limit"
a <- do.call(rbind, str_split(a, "=", n = 2))
## fvf.20141222: not chage to factor: ++ stringsAsFactors=F)
a <- data.frame(cbind(get.attributes(a[, 1]), a[, 2], stringsAsFactors=F))
colnames(a) <- c("label", "attribute", "value")
## build a px object: list with px class attribute ##
a$label <- make.names(clean.spaces(a$label))
a$attribute <- make.names(clean.spaces(gsub('\\"', "", a$attribute)))
# need to avoid that quotes are removed in DATA part because of a bug:
# a case was reported where the data part ended in ".." and the last quote was erased
# and this affected the scan function below
a.data <- as.character(a[a$label == "DATA", "value"])
a.value <- gsub('^\\"|\\"$', "", a$value) # removes " at beginning / end
a.value[a$label == "DATA"] <- a.data
names(a.value) <- a$attribute
px <- tapply(a.value, a$label, as.list)
## these metadata keys contain vectors (comma separated)
## we need to split them (and clean the mess: extra spaces, etc.)
px$STUB$value <- if(!is.null(px$STUB)) make.names(break.clean(px$STUB$value))
px$HEADING$value <- if(!is.null(px$HEADING)) make.names(break.clean(px$HEADING$value))
px$VALUES <- lapply(px$VALUES, break.clean)
# fvf.20141222: if there are not CODES, do not create CODES
if (!is.null(px$CODES))
px$CODES <- lapply(px$CODES, break.clean)
# fvf.20141222: Sustituye ["~~~~" "~~~~~"] por ["~~~~~"\n"~~~~"] en
# campos multilinea con retornos perdidos (simplifica la lectura humana)
px <- lapply(px, function(e){
if (!is.null(e$value))
e$value <- gsub('"[[:space:]]+"', '"\n"', e$value)
e
})
#### read the data part into a 'melted' dataframe ###
## there are two cases: files with/without KEYS keyword
## which need to be processed independently
# fvf[130608]: add to to read files with keys in data area
if ("KEYS" %in% a$label ){
## read the whole block
tc <- textConnection(px$DATA$value); on.exit( close(tc) )
raw <- read.table(tc, sep = ",", colClasses = "factor")
## extract and process the data part (the numbers)
data.part <- as.character(raw[, ncol(raw)] ) # numbers (last column of the data.frame)
data.part <- gsub('"-"', 0, data.part) # 0's might be encoded as "-"
data.part <- scan(text = data.part, na.strings = na.strings, quiet = T)
## extract and process the keys part (it needs to be staked a number of times,
## as many as there are entries in the data vector in each row in the block)
keys.part <- raw[, -ncol(raw), drop = FALSE]
keys.part <- keys.part[ rep(1:nrow(keys.part), each = length(data.part) / nrow(keys.part) ), , drop = FALSE ]
colnames(keys.part) <- names(px$KEYS)
## change CODES (if any) in keys part to VALUES (consistency issue)
# for (col.name in colnames(keys.part)[unlist(px$KEYS) == "CODES"])
# keys.part[[col.name]] <- mapvalues(keys.part[[col.name]],
# from = px$CODES[[col.name]],
# to = px$VALUES[[col.name]])
# fvf.20141222:
for (col.name in colnames(keys.part)){
if (px$KEYS[[col.name]] == 'CODES') {
keys.part[[col.name]] <- factor(keys.part[[col.name]], levels = px$CODES[[col.name]])
levels(keys.part[[col.name]]) <- px$VALUES[[col.name]] ## all levels a VALUES
} else keys.part[[col.name]] <- factor(keys.part[[col.name]], levels = px$VALUES[[col.name]] )
}
## extract and process the variables that are not keys
no.keys.part <- px$VALUES[setdiff(names(px$VALUES), names(px$KEYS))]
no.keys.part <- expand.grid(rev(no.keys.part))
## put everything together & cleanup
px$DATA$value <- data.frame( keys.part,
no.keys.part,
value = data.part,
row.names = NULL)
}
else
{
tmp <- gsub('"-"', 0, px$DATA$value) # 0 can be encoded as "-"
tmp <- gsub("\n", " ", tmp) # delete CR/LF of DATA area fvf[130608]
tc <- textConnection(tmp); on.exit( close(tc) )
raw <- scan(tc, na.strings = na.strings, quote = NULL, quiet = TRUE)
names.vals <- c( rev(px$HEADING$value), rev( px$STUB$value ) )
output.grid <- data.frame(do.call(expand.grid, px$VALUES[names.vals]))
# sanity check: avoids the problem of "reclycling" of values if
# the ratio of lenghts of variables and values is an exact integer
if (nrow(output.grid) != length(raw))
stop( "The input file is malformed: data and varnames length differ" )
px$DATA$value <- data.frame(output.grid, raw)
colnames(px$DATA$value) <- c(names.vals, "value")
}
class(px) <- "px"
px
}
The issue should have been fixed with the new code release (version 0.42.6).
Below is the reproducible code which needs to be pasted in PowerBI R script visualization.
I'm making some customizations to the default process_map object.
The visual works on my current desktop but gives out error when published to PowerBI Web.
# The following code to create a dataframe and remove duplicated rows is always executed and acts as a preamble for your script:
# dataset <- data.frame(Column1)
# dataset <- unique(dataset)
# Paste or type your script code here:
library(bupaR)
library(DiagrammeR)
library(tidyverse)
library(lubridate)
# convert to proper date
processMap <- patients %>%
process_map(sec = performance(median, "hours")
,type = frequency("relative_case")
,type_edges = frequency("absolute_case")
,rankdir = "LR"
,layout = layout_pm(edge_weight = TRUE)
,fixed_edge_width = F
,render = F
)
# customisation(label, color, font)
processMap$nodes_df$label <- stringr::str_replace_all(processMap$nodes_df$label, c('ARTIFICIAL_START' = 'Start', 'ARTIFICIAL_END' = 'End'))
processMap$nodes_df$color <- stringr::str_replace_all(processMap$nodes_df$color, c('chartreuse4' = '#769e00', 'brown4' = '#7a0f2d'))
processMap$nodes_df$fontcolor <- stringr::str_replace_all(processMap$nodes_df$fontcolor, c('chartreuse4' = '#769e00', 'brown4' = '#7a0f2d'))
processMap$nodes_df$fontname <- stringr::str_replace_all(processMap$nodes_df$fontname, c('Arial' = 'Calibri'))
processMap$edges_df$fontname <- stringr::str_replace_all(processMap$edges_df$fontname, c('Arial' = 'Calibri'))
# change edge with
processMap$edges_df$penwidth <- scales::rescale(processMap$edges_df$penwidth, to = c(0.5, 4))
# custom duration edges
edges_label <- processMap$edges_df$label
tmp <- regmatches(edges_label, gregexpr("\\(.*?\\)", edges_label))
tmp <- gsub("[\\(\\)]", "", tmp)
tmp <- stringr::str_replace_all(tmp, c('character0' = '0', ' hours'=''))
tmp <- as.numeric(tmp)*60*60
tmp <- lubridate::as.duration(tmp)
tmp <- regmatches(tmp, gregexpr("\\(.*?\\)", tmp))
tmp <- gsub("[\\(\\)]", "", tmp)
edges_label_duration <- stringr::str_replace_all(tmp, c('character0' = '<1 hour'))
edges_label_count_absolute_case <- gsub("\n.*","",edges_label)
edges_label_clean <- paste0(edges_label_count_absolute_case, '\n',edges_label_duration)
# remove those which should have no hour()
correct_label <- gsub("\n.*","",edges_label_clean[!grepl("hour", edges_label)])
edges_label_clean[!grepl("hour", edges_label)] <- correct_label
# assign to process object
processMap$edges_df$label <- edges_label_clean
DiagrammeR::export_graph(graph = processMap, file_type = 'png', file_name = 'processMap.png')
Below is the visual output in PowerBI desktop
When published to the PowerBI portal, it gives the following error
Error in if (any(ind)) Encoding(x[ind]) <- "bytes" :
missing value where TRUE/FALSE needed
In addition: Warning message:
Prefixing `UQ()` with the rlang namespace is deprecated as of rlang 0.3.0.
Please use the non-prefixed form or `!!` instead.
# Bad:
rlang::expr(mean(rlang::UQ(var) * 100))
# Ok:
rlang::expr(mean(UQ(var) * 100))
# Good:
rlang::expr(mean(!!var * 100))
rlang's on desktop is rlang_0.4.10 while the PowerBI Web has a rlang_0.3.0
I ended up re-writing the customisation code so that there's no function within a function which is referenced by the error code below
# Bad:
rlang::expr(mean(rlang::UQ(var) * 100))
I suspect the related function is this one
regmatches(edges_label, gregexpr("\\(.*?\\)", edges_label))
Couldn't really definitively say it's the case but anyways the updated code below generates no more error when published to PowerBI service/web.
library(bupaR)
library(DiagrammeR)
library(tidyverse)
library(lubridate)
# convert to proper date
processMap <- patients %>%
process_map(sec = performance(median, "hours")
,type = frequency("relative_case")
,type_edges = frequency("absolute_case")
,rankdir = "LR"
,layout = layout_pm(edge_weight = TRUE)
,fixed_edge_width = F
,render = F
)
# customisation(label, color, font)
processMap$nodes_df$label <- stringr::str_replace_all(processMap$nodes_df$label, c('ARTIFICIAL_START' = 'Start', 'ARTIFICIAL_END' = 'End'))
processMap$nodes_df$color <- stringr::str_replace_all(processMap$nodes_df$color, c('chartreuse4' = '#769e00', 'brown4' = '#7a0f2d'))
processMap$nodes_df$fontcolor <- stringr::str_replace_all(processMap$nodes_df$fontcolor, c('chartreuse4' = '#769e00', 'brown4' = '#7a0f2d'))
processMap$nodes_df$fontname <- stringr::str_replace_all(processMap$nodes_df$fontname, c('Arial' = 'Calibri'))
processMap$edges_df$fontname <- stringr::str_replace_all(processMap$edges_df$fontname, c('Arial' = 'Calibri'))
# change edge with
processMap$edges_df$penwidth <- scales::rescale(processMap$edges_df$penwidth, to = c(0.5, 4))
# custom duration edges
edges_label <- processMap$edges_df$label
# extract within bracket
tmp <- stringr::str_extract(string = edges_label,
pattern = "(?<=\\().*(?=\\))")
tmp <- stringr::str_replace_all(tmp, c(' hours'=''))
tmp[is.na(tmp)] <- '0'
tmp <- as.numeric(tmp)*60*60
tmp <- lubridate::as.duration(tmp)
tmp <- stringr::str_extract(string = tmp,
pattern = "\\(.*?\\)")
tmp[is.na(tmp)] <- '(<1 hour)'
edges_label_duration <- tmp
edges_label_count_absolute_case <- gsub("\n.*","",edges_label)
edges_label_clean <- paste0(edges_label_count_absolute_case, '\n',edges_label_duration)
# remove those which should have no hour()
correct_label <- gsub("\n.*","",edges_label_clean[!grepl("hour", edges_label)])
edges_label_clean[!grepl("hour", edges_label)] <- correct_label
# assign to process object
processMap$edges_df$label <- edges_label_clean
DiagrammeR::export_graph(graph = processMap, file_type = 'png', file_name = 'processMap.png')
I was trying to use the scales package to produce formatting for a complex table. The following helper applies the label_percent and label_number_si functions from the scales package.
For some reason, the negative_parens = TRUE is not producing the correct output:
prettify_numbers_as_percent <- function(x){
lapply(as.list(as.numeric(x)),label_percent(accuracy = 1, suffix = "%", negative_parens = TRUE, sep = " ")) %>%
unlist() %>%
return() }
prettify_numbers_as_si <- function(x){
lapply(as.list(as.numeric(x)), label_number_si(accuracy = 1, negative_parens = TRUE, sep = " ")) %>%
unlist() %>%
return()
}
When I run
prettify_numbers_as_si(50000)
prettify_numbers_as_percent(0.05)
I get the expected output:
"50K"
"5%"
When I run
prettify_numbers_as_si(-50000)
prettify_numbers_as_percent(-0.05)
I get the incorrect output, despite the fact that negative_parens = TRUE is set:
"-50K"
"-5%"
Does anyone know why parameter setting is failing?
The issue is that while other functions in the scales package have negative_parens= as arguments, label_percent and label_number_si do not. Consequently, you have to write in that logic to your functions:
new_percent <- function(x){
perc <- lapply(
as.list(as.numeric(x)),
label_percent(
accuracy=1, suffix = '%', sep=' '
)
) %>%
unlist()
for(i in 1:length(perc)){
if(substring(perc[i],1,1)=='-'){
perc[i] <- paste0('(',substring(perc[i],2),')')
}
}
return(perc)
}
new_numbers <- function(x){
nums <- lapply(
as.list(as.numeric(x)),
label_number_si(
accuracy = 1, sep = " "
)
) %>%
unlist()
for(i in 1:length(nums)){
if (substring(nums[i],1,1)=='-'){
nums[i] <- paste0('(',substring(nums[i],2),')')
}
}
return(nums)
}
Since you know each value in your return that needs to be in parentheses will start with a "-", I'm using a for loop and substring() to iterate through each item and convert those that start with "-" to start and end with parentheses. Works pretty well:
test_perc <- c(-0.5, -0.05, 1.2, .23, -0.13)
test_nums <- c(6000,-60000, 74000, -56000000)
> new_percent(test_perc)
[1] "(50%)" "(5%)" "120%" "23%" "(13%)"
> new_numbers(test_nums)
[1] "6K" "(60K)" "74K" "(56M)"
I have list of data.frame as an output of custom function, so I intend to split each data.frame by its last column, where threshold is given. However, I manipulated the two list nicely, and combined them to get only one table. But I have an error when manipulating this new table. I can't figure out where is issue come from. How can I fix this error ? Can anyone point me out to possibly fix this error ? If this error can be fixed, I want to implement wrapper. How can I easily manipulate list of data.frame ? Any better idea to debug the error ?
mini example :
savedDF <- list(
bar = data.frame(.start=c(12,21,37), .stop=c(14,29,45), .score=c(5,9,4)),
cat = data.frame(.start=c(18,42,18,42,81), .stop=c(27,46,27,46,114), .score=c(10,5,10,5,34)),
foo = data.frame(.start=c(3,3,33,3,33,91), .stop=c(24,24,10,24,10,17), .score=c(22,22,6,22,6,7))
)
discardedDF <- list(
bar = data.frame(.start=c(16,29), .stop=c(20,37), .score=c(2,11)),
cat = data.frame(.start=c(21,31), .stop=c(23,43), .score=c(1,9)),
foo = data.frame(.start=c(54, 79), .stop=c(71,93), .score=c(3,8))
)
I can manipulate this way :
both <- do.call("rbind", c(savedDF, discardedDF))
cn <- c("letter", "seq")
# FIXME :
DF <- cbind(
read.table(text = chartr("_", ".", rownames(both)), header=T, sep = ".", col.names = cn),
both)
DF <- transform(DF, isPassed = ifelse(.score > 8, "Pass", "Fail"))
by(DF, DF[c("letter", "isPassed")],
function(x) write.csv(x[-(1:length(savedDF))],
sprintf("%s_%s_%s.csv", x$letter[1], x$isPassed[1])))
But I have an error
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
line 15 did not have 2 elements
Why I have this error ? Can anyone point me out how to fix this ?
my desired output is list of CSV file as follows :
bar.saved.Pass.csv
bar.saved.Fail.csv
bar.discarded.Pass.csv
bar.discarded.Fail.csv
cat.saved.Pass.csv
cat.saved.Fail.csv
cat.discarded.Pass.csv
cat.discarded.Fail.csv
foo.saved.Pass.csv
foo.saved.Fail.csv
foo.discarded.Pass.csv
foo.discarded.Fail.csv
But I think controlling exported CSV files still not desired. How can I improve functionality of this wrapper ? I intend to let use choose output directory by custom, or more dynamic would be nice. Any idea ? Thanks a lot
Is this what you are looking for?
library(tidyverse)
library(magrittr)
both <- do.call("rbind", c(savedDF, discardedDF))
both %<>% rownames_to_column(var = "cn")
both %<>% separate(cn, c("letters", "seq"), sep = "\\.")
both %<>% mutate(isPassed = ifelse(.score > 8, "Passed", "Failed"),
isDiscard = ifelse(is.na(seq), "Saved", "Discarded"))
list_of_dfs <- both %>% split(list(.$letters, .$isPassed, .$isDiscard))
csv_names <- paste0("/Users/nathanday/Desktop/", names(list_of_dfs), ".csv") # change this path
mapply(write.csv, list_of_dfs, csv_names)
The %<>% operator is short hand so both %<>% rownames_to_columm(var = "cn") is identical to both <- rownames_to_column(both, var = "cn")
To make it more "dynamic" for allowing output path input, you could wrap this in the function structure you already have like this:
output_where <- function(output_path, list1, list2) {
if (!dir.exists(output_path)) {
dir.create(file.path(output_path))
}
both <- do.call(rbind, c(list1, list2))
both %<>% rownames_to_column(var = "cn")
both %<>% separate(cn, c("letters", "seq"), sep = "\\.")
both %<>% mutate(isPassed = ifelse(.score > 8, "Passed", "Failed"), isDiscard = ifelse(is.na(seq), "Saved", "Discarded"))
list_of_dfs <- both %>% split(list(.$letters, .$isPassed, .$isDiscard))
csv_names <- paste0(output_path, names(list_of_dfs), ".csv")
return(mapply(write.csv, list_of_dfs, csv_names))
}
output_where("~/Desktop/", savedDF, discardedDF)
for even more dynamics:
output_where <- function(output_path, list1, list2) {
if (!dir.exists(output_path)) {
dir.create(file.path(output_path))
}
names(list1) <- paste("list1", names(list1), sep = ".")
names(list2) <- paste("list2", names(list2), sep = ".")
both <- do.call(rbind, c(list1, list2))
both %<>% rownames_to_column(var = "cn")
both %<>% separate(cn, c("original_list", "letters", "seq"), sep = "\\.")
both %<>% mutate(isPassed = ifelse(.score > 8, "Passed", "Failed"))
list_of_dfs <- both %>% split(list(.$letters, .$isPassed, .$original_list))
csv_names <- paste0(output_path, names(list_of_dfs), ".csv")
return(mapply(write.csv, list_of_dfs, csv_names))
}
I need help determining how I can use the input for the function below as an input for another r file.
Hotel <- function(hotel) {
require(data.table)
dat <- read.csv("demo.csv", header = TRUE)
dat$Date <- as.Date(paste0(format(strptime(as.character(dat$Date),
"%m/%d/%y"),
"%Y/%m"),"/1"))
library(data.table)
table <- setDT(dat)[, list(Revenue = sum(Revenues),
Hours = sum(Hours),
Index = mean(Index)),
by = list(Hotel, Date)]
answer <- na.omit(table[table$Hotel == hotel, ])
if (nrow(answer) == 0) {
stop("invalid hotel")
}
return(answer)
}
I would input Hotel("Hotel Name")
Here's the other R file using the Hotel name I inputted above.
#Reads the dataframe from the Hotel Function
star <- (Hotel("Hotel Name"))
#Calculates the Revpolu and Index
Revpolu <- star$Revenue / star$Hours
Index <- star$Index
png(filename = "~/Desktop/result.png", width = 480, height= 480)
plot(Index, Revpolu, main = "Hotel Name", col = "green", pch = 20)
testing <- cor.test(Index, Revpolu)
write.table(testing[["p.value"]], file = "output.csv", sep = ";", row.names = FALSE, col.names = FALSE)
dev.off()
I would like for this part to become automated instead of having to copy and paste from the first file an input and then storing it as a variable. Or if it's easier, then make all of this just one function.
Also instead of having to input one Hotel name for the function. Is it possible to make the first file read all the hotel names if they are identified as row names in the .csv file and have that input read in the second file?
Since your example is not reproducible and your code has some bugs (using the column "Rooms" which is not produced by your function), I can't give you a tested answer, but here's how you can structure your code to produce the statistics you want for all hotels without having to copy and paste hotel names:
library(data.table)
# Use fread instead of read.csv, it's faster
dat <- fread("demo.csv", header = TRUE)
dat[, Date := as.Date(paste0(format(strptime(as.character(Date), "%m/%d/%y"), "%Y/%m"),"/1"))
table <- dat[, list(
Revenue = sum(Revenues),
Hours = sum(Hours),
Index = mean(Index)
), by = list(Hotel, Date)]
# You might want to consider using na.rm=TRUE in cor.test instead of
# using na.omit, but I kept it here to keep the result similar.
answer <- na.omit(table)
# Calculate Revpolu inside the data.table
table[, Revpolu := Revenue / Hours]
# You can compute a p-value for all hotels using a group by
testing <- table[, list(p.value = cor.test(Index, Revpolu)[["p.value"]]), by=Hotel]
write.table(testing, file = "output.csv", sep = ";", row.names = FALSE, col.names = FALSE)
# You can get individual plots for each hotel with a for loop
hotels <- unique(table$Hotel)
for (h in hotels) {
png(filename = "~/Desktop/result.png", width = 480, height= 480)
plot(table[Hotel == h, Index], table[Hotel == h, Revpolu], main = h, col = "green", pch = 20)
dev.off()
}