loading combined data in R

loading combined data in R - r

I have got several Excel files (see link: we.tl/t-qJl3kVcY0j) that I combine together.
Each Excel file have columns from A to AF as below:
t-ph
Load
HR
BF
V'E
V'O2
V'CO2
d O2/dW
RER
EqO2
EqCO2
PETCO2
VES (ml)
VESi (ml/m²)
FC (bpm)
QC (l/min)
IC (l/min/m²)
PAS (mmHg)
PAD (mmHg)
PAM (mmHg)
ICT
TEV (ms)
RPD (%)
WCI (kg.m/m²)
RVSi (dyn.s/cm5.m²)
RVS (dyn.s/cm5)
VTD est (ml)
FE est (%)
O2Hb
HHb
tHb
HbDiff
My previous code was working, but since I have added columns AC (HHb) to AF (HbDiff), I can't load them anymore. I have tried to match the number of columns, the title, but still not. I can't produce a reproducible example since I can't load my data.
Here is the code I used:
pacman::p_load(tidyverse, readxl, ggpubr)
library(dplyr)
library(ggplot2)
library(afex) ##statistic package
# load data and format
load_files <- function(files){
temp <- read_excel(files) %>%
select(-(c(8:11, 14:15, 18:23))) %>%
mutate(id = pull(.[4,1])) %>% ##ID
mutate(body_mass = pull(.[7,3])) %>% ##body mass
mutate(training = pull(.[4,2])) %>% ##training group
set_names(c("time", "power", "hr", "fr", "VE", "absVO2", "VCO2", "PETCO2", "VES", "QC", "IC", "WCI", "RVSi", "RVS", "VTD", "FE", "O2Hb", "HHb", "tHb", "HbDiff", "id", "body_mass", "training")) %>%
slice(86:which(grepl("ration", VE))-1) %>% ##until recovery period
mutate_at(vars(1:16), as.numeric) %>%
mutate_at(vars(18), as.numeric) %>%
mutate(time = format(as.POSIXct(Sys.Date() + time), "%H:%M", tz="UTC"),
absVO2 = absVO2/1000,
VCO2 = VCO2/1000)
}
# apply function to all files
df <- map_df(file_list, load_files)
# remove those with who have less than four similar power
df <- df %>%
mutate(len_seq = rep(rle(power)$lengths, rle(power)$lengths)) %>%
filter(len_seq == 4) %>%
mutate(seq_id = rep(1:(n()/4), each = 4)) %>%
group_by(id) %>%
select(-seq_id)%>%
select(-(20))
# group data
df_sum <- df %>%
type.convert(as.is = TRUE) %>%
group_by(id, power, training) %>%
summarise_if(is.numeric, mean) %>%
group_by(id) %>%
mutate(percent_absVO2 = absVO2/max(absVO2)*100,
percent_power = power/max(power)*100,
percent_QC = QC/max(QC)*100,
percent_SV = VES/max(VES)*100,
percent_VCO2 = VCO2/max(VCO2)*100,
percent_VE = VE/max(VE)*100) %>%
mutate(VE_VO2 = VE/absVO2,
VE_VCO2 = VE/VCO2) %>%
mutate(RER = VCO2/absVO2, VT = VE/fr) %>%
mutate(relVO2 = absVO2/body_mass*1000,
percent_relVO2 = relVO2/max(relVO2)*100) %>%
mutate(BF = VE/VT) %>%
mutate(mech_perf = (power/(((0.003*power+0.1208)*1000*body_mass)/60))*100) %>%
mutate(group = ifelse(grepl(".*-PRD-C", id), "CAD", "Healthy")) %>%
mutate(temps = ifelse(grepl(".*-PRD-C1", id), "1", ifelse(grepl(".*-PRD-S1", id), "1", "2")))
Then the outcome I get:
Error in `set_names()`:
! The size of `nm` (23) must be compatible with the size of `x` (20).
Run `rlang::last_error()` to see where the error occurred.
Thank you very much for your precious help.

Related

Grouping By Multiple Selection Answer for Likert Package

I wanted to create a likert graph that is grouped by Question i. I can create the likert graph for total responses ungrouped, but im uncertain of how to reformat question 6 without losing the column for question i. (aka do the reformatting done below but also have it take into account who selected what in question i.)
What I want is the sufficiency of Q6 grouped by their answer in question i.
Sample Dataframe:
SurveyClean2 <- data.frame(i = c("Mail,Email", "Mail", "Mail,Email,Podcast", "Radio,Podcast", "Radio", "Mail,Radio"), Q6_3 = c("Not Sufficient", "Very Sufficient", "Completely Sufficient", "Moderately Sufficient", "Moderately Sufficient", "Not Sufficient"))
Unnesting Question i:
UnnestQi <- SurveyClean2 %>%
as_tibble() %>%
mutate(i = str_split(Q3, ",")) %>%
unnest(i)
Survey2Q6 <- UnnestQi |> drop_na(Q5) |> drop_na(i)
Reformating Question 6 to Likert-friendly format:
clean_survey <- function(data, column, question) {
data %>%
dplyr::select(all_of({{column}})) %>%
dplyr::mutate(Question = question) %>%
dplyr::group_by(Question, across(1)) %>%
dplyr::count() %>%
dplyr::ungroup() %>%
tidyr::pivot_wider(names_from = 2, values_from = n)
}
# table that contains survey questions/columns and the question name
survey_table <- dplyr::tibble(
column = c("Q6_3"),
question = c("Expert advice")
)
# loop through your data and clean it, then bind as dataframe
LikertGroupqi62 <- purrr::map2_df(survey_table$column, survey_table$question, function(x, y){
clean_survey(Survey2Q6, x, y)}) |>
mutate(across(everything(), ~ifelse(is.na(.), 0, .)))
## Likert
LikertGroupqi62 <- LikertGroupqi62 |> dplyr::select(Question, `Not Sufficient`, `Slightly Sufficient`, `Moderately Sufficient`, `Very Sufficient`, `Completely Sufficient`)
Likert WITHOUT grouping:
likert(Question~., LikertGroupqi62, ReferenceZero = 0, auto.key.in = list(columns = 1), main = list("Sufficiency of Cost-share Advice Based on Person or Agency Worked With"), col = c("#db6d00", "#924900", "#000000", "#004949", "#009292"),strip.left = FALSE, ylab = "", xlab = "Total Number of Respondents")

How to remove error while is loading in Shiny

I'm trying to build a Shiny App, everything works ok, but my issue is at the beginning, the first time that my app is launched i get an error in my highcharts due the size of the data (more than 3M of rows),
After 10 seconds the error disapear and everithing looks ok, but i want to remove the error, now i'm using waiter package, loading screeen is displayed 1.5 seconds, then the error appear and later the graph is showed .
I want to use Waiter package to hide this error until every calculation is finished. This is the Error
Below here my code for the graph
# Graph for shortInterest tab By CvsI (bars) --Dynamic
output$graph_bars_shortInterest_hc <- renderHighchart({
waiter_show(
id = "graph_bars_shortInterest_hc",
html = tagList(spin_fading_circles(),
"Loading Model ..."),
color = "#63666a",
logo = "",
hide_on_render = !is.null(id)
)
Client <- subset(Data_russel, Metrics == "marketCap") %>%
filter(Value >= input$MC_bars_[1])%>%
filter(Value <= input$MC_bars_[2])%>%
select(Client_Name) %>% unique()
Client_2 <- subset(Data_russel, Metrics == "Annual_Limit_Adequacy") %>%
filter(Value >= input$AL_filter_[1])%>%
filter(Value <= input$AL_filter_[2])%>%
select(Client_Name) %>% unique()
Data_Metric <- subset(Data_russel, Metrics == "shortInterest" & Industry %in% input$industry_CvsI_bars)
Client_filtered <- inner_join(Client, Client_2, by = "Client_Name")
Data_ <- inner_join(Client_filtered, Data_Metric, by = "Client_Name") # Clients in the range of Selected Market cap
Data_c <- subset(Data_russel, Metrics == "shortInterest" & Industry %in% input$industry_CvsI_bars & Client_Name == input$clientname_CvsI_bars)
Table_ <- seq(input$perc_range_[1], input$perc_range_[2], 1) %>% as.data.frame()
names(Table_) <- "Percentile"
Table_$Value <- round( quantile(Data_c$Value, Table_$Percentile/100), digits = 2)
Table_$Industry <- round( quantile(Data_$Value, Table_$Percentile/100), digits = 2)
hc_1 <- Table_ %>%
hchart(. , type = "line", hcaes(x = Percentile, y = Value), name = "Client", color = "#FFB81C") %>%
hc_add_series(data = Table_ ,type = 'line' , color = "#00a0d2", name = "Industry", hcaes(x = Percentile, y = Industry))%>%
hc_yAxis(opposite = TRUE) %>%
hc_title(text = "shortInterest Benchmark", margin = 30,
align = "center",
style = list(color = "#702080", useHTML = TRUE)) %>%
hc_yAxis(max = max(Table_$Industry)+(sd(Table_$Industry)/5))%>%
hc_yAxis(min = min(Table_$Industry)-(sd(Table_$Industry)/5))%>%
hc_add_theme(hc_theme_google())
hc_1
})
Thanks !!

I fixed using next function, and using each output in the UI into this function
output %>% withSpinner(
type = getOption("spinner.type", default = 3),
color.background = getOption("spinner.color.background", default = "#C8D7DF" ),
color="#00A0D2")
}```

Text transform issue inserting image in a gt table

I had previously asked about inserting images in gt tables here and gotten a lot of help. But I am now encountering a new issue. The difficult part is that I am having trouble creating a minimal example.
Consider this code from the answer to my original post
library(gt)
library(magrittr)
library(purrr)
library(ggplot2)
# Let's make some pngs
mtcars %>%
split(.$cyl) %>%
map(~ ggplot(.x, aes(hp, mpg, color = factor(gear))) + geom_point()) %>%
set_names(c("CA", "UT", "OH")) %>%
iwalk(~ ggsave(paste0(.y, "_test.png"), .x))
column_one <- c("CA", "UT", "OH")
# Put the filenames in the column
column_two <- c("CA", "UT", "OH")
dashboard.data <- data.frame(column_one, column_two, stringsAsFactors = FALSE)
names(dashboard.data)[1] <- "State"
names(dashboard.data)[2] <- "IncidenceGauge"
dboard3 <- dashboard.data %>%
gt() %>%
tab_header(
title = md("**Big Title**"),
subtitle = md("*Subtitle*")
) %>%
text_transform(
locations = cells_body(vars(IncidenceGauge)),
fn = function(x) {
# loop over the elements of the column
map_chr(x, ~ local_image(
filename = paste0(.x, "_test.png"),
height = 100
))
}) %>%
cols_label(IncidenceGauge = "Risk Level")
print(dboard3)
It runs fine and produces this image
Now, imagine that I want to create a header row, and in that row there is no image in column two. In my case I create an image called NA_test.png that is just a blank white square. For the example below, you'll have to image that's what NA_test.png is and not a plot from mtcars. You'll see that column two now begins with NA...
library(gt)
library(magrittr)
library(purrr)
library(ggplot2)
# Let's make some pngs
mtcars %>%
split(.$cyl) %>%
map(~ ggplot(.x, aes(hp, mpg, color = factor(gear))) + geom_point()) %>%
set_names(c("NA", "UT", "OH")) %>%
iwalk(~ ggsave(paste0(.y, "_test.png"), .x))
column_one <- c("Header", "UT", "OH")
# Put the filenames in the column
column_two <- c(NA, "UT", "OH")
dashboard.data <- data.frame(column_one, column_two, stringsAsFactors = FALSE)
names(dashboard.data)[1] <- "State"
names(dashboard.data)[2] <- "IncidenceGauge"
dboard3 <- dashboard.data %>%
gt() %>%
tab_header(
title = md("**Big Title**"),
subtitle = md("*Subtitle*")
) %>%
text_transform(
locations = cells_body(vars(IncidenceGauge)),
fn = function(x) {
# loop over the elements of the column
map_chr(x, ~ local_image(
filename = paste0(.x, "_test.png"),
height = 100
))
}) %>%
cols_label(IncidenceGauge = "Risk Level")
print(dboard3)
This runs as well and produces this...again you'll just have to imagine that next to Header NA_test.png is just a blank white image.
My issue is that in my script (much longer, but literally copied in form from this example) when I encounter NA's it seems as if it's treating them like blanks rather then substituting NA. I get error messages that R can't find _test.png. Not NA_test.png as I would expect it to look for, but _test.png.
Here's the dataframe that is passed to gt()
There is no call to text_transform() for the Base column, but there is for the CommunityNewCases column and that's where the error occurs.
Can anyone suggest any reason as to why this is happening?

How to handle NA fields is important to dashboard design. So the real question is - should a dashboard really aggregate a plot of the entries which have "NA" as location? NA is typically a data entry error, so isn't the plot misleading?
As a statistical language, R considers the logical NA as a blank or empty data field. This is by design. And it is powerful, because NA retains its meaning in both character and numeric fields.
To change R's standard behavior, simply reassign an alternative value to the logical NA. For example, your sample code can replace logical NA with the character "NA" to use it in a paste command.
column_two <- c("NA", "UT", "OH")
Or if your data is more complex than this toy example, replace NA with "NA" using tidyr library.
dboard3 <- dashboard.data %>%
tidyr::replace_na("NA") %>%
gt() %>%
...
Alternatively, you could retain the logical NA, but handle with ifelse and if.na() test as shown here...
dboard3 <- dashboard.data %>%
gt() %>%
tab_header(
title = md("**Big Title**"),
subtitle = md("*Subtitle*")
) %>%
text_transform(
locations = cells_body(vars(IncidenceGauge)),
fn = function(x) {
# loop over the elements of the column
map_chr(x, ~ local_image(
filename = ifelse(is.na(.x),
"NA_test.png",
paste0(.x, "_test.png")),
height = 100
))
}) %>%
cols_label(IncidenceGauge = "Risk Level")

Formatting an ftable in R

I have the following 3 way table I created in R.
with(dataset, ftable(xtabs(count ~ dos + sex + edu)))
The output looks like
edu high low medium unknown
dos sex
five-to-ten-years female 247776 44916 127133 23793
male 225403 37858 147821 20383
five-years-or-less female 304851 58018 182152 33649
male 253977 55720 193621 28972
more-than-ten-years female 709303 452605 539403 165675
male 629162 309193 689299 121336
native-born female 1988476 1456792 2094297 502153
male 1411509 1197395 2790522 395953
unknown female 57974 75480 73204 593141
male 40176 57786 93108 605542
I want to rename the variables and format the table so that I can include it in a report. I know that I can use dnn to rename the variables, but are there any other recommendations to rename the variables? And to format the table (similar to using kable)?

You could convert the output to a text matrix using the following function, after which you can style with kable however you choose:
ftab_to_matrix <- function(ft)
{
row_vars <- attr(ft, "row.vars")
for(i in seq_along(row_vars)){
row_vars[[i]] <- c(names(row_vars[i]), row_vars[[i]])}
rowvar_widths <- sapply(row_vars, function(x) max(nchar(x))) + 1
col_vars <- attr(ft, "col.vars")
rowvar_widths <- c(1, cumsum(c(rowvar_widths, max(nchar(names(col_vars))))))
ft_text <- capture.output(print(ft))
row_cols <- sapply(seq_along(rowvar_widths)[-1], function(x)
substr(ft_text, rowvar_widths[x - 1], rowvar_widths[x]))
ft_text <- substr(ft_text, rowvar_widths[length(rowvar_widths)] + 2, 100)
ft_breaks <- c(1, cumsum(lapply(strsplit(ft_text[length(ft_text)], "\\d "),
function(x) nchar(x) + 2)[[1]]))
col_cols <- sapply(seq_along(ft_breaks)[-1], function(x)
substr(ft_text, ft_breaks[x - 1], ft_breaks[x]))
trimws(cbind(row_cols, col_cols))
}
So, for example, using my example data from your last question, you could do something like:
my_tab <- with(`3waydata`, ftable(xtabs(count ~ duration + sex + education)))
as_image(kable_styling(kable(ftab_to_df(my_tab))), file = "kable.png")

Might have been easier had you given the full picture when you asked your first question... You could use gt to make fancy tables for reports. This is an edited version more fully demonstrating some capabilities.
library(dplyr)
library(gt)
way3data <- data %>%
group_by(duration, education, sex) %>%
summarise(count = sum(number)) %>%
ungroup
# Reorder with select and Titlecase with stringr
longer <- tidyr::pivot_wider(way3data,
values_from = count,
names_from = "education") %>%
select(duration, sex, high, medium, low, unknown) %>%
rename_with(stringr::str_to_title)
# Demonstrating some of the features of gt
# obviously could have done some of this
# to the original dataframe
myresults <- longer %>%
group_by(Duration) %>%
gt(rowname_col = "Sex") %>%
row_group_order(
groups = c("native-born",
"more-than-ten-years",
"five-to-ten-years",
"five-years-or-less",
"unknown")
) %>%
tab_spanner(label = "Education",
columns = matches("High|Low|Medium|Unknown")) %>%
tab_stubhead(label = "Duration or something") %>%
tab_style(
style = cell_text(style = "oblique", weight = "bold"),
locations = cells_row_groups()) %>%
tab_style(
style = cell_text(align = "right", style = "italic", weight = "bold"),
locations = cells_column_labels(
columns = vars(High, Low, Medium, Unknown)
)) %>%
tab_style(
style = cell_text(align = "right", weight = "bold"),
locations = cells_stub()) %>%
tab_header(
title = "Fancy table of counts with Duration, Education and Gender") %>%
tab_source_note(md("More information is available at https://stackoverflow.com/questions/62284264."))
# myresults
# Can save in other formats including .rtf
myresults %>%
gtsave(
"tab_1.png", expand = 10
)
You can read about all the formatting choices here
Data compliments of Allan
set.seed(69)
data <- data.frame(education = sample(c("high","low","medium","unknown"), 600, T),
sex = rep(c("Male", "Female"), 300),
duration = sample(c("unknown", "native-born",
"five-years-or-less", "five-to-ten-years",
"more-than-ten-years"), 600, T),
number = rpois(600, 10))

Is there a way to make my code functions work for multiple variables instead of the one i currently have it working for?

I have a dataset which looks as follows (with around 200 individuals):
NAME AGE2012 SEX SurveyDate12 WAZ12 BAZ12 HAZ12 HB12 SurveyDate14 WAZ14 BAZ14 HAZ14 HB14
1 22 Male 2012-11-26 -1.2 -0.54 -0.01 11.9 2014-11-26 -1.5 -0.52 -0.43 12.2
2 26 Female 2012-11-26 -1.5 -0.36 -0.04 11.2 2014-11-26 -1.7 -0.84 -0.32 11.4
I Am currently using them to make a slopegraph, so I have to do certain things with this dataset such as using pivot longer. I am practising to use factors, and so am trying to keep everything I do within a function. I have the code working to make the graph using just HB (see below). But what I would like to do is make this code work for all 4 of the variables I have in my dataset just using functions. Can anyone help me with this?
slopegraph_prep <- function(health_longer, HB){health %>%
select(NAME:SEX, starts_with("HB")) %>%
pivot_longer(cols = starts_with("HB"),
names_to = "Year",
names_prefix = "HB",
values_to = "HB") %>%
mutate(
HB = case_when(
HB < 0 ~ "NA",
TRUE ~ as.character(HB)
)
) %>%
na_if("NA") %>%
mutate(HB = as.numeric(HB)) %>%
mutate(
Year = case_when(
Year=="12" ~ "2012",
Year=="14" ~ "2014",
Year=="19" ~ "2019")
)
}
slopegraph_by_sex <- function(health, HB, SEX){ Subsetdata <- subset(health, SEX == SEX)
newggslopegraph(Subsetdata , Year , HB, NAME,
Title = "Haemoglobin",
SubTitle = SEX,
Caption = NULL,
RemoveMissing = FALSE)
}
df_healthmeas <- slopegraph_prep(health, "HB")
df_healthmeas_female <- slopegraph_by_sex(df_healthmeas, "Haemoglobin", "female")
df_healthmeas_male <- slopegraph_by_sex(df_healthmeas, "Haemoglobin", "male")
What I really want to do is to just be able to run this for example and for it to run, but I feel I will need to make my variables more generic?:
df_healthmeas <- slopegraph_prep(health, "WAZ")
df_healthmeas_female <- slopegraph_by_sex(df_healthmeas, "Weight to Age WAZ", "female")
df_healthmeas_male <- slopegraph_by_sex(df_healthmeas, "Weight to Age WAZ", "male")
Any help with this would be massively appreciated

Hey this is already doable with your code, you olny have to specify the function argument:
df_healthmeas <- slopegraph_prep(health_longer = health, HB =WAZ)
the variables you specified in the fiunction code is only a "dummy" so you could re-write it for better readabiliy as such:
slopegraph_prep <- function(data, var){data %>%
select(NAME:SEX, starts_with("var")) %>%
pivot_longer(cols = starts_with("var"),
names_to = "Year",
names_prefix = "var",
values_to = "var") %>%
mutate(
var = case_when(
var < 0 ~ "NA",
TRUE ~ as.character(var)
)
) %>%
na_if("NA") %>%
mutate(var = as.numeric(var)) %>%
mutate(
Year = case_when(
Year=="12" ~ "2012",
Year=="14" ~ "2014",
Year=="19" ~ "2019")
)
}

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

loading combined data in R - r

Related

Grouping By Multiple Selection Answer for Likert Package

How to remove error while is loading in Shiny

Text transform issue inserting image in a gt table

Formatting an ftable in R

Is there a way to make my code functions work for multiple variables instead of the one i currently have it working for?

Categories

Resources