I have below dataframe which is categorized by the column Category
> dput(mydata)
structure(list(Category = c("Executive", "Management", "Management",
"Management", "Professional", "Professional", "Professional",
"Para-Professional", "Para-Professional", "Para-Professional"
), Rank = c("Rank 1", "Rank 1", "Rank 2", "Rank 3", "Rank 1",
"Rank 2", "Rank 3", "Rank 1", "Rank 2", "Rank 3"), Jobs = c(" SMP - Sales, Marketing & Product Management",
" SMP - Sales, Marketing & Product Management", " ENS - Engineering & Science",
" FIN - Finance", " SMP - Sales, Marketing & Product Management",
" ENS - Engineering & Science", " FIN - Finance", " PSK - Production & Skilled Trades",
" ENS - Engineering & Science", " EGS - Energy Generation & Supply"
), N = c(3, 10, 3, 2, 54, 25, 5, 7, 2, 1)), row.names = c(NA,
-10L), class = "data.frame")
I wanted to merge cells of same category for category column which did with below code using flex table
mydata%>% flextable()%>% merge_v(j=~Category)
Now i want to highlight entire section border of data corresponding to each category, for example the data corresponding to Executive category should be highlighted with broader border similarly for other. I tried and could highlight only the category cells as below:
cl<-fp_border(color = "#00A8C8",width = 3)
mydata%>% flextable()%>% merge_v(j=~Category)%>%hline(j=~Category,border = cl)
I want similar border around all the sub-table corresponding to each category to differentiate better between categories in the table. How can do it with flextable only?
I think this should help.
The key here is to find a way to create a logical vector representing where lines should appear - this is what function break_position is doing.
library(flextable)
library(officer)
library(magrittr)
mydata <- structure(list(Category = c(
"Executive", "Management", "Management",
"Management", "Professional", "Professional", "Professional",
"Para-Professional", "Para-Professional", "Para-Professional"
), Rank = c(
"Rank 1", "Rank 1", "Rank 2", "Rank 3", "Rank 1",
"Rank 2", "Rank 3", "Rank 1", "Rank 2", "Rank 3"
), Jobs = c(
" SMP - Sales, Marketing & Product Management",
" SMP - Sales, Marketing & Product Management", " ENS - Engineering & Science",
" FIN - Finance", " SMP - Sales, Marketing & Product Management",
" ENS - Engineering & Science", " FIN - Finance", " PSK - Production & Skilled Trades",
" ENS - Engineering & Science", " EGS - Energy Generation & Supply"
), N = c(3, 10, 3, 2, 54, 25, 5, 7, 2, 1)), row.names = c(
NA,
-10L
), class = "data.frame")
cl <- fp_border(color = "#00A8C8", width = 3)
break_position <- function(x) {
z <- data.table::rleidv(x)
c(z[-length(z)] != z[-1], FALSE)
}
mydata %>%
flextable() %>%
merge_v(j = ~Category) %>%
hline(i = ~ break_position(Category), border = cl) %>%
fix_border_issues()
Related
I've got an xts df of the following format:
structure(c("May 2022", "Jun 2022", "Jul 2022", "Aug 2022", "Sep 2022",
"Oct 2022", "Nov 2022", "Dec 2022", " 3035.199", " 5500.000",
"11568.750", " 2510.000", " 6999.999", "21792.149", " 9750.000",
" 5624.999", " 2250.000", " 4136.975", " 6525.500", " 2771.875",
" 4637.500", "16273.499", " 6000.000", " 4494.649", " 2500.000",
" 0.000", " 3029.000", " 2803.500", " 0.000", "14481.250",
" 4374.998", " 4062.498", " 0.000", " 3075.000", " 6939.249",
" 1500.000", " 4183.157", " 5769.000", " 3559.500", " 3250.000"
), class = c("xts", "zoo"), index = structure(c(1651363200, 1654041600,
1656633600, 1659312000, 1661990400, 1664582400, 1667260800, 1669852800
), tzone = "UTC", tclass = "yearmon"), .Dim = c(8L, 5L), .Dimnames = list(
NULL, c("Month", "Cat 1", "Cat 2", "Cat 3", "Cat 4")))
I'm trying to create a stacked bar chart using the dygraphs library.
library(dygraphs)
library(lubridate)
today <- as.Date(Sys.time())
last_6 <- today+months(-6)
dygraph(df) %>%
dyAxis("y", label= "Total") %>%
dyRangeSelector(dateWindow = c(last_6, today)) %>%
dyMultiColumnGroup(c("Cat 1", "Cat 2", "Cat 3", "Cat 4"))
This produce a bar chart that looks like this:
I was wondering if anyone had any advice on how to make stacked bar chart? Many of the guides talk about bringing in plotters, but unfortunately they are not detailed enough for me to properly understand what is going on.
Adding this:
dyStackedBarGroup(c("Cat 1", "Cat 2", "Cat 3", "Cat 4"))
instead of the dyMultiColumnGroup line leads to a:
Error in cumulativeYval + points : non-numeric argument to binary operator
While Quinten's got a great working answer, I thought I would add to it.
You don't need separate ts objects; you do need the data to be numeric, though.
Assuming the data from dput is named df:
# drop the months column, change data to numeric, restore ts class
df <- df[ ,-1] %>% sapply(as.numeric) %>%
ts(start = c(2022, 5), deltat = 1/12)
Now you're ready to graph.
dygraph(df) %>%
dyAxis("y", label= "Total") %>%
dyRangeSelector(dateWindow = c(today() - months(6), today())) %>%
dyStackedBarGroup(dimnames(df)[[2]])
It seems that you should create each time series as ts, so I convert them each separately and after that combined them with cbind. Here is a reproducible example:
Cat1 <- ts(c(3035.199, 5500.000, 11568.750, 2510.000, 6999.999, 21792.149, 9750.000, 5624.999), start = c(2022, 5), end = c(2022, 12), frequency = 12)
Cat2 <- ts(c(2250.000, 4136.975, 6525.500, 2771.875, 4637.500, 16273.499, 6000.000, 4494.649), start = c(2022, 5), end = c(2022, 12), frequency = 12)
Cat3 <- ts(c(2500.000, 0.000, 3029.000, 2803.500, 0.000, 14481.250, 4374.998, 4062.498), start = c(2022, 5), end = c(2022, 12), frequency = 12)
Cat4 <- ts(c(0.000, 3075.000, 6939.249, 1500.000, 4183.157, 5769.000, 3559.500, 3250.000), start = c(2022, 5), end = c(2022, 12), frequency = 12)
df <- cbind(Cat1, Cat2, Cat3, Cat4)
library(dygraphs)
library(lubridate)
today <- as.Date(Sys.time())
last_6 <- today+months(-6)
dygraph(df) %>%
dyAxis("y", label= "Total") %>%
dyRangeSelector(dateWindow = c(last_6, today)) %>%
dyStackedBarGroup(name = c("Cat1", "Cat2", "Cat3", "Cat4"))
Created on 2022-12-22 with reprex v2.0.2
I am trying to make a descriptive statistics table in R and my code functions properly (producing a table) but despite the fact that I have no missing values in my dataset, the table outputs all of my values as missing. I am still a novice in R, so I do not have a broad enough knowledge base to troubleshoot.
My code:
data <- read_excel("Data.xlsx")
data$stage <-
factor(data$stage, levels=c(1,2,3,4,5,6,7),
labels =c("Stage 0", "Stage 1", "Stage 2", "Stage 3", "Unsure", "Unsure (Early Stage)", "Unsure (Late Stage"))
data$primary_language <-factor(data$primary_language, levels=c(1,2), labels = c("Spanish", "English"))
data$status_zipcode <- factor(data$status_zipcode, levels = (1:3), labels = c("Minority", "Majority", "Diverse"))
data$status_censusblock <- factor(data$status_censusblock, levels = c(0:2), labels = c("Minority", "Majority", "Diverse"))
data$self_identity <- factor(data$self_identity, levels = c(0:1), labels = c("Hispanic/Latina","White/Caucasian"))
data$subjective_identity <- factor(data$subjective_identity, levels = c(0,1,2,4), labels = c("Hispanic/Latina", "White/Caucasian", "Multiracial", "Asian"))
label (data$stage)<- "Stage at Diagnosis"
label(data$age) <- "Age"
label(data$primary_language) <- "Primary language"
label(data$status_zipcode)<- "Demographic Status in Zipcode Area"
label(data$status_censusblock)<- "Demographic Status in Census Block Group"
label(data$self_identity) <- "Self-Identified Racial/Ethnic Group"
label(data$subjective_identity)<- "Racial/Ethnic Group as Identified by Others"
table1(~ stage +age + primary_language + status_zipcode + status_censusblock + self_identity + subjective_identity| primary_language, data=data)
Table output:
enter image description here
Data set:
enter image description here
When I run the data set the values are there. It actually worked for me when I re-did the spacing:
data$stage <- factor(data$stage,
levels = c(1,2,3,4,5,6,7),
labels = c("Stage 0", "Stage 1", "Stage 2", "Stage 3", "Unsure", "Unsure (Early Stage)", "Unsure (Late Stage"))
When I did it exactly as you typed it came up with NA's, too. Try the first and see if it works for you that way. Then check the spacing for the others. That may be all it is.
I do end up with one NA on the stage column because 0 is not defined in your levels.
Edit: Ran the rest so here are some other points.
You end up with an NA in stage because one of your values is 0 but it's not defined with a label
You end up with NA's in language because you have a 0 and a 1 but you define it as 1, 2. So you'd need to change to the values. You end up with NA's in other portions because of the :
Change your code to this and you should have the values you need except that initial 0 in "stage":
data$stage <- factor(data$stage,
levels=c(1,2,3,4,5,6,7),
labels =c("Stage 0", "Stage 1", "Stage 2", "Stage 3", "Unsure", "Unsure (Early Stage)", "Unsure (Late Stage"))
data$primary_language <-factor(data$primary_language,
levels=c(0,1),
labels = c("Spanish", "English"))
data$status_zipcode <- factor(data$status_zipcode,
levels = c(0,1,2),
labels = c("Minority", "Majority", "Diverse"))
data$status_censusblock <- factor(data$status_censusblock,
levels = c(0,1,2),
labels = c("Minority", "Majority", "Diverse"))
data$self_identity <- factor(data$self_identity,
levels = c(0,1),
labels = c("Hispanic/Latina","White/Caucasian"))
data$subjective_identity <- factor(data$subjective_identity,
levels = c(0,1,2,4),
labels = c("Hispanic/Latina", "White/Caucasian", "Multiracial", "Asian"))
enter image description here
I am a new R user. I am trying to transform proportions to percentages on a forest plot I have generated using metaprop.
I have looked here Quick question about transforming proportions to percentages - forest function in R and at the link this post refers to.
mytransf = function(x)
(x) * 100
studies <- c("Study 1", "Study 2", "Study 3")
obs <- c(104, 101,79670)
denom <- c(1146, 2613, 147766)
m1 <- metaprop(obs, denom, studies, comb.random=FALSE,
byseparator=": ")
forest(m1, print.tau2 = FALSE, col.by="black", text.fixed = "Total number of events",
text.fixed.w = "Subtotal", rightcols = c("effect","ci"),
leftlabs=c("Study","Events","Total"),
xlim=c(0,0.7),
transf=mytransf)
The output is remains as proportions, not as percentages. I tried "atransf" as well. Is anyone able to please help me with this? This is what I can generate currently: picture of output
You can use the pscale option of metaprop:
library(meta)
studies <- c("Study 1", "Study 2", "Study 3")
obs <- c(104, 101,79670)
denom <- c(1146, 2613, 147766)
m1 <- metaprop(obs, denom, studies, comb.random=FALSE,
byseparator=": ",
pscale=100)
forest(m1, print.tau2 = FALSE, col.by="black",
text.fixed = "Total number of events",
text.fixed.w = "Subtotal",
rightlabs = c("Prop. (%)","[95% CI]"),
leftlabs=c("Study","Events","Total"),
xlim=c(0,70))
I'm just starting to learn tidyverse approaches and currently I am looking how to solve the following problem:
in the following dataframe:
i'm looking to replace all the "sensitivity level" cells with
the name of the channel found 1 row up, in the 2nd column, combined with the word "sensitivity"
after successfully doing so I want to run it through the other tidyverse solution to the other part of my problem that is posted here
this is the code to generate an exact replica of my dataframe:
df <- structure(list(Parameter = c("Trigger level (mV)", "Smart Triggered!",
"FL Red Maximum > 9", "CytoUSB Block size", "Instrument", "Beam width",
"Core speed", "User Comments", "Measurement date", "Measurement duration",
"Flow rate (µL/sec)", "Channel 1", "Channel 2", "Channel 3",
"Channel 4", " sensitivity level", "Channel 5", " sensitivity level",
"Channel 6", " sensitivity level", "Channel 7", " sensitivity level",
"Total number of particles", "Smart triggered number of particles",
"Concentration (part/µL)", "Volume (µL)"), Value = c(" 10.49",
"", "", " Auto (maxTimeOut 5s )", " EMSO II", " 5", " 2.2", " ",
" 15-Apr-16 5:12:21 AM", " 36", " 2.06", " Trigger1", " FWS L",
" FWS R", " SWS TRIGGER", " 55", " FL Yellow", " 100", " FL Orange",
" 100", " FL Red", " 100", " 1344", " 1344", " 2062.71614038604",
" 46.7626146474752")), class = "data.frame", row.names = c(NA,
-26L))
This should give what you want:
library(dplyr)
df %>%
mutate(Parameter = ifelse(
Parameter == " sensitivity level",
paste(lag(Value), "sensitivity"),
Parameter
))
I have what I know is going to be an impossibly easy question. I am showing an average number of days by month using a bar chart, using the following example:
dat <- structure(list(Days = c("217.00", "120.00", "180.00", "183.00",
"187.00", "192.00"), Amt = c("1,786.84", "1,996.53",
"1,943.23", "321.30", "2,957.03", "1,124.32"), Month = c(201309L,
201309L, 201309L, 201310L, 201309L, 201309L), Vendor = c("Comp A",
"Comp A", "Comp A", "Comp A", "Comp A",
"Comp A"), Type = c("Full", "Full",
"Self", "Self", "Self", "Self"
), ProjectName = c("Rpt 8",
"Rpt 8", "Rpt 8",
"Rpt 8", "Rpt 8",
"Rpt 8")), .Names = c("Days",
"Amt", "Month", "Vendor", "Type", "ProjectName"
), row.names = c("558", "561", "860", "1157", "1179", "1221"), class =
"data.frame")
ggplot(dat, aes(x=as.character(Month),y=as.numeric(Days),fill=Type))+
stat_summary(fun.y='mean', geom = 'bar')+
ggtitle('Rpt 8')+
xlab('Month')+
ylab('Average Days')+
geom_text(stat='bin',aes(y=100, label=paste('Avg:\n',..count..)))
Right now my labels are showing counts & showing up where ever i designate y.
I want to:
place labels at the top of the bars.
display the average, not the count.
I've pretty thoroughly - and unsuccessfully - tried most of the other solutions on SO & elsewhere.
Just got it:
means<-ddply(dat,.(Vendor,Type,Month), summarise, avg=mean(as.numeric(Days)))
ggplot(dat, aes(x=as.character(Month),y=as.numeric(Days),fill=Type))+
stat_summary(fun.y='mean', geom = 'bar')+
geom_text(data = means, stat='identity',
aes(y=avg+7, label=round(avg,0),group=Type))
i realize there is code nearly identical to this sitting elsewhere. my error came in placing the round's 0 outside the correct closing parenthesis -- thus moving all my labels to 0 on x axis... DUH!