Gant style diagram (weekdays and hours on y-axis) - r

I'm new to R and would like to create a Gantt-style diagram where I can see how long jobs on a SQL Server run over the week.
So my y-axis would be filled with job names and my x-axis has a (in and out zoom-able) scale with Weekdays, hours, minutes and seconds.
My dataset can still be configured. I can transform the start and end times to every format since I have them as DateTimes.
This is how the data looks like:
structure(list(JobName = structure(c(1L, 1L, 1L, 2L, 2L), .Label = c("ATLAS_Admin_DeployClientDatabase", "ATLAS_Admin_ParseMasterCubeDatabase"), class = "factor"), RunDateTime = structure(c(1L,3L, 5L, 2L, 4L), .Label = c("2016-11-10T15:39:36.0000000", "2016-11-16T11:30:20.0000000","2016-11-16T11:37:25.0000000", "2016-11-16T15:51:56.0000000","2016-11-16T15:52:59.0000000"), class = "factor"), StartWeekday = structure(c(1L,2L, 2L, 2L, 2L), .Label = c("Thursday", "Wednesday"), class = "factor"), StartTime = structure(c(3L, 2L, 5L, 1L, 4L), .Label = c("1899-12-30T11:30:20.0000000", "1899-12-30T11:37:25.0000000", "1899-12-30T15:39:36.0000000", "1899-12-30T15:51:56.0000000", "1899-12-30T15:52:59.0000000" ), class = "factor"), EndRunDateTime = structure(c(1L, 3L, 5L, 2L, 4L), .Label = c("2016-11-10T16:02:39.0000000", "2016-11-16T11:31:24.0000000", "2016-11-16T12:03:10.0000000", "2016-11-16T15:52:57.0000000", "2016-11-16T16:19:06.0000000"), class = "factor"), EndWeekday = structure(c(1L, 2L, 2L, 2L, 2L), .Label = c("Thursday", "Wednesday"), class = "factor"), EndTime = structure(c(4L, 2L, 5L, 1L, 3L), .Label = c("1899-12-30T11:31:24.0000000", "1899-12-30T12:03:10.0000000", "1899-12-30T15:52:57.0000000", "1899-12-30T16:02:39.0000000", "1899-12-30T16:19:06.0000000" ), class = "factor")), .Names = c("JobName", "RunDateTime","StartWeekday", "StartTime", "EndRunDateTime", "EndWeekday","EndTime"), row.names = c(NA, 5L), class = "data.frame")
The names are linked over the JobID.
In the end it should look like this: Gantt-Diagram with weekdays/times instead of dates
I'm not limited to any library, yet ggplot is already installed.

Related

cut.default error in heatmap generation R

I want to generate a heatmap from a 8*6 dataframe. The last row in the dataframe has the information to annotate the columns. Structure of the dataframe is as follows:
heatmap_try <-structure(list(BGC0000041 = structure(c(1L, 2L, 1L, 1L, 1L, 3L
), .Label = c("0", "0.447458977", "a"), class = "factor"), BGC0000128 = structure(c(1L,
1L, 1L, 3L, 2L, 4L), .Label = c("0", "1.785875195", "4.093659107",
"a"), class = "factor"), BGC0000287 = structure(c(1L, 1L, 1L,
3L, 2L, 4L), .Label = c("0", "1.785875195", "4.456229186", "b"
), class = "factor"), BGC0000294 = structure(c(3L, 1L, 2L, 4L,
1L, 5L), .Label = c("0", "2.035046947", "3.230553742", "3.286304185",
"b"), class = "factor"), BGC0000295 = structure(c(1L, 1L, 1L,
2L, 1L, 3L), .Label = c("0", "2.286304185", "c"), class = "factor"),
BGC0000308 = structure(c(4L, 2L, 3L, 5L, 1L, 6L), .Label = c("6.277728291",
"6.313707588", "6.607936616", "6.622871165", "6.64385619",
"c"), class = "factor"), BGC0000323 = structure(c(1L, 2L,
1L, 1L, 1L, 3L), .Label = c("0", "0.447458977", "c"), class = "factor"),
BGC0000328 = structure(c(1L, 2L, 1L, 1L, 1L, 3L), .Label = c("0",
"0.447458977", "c"), class = "factor")), class = "data.frame", row.names = c("Gut",
"Oral", "Anterior_nares", "Retroauricular_crease", "Vagina",
"AL"))
My code for heatmap generation is as follows (I am using pheatmap library):
library(pheatmap)
heatmap_data1 <- heatmap_try[ c(1:5), c(1:8) ]
anotation_data <- as.data.frame(t(heatmap_try[6, ]))
row.names(anotation_data) <- colnames(heatmap_data1)
pheatmap(heatmap_data1, annotation_col = anotation_data, color = colorRampPalette(c("white","blue"))(n=100),cellwidth = 40,cellheight = 6,fontsize_row = 5,cluster_rows = F,cluster_cols = F)
However, I am getting the following error:
Error in cut.default(x, breaks = breaks, include.lowest = T) :
'x' must be numeric
What I am doing wrong?
Thanks!
This is because the columns of heatmap_data1 are factors, they need to be numeric. One way to convert is with:
heatmap_data1_num <- as.data.frame(lapply(heatmap_data1,
function(x) as.numeric(as.character(x))))
# then as before
pheatmap(heatmap_data1_num, annotation_col = anotation_data, color = colorRampPalette(c("white","blue"))(n=100),cellwidth = 40,cellheight = 6,fontsize_row = 5,cluster_rows = F,cluster_cols = F)

How to change box color in transition plot (Gmisc package)

I want to make transition plot with three columns. I use Gmisc package but not the transitionPlot function since it does not enable me include third column. Therefore, I used the code below. My problem is that my result transition table is dark green and there is box shadow. Could you please help me how I can change the color and get rid of the shadow? Thank you. This is my first inquiry, if there is something wrong, sorry.
Here a dataframe sample (I took this from stackoverflow, since I do not have the data):
x <- structure(list(Obs = 1:13, Seq.1 = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L), .Label = c("a", "b", "c" ), class = "factor"), Seq.2 = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("c", "d"), class = "factor"), Seq.3 = structure(c(1L, 1L, 1L, 2L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("", "d", "e"), class = "factor"), Seq.4 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L), .Label = c("", "f"), class = "factor")), .Names = c("Obs", "Seq.1", "Seq.2", "Seq.3", "Seq.4"), class = "data.frame", row.names = c(NA, -13L))
library(Gmisc)
library(dplyr)
transitions <- table(x$Seq.1,x$Seq.2) %>%
getRefClass("Transition")$new(label=c("1st Iteration", "2nd Iteration"))
transitions$box_width = 0.25;
transitions$box_label_cex = 0.7;
transitions$arrow_type = "simple";
transitions$arrow_rez = 300;
table(x$Seq.2,x$Seq.3) %>% transitions$addTransitions(label = '3rd Iteration')
transitions$render()

Plot a nucleotide chain in R

I am interested in plotting this sample figure in R. Sample figure was generated in Illustrator.
Essentially, my data is structured as such:
> dput(data)
structure(list(FirstPos = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("pos1",
"pos2"), class = "factor"), SecondPos = structure(c(1L, 1L, 1L,
2L, 2L, 2L), .Label = c("pos2", "pos3"), class = "factor"), FirstPosseq = structure(c(1L,
1L, 1L, 2L, 3L, 3L), .Label = c("A", "C", "T"), class = "factor"),
SecondPosseq = structure(c(2L, 4L, 1L, 1L, 3L, 4L), .Label = c("A",
"C", "G", "T"), class = "factor"), Count = c(10L, 100L, 1L,
100L, 100L, 100L)), .Names = c("FirstPos", "SecondPos", "FirstPosseq",
"SecondPosseq", "Count"), class = "data.frame", row.names = c(NA,
-6L))
This is a list of positions (original position and partner position). For each row, the "count" column signifies how likely the 2 nucleotides co-occur. I want a way to display that probability and the order (on the x-axis). In the example, I tried varying the line thickness based on the 'Count'.
Looking through the ggplot2 library, I couldn't find figures like this and was hoping to get your advice on potential packages/ways I could use.
Thank you!
One possible solution is to use the igraph package. Below is a basic example of how to get started with your data set.
# Assign your data to variable 'dat'.
dat = structure(list(FirstPos = structure(c(1L, 1L, 1L, 2L, 2L, 2L),
.Label = c("pos1", "pos2"), class = "factor"),
SecondPos = structure(c(1L, 1L, 1L, 2L, 2L, 2L),
.Label = c("pos2", "pos3"), class = "factor"),
FirstPosseq = structure(c(1L, 1L, 1L, 2L, 3L, 3L),
.Label = c("A", "C", "T"), class = "factor"),
SecondPosseq = structure(c(2L, 4L, 1L, 1L, 3L, 4L),
.Label = c("A", "C", "G", "T"), class = "factor"),
Count = c(10L, 100L, 1L, 100L, 100L, 100L)),
.Names = c("FirstPos", "SecondPos", "FirstPosseq",
"SecondPosseq", "Count"), class = "data.frame",
row.names = c(NA, -6L))
library(igraph)
# Create unique names/ids for each vertex in the graph.
dat$node1 = paste(dat$FirstPos, dat$FirstPosseq, sep="_")
dat$node2 = paste(dat$SecondPos, dat$SecondPosseq, sep="_")
# Use last two column of data as an edge list matrix, create graph.
g = graph_from_edgelist(as.matrix(dat[, c(6, 7)]))
# Add edge weights to graph.
E(g)$weight = dat$Count
# Plot using 'layout_as_tree' to control layout.
plot(g, layout=layout_as_tree(g, root=1), edge.width=log10(E(g)$weight + 1) * 5,
vertex.size=30, vertex.color="white", edge.color="black",
edge.arrow.mode=0L, vertex.label.color="black")

Select observations in R based on maximum number listed in a column

I hope I've done this correctly! I have two data frames:
teachers = structure(list(Teacher = c(123L, 123L, 123L, 123L, 124L),
tStudents = c(3L, 3L, 4L, 3L, 4L), Term = c(1801L, 1802L, 1801L, 1803L, 1802L),
Course = structure(c(5L, 6L, 7L, 6L, 8L), .Label = c("ENGG",
"ENGG2", "LITT", "LITT2", "MATH", "MATH2", "PHYS", "SCIE"
), class = "factor")), .Names = c("Teacher", "tStudents", "Term", "Course"), row.names = c(NA, 5L), class = "data.frame")
enrols = structure(list(UniqueStudent = structure(c(3L, 2L, 1L, 5L, 4L),
.Label = c("1801-ENGG-N1-abcd1#abc.edu.au", "1801-MATH-C1-abcd1#abc.edu.au","1801-PHYS-L1-abcd1#abc.edu.au", "1802-MATH2-G1-abcd1#abc.edu.au", "1802-SCIE-K2-abcd1#abc.edu.au"), class = "factor"), Term = c(1801L,1801L, 1801L, 1802L, 1802L), Student.Email.Addresses = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "abcd1#abc.edu.au", class = "factor"), ID = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "s12344", class = "factor"),
Gender.Description = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "M", class = "factor"),
Age = c(12L, 12L, 12L, 12L, 12L), Program.Short.Description = structure(c(1L,
1L, 1L, 1L, 1L), .Label = "LSC1", class = "factor"), Term.CC.CN = structure(c(3L,
2L, 1L, 5L, 4L), .Label = c("1801-ENGG-N1", "1801-MATH-C1",
"1801-PHYS-L1", "1802-MATH2-G1", "1802-SCIE-K2"), class = "factor"),
Course.Code = structure(c(4L, 2L, 1L, 5L, 3L), .Label = c("ENGG",
"MATH", "MATH2", "PHYS", "SCIE"), class = "factor"), Class.Number = structure(c(4L,
1L, 5L, 3L, 2L), .Label = c("C1", "G1", "K2", "L1", "N1"), class = "factor"),
Teacher = c(123L, 123L, 125L, 124L, 123L)), .Names = c("UniqueStudent", "Term", "Student.Email.Addresses", "ID", "Gender.Description", "Age", "Program.Short.Description", "Term.CC.CN", "Course.Code", "Class.Number", "Teacher"), row.names = c(NA, 5L), class = "data.frame")
teachers$tStudents lists the maximum number of students allowed to be allocated to a teacher per Term and Course. I've also pre-merged the Course enrolments in the "enrols" data to list the Teachers for each course.
So, what I need to do is create class lists from the enrols data using the teachers data by c("teacher", "Term", "Course") but my class lists can only select a maximum value of students based on the number listed in teachers$tStudents. Ideally, I'd also like to select a representative distribution of students so that the new class lists have both genders, different ages and are from different Program.Short.Description.
I've tried merging in different ways in dplyr and can create full lists with all students but haven't been able to use the teachers$tStudents column to limit the number of observations to select. Is this possible?

Special characters in a column: mess in the table

I have a problem with special characters in a column of a table.
Here is an example of the data:
structure(list(shipType = structure(c(1L, 3L, 1L, 2L, 4L), .Label = c("CARGO",
"FISHING", "TOWING_LONG_WIDE", "UNKNOWN"), class = "factor"),
shipCargo = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "UNDEFINED", class = "factor"),
destination = structure(c(3L, 1L, 2L, 4L, 5L), .Label = c("\\KORSOR ;.,NA,.\\",
"LEHTMA", "RIGA", "TALLIN", "VYBORG"), class = "factor"),
eta = structure(c(1L, 2L, 5L, 3L, 4L), .Label = c("01/01 00:00 UTC",
"01/01 09:00 UTC", "24/12 16:00 UTC", "26/12 07:00 UTC",
"30/12 16:00 UTC"), class = "factor"), imo = structure(c(3L,
5L, 1L, 4L, 2L), .Label = c("7101891", "7406318", "9066045",
"9158185", "Russia"), class = "factor"), callsign = structure(c(5L,
1L, 2L, 3L, 4L), .Label = c("12", "UALB", "UBYK8", "UFPC",
"UICC"), class = "factor"), country = structure(c(2L, 1L,
2L, 2L, 2L), .Label = c("2014-12-29", "Russia"), class = "factor"),
month = c(12L, 1L, 12L, 12L, 12L), date = structure(c(2L,
1L, 2L, 2L, 2L), .Label = c("", "2014-12-29"), class = "factor"),
week = c(1L, NA, 1L, 1L, 1L), X = c(NA, NA, NA, NA, NA)), .Names = c("shipType",
"shipCargo", "destination", "eta", "imo", "callsign", "country",
"month", "date", "week", "X"), class = "data.frame", row.names = c(NA,
-5L))
As you can see on the second row, there is a problem in the column "destination" when reading the file with the following code
data <- read.table(file, header=T, fill=T, sep=",")
I have tried different things, such as: exporting with quotes and without headers
data <- read.table(file, sep=",", fill=T, head=F, quote="")
and then removing the first line (the actual headers that are in the table...) and adding one more time these headers
data <- data[-1,]
colnames(data)<-c( "shipType", "shipCargo","destination","eta","imo","callsign", "country","month","date","week")
It looks better, but there are a lot of special characters and it will be time consuming / source of errors (I have lot of tables..) to edit.
Is there a way to avoid the columns to be messed up when importing the file?
Thank you!

Resources