Draw a graph in R with header elaborate on two columns - r

I have a table with header expanded on two columns. How to draw a 3D graph on this table OR what would be a way to draw a graph on tables having elaborated headers. Kindly suggest me alternate ways to achieve this (if any)
Crime Table:
year
2014 2015 2016
Reported Detected Reported Detected Reported Detected
Murder 221 208 178 172 26 20
Murder(Gain) 20 16 11 9 1 1
Dacoity 51 45 44 36 5 1
Robbery 538 316 351 201 23 10
Chain Snatching 528 394 342 229 23 0
Code:
library(tables)
#CLASS 1 CRIMES 2014
c14 <- structure(list(`Reported` = c(221, 20, 51,
538, 528), `Detected` = c(208, 16, 45, 316, 394)), .Names = c("Reported",
"Detected"), row.names = c("Murder", "Murder(Gain)", "Dacoity", "Robbery", "Chain Snatching"), class = "data.frame")
c14
#CLASS 1 CRIMES 2015
c15 <- structure(list(`Reported` = c(178, 11, 44,
351, 342), `Detected` = c(172, 9,
36, 201, 229)), .Names = c("Reported",
"Detected"), row.names = c("Murder", "Murder(Gain)", "Dacoity",
"Robbery", "Chain Snatching"), class = "data.frame")
c15
#CLASS 1 CRIMES 31-01-2016
c16 <- structure(list(`Reported` = c(26, 1, 5,
23, 23), `Detected` = c(20, 1,
1, 10, 0)), .Names = c("Reported",
"Detected"), row.names = c("Murder", "Murder(Gain)", "Dacoity",
"Robbery", "Chain Snatching"), class = "data.frame")
c16
# rbind with rownames as a column
st <- rbind(
data.frame(c14, year = '2014', what = factor(rownames(c14), levels = rownames(c14)),
row.names= NULL, check.names = FALSE),
data.frame(c15,year = '2015',what = factor(rownames(c15), levels = rownames(c15)),
row.names = NULL,check.names = FALSE),
data.frame(c16,year = '2016',what = factor(rownames(c16), levels = rownames(c16)),
row.names = NULL,check.names = FALSE)
)
crimetable <- tabular(Heading()*what ~ year*(`Reported` +`Detected`)*Heading()*(identity),data=st)
crimetable

As I hate 3D plots for 3-way tables and I like ggplot2, I suggest this:
Gather your data into "long" format:
library(tidyr)
st_long = gather(st, type, count, -c(year, what))
head(st_long, 3)
# year what type count
# 1 2014 Murder Reported 221
# 2 2014 Murder(Gain) Reported 20
# 3 2014 Dacoity Reported 51
As you can see, both Detected and Reported columns are now included in the same column called type. This is useful for ggplot2, as it can easily create facets. Facets are separate elements within the plot that share the same aesthetic components but work with on different groups of data:
library(ggplot2)
ggplot(st_long, aes(year, count, group = what, color = what)) +
geom_line() +
facet_wrap(~ type)
(I am not saying that line plot is the only/best plot here, but it is often used when comparing frequencies across different time-points.)

Related

How to build stacked bar chart

How can I build stacked bar chart from this data? Where years will be x axis while OLD and NEW differentiated via colours in bars.
However I want to avoid manual coding and automatize the process.
structure(list(`1998` = c(11, 826), `2000` = c(217, 620), `2007` = c(625,
212), `2012` = c(836, 1)), class = "data.frame", row.names = c("NEW",
"OLD"))
1998 2000 2007 2012
NEW 11 217 625 836
OLD 826 620 212 1
Expected output:
Looking for something like this?
library(tidyverse)
df %>%
# rownames to column
mutate(type = rownames(.)) %>%
# convert to long data
pivot_longer(-"type") %>%
# plot
ggplot() +
geom_col(aes(x = name, y = value, fill = type))

Survival function with dropdown menu for each gene

I am trying to make a shiny app where you can select different miRNA in my input then plot the survival curve using ggsurvplot. There is something wrong with the functions within fitSurv, but I am not sure where I am doing it wrong.
library(dplyr)
require(survminer)
library(tidyverse)
require(reshape2)
library(shiny)
library(tidyr)
require(survival)
example data:
df.miRNA.cpm <- structure(list(`86` = c(5.57979757386892, 17.0240095264258, 4.28380151026145,
13.0457611762755, 12.5531123449841), `175` = c(5.21619202802748,
15.2849097474841, 2.46719979911461, 10.879496005461, 9.66416497290915
), `217` = c(5.42796072966512, 17.1413407297933, 5.15230233060323,
12.2646127361351, 12.1031024927547), `394` = c(-1.1390337316217,
15.1021660424984, 4.63168157763046, 11.1299079134792, 9.55572588729967
), `444` = c(5.06134249676025, 14.5442494311861, -0.399445049232868,
7.45775961504073, 9.92629675808998)), row.names = c("hsa_let_7a_3p",
"hsa_let_7a_5p", "hsa_let_7b_3p", "hsa_let_7b_5p", "hsa_let_7c_5p"
), class = "data.frame")
df.miRNA.cpm$miRNA <- rownames(df.miRNA.cpm)
ss.survival.shiny.miRNA.miRNA <- structure(list(ID = c("86", "175", "217", "394", "444"), TimeDiff = c(71.0416666666667,
601.958333333333, 1130, 1393, 117.041666666667), Status = c(1L,
1L, 0L, 0L, 1L)), row.names = c(NA, 5L), class = "data.frame")
Joint the two example data frames:
data_prep.miRNA <- df.miRNA.cpm %>%
tidyr::pivot_longer(-miRNA, names_to = "ID") %>%
left_join(ss.survival.shiny.miRNA.miRNA)
Example of the joined data:
> data_prep.miRNA
# A tibble: 153,033 x 5
miRNA ID value TimeDiff Status
<chr> <chr> <dbl> <dbl> <int>
1 hsa_let_7a_3p 86 5.58 71.0 1
2 hsa_let_7a_3p 175 5.22 602. 1
3 hsa_let_7a_3p 217 5.43 1130 0
4 hsa_let_7a_3p 394 -1.14 1393 0
5 hsa_let_7a_3p 444 5.06 117. 1
6 hsa_let_7a_3p 618 4.37 1508 0
7 hsa_let_7a_3p 640 2.46 1409 0
8 hsa_let_7a_3p 829 0.435 919. 0
9 hsa_let_7a_3p 851 -1.36 976. 0
10 hsa_let_7a_3p 998 3.87 1196. 0
# … with 153,023 more rows
For a selected MicroRNA this works:
fitSurv <- survfit(Surv(data$TimeDiff, data$Status) ~ paste(cut(value , quantile(value , probs = c(0, 0.8)), include.lowest=T)), data = data_prep.miRNA[grep("hsa_let_7a_3p",data_prep.miRNA$miRNA),])
Shiny:
ui.miRNA <- fluidPage(
selectInput("MicroRNA", "miRNA", choices = unique(data_prep.miRNA$miRNA)),
plotOutput("myplot"))
server <- function(input, output, session) {
data_selected <- reactive({
filter(data_prep.miRNA, miRNA %in% input$MicroRNA)
})
output$myplot <- renderPlot({
fitSurv <- survfit(Surv("TimeDiff", "Status") ~ paste(cut("value" , quantile("value" , probs = c(0, 0.8)), include.lowest=T)), data = data_selected)
ggsurvplot(fitSurv ,title="", xlab="Time (Yrs)", ylab="Survival prbability",
font.main = 8,
font.x = 8,
font.y = 8,
font.tickslab = 8,
font.legend=8,
pval.size = 3,
pval.coord = c(1000,1),
size=0.4,
legend = "right",
censor.size=2,
break.time.by = 365,
pval =T,#"p=0.003",#"p=0.41",
#xscale=365,
#palette = c("#E7B800", "#2E9FDF"),
#ggtheme = theme_bw(),
risk.table = F,
xscale=365.25,
xlim=c(0,7*365))
})
}
shinyApp(ui.miRNA, server)
There are several mistakes in this statement:
fitSurv <-
survfit(Surv("TimeDiff", "Status") ~ paste(cut("value", quantile("value", probs = c(0, 0.8)), include.lowest=T)),
data = data_selected)
First, data_selected is a reactive conductor, not a dataframe. If you want the dataframe returned by this reactive conductor, you have to use parentheses: data_selected().
Next, you must not quote the variables: TimeDiff and not "TimeDiff", etc.
The paste command is useless.
Your cut produces only one category and the NA category. To get two intervals as categories, use probs = c(0, 0.8, 1) in quantile.
Finally it is not a good idea to use T for TRUE, because T can be set to any R object, while TRUE is a reserved work.
To conclude, here is the corrected code:
fitSurv <-
survfit(Surv(TimeDiff, Status) ~ cut(value, quantile(value, probs = c(0, 0.8, 1)), include.lowest=TRUE),
data = data_selected())

Plotting subset os subset in R

I have a data.frame like following:
files
Total 1000
Subset1 587
Subset2 123
I would like to represent the above data frame in a such way that of 123 files is a subset of 587 which itself subset of 1000. When I use pie or bar graphs, it is misleading.
My sincere apologies if my question is very amateurish. Kindly guide me how can represent the above data in R plots.
Perhaps something like this:
df = data.frame(files=c(1000,587,123),row.names = c('total','subset1','subset2'))
library(VennDiagram)
draw.triple.venn(area1 = df$files[1], area2 = df$files[2], area3 = df$files[3],
n12 = 587, n23 = 123, n13 = 123, n123 = 123,
category = c("Total", "Subset1", "Subset2"),
lty = "blank", fill = c("skyblue", "pink1", "mediumorchid"),
cat.pos = 0,cat.dist = c(-0.02,-0.05,-0.02))
Result:
You could do it as follows:
df$files[1] <- df$files[1] - sum(df$files[-1])
pie(df$files, df$sets)
The result:
Data:
df <- read.table(text=" sets files
Total 1000
Subset1 587
Subset2 123 ", header=TRUE)

R: plot multiple curves vs one var but for 4 factors

I have a DF that looks like:
id app vac dac
1: 1 1000802 579 455
2: 1 1000803 1284 918
3: 1 1000807 68 66
4: 1 1000809 1470 903
5: 2 1000802 407 188
6: 2 1000803 365 364
7: 2 1000807 938 116
8: 2 1000809 699 570
I need to plot vac and dac for each app on same canvas as a function of id. I know how to do it for only one app by using melt and bulk-plot with ggplot. But I'm stuck how to do it for arbitrary number of factors/levels.
In this example there will be total 8 curves for 4 app. Any thoughts?
Here's the data frame for tests. Thank you!!
df = structure(list(id = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), app = c(1000802,
1000803, 1000807, 1000809, 1000802, 1000803, 1000807, 1000809
), vac = c(579, 1284, 68, 1470, 407, 365, 938, 699), dac = c(455,
918, 66, 903, 188, 364, 116, 570)), .Names = c("id", "app", "vac",
"dac"), class = c("data.table", "data.frame"), row.names = c(NA,
-8L))
Edit: some clarification on axes,
x axis = id, y axis = values of vac and dac for each of 4 app factors
It is a bit unclear what you are looking for, but if you are looking for a line connecting the values of vac and dac, here is a solution using dplyr and tidyr.
First, gather the vac and dac columns (this is similar to reshape2::melt but with a syntax I find easier to follow). Then, set the variable (which has "vac" and "dac") as your x-locations, the value (from the old vac and dac columns) as your y and then map app and id to aesthetics (here, color and linetype). Set the group to ensure that it connects the right pairs of points, and add geom_line:
df %>%
gather(variable, value, vac, dac) %>%
ggplot(aes(x = variable
, y = value
, color = factor(app)
, linetype = factor(id)
, group = paste(app, id))) +
geom_line()
gives
Given the question edit, you can change axes like so:
df %>%
gather(variable, value, vac, dac) %>%
ggplot(aes(x = id
, y = value
, color = factor(app)
, linetype = variable
, group = paste(app, variable))) +
geom_line()
gives
I not sure, I understood your question but I would do something like
ggplot(df,aes(vac,app,group=app)) + geom_point(aes(color=factor(app)))

Creating a colored scatter plot

I'm taking this data vis class in which the professor has us basically copying and pasting code instead of teaching us anything. I'm trying to figure out how to create a scatter plot which illustrates the strike rate and civilian casualties of drone warfare.
The problem I'm having is how to use a variable from the data to dictate the color of a data point. I want to minimally use the "status" (dead/2, alive/1) to color the points.
It'd be ideal if I could figure out how to color the points based upon the drone target's nationality, too, since I have data for that. Anyway, this is what I have so far. It creates the points, but not the colors. I'd like to know how to create the colors.
symbols(killVStarget$name, killVStarget$strikes,
circles=sqrt(killVStarget$casualties),
col=ifelse(killVStarget$status==2, "red", "black"), cex=0.15)
I imported the data from a .csv file. Here are the first 10 entries copied from excel:
name nationality status strikes casualties
baitullah mehsud pakistani 2 7 164
qari hussain pakistani 2 6 128
abu ubaidah al masri pakistani 2 3 120
mullah sangeen zadran pakistani 2 3 108
ayman al-zawahiri pakistani 1 2 105
sirajudin haqqani pakistani 1 5 82
hakimullah mehsud pakistani 2 5 68
sadiq noor pakistani 2 4 57
said al-shihri yemeni 2 4 57
df <- data.frame(name = c("baitullah mehsud pakistani", "qari hussain pakistani", "abu ubaidah al masri pakistani", "mullah sangeen zadran pakistani",
"ayman al-zawahiri pakistani", "sirajudin haqqani pakistani", "hakimullah mehsud pakistani", "sadiq noor pakistani",
"said al-shihri yemeni "), strikes = c(7, 6, 3, 3, 2, 5, 5, 4, 4), status = c(2, 2, 2, 2, 1, 1, 2, 2, 2),
casualities = c(164, 128, 120, 108, 105, 82, 68, 57, 57)
)
library(ggplot2)
ggplot(aes(x = name, y = strikes, size = casualities, color = factor(status)), data = df) + geom_point()
ggplot(aes(x = strikes, y = name, size = casualities, color = factor(status)), data = df) + geom_point()

Resources