Visualising the distribution for different subgroups - r

I'm using "d.pizza" data. There is variable called "delivery_min" which is delivery time (in minutes) and there is variable called "area" which can be one of three areas (Camden, Westminster and Brent).
I want to draw a density plot that visualises the distribution of delivery time for these three areas.
I tried
plot.ecdf(pizza_d$delivery_min)
this code works, but how can I do it for each area?
head(d.pizza)=
index date week weekday area count rabate price operator driver delivery_min
1 1 1 01.03.2014 9 6 Camden 5 TRUE 65.655 Rhonda Taylor 20.0
2 2 2 01.03.2014 9 6 Westminster 2 FALSE 26.980 Rhonda Butcher 19.6
3 3 3 01.03.2014 9 6 Westminster 3 FALSE 40.970 Allanah Butcher 17.8
4 4 4 01.03.2014 9 6 Brent 2 FALSE 25.980 Allanah Taylor 37.3
5 5 5 01.03.2014 9 6 Brent 5 TRUE 57.555 Rhonda Carter 21.8
6 6 6 01.03.2014 9 6 Camden 1 FALSE 13.990 Allanah Taylor 48.7
temperature wine_ordered wine_delivered wrongpizza quality
1 53.0 0 0 FALSE medium
2 56.4 0 0 FALSE high
3 36.5 0 0 FALSE <NA>
4 NA 0 0 FALSE <NA>
5 50.0 0 0 FALSE medium
6 27.0 0 0 FALSE low

You could do:
library(DescTools)
data(d.pizza)
plot.ecdf(subset(d.pizza, area == "Camden")$delivery_min,
col = "red", main = "ECDF for pizza deliveries")
plot.ecdf(subset(d.pizza, area == "Westminster")$delivery_min,
add = TRUE, col = "blue")
plot.ecdf(subset(d.pizza, area == "Brent")$delivery_min,
add = TRUE, col = "green")

library(DescTools)
data(d.pizza)
summary(d.pizza$delivery_min)
plot(NULL,ylab='',xlab='', xlim=c(5,66), ylim=0:1)
for(A in 1:3) {
plot.ecdf(d.pizza$delivery_min[d.pizza$area == levels(d.pizza$area)[A]],
pch=20, col=A+1, add=T)
}
legend("bottomright", legend=levels(d.pizza$area),
bty='n', pch=20, col=2:4)

I'd recommend the ggplot2 library for data visualization in R. Here's some code using ggplot2 that can create a density plot with the three groups overlaid:
library(ggplot2)
# make example dataframe
d.pizza <- data.frame(delivery_min = rnorm(n=30), area = rep(c("Camden", "Westminster", "Brent"), 10))
# plot data in ggplot2
ggplot(d.pizza, aes(x = delivery_min, fill = area, color = area)) + geom_density(alpha = 0.5)
If you want a histogram, that can be done too:
ggplot(d.pizza, aes(x = delivery_min, fill = area, color = area)) + geom_histogram(alpha = 0.5, position = 'identity')

Related

Labels missing in barplot

I have a dataset that I would like to visualize with barplot() . My question is, why do some labels not show when appended with text() and how does one solve this issue?
For example this is my table
table(test$Freq)
2 3 4 5 6 7 8 9 10 11 12 14 16 44
6338 2544 1072 394 102 29 11 9 5 2 3 1 1 1
And the following barplot will miss the first label:
barplot(table(test$Freq))
text(x = xx, y = test$Freq, label = test$Freq, pos = 3, cex = 0.8, col = "red")
It looks like the text is being plotted outside of your graph.
Try adjusting the ylim value when you call barplot. This should solve your problem.

Plotting tetrahedron with data points in R

I'm in a little bit of pain at the moment.
I'm looking for a way to plot compositional data.(https://en.wikipedia.org/wiki/Compositional_data). I have four categories so data must be representable in a 3d simplex ( since one category is always 1 minus the sum of others).
So I have to plot a tetrahedron (edges will be my four categories) that contains my data points.
I've found this github https://gist.github.com/rmaia/5439815 but the use of pavo package(tcs, vismodel...) is pretty obscure to me.
I've also found something else in composition package, with function plot3D. But in this case an RGL device is open(?!) and I don't really need a rotating plot but just a static plot, since I want to save as an image and insert into my thesis.
Update: data looks like this. Consider only columns violent_crime (total), rape, murder, robbery, aggravated_assault
[ cities violent_crime murder rape rape(legally revised) robbery
1 Autauga 68 2 8 NA 6
2 Baldwin 98 0 4 NA 18
3 Barbour 17 2 2 NA 2
4 Bibb 4 0 1 NA 0
5 Blount 90 0 6 NA 1
6 Bullock 15 0 0 NA 3
7 Butler 44 1 7 NA 4
8 Calhoun 15 0 3 NA 1
9 Chambers 4 0 0 NA 2
10 Cherokee 49 2 8 NA 2
aggravated_assault
1 52
2 76
3 11
4 3
5 83
6 12
7 32
8 11
9 2
10 37
Update: my final plot with composition package
Here is how you can do this without a dedicated package by using geometry and plot3D. Using the data you provided:
# Load test data
df <- read.csv("test.csv")[, c("murder", "robbery", "rape", "aggravated_assault")]
# Convert absolute data to relative
df <- t(apply(df, 1, function(x) x / sum(x)))
# Compute tetrahedron coordinates according to https://mathoverflow.net/a/184585
simplex <- function(n) {
qr.Q(qr(matrix(1, nrow=n)) ,complete = TRUE)[,-1]
}
tetra <- simplex(4)
# Convert barycentric coordinates (4D) to cartesian coordinates (3D)
library(geometry)
df3D <- bary2cart(tetra, df)
# Plot data
library(plot3D)
scatter3D(df3D[,1], df3D[,2], df3D[,3],
xlim = range(tetra[,1]), ylim = range(tetra[,2]), zlim = range(tetra[,3]),
col = "blue", pch = 16, box = FALSE, theta = 120)
lines3D(tetra[c(1,2,3,4,1,3,1,2,4),1],
tetra[c(1,2,3,4,1,3,1,2,4),2],
tetra[c(1,2,3,4,1,3,1,2,4),3],
col = "grey", add = TRUE)
text3D(tetra[,1], tetra[,2], tetra[,3],
colnames(df), add = TRUE)
You can tweak the orientation with the phi and theta arguments in scatter3D.

Two sided bean plots with connection in R

I am trying to create two sided bean plots in R.
My data looks like:
> t1
Country Women Kids
1 China 2 5
2 China 4 10
3 China 3 10
4 China 1 3
5 China 2 2
6 USA 1 1
7 USA 1 2
8 USA 2 1
9 USA 2 3
10 USA 1 0
11 Swiss 1 3
12 Swiss 2 6
13 Swiss 2 5
14 Swiss 1 2
15 Swiss 3 9
I tried the following using R package "beanplot":
> t2=melt(t1)
Using Country as id variables
> t2$C.M=paste(t2$Country,t2$variable,sep=" ")
> beanplot(value ~ C.M, data = t2, ll = 0.04,
+ main = NA, side = "both",ylab = "Count",
+ border = NA, col = list("blue", c("orange", "white")),what=c(1,1,1,1))
And I get the bean plots:
Bean plots for family structure per country
However, I want a bean plot that tells the relation of pairs of points (i.e. women with kids) with connections per country. It should be something like:
This plot but with two-sided bean plot instead of box plot for each country.
Is there a way to achieve this?
You can do:
library(beanplot)
library(reshape2)
library(beeswarm)
# melt
d1 <- melt(t1)
# draw the beans using the at to specify the positions, boxwex
# to increase the size of the beans and xlim to increase the x-axis limits:
beanplot(d1$value ~ interaction(d1$variable, d1$Country), at=c(1.5,3.5,5.5),
side="both",col = list("blue", c("orange", "white")), what=c(1,1,1,1),
boxwex=2, xlim=c(0,7))
# add the points
n <- beeswarm(d1$value ~ interaction(d1$variable, d1$Country), add=T, cex=2,
pwcol = d1$variable, pch=16)
# and finally the segments
segments(matrix(n$x,5,)[,1], d[1:5, 2], matrix(n$x,5,)[,2], d[1:5, 3], lwd= 2)
segments(matrix(n$x,5,)[,3], d[11:15, 2], matrix(n$x,5,)[,4], d[11:15, 3], lwd= 2)
segments(matrix(n$x,5,)[,5], d[6:10, 2], matrix(n$x,5,)[,6], d[6:10, 3], lwd= 2)

use ggplot to plot a panel of bar plots

I have a data frame which reads as below:
factor bin ret
1 beta 1 -0.026840807
2 beta 2 -0.051610137
3 beta 3 -0.044658901
4 beta 4 -0.053322048
5 beta 5 -0.060173704
6 size 1 -0.047448288
7 size 2 -0.045603776
8 size 3 -0.051804757
9 size 4 -0.047044614
10 size 5 -0.045720971
11 liquidity 1 -0.057657070
12 liquidity 2 -0.053105474
13 liquidity 3 -0.045501401
14 liquidity 4 -0.048572585
15 liquidity 5 -0.032209038
16 nonlinear 1 -0.045752503
17 nonlinear 2 -0.047673201
18 nonlinear 3 -0.051107792
19 nonlinear 4 -0.045364070
20 nonlinear 5 -0.047722148
21 btop 1 -0.004399745
22 btop 2 -0.035082069
23 btop 3 -0.054526058
24 btop 4 -0.063497535
25 btop 5 -0.077123859
I would like to plot a panel of charts which looks similar to this:
The difference is that the chart I would like to create would have the bin as the x- axis, and ret as the y- axis. And charts should be bar plot. Anyone could help me with this question?
FYI: The code for the sample plot I've included is:
print(ggplot(df, aes(date,value)) +ylab('return(bps)') + geom_line() + facet_wrap(~ series,ncol=input$numCol)+theme(strip.text.x = element_text(size = 20, colour = "red", angle = 0)))
I wonder if minor change to the code could solve my problem.
From you're description i'll assume this is what you're after
print(ggplot(df, aes(bin, ret)) +
ylab('return(bps)') +
geom_bar(stat="identity") +
facet_wrap(~ factor,ncol=2)+
theme(strip.text.x = element_text(size = 20, colour = "red", angle = 0)))

Alternatives to the barplot function

I have a doubt about the use of the barplot function, I have the following function that receives a data.frame as parameter, which can vary widely in the number of rows. I want to print a histogram as image or likeness. The problem is that I always have problems barplot margins. Is there any way to do the same histogram with another library that no problems margins?
function:
HIST_EPC_list<-function(DF_TAG_PHASE_EPC_counter){
num<-nrow(DF_TAG_PHASE_EPC_counter)
barplot(DF_TAG_PHASE_EPC_counter$Num_EPC, names.arg = DF_TAG_PHASE_EPC_counter$Tag_PHASE, xlab = "Tag_PHASE", ylab = "Num_EPC", main="Histograma Num tags/PHASE:", width=40)
par(mar=c(10,10,10,10))
}
data.frame example:
DF_TAG_PHASE_EPC_counter
Tag_PHASE Num_EPC
1 123.0 1
2 75.0 1
3 78.0 1
4 81.0 2
5 84.0 1
6 87.0 1
7 90.0 2
8 98.0 1
Error:
Error in plot.new() : figure margins too large
Called from: barplot(DF_TAG_RSSI_EPC_counter$Num_EPC, names.arg = DF_TAG_RSSI_EPC_counter$Tag_RSSI,
xlab = "Tag_RSSI", ylab = "Num_EPC", main = "Histograma Num tags/RSSI:",
width = 10)

Resources