Circular time plots in R with stacked rose - r

I have a data frame imported in excel with the following values:
> dt <- read.csv(file="teste1.csv",head=TRUE,sep=";")
> dt
hour occur time tt
1 1 one 00:00:59 59
2 2 one 08:40:02 31202
3 3 one 07:09:59 25799
4 4 one 01:22:16 4936
5 5 one 01:30:28 5428
6 6 one 01:28:57 5337
7 7 one 19:05:34 68734
8 8 one 01:57:47 7067
9 9 one 00:13:17 797
10 10 one 12:14:48 44088
11 11 one 23:24:43 84283
12 12 one 13:23:14 48194
13 13 one 02:28:51 8931
14 14 one 14:21:24 51684
15 15 one 13:26:14 48374
16 16 one 00:27:24 1644
17 17 one 15:56:51 57411
18 18 one 11:07:50 40070
19 19 one 07:18:18 26298
20 20 one 07:33:13 27193
21 21 one 10:02:03 36123
22 22 one 11:30:32 41432
23 23 one 21:21:27 76887
24 24 one 00:49:18 2958
25 1 two 21:01:11 75671
26 2 two 11:00:40 39640
27 3 two 21:40:09 78009
28 4 two 01:05:37 3937
29 5 two 00:44:17 2657
30 6 two 12:43:21 45801
31 7 two 10:53:49 39229
32 8 two 08:29:09 30549
33 9 two 05:07:46 18466
34 10 two 17:32:37 63157
35 11 two 09:35:16 34516
36 12 two 03:04:19 11059
37 13 two 23:09:13 83353
38 14 two 01:15:49 4549
39 15 two 14:24:33 51873
40 16 two 01:12:53 4373
41 17 two 21:20:11 76811
42 18 two 02:25:21 8721
43 19 two 01:17:37 4657
44 20 two 15:07:50 54470
45 21 two 22:27:32 80852
46 22 two 01:41:07 6067
47 23 two 09:40:23 34823
48 24 two 05:31:17 19877
I want to create a circular time with stacked rose based on the data frame, ie, each stacked rose are grouped by column occur, and the size is defined by column time.
The column hour indicates the x position of each rose.
So I tried in this way but the result doesn't match with what I want:
ggplot(dt, aes(x = hour, fill = occur)) + geom_histogram(breaks = seq(0,
24), width = 2, colour = "grey") + coord_polar(start = 0) + theme_minimal() +
scale_fill_brewer() + scale_x_continuous("", limits = c(0, 24), breaks = seq(0, 24), labels = seq(0,
24))
What I'm doing wrong? I want something like this http://blog.odotech.com/Portals/57087/images/French%20landfill%20wind%20rose.png
I hope I've explained correctly. Thank you!

Not sure, but hope it helps:
Convert your time value to numeric (I used chron package, but there are numerous other ways, so you don't have to call this library, but it's just to make it more straighforward):
library(chron)
x$tt<-hours(times(x$time))*3600+minutes(times(x$time))*60+seconds(times(x$time))
And make a graph:
p<-ggplot(x, aes(x = hour, y=tt,fill = occur)) +
geom_bar(breaks = seq(0,24), width = 2, colour="grey",stat = "identity") +
theme_minimal() +
scale_fill_brewer()+coord_polar(start=0)+
scale_x_continuous("", limits = c(0, 24), breaks = seq(0, 24), labels = seq(0,24))
Is that ok?
Here some cases have only 1 colors, but it's due to the scaling issues, as some have time near 24 hours, while others are in seconds only.
You can try separate graphs using facet (it's better to play with colors afterwards :))
p+facet_grid(~occur)+ theme(axis.title.y = theme_blank(),
axis.text.y = theme_blank())
The circular graph is good if you're comparing data by hours, but if you also want to compare differences in occur variable, think it's better to show in old fashion bar graphs.

Related

Barplot in R fill with certain values

I have a dataset which contains the data of pairs playing a game. I have a barplot that shows the total games played by the pairs. But now I want those bars('number') to be filled with the amount of games they successfully completed('sum'). I can't get it to work. The barplot is created like this:
barplot(height = game_count$number, xlab = 'Pairs', ylim = c(0,35), ylab='Games played')
The data looks like this:
participants sum number
1 06104873220647518670 30 32
2 06105747340637377404 23 24
3 06113978630633565020 28 32
4 06121794480617858550 25 27
5 06122613960611857952 23 26
6 06123139380653583516 25 28
7 06123650620648276595 28 32
8 06124453210624910109 32 34
9 06127993700610846968 24 26
10 06128440030639764541 19 24
11 06132461300624244572 26 30
12 06137611390651588167 25 28
13 06145014400637290807 16 19
14 06163181050611257617 30 30
15 06172024240651919112 21 23
One option can be ggplot2:
library(ggplot2)
#Code
game_count$Freq <- game_count$sum/game_count$number
#Plot
ggplot(game_count,aes(x=1:nrow(game_count),y=Freq))+
geom_col(fill='cyan3',color='black')+
xlab('')
Output:
This worked for me:
barplot(t(game_correct[c('number', 'sum')]), beside=TRUE, ylim=c(0,35), col=c('black', 'green'), main='Games played and successive games by the pairs', xlab='Pairs', ylab='Games')
Result in this graph:

Making multi-line plots in R using ggplot2

I would like to compile some data into a ggplot() line plot of different colors.
It's rainfall in various places over 100 days, and the data is quite different between locations which is giving me fits.
I've tried using different suggestions from this forum and they don't seem to be working well for this data. Sample data:
Time Location1 Location2 Location3
0 48 99.2966479761526 2
1 51 98.7287820735946 4
2 58 98.4803262236528 4.82842712474619
3 43 97.8941490454599 5.46410161513775
4 47 96.6091435402632 6
5 47 95.207282404881 6.47213595499958
6 41 94.8696538619697 6.89897948556636
7 34 94.6514389757067 7.29150262212918
8 40 93.7297335476615 7.65685424949238
9 57 93.2440731907263 8
My code thus far is
ggplot(Rain) +
geom_line(aes(x=Time,y=Location1,col="red")) +
geom_line(aes(x=Time,y=Location2,col="blue")) +
geom_line(aes(x=Time,y=Location3,col="green")) +
scale_color_manual(labels = c("Location 1","Location 2","Location 3"),
values = c("red","blue","green")) +
xlab("Time (Days)") + ylab("Rainfall (Inches)") + labs(color="Locations") +
ggtitle("Rainfall Over 100 Days In Three Locations")
So far it gives me everything that I want but for some reason the colors are wrong when I plot it, i.e. it plots location 1 in green while I told it red in my first geom_line.
library(tidyr)
library(ggplot2)
df_long <- gather(data = df1, Place, Rain, -Time)
ggplot(df_long) +
geom_line(aes(x=Time, y=Rain, color=Place))
Data:
df1 <- read.table(text="Time Location1 Location2 Location3
0 48 99.2966479761526 2
1 51 98.7287820735946 4
2 58 98.4803262236528 4.82842712474619
3 43 97.8941490454599 5.46410161513775
4 47 96.6091435402632 6
5 47 95.207282404881 6.47213595499958
6 41 94.8696538619697 6.89897948556636
7 34 94.6514389757067 7.29150262212918
8 40 93.7297335476615 7.65685424949238
9 57 93.2440731907263 8",
header=T, stringsAsFactors=F)

Function for generating multiple line charts for all variables in a dataframe for different groups

I have 106 weeks data for 5 different LOB (Line of Business). The variables are Traffic, Spend, Clicks, etc. In total there will be 106*5 = 530 rows.
Dataframe looks like:
LOB Week Traffic Spend Clicks
A 1 34 12 5
A 2 37 32 6
A 3 41 57 7
A 4 52 42 12
A 5 27 37 8
... 106 weeks
B...106 weeks
C...106 weeks
D...106 weeks
E 1 43 22 12
E 2 65 16 14
E 3 76 18 9
E 4 25 14 11
E 5 53 15 15
... 106 weeks
I want to generate line chart for Traffic for all the 5 different LOB on the same chart, similarly for other metrics also. For this I have written a function but it is not doing what I want.
Code:
for ( i in seq(1,length( data),1) ) plot(data[,i],ylab=names(data[i]),type="l", col = "red", xlab = "Week", main = "")
Kindly suggest me how this can be done.
You can use ggplot2 :
ggplot(data, aes(x = Week, y = Traffic, color = LOB)) +
geom_line()
Please try to submit a toy example of your data so we can reproduce the code. See Here.
Edit: as suggested by #Axeman, you may want to plot all metrics together. Here is his solution for visibility:
d <- gather(data, metric, value, -Week, -LOB)
ggplot(d, aes(Week, value, color = LOB)) +
geom_line() +
facet_wrap(~metric, scales = 'free_y')

geom_bar labeling for melted data / stacked barplot

I have a problem with drawing stacked barplot with ggplot. My data looks like this:
timeInterval TotalWilling TotalAccepted SimID
1 16 12 Sim1
1 23 23 Sim2
1 63 60 Sim3
1 69 60 Sim4
1 61 60 Sim5
1 60 54 Sim6
2 16 8 Sim1
2 23 21 Sim2
2 63 52 Sim3
2 69 64 Sim4
2 61 45 Sim5
2 60 32 Sim6
3 16 14 Sim1
3 23 11 Sim2
3 63 59 Sim3
3 69 69 Sim4
3 61 28 Sim5
3 60 36 Sim6
I would like to draw a stacked barplot for each simID over a timeInterval, and Willing and Accepted should be stacked. I achieved the barplot with the following simple code:
dat <- read.csv("myDat.csv")
meltedDat <- melt(dat,id.vars = c("SimID", "timeInterval"))
ggplot(meltedDat, aes(timeInterval, value, fill = variable)) + facet_wrap(~ SimID) +
geom_bar(stat="identity", position = "stack")
I get the following graph:
Here my problem is that I would like to put percentages on each stack. Which means, I want to put percentage as for Willing label: (Willing/(Willing+Accepted)) and for Accepted part, ((Accepted/(Accepted+Willing)) so that I can see how many percent is willing how many is accepted such as 45 on red part of stack to 55 on blue part for each stack. I cannot seem to achieve this kind of labeling.
Any hint is appreciated.
applied from Showing data values on stacked bar chart in ggplot2
meltedDat <- melt(dat,id.vars = c("SimID", "timeInterval"))
meltedDat$normvalue <- meltedDat$value
meltedDat$valuestr <- sprintf("%.2f%%", meltedDat$value, meltedDat$normvalue*100)
meltedDat <- ddply(meltedDat, .(timeInterval, SimID), transform, pos = cumsum(normvalue) - (0.5 * normvalue))
ggplot(meltedDat, aes(timeInterval, value, fill = variable)) + facet_wrap(~ SimID) + geom_bar(stat="identity", position = "stack") + geom_text(aes(x=timeInterval, y=pos, label=valuestr), size=2)
also, it looks like you may have some of your variables coded as factors.

R hist vs geom_hist break points

I am using both geom_hist and histogram in R with the same breakpoints but I get different graphs. I did a quick search, does anyone know what the definition breaks are and why they would be a difference
These produce two different plots.
set.seed(25)
data <- data.frame(Mos=rnorm(500, mean = 25, sd = 8))
data$Mos<-round(data$Mos)
pAge <- ggplot(data, aes(x=Mos))
pAge + geom_histogram(breaks=seq(0, 50, by = 2))
hist(data$Mos,breaks=seq(0, 50, by = 2))
Thanks
To get the same histogram in ggplot2 you specify the breaks inside scale_x_continuous and binwidth inside geom_histogram.
Additionally, hist and histograms in ggplot2 use different defaults to create the intervals:
hist: right-closed (left open) intervals. Default: right = TRUE
stat_bin (ggplot2): left-closed (right open) intervals. Default: right = FALSE
**hist** **ggplot2**
freq1 Freq freq2 Freq
1 (0,2] 0 [0,2) 0
2 (2,4] 2 [2,4) 2
3 (4,6] 2 [4,6) 1
4 (6,8] 1 [6,8) 2
5 (8,10] 6 [8,10) 2
6 (10,12] 9 [10,12) 7
7 (12,14] 24 [12,14) 17
8 (14,16] 27 [14,16) 26
9 (16,18] 39 [16,18) 31
10 (18,20] 48 [18,20) 46
11 (20,22] 52 [20,22) 43
12 (22,24] 38 [22,24) 57
13 (24,26] 44 [24,26) 36
14 (26,28] 46 [26,28) 52
15 (28,30] 39 [28,30) 39
16 (30,32] 31 [30,32) 33
17 (32,34] 30 [32,34) 26
18 (34,36] 24 [34,36) 29
19 (36,38] 18 [36,38) 27
20 (38,40] 9 [38,40) 12
21 (40,42] 5 [40,42) 6
22 (42,44] 4 [42,44) 0
23 (44,46] 1 [44,46) 5
24 (46,48] 1 [46,48) 0
25 (48,50] 0 [48,50) 1
I included the argument right = FALSE so the histogram intervalss are left-closed (right open) as they are in ggplot2. I added the labels in both plots, so it is easier to check the intervals are the same.
ggplot(data, aes(x = Mos))+
geom_histogram(binwidth = 2, colour = "black", fill = "white")+
scale_x_continuous(breaks = seq(0, 50, by = 2))+
stat_bin(binwidth = 2, aes(label=..count..), vjust=-0.5, geom = "text")
hist(data$Mos,breaks=seq(0, 50, by = 2), labels =TRUE, right =FALSE)
To check the frequencies in each bin:
freq <- cut(data$Mos, breaks = seq(0, 50, by = 2), dig.lab = 4, right = FALSE)
as.data.frame(table(frecuencias))

Resources