sunflowerplot legend symbol with the correct number of segment - r

I wish create a good legend for a sunflowerplot in R with the symbol of multiple leaves with the correct number of "petals".
The "petals" was created in the function sunflowerplot by segments with different angles.
It is possible to recreate a figure with some segments but do you know if it's possible to insert a self created symbol in a legend? Or have you find a solution to create a legend symbol with the different levels of petals draw in the plot for the sunflowerplot function.
dat <- structure(c(0, 0, 0, 0.074, 0.074, 0.074, 0.22, 0.22, 0.22, 0.66,
0.66, 0.66, 18, 19, 19, 19, 19, 18, 16, 16, 18, 3, 3, 3), .Dim = c(12L,
2L))
sunflowerplot(dat[, 1], dat[, 2])
legend("right", c("1 rep", "4 rep", "8 rep"), pch = c(16, 3, 8))
I know it is possible to use pch 4 and 8 in the legend but I am not really agree with this method.

Ok My solution is to don't use the sunflowerplot function, but to create my own function inspired of the sunflowerplot code.
step 1 : calculate the number of point for each coordinate
tt <- xyTable(cbind(dat[,1], dat[,2]))
step 2: plot the tt object
plot(tt$x, tt$y, cex = tt$number, pch = 16)
step 3: add the legend only for existing points
legend("topright", pt.cex = sort(unique(tt$number)), pch = rep(16, length(unique(tt$number))), legend = paste(sort(unique(tt$number)), "replic.", sep = " "), bty = "n")

Related

How to plot igraph such that each vertex is at a specified coordinates, that resembles football player positions?

I have a dataframe with 3 columns, example like this (purely hypothetical):
id <- c("Muller", "Muller", "Ter Stegen", "Musiala", "Musiala", "Musiala", "Pavard")
tid <- c("Davies", "De Ligt", "Muller", "Kimmich", "Pavard", "Lewandowski", "De Ligt")
Passes <- c(14, 5, 1, 10, 23, 4, 1)
Passes <- data.frame(id, tid, Passes)
dput(Passes)
And I have been wanting to plot this so that the vertices appear at specific coordinates in the output graph .
So far my codes are like this:
g <- graph.data.frame(Passes, directed = TRUE)
set_edge_attr(g, "weight", value= E(g)$Passes)
coords <- data.frame(id = c("Ter Stegen", "Musiala", "Davies", "Kimmich", 'De Ligt', "Lewandowski", "Muller", "Pavard"),
x= c(0.5, 1, 1, 1, 2, 3, 3, 3.5),
y= c(1, 1.8, 1.4, 1, 0.6, 1.8, 1.6, 1.2))
plot(g, vertex.size= 2, edge.arrow.size = 0.3, vertex.label.cex = 0.8,
edge.curved=.2, asp = 0, vertex.label.dist=0.7,
layout=coords, xlim = c(0, 4), ylim = c(0, 2))
But then I keep getting errors like 'Error in norm_coords(layout, -1, 1, -1, 1) : `layout' not a matrix''
Anyone know what is wrong with my code, or can propose a better method? Thank you! It's just my actual dataframe has 32 unique ids and together there are 252 rows, I want to find an efficient way to give each unique id a position.
Thanks,
Emmy
try
library(tidyverse)
new.coords <- coords %>% arrange(factor(id, levels = V(g))) %>% select(x,y) %>% as.matrix()
plot(g, vertex.size= 2, edge.arrow.size = 0.3, vertex.label.cex = 0.8,
edge.curved=.2, asp = 0, vertex.label.dist=0.7,
layout = new.coords)

Place elements from vector on histogram bins (R ggplot)

I have a ggplot histogram, showing two histograms of a continuous variable, one for each level of a group.
Through use of ggplot_build, I now also have vectors where each element is the proportional count of one group (1) versus the other (0), per bin.
So for the following histogram built with
ggplot(data,aes(x=nonfordist)) + geom_histogram(aes(fill=presence),
position="identity",alpha=0.5,bins=30)+ coord_cartesian(xlim=c(NA,1750))
I have the following list, showing sequential proportions of group1/group0 per bin
list(0.398927744608261, 0.35358629130967, 0.275296034083078,
0.247361252979231, 0.260224274406332, 0.22107969151671, 0.252847380410023,
0.230055658627087, 0.212244897959184, 0.242105263157895,
0.235294117647059, 0.115384615384615, 0.2, 0.421052631578947,
0.4375, 0.230769230769231, 0.222222222222222, 0.5, 0, 0,
0, NaN, 1, 1, 0, 0, NaN, NaN, NaN, Inf)
What I want now is to plot the elements of this list on the corresponding bins, preferably above the bars showing the counts for group1.
I do not want to include the proportions for bins that fall outside of the histogram due to my xlim command.
You could use stat_bin with a text geom, using the same breaks as you do for your histogram. We don't have your actual data, so I've tried to approximate it here (see footnote for reproducible data). You haven't told us what your list of proportions is called, so I have named it props in this example.
ggplot(data,aes(x=nonfordist)) +
geom_histogram(aes(fill = presence),
breaks = seq(-82.5, by = 165, length = 11),
position = "identity", alpha = 0.5, bins = 30) +
stat_bin(data = data[data$presence == 1, ], geom = "text",
breaks = seq(-82.5, by = 165, length = 11),
label = round(unlist(props)[1:10], 2), vjust = -0.5) +
coord_cartesian(xlim = c(NA, 1750))
Approximation of data
data <- data.frame(
nonfordist = rep(165 * c(0:10, 0:10),
c(24800, 20200, 16000, 6000, 2800, 1300, 700, 450, 100,
50, 30, 9950, 7400, 4500, 600, 300, 150, 80, 50, 30, 20,
10)),
presence = factor(rep(c(0, 1), c(72430, 23090))))

parameters of histogram with R

First, I wanted to be able to display the absciss axis with decimal numbers (example: 1.5, 2.6, ...), but the problem is that when I display the histogram with my code, then automatically the x-axis displays whole number as you can see in the follow picture (I have circled in red what I would like to change): hist
How can i change the parameters to be able to get these whole numbers into decimals?
Secondly, I would like the numbers that appear on the x-axis to correspond exactly to my breaks vector.
Could someone please help me?
Here is my code:
my_data <- transform(my_data, new = as.numeric(new/1000000))
sal_hist_default = hist(my_data$new, breaks = c(1,6.3,11.6,16.9,22.2,27.5), col = "blue", border = "black", las = 1, include.lowest=TRUE,right=FALSE, main="Salary Of best category", xlab = "salaries", ylab = "num of players",xlim = c(1,27.5), ylim = c(0,600))
You should really provide sample data, but try this:
set.seed(42)
new <- rnorm(1000, 14, 3.5)
my_data <- data.frame(new)
sal_hist_default = hist(my_data$new, breaks = c(1, 6.3, 11.6, 16.9, 22.2, 27.5), col = "blue",
border = "black", las = 1, include.lowest=TRUE,right=FALSE, main="Salary Of best category",
xlab = "salaries", ylab = "num of players",xlim = c(1,27.5), ylim = c(0,600), xaxt="n")
axis(1, c(1, 6.3, 11.6, 16.9, 22.2, 27.5), c(1, 6.3, 11.6, 16.9, 22.2, 27.5))

How to automate positioning of inner labels within a stacked barplot?

I frequently have to produce stacked bar plots with labels. The way I've been coding the labels is very time intensive and I wondered if there was a way to code things more efficiently. I would like the labels to be centered on each section of the bars. I'd prefer base R solutions.
stemdata <- structure(list( #had to round some nums below for 100% bar
A = c(7, 17, 76),
B = c(14, 10, 76),
C = c( 14, 17, 69),
D = c( 4, 10, 86),
E = c( 7, 17, 76),
F = c(4, 10, 86)),
.Names = c("Food, travel, accommodations, and procedures",
"Travel itinerary and dates",
"Location of the STEM Tour stops",
"Interactions with presenters/guides",
"Duration of each STEM Tour stop",
"Overall quality of the STEM Tour"
),
class = "data.frame",
row.names = c(NA, -3L)) #4L=number of numbers in each letter vector#
# attach(stemdata)
print(stemdata)
par(mar=c(0, 19, 1, 2.1)) # this sets margins to allow long labels
barplot(as.matrix(stemdata),
beside = F, ylim = range(0, 10), xlim = range(0, 100),
horiz = T, col=colors, main="N=29",
border=F, las=1, xaxt='n', width = 1.03)
text(7, 2, "14%")
text(19, 2, "10%")
text(62, 2, "76%")
text(7, 3.2, "14%")
text(22.5, 3.2, "17%")
text(65.5, 3.2, "69%")
text(8, 4.4, "10%")
text(55, 4.4, "86%")
text(3.5, 5.6, "7%")
text(15, 5.6, "17%")
text(62, 5.6, "76%")
text(9, 6.9, "10%")
text(55, 6.9, "86%")
Staying base R as OP requested, we can easily automate the inner label positioning (i.e. x coordinates) within a small function.
xFun <- function(x) x/2 + c(0, cumsum(x)[-length(x)])
Now, it's good to know that barplot invisibly trows the y coordinates, we can catch them by assignment (here byc <- barplot(.)).
Eventually, just assemble coordinates and labels in data frame labs and "loop" through the text calls in a sapply. (Use col="white" or col=0 for white labels as wished in the other question.)
# barplot
colors <- c("gold", "orange", "red")
par(mar=c(2, 19, 4, 2) + 0.1) # expand margins
byc <- barplot(as.matrix(stemdata), horiz=TRUE, col=colors, main="N=29", # assign `byc`
border=FALSE, las=1, xaxt='n')
# labels
labs <- data.frame(x=as.vector(sapply(stemdata, xFun)), # apply `xFun` here
y=rep(byc, each=nrow(stemdata)), # use `byc` here
labels=as.vector(apply(stemdata, 1:2, paste0, "%")),
stringsAsFactors=FALSE)
invisible(sapply(seq(nrow(labs)), function(x) # `invisible` prevents unneeded console output
text(x=labs[x, 1:2], labels=labs[x, 3], cex=.9, font=2, col=0)))
# legend (set `xpd=TRUE` to plot beyond margins!)
legend(-55, 8.5, legend=c("Medium","High", "Very High"), col=colors, pch=15, xpd=TRUE)
par(mar=c(5, 4, 4, 2) + 0.1) # finally better reset par to default
Result
Data
stemdata <- structure(list(`Food, travel, accommodations, and procedures` = c(7,
17, 76), `Travel itinerary and dates` = c(14, 10, 76), `Location of the STEM Tour stops` = c(14,
17, 69), `Interactions with presenters/guides` = c(4, 10, 86),
`Duration of each STEM Tour stop` = c(7, 17, 76), `Overall quality of the STEM Tour` = c(4,
10, 86)), class = "data.frame", row.names = c(NA, -3L))
Would you consider a tidyverse solution?
library(tidyverse) # for dplyr, tidyr, tibble & ggplot2
stemdata %>%
rownames_to_column(var = "id") %>%
gather(Var, Val, -id) %>%
group_by(Var) %>%
mutate(id = factor(id, levels = 3:1)) %>%
ggplot(aes(Var, Val)) +
geom_col(aes(fill = id)) +
coord_flip() +
geom_text(aes(label = paste0(Val, "%")),
position = position_stack(0.5))
Result:

Repeating x axis R

I have data with the amount of radiation at a specific time (hour, minutes) for three repeating days. I want to plot this so the x-axis goes from 0 - 24 3 times. So the x axis repeats itself. And on the y axis the amount of radiation. I have tried the following script without any succes.
plot(gegevens[,1],gegevens[,2],type='l',col='red',xaxt='n',yaxt='n',xlab='',ylab='')
axis(1, at=(0:74),labels = rep.int(0:24,3), las=2)
mtext('Zonnetijd (u)', side=1,line=3)
The dataset was to big so I've selected the first two hours from 2 days. The first column is the time en the second is the radiation. The data then looks as followed:
structure(c(0, 0.083333333333333, 0.166666666666667, 0.25, 0.333333333333333,
0.416666666666667, 0.5, 0.583333333333333, 0.666666666666667,
0.75, 0.833333333333333, 0.916666666666667, 1, 1.08333333333333,
1.16666666666667, 1.25, 1.33333333333333, 1.41666666666667, 1.5,
1.58333333333333, 1.66666666666667, 1.75, 1.83333333333333, 1.91666666666667,
0.0158590638878904, 0.0991923972212234, 0.182525730554557, 0.26585906388789,
0.349192397221223, 0.432525730554557, 0.51585906388789, 0.599192397221223,
0.682525730554557, 0.76585906388789, 0.849192397221223, 0.932525730554557,
1.01585906388789, 1.09919239722122, 1.18252573055456, 1.26585906388789,
1.34919239722122, 1.43252573055456, 1.51585906388789, 1.59919239722122,
1.68252573055456, 1.76585906388789, 1.84919239722122, 1.93252573055456,
0.066, 0.066, 0.068, 0.068, 0.068, 0.066, 0.066, 0.066, 0.066,
0.066, 0.066, 0.066, 0.057, 0, 0, 0, -0.002, 0, 0, -0.002, 0,
-0.002, -0.009, -0.011, 0, -0.002, 0, -0.002, 0, -0.002, 0, 0.002,
0, 0, 0, 0, -0.002, -0.002, -0.007, 0, -0.002, 0, 0, 0, -0.002,
-0.002, -0.002, 0), .Dim = c(48L, 2L), .Dimnames = list(NULL,
c("t", "z")))
I think you would be better off to move towards a date/time class for your axis. Then you can have more control on what to plot etc. Below is an example:
# create example data
df <- data.frame(
T = seq.POSIXt(as.POSIXct("2000-01-01 00:00:00"),
by = "hours", length.out = 24*3)
)
df
df$St <- cumsum(rnorm(24*3))
# plot
png("test.png", width = 8, height = 4, units = "in", res = 200)
op <- par(mar = c(4,4,1,1), ps = 8)
plot(St ~ T, df, type="l",col='red',xaxt='n',yaxt='n',xlab='',ylab='')
axis(1, at=df$T, labels = format(df$T, "%H"), las=2)
mtext('Zonnetijd (u)', side=1,line=3)
par(op)
dev.off()
You Can see that you may have some space issues with the labels when you plot every one.
Here is another example with 3-hour increment labels:
# alt plot
AT <- seq(min(df$T), max(df$T), by = "3 hour") # 3 hour increments
LAB <- format(AT, "%H")
png("test2.png", width = 8, height = 4, units = "in", res = 200)
op <- par(mar = c(4,4,1,1), ps = 8)
plot(St ~ T, df, type="l",col='red', xlab='', ylab='', xaxt='n')
axis(1, at = AT, labels = LAB, las=2)
mtext('Zonnetijd (u)', side=2, line=3)
mtext('hour', side=1, line=3)
par(op)
dev.off()
Marc has good advice about using a datetime class. Overall, that is a good way to go. See this question for examples of converting decimal times in hours to POSIX datetime class.
If you want to continue with your numeric data we the data itself to indicate what day it occurs on. Here we create a new column identical to the first, but adding 24 every time the first column has a negative difference between successive rows:
gegevens = cbind(gegevens, gegevens[, 1] + 24 * c(0, cumsum(diff(gegevens[, 1]) < 0)))
Now when we plot using our new column, the hours are correctly spaced by day:
plot(gegevens[, 3], gegevens[, 2], type = 'l', col = 'red', xaxt = 'n', yaxt = 'n', xlab = '', ylab = '')
You have some axis issues as well. There is no 24 hour, we usually call this the 0 hour. And 24 * 3 = 72, not 74, so our maximum hour (starting at 0) is 71:
axis(1, at= 0:71, labels = rep.int(0:23,3), las = 2)
Here is the resulting plot on your sample data. It should "work" on your full data, but I agree with Marc that it is probably too many labels. Using a POSIXct date-time format is the best way to flexibly make adjustments.

Resources