Plotting a dot for every n observations - r

I want to archieve the following plot type using ggplot:
using the following data:
t <- read.table(header=T, row.names=NULL,
colClasses=c(rep("factor",3),"numeric"), text=
"week team level n.persons
1 A 1 50
1 A 2 20
1 A 3 30
1 B 1 50
1 B 2 20
2 A 2 20
2 A 3 40
2 A 4 20
2 B 3 30
2 B 4 20")
so far, by applying this transformation
t0 <- t[ rep(1:nrow(t), t$n.persons %/% 10 ) , ]
and plotting
ggplot(t0) + aes(x=week, y=level, fill=team) +
geom_dotplot(binaxis="y", stackdir="center",
position=position_dodge(width=0.2)
i could generate
A: How to archieve that dots of different teams dodge each other vertically and do not overlap?
B: Is it possible that the whole pack of dots is always centered, i.e.
no dodging occurs if there are only dots of one team in one place?

The following code stops the overlap:
t0 <- t[ rep(1:nrow(t), t$n.persons %/% 10 ) , ]
t0$level <- as.numeric(t0$level) # This changes the x-axis to numerics
t0$level <- ifelse(t0$team == "B", (t0$level+.1), t0$level) # This adds .1 to the position on the x-axis if the team is 'B'
ggplot(t0) + aes(x=week, y=level, fill=team) + geom_dotplot(binaxis="y", stackdir="center",
position=position_dodge(width=0.2))
Here is the output:
You could also minus a value to move the dot downwards if you would prefer that.
If you want the line exactly between the dots this code should do it:
t0$level <- ifelse(t0$team == "B", (t0$level+.06), t0$level)
t0$level <- ifelse(t0$team == "A", (t0$level-.06), t0$level)
Output:
I'm not sure off the top of my head how to skip the above ifelse when there is only one team at a given coordinate. I'd imagine you'd need to do a count of unique team labels at each coordinate and only if that count was > 1 then run the code above.

Related

R: "Animate" Points on a Scatter Plot

I am working with R. Suppose I have the following data frame:
my_data <- data.frame(
"col" = c("red","red","red","red","red","blue","blue","blue","blue","blue","green", "green", "green", "green","green"),
"x_cor" = c(1,2,5,6,7,4,9,1,0,1,4,4,7,8,2),
"y_cor" = c(2,3,4,5,9,5,8,1,3,9,11,5,7,9,1),
"frame_number" = c(1,2,3,4,5, 1,2,3,4,5, 1,2,3,4,5)
)
my_data$col = as.factor(my_data$col)
head(my_data)
col x_cor y_cor frame_number
1 red 1 2 1
2 red 2 3 2
3 red 5 4 3
4 red 6 5 4
5 red 7 9 5
6 blue 4 5 1
In R, is it possible to create a (two-dimensional) graph that will "animate" each colored point to a new position based on the "frame number"?
For example:
I started following the instructions from this website here: https://www.datanovia.com/en/blog/gganimate-how-to-create-plots-with-beautiful-animation-in-r/
First, I made a static graph:
library(ggplot2)
library(gganimate)
p <- ggplot(
my_data,
aes(x = x_cor, y=y_cor, colour = col)
Then, I tried to animate it:
p + transition_time(frame_number) +
labs(title = "frame_number: {frame_number}")
Unfortunately, this produced an empty plot and the following warnings:
There were 50 or more warnings (use warnings() to see the first 50)
1: Cannot get dimensions of plot table. Plot region might not be fixed
2: values must be length 1,
but FUN(X[[1]]) result is length 15
Can someone please show me how to fix this problem?
Thanks

no. of geom_point matches the value

I have an existing ggplot with geom_col and some observations from a dataframe. The dataframe looks something like :
over runs wickets
1 12 0
2 8 0
3 9 2
4 3 1
5 6 0
The geom_col represents the runs data column and now I want to represent the wickets column using geom_point in a way that the number of points represents the wickets.
I want my graph to look something like this :
As
As far as I know, we'll need to transform your data to have one row per point. This method will require dplyr version > 1.0 which allows summarize to expand the number of rows.
You can adjust the spacing of the wickets by multiplying seq(wickets), though with your sample data a spacing of 1 unit looks pretty good to me.
library(dplyr)
wicket_data = dd %>%
filter(wickets > 0) %>%
group_by(over) %>%
summarize(wicket_y = runs + seq(wickets))
ggplot(dd, aes(x = over)) +
geom_col(aes(y = runs), fill = "#A6C6FF") +
geom_point(data = wicket_data, aes(y = wicket_y), color = "firebrick4") +
theme_bw()
Using this sample data:
dd = read.table(text = "over runs wickets
1 12 0
2 8 0
3 9 2
4 3 1
5 6 0", header = T)

R - reshaped data from wide to long format, now want to use created timevar as factor

I am working with longitudinal data and assess the utilization of a policy over 13 months. In oder to get some barplots with the different months on my x-axis, I converted my data from wide Format to Long Format.
So now, my dataset looks like this
id month hours
1 1 13
1 2 16
1 3 20
2 1 0
2 2 0
2 3 10
I thought, after reshaping I could easily use my newly created "month" variable as a factor and plot some graphs. However, it does not work out and tells me it's a list or an atomic vector. Transforming it into a factor did not work out - I would desperately Need it as a factor.
Does anybody know how to turn it into a factor?
Thank you very much for your help!
EDIT.
The OP's graph code was posted in a comment. Here it is.
library(ggplot2)
ggplot(data, aes(x = hours, y = month)) + geom_density() + labs(title = 'Distribution of hours')
# Loading ggplot2
library(ggplot2)
# Placing example in dataframe
data <- read.table(text = "
id month hours
1 1 13
1 2 16
1 3 20
2 1 0
2 2 0
2 3 10
", header = TRUE)
# Converting month to factor
data$month <- factor(data$month, levels = 1:12, labels = 1:12)
# Plotting grouping by id
ggplot(data, aes(x = month, y = hours, group = id, color = factor(id))) + geom_line()
# Plotting hour density by month
ggplot(data, aes(hours, color = month)) + geom_density()
The problem seems to be in the aes. geom_density only needs a x value, if you think about it a little, y doesn't make sense. You want the density of the x values, so on the vertical axis the values will be the values of that density, not some other values present in the dataset.
First, read in the data.
Indirekte_long <- read.table(text = "
id month hours
1 1 13
1 2 16
1 3 20
2 1 0
2 2 0
2 3 10
", header = TRUE)
Now graph it.
library(ggplot2)
g <- ggplot(Indirekte_long, aes(hours))
g + geom_density() + labs(title = 'Distribution of hours')

Overlaying unique column values as geom_point in ggplot2

Here is an excerpt of the dataset I am working on.
Name Value ID Total
A 10 1 3
A 11 2 3
A 10 3 3
B 10 1 4
B 11 2 4
B 11 3 4
B 11 4 4
What I want to do is plot Name on the x-axis ID on the y-axis for all Values of 11; on top of which I want to overlay Total so that when the graph is interpreted, it is possible to see the count of items per a Name group. This might be achieved using length of a group in the Name variable or using Total. Here is what I did and a sample of the output desired.
mydf <- read.csv("./test1.csv", header = T)
x <- ggplot(mydf, aes(Name, ID))+ geom_point(data = subset(mydf, Value==11), size=3, colour="tomato3")+ scale_y_continuous(name="Class ID", limits=c(1,4),breaks=seq(1,4, by=1))
y <- x+ xlab("Class")+theme_bw()
z <- y+scale_x_discrete(limits = c("A","B", "C"))
The three orange asterisks at (A,3) and (B,4) are manual text annotation that I want to replace with either a short line or a circle to indicate the total number of items.
Thank you for your help.

Plot In R with Multiple Lines Based On A Particular Variable?

I have this accelerometer dataset and, let's say that I have some n number of observations for each subject (30 subjects total) for body-acceleration x time.
I want to make a plot so that it plots these body acceleration x time points for each subject in a different color on the y axis and the x axis is just an index. I tried this:
ggplot(data = filtered_data_walk, aes(x = seq_along(filtered_data_walk$'body-acceleration-mean-y-time'), y = filtered_data_walk$'body-acceleration-mean-y-time')) +
geom_line(aes(color = filtered_data_walk$subject))
But, the problem is that it doesn't superimpose the 30 lines, instead, they run along side each other. In other words, I end up with n1 + n2 + n3 + ... + n30 x index points, instead of max{n1, n2, ..., n30}. This is my first time posting, so I hope this makes sense (I know my formatting is bad).
One solution I thought of was to create a new variable which gives a value of 1 to n for all the observations of each subject. So, for example, if I had 6 observations for subject1, 4 observations for subject2, and 9 observations for subject3, this new variable would be sequenced like:
1 2 3 4 5 6 1 2 3 4 1 2 3 4 5 6 7 8 9
Is there an easy way to do this? Please help, ty.
Assuming your data is formatted as a data.frame or matrix, for a toy dataset like
x <- data.frame(replicate(5, rnorm(10)))
x
# X1 X2 X3 X4 X5
# 1 -1.36452272 -1.46446475 2.0444381 0.001585876 -1.1085990
# 2 -1.41303046 -0.14690269 1.6179084 -0.310162018 -1.5528733
# 3 -0.15319554 -0.18779791 -0.3005058 0.351619212 1.6282955
# 4 -0.38712167 -0.14867239 -1.0776359 0.106694311 -0.7065382
# 5 -0.50711166 -0.95992916 1.3522922 1.437085757 -0.7921355
# 6 -0.82377208 0.50423328 -0.5366513 -1.315263679 1.0604499
# 7 -0.01462037 -1.15213287 0.9910678 0.372623508 1.9002438
# 8 1.49721113 -0.84914197 0.2422053 0.337141898 1.2405208
# 9 1.95914245 -1.43041783 0.2190829 -1.797396822 0.4970690
# 10 -1.75726827 -0.04123615 -0.1660454 -1.071688768 -0.3331887
...you might be able to get there with something like
plot(x[,1], type='l', xlim=c(1, nrow(x)), ylim=c(min(x), max(x)))
for(i in 2:ncol(x)) lines(x[,i], col=i)
You could play with formatting some more, of course, do things with lty= and lwd= and maybe a color ramp of your own choosing, etc.
If your data is in the format below...
x <- data.frame(id=c("A","A","A","B","B","B","B","C","C"), acc=rnorm(9))
x
# id acc
# 1 A 0.1796964
# 2 A 0.8770237
# 3 A -2.4413527
# 4 B 0.9379746
# 5 B -0.3416141
# 6 B -0.2921062
# 7 B 0.1440221
# 8 C -0.3248310
# 9 C -0.1058267
...you could get there with
maxn <- max(with(x, tapply(acc, id, length)))
ids <- sort(unique(x$id))
plot(x$acc[x$id==ids[1]], type='l', xlim=c(1,maxn), ylim=c(min(x$acc),max(x$acc)))
for(i in 2:length(ids)) lines(x$acc[x$id==ids[i]], col=i)
Hope this helps, and that I interpreted your problem right--
That's pretty quick to do if you are OK with using dplyr. group_by to enforce a separate counter for each subject, mutate to add the actual counter, and your ggplot should work. Example with iris dataset:
group_by(iris, Species) %>%
mutate(index = seq_along(Petal.Length)) %>%
ggplot() + geom_line(aes(x=index, y=Petal.Length, color=Species))

Resources