ggplot each group consists of only one observation - r

I'm trying to make a plot similar to this answer: https://stackoverflow.com/a/4877936/651779
My data frame looks like this:
df2 <- read.table(text='measurements samples value
1 4hours sham1 6
2 1day sham1 175
3 3days sham1 417
4 7days sham1 163
5 14days sham1 37
6 90days sham1 134
7 4hours sham2 8
8 1day sham2 402
9 3days sham2 482
10 7days sham2 67
11 14days sham2 16
12 90days sham2 31
13 4hours sham3 185
14 1day sham3 402
15 3days sham3 482
16 7days sham3 85
17 14days sham3 29
18 90days sham3 10',header=T)
And plot it with
ggplot(df2, aes(measurements, value)) + geom_line(aes(colour = samples))
No lines show in the plot, and I get the message
geom_path: Each group consist of only one observation.
Do you need to adjust the group aesthetic?
I don't see where what I'm doing is different from the answer I linked above. What should I change to make this work?

Add group = samples to the aes of geom_line. This is necessary since you want one line per samples rather than for each data point.
ggplot(df2, aes(measurements, value)) +
geom_line(aes(colour = samples, group = samples))

Related

time series aesthetics with ggplot2

hello I have tried to graph the following data
I have tried to graph the following time series
fecha importaciones
1 Ene\n1994 171.0
2 Feb\n1994 170.7
3 Mar\n1994 183.7
4 Abr\n1994 214.6
5 May\n1994 227.2
6 Jun\n1994 221.1
7 Jul\n1994 216.4
8 Ago\n1994 235.3
9 Sep\n1994 227.0
10 Oct\n1994 216.0
11 Nov\n1994 221.5
12 Dic\n1994 270.9
13 Ene\n1995 250.4
14 Feb\n1995 259.6
15 Mar\n1995 258.2
16 Abr\n1995 232.9
17 May\n1995 335.0
18 Jun\n1995 295.2
19 Jul\n1995 302.5
20 Ago\n1995 283.3
21 Sep\n1995 264.4
22 Oct\n1995 277.6
23 Nov\n1995 289.1
24 Dic\n1995 280.5
25 Ene\n1996 252.4
26 Feb\n1996 250.1
.
.
.
320 Ago\n2020 794.6
321 Sep\n2020 938.2
322 Oct\n2020 966.3
323 Nov\n2020 958.9
324 Dic\n2020 1059.2
325 Ene\n2021 1056.2
326 Feb\n2021 982.5
I graph it with office cal
but trying to plot it in R with ggplot
ggplot(datos, aes(x = fecha, y = importaciones)) +
geom_line(size = 1) +
scale_color_manual(values=c("#00AFBB", "#E7B800"))+
theme_minimal()
I have tried to graph with all the possible steps but it does not fit me in a correct way for someone to guide me
Change the x-axis to date class.
library(ggplot2)
df$fecha <- lubridate::dmy(paste0(1, df$fecha))
ggplot(datos, aes(x = fecha, y = importaciones, group = 1)) +
geom_line(size = 1) +
scale_color_manual(values=c("#00AFBB", "#E7B800"))+
theme_minimal()
You can use scale_x_date to change the breaks and display format of dates on x-axis.

fit a normal distribution to grouped data, giving expected frequencies

I have a frequency distribution of observations, grouped into counts within class intervals.
I want to fit a normal (or other continuous) distribution, and find the expected frequencies in each interval according to that distribution.
For example, suppose the following, where I want to calculate another column, expected giving the
expected number of soldiers with chest circumferences in the interval given by chest, where these
are assumed to be centered on the nominal value. E.g., 35 = 34.5 <= y < 35.5. One analysis I've seen gives the expected frequency in this cell as 72.5 vs. the observed 81.
> data(ChestSizes, package="HistData")
>
> ChestSizes
chest count
1 33 3
2 34 18
3 35 81
4 36 185
5 37 420
6 38 749
7 39 1073
8 40 1079
9 41 934
10 42 658
11 43 370
12 44 92
13 45 50
14 46 21
15 47 4
16 48 1
>
> # ungroup to a vector of values
> chests <- vcdExtra::expand.dft(ChestSizes, freq="count")
There are quite a number of variations of this question, most of which relate to plotting the normal density on top of a histogram, scaled to represent counts not density. But none explicitly show the calculation of the expected frequencies. One close question is R: add normal fits to grouped histograms in ggplot2
I can perfectly well do the standard plot (below), but for other things, like a Chi-square test or a vcd::rootogram plot, I need the expected frequencies in the same class intervals.
> bw <- 1
n_obs <- nrow(chests)
xbar <- mean(chests$chest)
std <- sd(chests$chest)
plt <-
ggplot(chests, aes(chest)) +
geom_histogram(color="black", fill="lightblue", binwidth = bw) +
stat_function(fun = function(x)
dnorm(x, mean = xbar, sd = std) * bw * n_obs,
color = "darkred", size = 1)
plt
here is how you could calculate the expected frequencies for each group assuming Normality.
xbar <- with(ChestSizes, weighted.mean(chest, count))
sdx <- with(ChestSizes, sd(rep(chest, count)))
transform(ChestSizes, Expected = diff(pnorm(c(32, chest) + .5, xbar, sdx)) * sum(count))
chest count Expected
1 33 3 4.7600583
2 34 18 20.8822328
3 35 81 72.5129162
4 36 185 199.3338028
5 37 420 433.8292832
6 38 749 747.5926687
7 39 1073 1020.1058521
8 40 1079 1102.2356155
9 41 934 943.0970605
10 42 658 638.9745241
11 43 370 342.7971793
12 44 92 145.6089948
13 45 50 48.9662992
14 46 21 13.0351612
15 47 4 2.7465640
16 48 1 0.4579888

R find number of rows in a group and plot

I have a table of Tennis matches. I want to group by winner_ids and plot them against height, basically to check if the taller players have won more matches.
The data looks like this.
m_id winner_id winner_height
1 21 166
2 21 166
3 22 167
4 21 166
5 23 170
6 24 163
7 22 167
8 25 164
Here m_id is the match_id. I want to plot number of matches a person has won against his height
example: 21 has won 3 matches and her height is 166 cm
how can I acheive this in ggplot?
my following code doesn't seem to be working
matches %>% group_by(winner_id) %>%
ggplot(., aes(x = winner_ht, y = nrow((winner_id)))) + geom_point()
Can anyone help?
Do you mean something like this?
library(tidyverse)
df %>%
group_by(winner_id, winner_height) %>%
summarise(n = n()) %>%
ggplot(aes(winner_height, n, label = winner_id)) +
geom_point() +
geom_text(position = position_nudge(y = -0.1))
Explanation: We count the number of games n per winner_id and winner_height and pass the summarised data to ggplot where we plot winner_height vs. n. We can also add labels to indicate the winner_id.
Sample data
df <- read.table(text =
"m_id winner_id winner_height
1 21 166
2 21 166
3 22 167
4 21 166
5 23 170
6 24 163
7 22 167
8 25 164", header = T)

Function for generating multiple line charts for all variables in a dataframe for different groups

I have 106 weeks data for 5 different LOB (Line of Business). The variables are Traffic, Spend, Clicks, etc. In total there will be 106*5 = 530 rows.
Dataframe looks like:
LOB Week Traffic Spend Clicks
A 1 34 12 5
A 2 37 32 6
A 3 41 57 7
A 4 52 42 12
A 5 27 37 8
... 106 weeks
B...106 weeks
C...106 weeks
D...106 weeks
E 1 43 22 12
E 2 65 16 14
E 3 76 18 9
E 4 25 14 11
E 5 53 15 15
... 106 weeks
I want to generate line chart for Traffic for all the 5 different LOB on the same chart, similarly for other metrics also. For this I have written a function but it is not doing what I want.
Code:
for ( i in seq(1,length( data),1) ) plot(data[,i],ylab=names(data[i]),type="l", col = "red", xlab = "Week", main = "")
Kindly suggest me how this can be done.
You can use ggplot2 :
ggplot(data, aes(x = Week, y = Traffic, color = LOB)) +
geom_line()
Please try to submit a toy example of your data so we can reproduce the code. See Here.
Edit: as suggested by #Axeman, you may want to plot all metrics together. Here is his solution for visibility:
d <- gather(data, metric, value, -Week, -LOB)
ggplot(d, aes(Week, value, color = LOB)) +
geom_line() +
facet_wrap(~metric, scales = 'free_y')

cluster analysis with weight

I have a data frame 'heat' demonstrating people's performance across time.
'Var1' represents the code of persons.
'Var2' represents a time line (measured by number of days from the starting point).
'Variable' is the score they get at a given time point.
Var1 Var2 value
1 1 36 -0.6941826
2 2 36 -0.5585414
3 3 36 0.8032384
4 4 36 0.7973031
5 5 36 0.7536959
6 6 36 -0.5942059
....
54 10 73 0.7063218
55 11 73 -0.6949616
56 12 73 -0.6641516
57 13 73 0.6890433
58 14 73 0.6310124
59 15 73 -0.6305091
60 16 73 0.6809655
61 17 73 0.8957870
....
101 13 110 0.6495796
102 14 110 0.5990869
103 15 110 -0.6210600
104 16 110 0.6441960
105 17 110 0.7838654
....
Now I want to cluster their performance and reflect it on a heatmap. So I used the function dist() and hclust() to clustered the data frame and plotted it with ggplot2:
ggplot(data = heat) + geom_tile(aes(x = Var2, y = Var1 %>% as.character(),
fill = value)) +
scale_fill_gradient(low = "yellow",high = "red") +
geom_vline(xintercept = c(746, 2142, 2917))
It looks like this:
However, I am more interested in what happened around day 746, day 2142 and day 2917 (the black lines). I would like the scores around these days bearing more weight in the clustering. I want people demonstrating similar performance around these days to have more priority to be clustered together. Is there a way of doing this?
As long as your weights are integer, you supposedly can just replicate those days artificially.
If you want more control, just compute the distance matrix yourself, with whatever weighted distance you want to use.

Resources