I do the following graph:
> ddd
UV.NF TRIS volvol
2 145.1923 31 500 µl / 625 µl
3 116.3462 50 500 µl / 625 µl
4 127.1635 60 500 µl / 625 µl
5 125.9615 69 500 µl / 625 µl
6 162.0192 30 1 ml / 625 µl
7 166.8269 50 1 ml / 625 µl
8 176.4423 60 1 ml / 625 µl
9 171.6346 70 1 ml / 625 µl
19 292.3077 31 500 µl / 2500 µl
20 321.1538 50 500 µl / 2500 µl
21 225.0000 60 500 µl / 2500 µl
22 263.4615 69 500 µl / 2500 µl
23 301.9231 30 1 ml / 2500 µl
24 350.0000 50 1 ml / 2500 µl
25 282.6923 60 1 ml / 2500 µl
26 282.6923 70 1 ml / 2500 µl
35 133.6207 31 500 µl / 625 µl
ggplot() +
geom_point(aes(y = log(UV.NF), x = TRIS, colour=ddd[,"volvol"], shape=ddd[,"volvol"]),
data=ddd) +
labs(colour = "volvol", shape="volvol") + xlab("TRIS (mM)") +
guides(colour = guide_legend(title="Vol. lyo. / Vol. reconst."),
shape=guide_legend(title="Vol. lyo. / Vol. reconst.")) +
scale_shape_manual(values = c(19,19,3,3)) + scale_colour_manual(values = c(2,4,2,4))
I want to add the regression line lm(y~x) for each of the four groups appearing in the legend. I have done many attempts with geom_smooth() but without success.
I'm not quite sure whether that's what you want, but have you tried the following?
ggplot(ddd,aes(y = log(UV.NF), x = TRIS, colour = volvol, shape = volvol)) +
geom_point() + geom_smooth(method = "lm", fill = NA)
This gives me the following plot with your data:
There's also some documentation for geom_smooth that does pretty much what you'd like, albeit in a more complicated (yet flexible) manner.
Related
I have a frequency distribution of observations, grouped into counts within class intervals.
I want to fit a normal (or other continuous) distribution, and find the expected frequencies in each interval according to that distribution.
For example, suppose the following, where I want to calculate another column, expected giving the
expected number of soldiers with chest circumferences in the interval given by chest, where these
are assumed to be centered on the nominal value. E.g., 35 = 34.5 <= y < 35.5. One analysis I've seen gives the expected frequency in this cell as 72.5 vs. the observed 81.
> data(ChestSizes, package="HistData")
>
> ChestSizes
chest count
1 33 3
2 34 18
3 35 81
4 36 185
5 37 420
6 38 749
7 39 1073
8 40 1079
9 41 934
10 42 658
11 43 370
12 44 92
13 45 50
14 46 21
15 47 4
16 48 1
>
> # ungroup to a vector of values
> chests <- vcdExtra::expand.dft(ChestSizes, freq="count")
There are quite a number of variations of this question, most of which relate to plotting the normal density on top of a histogram, scaled to represent counts not density. But none explicitly show the calculation of the expected frequencies. One close question is R: add normal fits to grouped histograms in ggplot2
I can perfectly well do the standard plot (below), but for other things, like a Chi-square test or a vcd::rootogram plot, I need the expected frequencies in the same class intervals.
> bw <- 1
n_obs <- nrow(chests)
xbar <- mean(chests$chest)
std <- sd(chests$chest)
plt <-
ggplot(chests, aes(chest)) +
geom_histogram(color="black", fill="lightblue", binwidth = bw) +
stat_function(fun = function(x)
dnorm(x, mean = xbar, sd = std) * bw * n_obs,
color = "darkred", size = 1)
plt
here is how you could calculate the expected frequencies for each group assuming Normality.
xbar <- with(ChestSizes, weighted.mean(chest, count))
sdx <- with(ChestSizes, sd(rep(chest, count)))
transform(ChestSizes, Expected = diff(pnorm(c(32, chest) + .5, xbar, sdx)) * sum(count))
chest count Expected
1 33 3 4.7600583
2 34 18 20.8822328
3 35 81 72.5129162
4 36 185 199.3338028
5 37 420 433.8292832
6 38 749 747.5926687
7 39 1073 1020.1058521
8 40 1079 1102.2356155
9 41 934 943.0970605
10 42 658 638.9745241
11 43 370 342.7971793
12 44 92 145.6089948
13 45 50 48.9662992
14 46 21 13.0351612
15 47 4 2.7465640
16 48 1 0.4579888
I have an input data and i would like to create a grouped chart, but when I finish the creation the problem is the order is different from the input, it arranged it as alphabetical, plus I would like to change the font style to italic, for the species names only.
> data <- read.table(
+ text = "Superfamily Drom Bactria Feru Paos
+ ERV 294 224 206 202
+ ERVL-MaLR 103 108 184 231
+ Gypsy 274 187 413 215
+ Pao 6 2 7 4
+ DIRS/Ngaro 15 14 45 25
+ Unknown 26 23 23 37
+ Undefined 76 77 80 95",
+ header = TRUE
+ )
> data
Superfamily Drom Bactria Feru Paos
1 ERV 294 224 206 202
2 ERVL-MaLR 103 108 184 231
3 Gypsy 274 187 413 215
4 Pao 6 2 7 4
5 DIRS/Ngaro 15 14 45 25
6 Unknown 26 23 23 37
7 Undefined 76 77 80 95
> data_long <- gather(data,
+ key = "Species",
+ value = "Distrubution",
+ -Superfamily)
> ggplot(data_long, aes(fill=Superfamily, y=Distrubution, x=Species)) + geom_bar(position="dodge2", stat="identity")
I would like to build the chart as the same as the input order, and italic font style to the species name only ex ( Drom Bactria ....)
I think this is what you're asking for
data_long$Species <- factor(data_long$Species, levels = unique(data_long$Species))
ggplot(data_long, aes(fill=Superfamily, y=Distrubution, x=Species)) + geom_bar(position="dodge2", stat="identity") + theme(axis.text.x = element_text(face = "italic"))
If ggplot recieves a factor, it will use the level-order as the axis order.
When it comes to the fonts, you change that in the theme argument.
--edit--
To get the superfamily in the same order as input, you would have to create a factor as we did with the species-name.
data_long$Superfamily<- factor(data_long$Superfamily, levels = data$Superfamily)
Forgoing the use of the readxl-package to read the excel sheet into R, this should work to change the species name:
colnames(data)[2:5] <- c("Alpha Drom", "Beta Bactria", "Gamma Feru", "Delta Paos")
Add this line before you create data_long.
I'm having trouble plotting fixed effects from an lmer model.
library(ggplot2)
library(lme4)
library(lmerTest)
library(effects)
data(diamonds)
m1 <- lmer(carat ~ price * depth + (1 | cut), diamonds)
summary(m1)
ee <- Effect(c("price", "depth"), m1)
ggplot(data.frame(ee), aes(price, fit, color = cut)) + geom_line()
When I use ggplot I get this error:
Don't know how to automatically pick scale for object of type
function. Defaulting to continuous. Error in (function (..., row.names
= NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 25, 0
but a simple plot(ee) yields 5 tiled plots:
A different model yields a plot:
m3 <- lmer(price ~ depth * clarity + (1 | cut), diamonds)
summary(m3)
eg <- Effect(c("depth", "clarity"), m3)
ggplot(as.data.frame(eg), aes(depth, fit, color = clarity)) + geom_line()
There does not appear to be a mismatch in the number of rows per column:
> as.data.frame(ee)
price depth fit se lower upper
1 330 40 0.4618286 0.04227714 0.3789651 0.5446922
2 5000 40 0.7931920 0.04074246 0.7133365 0.8730476
3 9600 40 1.1195885 0.04366662 1.0340016 1.2051754
4 14000 40 1.4317938 0.04988618 1.3340165 1.5295711
5 19000 40 1.7865726 0.05966749 1.6696239 1.9035214
6 330 50 0.4566107 0.03977778 0.3786459 0.5345754
7 5000 50 0.8690398 0.03930531 0.7920010 0.9460785
8 9600 50 1.2752869 0.04021511 1.1964649 1.3541088
9 14000 50 1.6638710 0.04228263 1.5809968 1.7467453
10 19000 50 2.1054440 0.04584704 2.0155834 2.1953045
11 330 60 0.4513927 0.03870323 0.3755341 0.5272513
12 5000 60 0.9448875 0.03868697 0.8690608 1.0207143
13 9600 60 1.4309852 0.03871997 1.3550938 1.5068767
14 14000 60 1.8959482 0.03879694 1.8199059 1.9719906
15 19000 60 2.4243153 0.03893796 2.3479966 2.5006340
16 330 70 0.4461747 0.03917094 0.3693994 0.5229501
17 5000 70 1.0207353 0.03892648 0.9444391 1.0970315
18 9600 70 1.5866836 0.03940454 1.5094504 1.6639168
19 14000 70 2.1280255 0.04050652 2.0486324 2.2074186
20 19000 70 2.7431867 0.04245996 2.6599648 2.8264085
21 330 80 0.4409568 0.04112833 0.3603449 0.5215686
22 5000 80 1.0965831 0.04000842 1.0181662 1.1749999
23 9600 80 1.7423820 0.04216278 1.6597426 1.8250214
24 14000 80 2.3601027 0.04684598 2.2682842 2.4519212
25 19000 80 3.0620580 0.05442428 2.9553860 3.1687300
What causes this error?
I have a data set records the tumor size at four different time points (each row is one patient). I want to perform an analysis on this dataset to show that overall for all patients, the tumor size is decreasing after each time point.
What kind of analysis can I do? How should I use ggplot to visualize these data and show the trend? Many thanks!
SUBJECTID Baseline 1 2 3
1001 88 78 30 14
1002 29 26 66 16
1003 50 64 54 46
1004 91 90 99 43
1005 98 109 60 42
1007 100 100 54
1008 45 49 47 32
1009 75 66 57 7
1010 60 52 20 3
1011 68 68 56 47
1012 78 84 56 57
1013 71 70 8 5
1015 79 50 11 3
1016 73 60 57 36
1017 54 27 16
1018 50 37 33 26
1019 115 68 33 67
1021 63 55 0 0
1022 98 91 76 75
1024 76 76 0
1025 47 45 42 42
1026 32 25 14 0
1027 40 37 65
1028 60 110 110 0
A box plot might work. Try the following:
library(tidyverse)
df %>%
gather(key = "time", value = "tumor_size", -SUBJECTID) %>%
ggplot(aes(time, tumor_size)) +
geom_boxplot() +
labs(title = "Tumor Size ~ Time",
subtitle = "Insert subtitle if you want",
caption = "Insert caption if you want",
x = "Time",
y = "Tumor Size (insert unit)") +
theme_bw() +
theme(
panel.grid.major.x = element_blank(),
text = element_text(family = "Palatino"),
plot.title = element_text(face = "bold", size = 20)
)
You could also add geom_jitter() if you'd like. After the geom_boxplot() + line, add:
geom_jitter(width = 0.1, pch = 21, fill = "grey") +
You'll get something like this:
To show that overall tumor size is decreasing after each time point, you usually want a mean tumor size after each time frame. It's much easier to plot than every individual element. I've written how to do this using your first four rows, producing a dot graph:
baseline <- c(88, 29, 50, 91)
dAC <- c(78, 26, 64, 90)
InterReg <- c(30, 66, 54, 99)
PreSurg <- c(14, 16, 46, 43)
matrix <- rbind(baseline, dAC, InterReg, PreSurg)
means <- rowMeans(matrix)
plot(means)
Dot graph:
In terms of what analysis to do, I can't really answer that. That depends on what you want it to look like. What I've done is the most basic way of representing the data. You may want to use a column graph, a bar graph, a line graph etc. That's up to your personal preference. In terms of using ggplot, here are many different examples you can use: https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf
EMPLTOT_N FIRMTOT average min
12289593 4511051 5 1
26841282 1074459 55 10
15867437 81243 300 100
6060684 8761 750 500
52366969 8910 1000 1000
137003 47573 5 1
226987 10372 55 10
81011 507 300 100
23379 52 750 500
13698 42 1000 1000
67014 20397 5 1
My data look like the data above. I want to create a new column EMP using mutate function that:
emp= average*FIRMTOT if EMPLTOT_N/FIRMTOT<min
and emp=EMPLTOT_N if EMPLTOT_N/FIRMTOT>min
In your sample data EMPLTOT_N / FIRMTOT is never less than min, but this should work:
df <- read.table(text = "EMPLTOT_N FIRMTOT average min
12289593 4511051 5 1
26841282 1074459 55 10
15867437 81243 300 100
6060684 8761 750 500
52366969 8910 1000 1000
137003 47573 5 1
226987 10372 55 10
81011 507 300 100
23379 52 750 500
13698 42 1000 1000
67014 20397 5 1", header = TRUE)
library('dplyr')
mutate(df, emp = ifelse(EMPLTOT_N / FIRMTOT < min, average * FIRMTOT, EMPLTOT_N))
In the above if EMPLTOT_N / FIRMTOT == min, emp will be given the value of EMPLTOT_N since you didn't specify what you want to happen in this case.