Related
I am trying to plot the performance of individuals on a number of tasks. The performance is rated in categories, and I would like to show each individual's overall performance as a stacked bar chart where Y represents the percentage of answers in each performance category, with positive values for good and negative values for bad performance or missing answers. Here's a toy dataset and the current plot I've managed to produce:
df <- data.frame(SRC = rep(LETTERS[1:14],each=6),
CAT = rep(c("Excellent","VeryGood","Good","Poor","Failing","Missing"),times=14),
PERCENT = c(29.3, 23.3, 30, -13.3, -4, 0, 16.7, 15.3, 38.7, -14.7, -4.7,
-10, 12, 9.3, 30.7, -19.3, -19.3, -9.3, 2.7, 6.7, 20, -23.3,
-14, -33.3, 16, 23.3, 20.7, -10.7, -9.3, -20, 24.7, 22, 12.7,
-8, -2, -30.7, 14, 15.3, 23.3, -4, -4.7, -38.7, 4.7, 6, 60, -24,
-4.7, -0.7, 8, 13.3, 57.3, -16, -3.3, -2, 8, 11.3, 62, -12.7,
-5.3, -0.7, 9.3, 14.7, 64.7, -10, -1.3, 0, 20.1, 20.9, 32.5,
-1.5, 0, 0, 14.2, 10.4, 33.2, -6.6, -2.8, 0, 14.7, 18.7, 55.3,
-10.7, 0, -0.7))
df$CAT <- ordered(df$CAT,levels=c("Excellent","VeryGood","Good","Poor","Failing","Missing"))
ggplot(df, aes(x=SRC, y=PERCENT, fill=CAT,group=CAT,group=SRC)) +
geom_bar(position="stack", stat="identity")
This is the figure:
It's almost what I want, except CAT is ordered reversely for negative values and I want the bars to stack according to the factor levels and the fill legend for negative values as well, i.e. Poor>Failing>Missing. This surely came up before, but I couldn't find a solution here or elsewhere. Thanks in advance!
using forecats library:
library(forecats)
df %>%
mutate(CAT=fct_relevel(CAT,"Excellent","VeryGood","Good","Missing","Failing","Poor")) %>%
ggplot( aes(x=SRC, y=PERCENT, fill=CAT)) +
geom_bar(position="stack", stat="identity")
My imported data set consists of predetermined ranges and their probability density values. I have plotted this in a bar chart in R. So my plot shows a histogram, but to R its just a bar plot. However, I now need to put a curve on this bar chart for visualization purposes, using same data in bar chart.
The code I have used so far is creating a funny looking curve that doesn't fit appropriately to the bar chart...Any help would be hugely appreciated please!
Code used so far:
barplot(Data10$pdf, names = Data10$ï..Weight.Range, xlab = "Weight", ylab = "Probability Density", ylim = c(0.00,0.05), main = "Histogram")
fit1<-smooth.spline(Data10$ï..Weight.Range, Data10$pdf, df=12, spar = 0.2)
lines(fit1,col="blue", lwd=3)
Link to output of this code:
Data:
Data10 <- structure(list(
ï..Weight.Range = c(0, 0.5, 1, 1.5, 2, 2.5, 3,
3.5, 4, 4.5, 5, 5.5, 6, 6.5, 7, 7.5, 8, 8.5, 9, 9.5, 10, 10.5,
11, 11.5, 12, 12.5, 13, 13.5, 14, 14.5, 15, 15.5, 16, 16.5, 17,
17.5, 18, 18.5, 19, 19.5, 20, 20.5, 21, 21.5, 22, 22.5, 23, 23.5,
24, 24.5, 25, 25.5, 26, 26.5, 27, 27.5, 28, 28.5, 29, 29.5, 30,
30.5, 31, 31.5, 32, 32.5, 33, 33.5, 34, 34.5, 35, 35.5, 36, 36.5,
37, 37.5, 38, 38.5, 39, 39.5, 40, 40.5, 41, 41.5, 42, 42.5, 43,
43.5, 44, 44.5, 45, 45.5, 46, 46.5, 47, 47.5, 48), pdf = c(0.012697609,
0.015237131, 0.017776653, 0.019046414, 0.020694512, 0.022575831,
0.024457151, 0.02633847, 0.028219789, 0.030101109, 0.031982428,
0.033863747, 0.035745066, 0.037626386, 0.039507705, 0.041389024,
0.043270343, 0.045151663, 0.042420729, 0.03688759, 0.033198831,
0.029510072, 0.026374627, 0.023976934, 0.02264407, 0.021614794,
0.020585518, 0.019556242, 0.018526967, 0.017497691, 0.016468415,
0.015439139, 0.014409863, 0.013380587, 0.012351311, 0.011322035,
0.009839476, 0.008433837, 0.007731017, 0.007028197, 0.005622558,
0.004919738, 0.004568328, 0.004498046, 0.004427764, 0.004357482,
0.0042872, 0.004216918, 0.004146636, 0.004076354, 0.004006072,
0.00393579, 0.003865508, 0.003795226, 0.003724944, 0.003654663,
0.003584381, 0.003514099, 0.003443817, 0.003373535, 0.003303253,
0.003232971, 0.003162689, 0.003092407, 0.003022125, 0.002951843,
0.002881561, 0.002811279, 0.002740997, 0.002670715, 0.002600433,
0.002530151, 0.002459869, 0.002389587, 0.002319305, 0.002249023,
0.002178741, 0.002108459, 0.002038177, 0.001967895, 0.001897613,
0.001827331, 0.001757049, 0.001686767, 0.001616485, 0.001546203,
0.001475921, 0.001405639, 0.001335357, 0.001265075, 0.001194794,
0.001124512, 0.00105423, 0.000983948, 0.000913666, 0.000843384,
0.000773102)
), class = "data.frame", row.names = c(NA, -97L))
You need to feed in the initial barplot when drawing the new lines.
my_bar <- barplot(Data10$pdf, names = Data10$ï..Weight.Range, xlab = "Weight", ylab = "Probability Density", ylim = c(0.00,0.05), main = "Histogram")
fit1<-smooth.spline(Data10$ï..Weight.Range, Data10$pdf, df=12, spar = .2)
lines(my_bar, fit1$y,col="blue",type="l",lwd=3)
The barplot function is meant to be used with a categorical variable. It is treating your x values as categories rather than a continuous number. When barplot runs, it calculates an value for each category which it silently returns. You can use those returned values with the result from your smooth spline to draw the line. For example
xx <- barplot(Data10$pdf, names = Data10$ï..Weight.Range, xlab = "Weight", ylab = "Probability Density", ylim = c(0.00,0.05), main = "Histogram")
fit1<-smooth.spline(Data10$ï..Weight.Range, Data10$pdf, df=12, spar = 0.2)
lines(xx[,1], fit1$y,col="blue", lwd=3)
Why does a fixed intercept lead to a huge negative shift? See the red line.
Form the docs ?poly
Returns or evaluates orthogonal polynomials of degree 1 to degree over
the specified set of points x: these are all orthogonal to the
constant polynomial of degree 0.
Thus, I would expect the polynomial of degree 0 to be the intercept. What do I miss?
plot(df$t, df$y)
# this is working as expected
model1 <- lm(y ~ -1 + poly(t, 10, raw = TRUE), data = df)
model2 <- lm(y ~ -1 + poly(t, 10, raw = FALSE), data = df)
model3 <- lm(y ~ poly(t, 10, raw = TRUE), data = df) # raw = FALSE gives similar results
nsamples <- 1000
new_df <- data.frame(t = seq(0, 96, length.out = nsamples))
new_df$y1 <- predict(model1, newdata = new_df)
new_df$y2 <- predict(model2, newdata = new_df)
new_df$y3 <- predict(model3, newdata = new_df)
plot(new_df$t, new_df$y1, type = "l", ylim = c(-0.5, 1))
lines(new_df$t, new_df$y2, col = "red")
lines(new_df$t, new_df$y3 + 0.05, col = "blue") # offest for visibilty added!!
lines(c(0, 96), -c(mean(df$y), mean(df$y)), col = "red")
Edit: I think the question is equivalent to "what orthogonal polynomials are used (formula)?". The reference in the docs is a really old book - I can't get it. And there are a lot of different ortogonal poynomials, see e.g. Wikipedia.
Data:
df <- structure(list(t = c(0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5,
8.5, 9.5, 10.5, 11.5, 12.5, 13.5, 14.5, 15.5, 16.5, 17.5, 18.5,
19.5, 20.5, 21.5, 22.5, 23.5, 24.5, 25.5, 26.5, 27.5, 28.5, 29.5,
30.5, 31.5, 32.5, 33.5, 34.5, 35.5, 36.5, 37.5, 38.5, 39.5, 40.5,
41.5, 42.5, 43.5, 44.5, 45.5, 46.5, 47.5, 48.5, 49.5, 50.5, 51.5,
52.5, 53.5, 54.5, 55.5, 56.5, 57.5, 58.5, 59.5, 60.5, 61.5, 62.5,
63.5, 64.5, 65.5, 66.5, 67.5, 68.5, 69.5, 70.5, 71.5, 72.5, 73.5,
74.5, 75.5, 76.5, 77.5, 78.5, 79.5, 80.5, 81.5, 82.5, 83.5, 84.5,
85.5, 86.5, 87.5, 88.5, 89.5, 90.5, 91.5, 92.5, 93.5, 94.5, 95.5),
y = c(0.00561299852289513, 0.0117183653372723, 0.0171836533727228,
0.0234367306745446, 0.0280157557853274, 0.0331856228458887, 0.0391432791728213,
0.0438700147710487, 0.048793697685869, 0.0539635647464303, 0.0586903003446578,
0.0630723781388479, 0.0681437715411128, 0.0732151649433777, 0.0780403741999015,
0.0813884785819793, 0.085425898572132, 0.0896110290497292, 0.0934022648941408,
0.0968980797636632, 0.0996061053668144, 0.103495814869522, 0.107631708517971,
0.111176760216642, 0.115017232890202, 0.119350073855244, 0.124766125061546,
0.131216149679961, 0.139586410635155, 0.148153618906942, 0.156080748399803,
0.166814377154111, 0.177006400787789, 0.189118660758247, 0.202412604628262,
0.217577548005908, 0.234318069916297, 0.249089118660758, 0.267355982274741,
0.284539635647464, 0.301477104874446, 0.316100443131462, 0.332151649433776,
0.346873461349089, 0.361792220580995, 0.376366322008863, 0.392220580994584,
0.408173313638602, 0.424224519940916, 0.439192516001969, 0.454849827671098,
0.471196454948301, 0.485622845888725, 0.500443131462334, 0.514869522402757,
0.529148202855736, 0.544559330379124, 0.559773510585918, 0.576218611521418,
0.593303791235844, 0.609010339734121, 0.623929098966027, 0.6397341211226,
0.655489906450025, 0.669768586903003, 0.68493353028065, 0.698867552929591,
0.713244707040867, 0.726095519448548, 0.74027572624323, 0.752584933530281,
0.76903003446578, 0.781486952240276, 0.794091580502216, 0.804726735598227,
0.818217626784835, 0.832742491383555, 0.845691777449532, 0.856179222058099,
0.866075824716888, 0.875923190546529, 0.886952240275726, 0.896898079763663,
0.906203840472674, 0.915755785327425, 0.923879862136878, 0.932693254554407,
0.940768094534712, 0.949187592319055, 0.956523879862137, 0.964204825209257,
0.971344165435746, 0.978532742491384, 0.986558345642541, 0.993205317577548, 1)),
class = "data.frame", row.names = c(NA, -96L))
Just think about a regression line. For (x, y) data, let xx = mean(x) and yy = mean(y). Fitting
y = b * (x - xx)
is different from fitting
y = a + b * (x - xx)
and that a (intercept) measures the vertical shift. Furthermore, it can be shown that a = yy.
i am trying to output a stem and leaf plot to a graphical device. It outputs fine to the device but the problem is only a part of the plot shows in the graphic device. How can I scale the plot to fit into the graphical device (window)?
library(aplpack)
plot.new()
flint <- c(44.6, 25.7, 33.2, 48.3, 39.4, 43.5, 39.8, 40.5, 91.7, 29.3,
39.1, 42.5, 49.6, 40.6, 49.1, 41.7, 30.2, 40.0, 31.9, 42.3,
47.2, 50.5, 44.1, 45.8)
chert <- c(25.8, 6.3, 21.3, 20.6, 22.2, 10.5, 18.9, 25.9, 23.8, 22.0,
10.6, 16.8, 21.8, 15.8, 16.3, 21.7, 17.9, 13.7, 19.1, 15.2,
21.2, 20.2, 10.6, 23.1)
dev.list()
dev.set(2)
tmp <- capture.output(stem.leaf.backback(flint,chert,unit=.1,rule.line="Dixon"))
text (0,1, paste(tmp, collapse='\n'), adj=c(0,1), family='mono')
Output to file instead of to screen tends to be more reproducible when it comes controlling the size of the "paper" you draw on. In case you are happy with - for example - a PDF file with the stem plot, something like this works fine:
library(aplpack)
flint <- c(44.6, 25.7, 33.2, 48.3, 39.4, 43.5, 39.8, 40.5, 91.7, 29.3, 39.1, 42.5, 49.6,
40.6, 49.1, 41.7, 30.2, 40.0, 31.9, 42.3, 47.2, 50.5, 44.1, 45.8)
chert <- c(25.8, 6.3, 21.3, 20.6, 22.2, 10.5, 18.9, 25.9, 23.8, 22.0, 10.6, 16.8, 21.8,
15.8, 16.3, 21.7, 17.9, 13.7, 19.1, 15.2, 21.2, 20.2, 10.6, 23.1)
You can specify the PDF page width and height (in inches). Say 14" by 7".
pdf(file = "stemplot.pdf", width = 14, height = 7)
plot.new()
tmp <- capture.output(stem.leaf.backback(flint, chert, unit = .1, rule.line = "Dixon"))
text(0, 1, paste(tmp, collapse='\n'), adj = c(0,1), family = 'mono')
dev.off()
Of course we changed the page size instead of scaling the plot, so it is not exactly what you originally asked...
In case output to file is acceptable, here are
10 tips for making your R graphics look their best.
I am new to contour plots in R and I am trying to create one to show changes in nutrient concentration with depth and salinity.
My dataset currently looks like this (link):
> head(DF)
salinity depth silicon
1 32.9 0.00 3.872717
2 32.9 0.00 3.906963
3 32.9 0.00 3.872717
4 33.4 3.56 3.119292
5 33.5 3.56 3.076484
6 33.0 0.00 3.675799
What I would like is for depth to be on the y-axis, salinity on the x-axis and the silicon concentration to be displayed based on colour.
From what I have read, in order to create a contour plot I need to turn the data I currently have into a matrix (by creating a function?).
Is this something that can be achieved? I'm not sure if I am going about this completely the wrong way, but essentially what I would like is something like this (apologies for image quality):
But with salinity instead of time and silicon concentration instead of temperature.
Thanks,
Kez
Copy-pastable data:
DF <- structure(list(salinity = c(32.9, 32.9, 32.9, 33.4, 33.5, 33,
33, 33.2, 33.3, 33.1, 33.1, 33.1, 33.7, 33.7, 34, 34, 34, 33.6,
34.3, 34.3, 34.8, 35.8, 34.7, 34.4, 34.3, 34.5, 34.4, 34.9, 34.9,
34.9, 34.8, 35, 35, 36, 34.9, 35, 35.2, 35.1, 30.2, 33.4, 34.5,
34.9, 33.4, 33.4, 35.1, 35.1, 34.6, 35.1, 34.43, 34.67, 34.67,
34.96, 34.76, 35.11, 34.14, 34.97, 25.13, 35.16, 35.11, 35.11,
35.11, 35.15), depth = c(0, 0, 0, 3.56, 3.56, 0, 0, 4.493, 4.493,
0, 0, 0, 4.362, 4.362, 9.9, 9.9, 0, 0, 5.826, 5.826, 11.725,
11.725, 11.725, 0, 0, 2.766, 2.766, 9.355, 9.355, 0, 0, 12.46,
12.46, 12.46, 0, 0, 12.427, 12.427, 1.2, 3.6, 6.2, 11, 1.1, 1.1,
4.2, 12.8, 6.9, 10.4, 1.16, 4.5, 4.5, 15.35, 1.13, 8.25, 17.92,
1.05, 14.25, 20.54, 0.97, 0.97, 7.67, 19.6), silicon = c(3.872716895,
3.90696347, 3.872716895, 3.119292237, 3.076484018, 3.675799087, 3.855593607,
3.547374429, 3.299086758, 4.591894977, 4.566210046, 4.857305936, 2.759703196,
2.5456621, 2.597031963, 2.126141553, 2.417237443, 2.331621005, 1.989155251,
1.835045662, 1.946347032, 1.937785388, 1.526826484, 1.638127854, 1.929223744,
1.698059361, 1.894977169, 1.312785388, 1.698059361, 1.329908676, 1.484018265,
1.621004566, 1.175799087, 1.167237443, 1.218607306, 1.038812785, 1.552511416,
1.141552511, 5.329861111, 1.684027778, 2.612847222, 1.840277778, 1.588541667,
1.553819444, 2.682291667, 1.692708333, 1.111111111, 1.935763889, 0.815972222,
1.197916667, 1.197916667, 1.796875, 1.258680556, 1.059027778, 1.25, 0.512152778,
1.336805556, 1.284722222, 0.998263889, 0.928819444, 0.399305556, 1.814236111
)), .Names = c("salinity", "depth", "silicon"), class = "data.frame", row.names = c(NA,
-62L))
EDIT: For anyone interested, with the help of Frank's post below I was able to create the following with my full data set:
You can use the interp function from the akima package to interpolate. Otherwise, you have to determine how to deal with areas that have missing data.
library(akima)
s <- interp(DF$salinity, DF$depth, DF$silicon, duplicate="mean",
xo=seq(min(DF$salinity), max(DF$salinity), length=50),
yo=seq(min(DF$depth), max(DF$depth), length=50))
# you can choose values other than length = 50.
# Note that I used duplicate = "mean", but you can pick your own way of handling duplicates
Then, there are a number of options for plotting, each with lots of room for customization. Here are a few choices:
filled.contour(s, color = terrain.colors)
image(s, col=rainbow(60))
library(fields); image.plot(s)
library(ggplot2)
ggs <- data.frame(salinity = rep(s$x, each=length(s$x)), depth = s$y, silicon = as.vector(t(s$z)))
p <- ggplot(ggs, aes(salinity, depth, fill=silicon))
p + geom_raster() + scale_fill_continuous(low="green", high="red") + theme_bw()