Related
First at all I would like to apologise if I did not use the correct jargon.
I have the dataset as below which contains a wide range of categories
Here some excerpt from dput (using droplevels)
structure(list(
x = c(2010L, 2010L, 2010L, 2010L, 2010L, 2010L,
2010L, 2010L, 2010L, 2010L, 2010L, 2010L, 2010L, 2010L, 2010L,
2010L, 2010L, 2010L, 2010L, 2010L, 2010L, 2010L, 2010L, 2010L,
2010L, 2010L, 2010L, 2010L, 2010L, 2010L, 2010L, 2010L, 2010L,
2010L, 2010L, 2010L, 2010L, 2010L, 2010L, 2010L, 2010L, 2010L,
2010L, 2010L), *[ME: there are more years than 2010...]*
y = c(7.85986, 185.81068, 107.24097, 7094.74649,
1.4982, 185.77319, 5090.79354, 167.58584, 4189.64609, 157.08277,
3927.06932, 2.86732, 71.683, 4.70123, 117.53085, 2.93452, 73.36292,
1.4982, 18.18734, 901.14744, 0.90268, 13.77532, 613.38298, 0.01845,
0.0681, 7.19925, 3.75315, 0.14333, 136.54008, 0.04766, 0.59077,
28.97255, 0.38608, 115.05258, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
x1 = structure(c(4L, 2L, 3L, 1L, 4L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 4L, 2L, 1L, 4L, 2L, 1L, 4L, 2L,
1L, 2L, 4L, 1L, 4L, 2L, 1L, 4L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L), .Label = c("All greenhouse gases - (CO2 equivalent)",
"CH4", "CO2", "N2O"), class = "factor"),
x2 = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Austria",
class = "factor"),
x4 = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 4L,
4L, 5L, 5L, 6L, 6L, 7L, 7L, 8L, 8L, 8L, 9L, 9L, 9L, 10L,
10L, 10L, 11L, 11L, 11L, 12L, 12L, 12L, 13L, 13L, 14L, 14L,
15L, 15L, 16L, 16L, 17L, 17L, 18L, 18L), .Label = c("3",
"3.1", "3.A", "3.A.1", "3.A.2", "3.A.3", "3.A.4", "3.B",
"3.B.1", "3.B.2", "3.B.3", "3.B.4", "3.B.5", "3.C", "3.C.1",
"3.C.2", "3.C.3", "3.C.4"), class = "factor")), class = "data.frame",
row.names = c(NA,
-44L))
I want to know whether the of the sum of subcategories in x4 (e.g. 3.B.1+3.B.2+...+3.B.n) equal the figure stated in the parent category (e.g. 3.B). (i.e. the in the csv stated sum) for a given year and country. I want to verify the sums.
For get the sum of the subcategories I have this
sum(df$y[df$x4 %in% c("3.A.1", "3.A.2", "3.A.3", "3.A.4") & x ==
"2010" & x2 == "Austria"])
To receive the sum of the parent category I have this
sum(df$y[df$x4 %in% c("3.A") & x == "2010" & x2 == "Austria"])
Next I would need an operation which checks whether the results of both codes are equal (True/false). However, I have more than 20 countries, 20 years, dozens of categories to check. With my newby approach I would be writing code for ages...
is there anyway to automate this? Basically, I am looking for a code which is able to do the following
1) Run for one category, go to next one
2) once done with categories change year and start again with categories
3) ... same for countries....
Any sort of help would be appreciated and even a suggestions how to use the right jargon in the title. Thanks in any case
Here's a potential solution using dplyr (might require some tweaking based on the full dataset):
require(dplyr)
# Create two columns - one that shows only the parent category number, and one that tells you if it's a parent or child; note that the regex here makes some assumptions on the format of your data.
mutate(df,parent=gsub("(.?\\..?)\\..*", "\\1", df$x4),
type=ifelse(parent==x4,"Parent","Child")) %>%
# Sum the children y's by category, year and country
group_by(parent, type, x, x2) %>%
summarize(sum(y)) %>%
# See if the sum of the children is equal to the parent y
tidyr::spread(type,`sum(y)`) %>%
mutate(equals=isTRUE(all.equal(Child,Parent)))
Result using your (new) data:
parent x x2 Child Parent equals
<chr> <int> <fct> <dbl> <dbl> <lgl>
1 3 2010 Austria NA 7396. FALSE
2 3.1 2010 Austria NA 5278. FALSE
3 3.A 2010 Austria 4357. 4357. TRUE
4 3.B 2010 Austria 921. 921. TRUE
5 3.C 2010 Austria 0 0 TRUE
I can see from your new data that you have two levels of parents. My solution will only work for the second level (e.g. 3.1 and its children), but can be easily tweaked to also work for the top level.
I have a dataframe from which I want to produce two US state maps, ideally in plotly, but which have the same colorscale. e.g a value of 1.6 would have same color on both maps
library(plotly)
library(tidyverse)
df <- structure(list(stateID = structure(c(1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L,
20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L,
33L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L,
46L, 47L, 48L, 49L, 50L, 51L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L,
9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L,
22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L,
35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L, 47L,
48L, 49L, 50L, 51L), .Label = c("AK", "AL", "AR", "AZ", "CA",
"CO", "CT", "DC", "DE", "FL", "GA", "HI", "IA", "ID", "IL", "IN",
"KS", "KY", "LA", "MA", "MD", "ME", "MI", "MN", "MO", "MS", "MT",
"NC", "ND", "NE", "NH", "NJ", "NM", "NV", "NY", "OH", "OK", "OR",
"PA", "RI", "SC", "SD", "TN", "TX", "UT", "VA", "VT", "WA", "WI",
"WV", "WY"), class = "factor"), category = structure(c(1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L), .Label = c("small", "large"), class = "factor"),
values = c(1.49852796690539, 1.149561395403, 1.24566150736064,
1.00189715065062, 0.987562097609043, 0.231859257910401, 1.64033870631829,
0.12613168777898, 1.91596280829981, 1.35835290746763, 0.792609711177647,
1.97339694341645, 0.376533678267151, 1.63516522897407, 0.225585919804871,
0.505839260760695, 0.0116439824923873, 1.51879569422454,
1.35506999026984, 0.907982874661684, 1.08905055327341, 1.43553646048531,
1.88463490083814, 0.908853257540613, 0.902149567846209, 1.5428032441996,
1.62871437706053, 0.623223986010998, 1.38058050535619, 1.57016781903803,
1.8731079204008, 0.297486190218478, 1.27625703299418, 0.519723014440387,
0.848097313195467, 0.471342021599412, 1.98357644258067, 1.82963251601905,
0.626917572226375, 0.140670910011977, 1.1393640646711, 1.99097026558593,
1.70624398859218, 0.0956417261622846, 1.53923089429736, 1.25705669261515,
1.89643088867888, 1.88176721381024, 1.44151636445895, 0.435520166531205,
1.18845809064806, 2.95685278996825, 1.37898187059909, 2.95418810285628,
1.94776474731043, 1.98949646670371, 1.37768995529041, 2.0124197602272,
1.33321205340326, 2.07528439210728, 2.49668326042593, 2.69146995106712,
1.52930808160454, 1.90695044351742, 1.74849874572828, 2.3098370959051,
1.59070889744908, 1.52278333622962, 2.13915946660563, 1.52873482694849,
1.39334590546787, 1.82181124016643, 2.44108150294051, 2.21637984598055,
1.99761381838471, 2.33465715777129, 2.53672145865858, 2.98815744556487,
1.09659902472049, 1.13088199263439, 1.91391275310889, 1.07779059326276,
1.03311925474554, 1.25638853525743, 2.84107198892161, 2.64419505093247,
1.85066655138507, 2.65426727430895, 2.37351916264743, 1.45171522675082,
2.54493401898071, 1.40593391284347, 1.82211668742821, 1.36818132549524,
2.88858095230535, 2.54271147632971, 2.90091867512092, 1.5358378039673,
2.86527143837884, 2.71315307915211, 2.09380666306242, 2.02881665108725
)), .Names = c("stateID", "category", "values"), row.names = c(NA,
102L), class = "data.frame")
small <-df %>%
filter(category=="small") %>%
plot_geo(locationmode = 'USA-states') %>%
add_trace(
z = ~values, color = ~values, colors = "Blues",
locations = ~stateID) %>%
layout(geo=list(scope='usa'))
small
large <-df %>%
filter(category=="large") %>%
plot_geo(locationmode = 'USA-states') %>%
add_trace(
z = ~values, color = ~values, colors = "Blues",
locations = ~stateID) %>%
layout(geo=list(scope='usa'))
large
As you can see, the same color scale covers different values in each map. I would like a single color scale of range 0-3. I am not fussy re colors though would like a sequential option
TIA
You can set the same scale for each map by setting zmin and zmax to the same values in each plot. It's documented in the help, but it's a bit cryptic: "zmax (number): Sets the upper bound of color domain" (and similarly for zmin).
small <-df %>%
filter(category=="small") %>%
plot_geo(locationmode = 'USA-states') %>%
add_trace(
z = ~values, color = ~values, colors = "Blues",
zmin=0, zmax=3,
locations = ~stateID) %>%
layout(geo=list(scope='usa'), title="Small")
large <-df %>%
filter(category=="large") %>%
plot_geo(locationmode = 'USA-states') %>%
add_trace(
z = ~values, color = ~values, colors = "Blues",
zmin=0, zmax=3,
locations = ~stateID) %>%
layout(geo=list(scope='usa'), title="Large")
I have a dataframe like this one:
> dput(df)
structure(list(OBBLIGATORIO = structure(c(2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("no",
"yes"), class = "factor"), COUNTRY = structure(c(16L, 16L, 16L,
16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L,
16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L,
16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L,
16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L,
16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L,
16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L,
16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L,
16L, 16L, 16L, 16L), .Label = c("Austria", "Belgium", "Bulgaria",
"Croatia", "Cyprus", "Czech Republic", "Denmark", "Estonia",
"Finland", "France", "Germany", "Greece", "Hungary", "Iceland",
"Ireland", "Italy", "Latvia", "Lithuania", "Luxembourg", "Malta",
"Norway", "Poland", "Portugal", "Romania", "Slovakia", "Slovenia",
"Spain", "Sweden", "United Kingdom of Great Britain and Northern Ireland"
), class = "factor"), YEAR = c(2003L, 2006L, 2007L, 2008L, 2009L,
2010L, 1995L, 1996L, 1997L, 1998L, 1999L, 2000L, 2001L, 2002L,
2003L, 2006L, 2007L, 2008L, 2009L, 2010L, 1995L, 1996L, 1997L,
1998L, 1999L, 2000L, 2001L, 2002L, 2003L, 2006L, 2007L, 2008L,
2009L, 2010L, 1995L, 1996L, 1997L, 1998L, 1999L, 2000L, 2001L,
2002L, 2003L, 2006L, 2007L, 2008L, 2009L, 2010L, 1995L, 1996L,
1997L, 1998L, 1999L, 2000L, 2001L, 2002L, 2003L, 2006L, 2007L,
2008L, 2009L, 2010L, 1995L, 1996L, 1997L, 1998L, 1999L, 2000L,
2001L, 2002L, 2003L, 2006L, 2007L, 2008L, 2009L, 2010L, 1995L,
1996L, 1997L, 1998L, 1999L, 2000L, 2001L, 2002L, 2003L, 2006L,
2007L, 2008L, 2009L, 2010L, 1995L, 1996L, 1997L, 1998L, 1999L,
2000L, 2001L, 2002L), AGE = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Total", class = "factor"),
`CAUSE OF DEATH` = c("Acute poliomyelitis", "Acute poliomyelitis",
"Acute poliomyelitis", "Acute poliomyelitis", "Acute poliomyelitis",
"Acute poliomyelitis", "Acute poliomyelitis", "Acute poliomyelitis",
"Acute poliomyelitis", "Acute poliomyelitis", "Acute poliomyelitis",
"Acute poliomyelitis", "Acute poliomyelitis", "Acute poliomyelitis",
"Diphtheria", "Diphtheria", "Diphtheria", "Diphtheria", "Diphtheria",
"Diphtheria", "Diphtheria", "Diphtheria", "Diphtheria", "Diphtheria",
"Diphtheria", "Diphtheria", "Diphtheria", "Diphtheria", "Measles",
"Measles", "Measles", "Measles", "Measles", "Measles", "Measles",
"Measles", "Measles", "Measles", "Measles", "Measles", "Measles",
"Measles", "Tetanus", "Tetanus", "Tetanus", "Tetanus", "Tetanus",
"Tetanus", "Tetanus", "Tetanus", "Tetanus", "Tetanus", "Tetanus",
"Tetanus", "Tetanus", "Tetanus", "Tuberculosis", "Tuberculosis",
"Tuberculosis", "Tuberculosis", "Tuberculosis", "Tuberculosis",
"Tuberculosis", "Tuberculosis", "Tuberculosis", "Tuberculosis",
"Tuberculosis", "Tuberculosis", "Tuberculosis", "Tuberculosis",
"Viral hepatitis", "Viral hepatitis", "Viral hepatitis",
"Viral hepatitis", "Viral hepatitis", "Viral hepatitis",
"Viral hepatitis", "Viral hepatitis", "Viral hepatitis",
"Viral hepatitis", "Viral hepatitis", "Viral hepatitis",
"Viral hepatitis", "Viral hepatitis", "Whooping cough", "Whooping cough",
"Whooping cough", "Whooping cough", "Whooping cough", "Whooping cough",
"Whooping cough", "Whooping cough", "Whooping cough", "Whooping cough",
"Whooping cough", "Whooping cough", "Whooping cough", "Whooping cough"
), VALUE = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 4L, 2L, 2L, 2L, 1L, 1L, 6L, 7L, 7L, 1L, 2L,
3L, 2L, 5L, 12L, 9L, 13L, 9L, 13L, 8L, 17L, 14L, 16L, 18L,
15L, 19L, 11L, 10L, 25L, 24L, 21L, 22L, 23L, 20L, 34L, 32L,
31L, 30L, 29L, 28L, 27L, 26L, 41L, 42L, 43L, 45L, 46L, 47L,
33L, 35L, 36L, 37L, 38L, 39L, 40L, 44L, 1L, 2L, 1L, 1L, 1L,
2L, 2L, 2L, 1L, 3L, 1L, 1L, 1L, 1L), .Label = c("0", "1",
"2", "3", "6", "7", "9", "17", "18", "19", "21", "22", "27",
"28", "30", "31", "37", "41", "42", "301", "329", "333",
"344", "350", "396", "413", "415", "460", "517", "558", "597",
"609", "622", "647", "681", "1087", "1349", "1413", "1448",
"1499", "1576", "1654", "1725", "1948", "2531", "2665", "2757"
), class = "factor"), ID = 1:98), .Names = c("OBBLIGATORIO",
"COUNTRY", "YEAR", "AGE", "CAUSE OF DEATH", "VALUE", "ID"), row.names = c(NA,
-98L), class = "data.frame")
I want to obtain a chart that:
on x axis there are values from YEAR column
on y axis there are
values from VALUE column data are divided by CAUSE OF DEATH column
So something like:
I try:
x11()
ggplot(df, aes(x = df$`YEAR`, y = df$`VALUE`, fill = df$`CAUSE OF DEATH`, colour = df$`CAUSE OF DEATH`)) +
geom_density(alpha = 0.1) +
xlim(1995, 2010)
But the result is completely different from the one I want.
Thanks
I'm not sure what your actual question is, but one problem with your dataframe is that the VALUE column is currently defined as a factor, not as as a numeric. I think that remedying this will go a long way to solving your problem. I do this post-facto below (i.e. after the dataframe is already created), but if you are getting the data into R via a read.table() or similar command, you can specify the class of your columns at data frame creation time, which is probably a better approach.
In my code below I use the dplyr package for manipulating dataframes. It's quite powerful, but for this particular example it isn't doing anything that base R couldn't do.
require(ggplot2)
require(dplyr)
require(magrittr)
df <- ### YOUR dput output goes here ###
# fix the problem with the `VALUE` column
df %<>% mutate(VALUE = VALUE %>% as.character %>% as.numeric)
# equivalent in base R:
# df$VALUE <- as.numeric(as.character(df$VALUE))
# make a graph (is it the one you want?)
df %>% group_by(YEAR, `CAUSE OF DEATH`) %>%
summarize(value = sum(VALUE)) %>%
ggplot(aes(x = YEAR, y = value, color = `CAUSE OF DEATH`)) +
geom_line() +
theme_bw() +
geom_point()
# save graph for uploading to SO
ggsave('SO37230266.png')
The result is this graph:
I am making a bar plot using lattice in R where I have data for 4 different years on sources of irrigation for different states. using my code, the bar plot is coming fine but I wish the bar corresponding to the year 1996 to be plotted first followed by the bar corresponding to year 2001 etc. so as to show the increasing area being irrigated by tube-wells. However, I am unable to change the ordering. Here is my data and the R code. Many thanks for your help.
# sample data
irr_atlas <- structure(list(state = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("ANDHRA PRADESH",
"KARNATAKA", "MADHYA PRADESH", "RAJASTHAN"), class = "factor"),
st_code = c(28L, 28L, 28L, 28L, 28L, 28L, 28L, 28L, 28L,
28L, 28L, 28L, 28L, 28L, 28L, 28L, 29L, 29L, 29L, 29L, 29L,
29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 23L,
23L, 23L, 23L, 23L, 23L, 23L, 23L, 23L, 23L, 23L, 23L, 23L,
23L, 23L, 23L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 8L, 8L), year = c(1996L, 1996L, 1996L, 1996L,
2001L, 2001L, 2001L, 2001L, 2006L, 2006L, 2006L, 2006L, 2011L,
2011L, 2011L, 2011L, 1996L, 1996L, 1996L, 1996L, 2001L, 2001L,
2001L, 2001L, 2006L, 2006L, 2006L, 2006L, 2011L, 2011L, 2011L,
2011L, 1996L, 1996L, 1996L, 1996L, 2001L, 2001L, 2001L, 2001L,
2006L, 2006L, 2006L, 2006L, 2011L, 2011L, 2011L, 2011L, 1996L,
1996L, 1996L, 1996L, 2001L, 2001L, 2001L, 2001L, 2006L, 2006L,
2006L, 2006L, 2011L, 2011L, 2011L, 2011L), irr_area = c(1.84066,
0.942819, 0.82886, 0.853502, 1.54922, 0.825659, 0.542492,
1.53412, 1.72969, 0.70271, 0.637221, 1.53894, 1.99893, 0.678425,
0.819829, 1.70708, 0.921594, 0.231669, 0.316999, 0.358529,
0.91339, 0.207157, 0.426549, 0.481061, 0.921255, 0.18192,
0.426145, 0.547193, 0.930802, 0.148065, 0.377149, 1.51843,
1.59425, 0.112145, 2.67683, 0.540054, 1.48056, 0.030502,
1.63696, 0.563948, 1.12595, 0.058667, 2.46494, 1.15004, 1.10444,
0.157069, 2.64378, 2.14177, 1.55814, 0.106623, 2.71347, 0.644683,
1.35746, 0.030586, 2.41845, 0.935234, 1.76933, 0.054374,
2.46197, 1.76918, 1.62587, 0.050299, 2.14737, 2.82708),irr_source = structure(c(1L,2L, 4L, 3L, 1L, 2L, 4L, 3L, 1L, 2L, 4L, 3L, 1L, 2L, 4L, 3L,
1L, 2L, 4L, 3L, 1L, 2L, 4L, 3L, 1L, 2L, 4L, 3L, 1L, 2L, 4L,
3L, 1L, 2L, 4L, 3L, 1L, 2L, 4L, 3L, 1L, 2L, 4L, 3L, 1L, 2L,
4L, 3L, 1L, 2L, 4L, 3L, 1L, 2L, 4L, 3L, 1L, 2L, 4L, 3L, 1L,
2L, 4L, 3L), .Label = c("Canal", "Tank", "Tube", "Well"), class = "factor")), .Names = c("state","st_code", "year", "irr_area", "irr_source"), class = "data.frame", row.names = c(NA, -64L))
Code for plot...
library(lattice)
barchart(~irr_area | factor(state) + factor(irr_source),
group=year, data=irr_atlas, auto.key=list(space="right"))
As mentioned, ordering of groups in R graphics is usually determined by the ordering of the factor variable. So, you can reorder your factors with factor and its levels argument.
library(lattice)
barchart(~irr_area | factor(state) + factor(irr_source),
group=factor(year, levels=sort(unique(year), decreasing=T)), # change the order of years
data=irr_atlas, auto.key=list(space="right"))
You can switch it back the other way by changing decreasing=F.
JD Long helped me with this: question about manual annotation.
But is it possible to do something similar on a facetted plot, such that the label style corresponds to the linestyle (aestetics) and in a way that I can annotate different facets individually?
Some data:
funny <- structure(list(Institution = structure(c(1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 3L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L), .Label = c("Q-branch",
"Some-Ville", "Spectre"), class = "factor"), Type = structure(c(5L,
6L, 1L, 3L, 5L, 6L, 2L, 4L, 5L, 6L, 2L, 4L, 5L, 6L, 2L, 4L, 5L,
6L, 2L, 4L, 5L, 6L, 2L, 4L, 5L, 6L, 2L, 4L, 5L, 6L, 2L, 4L, 5L,
6L, 2L, 4L, 5L, 6L, 2L, 4L, 5L, 6L, 2L, 4L, 5L, 6L, 2L, 4L), .Label = c("Korte videregående uddannelser",
"Mammas beer", "Mellemlange videregående uddannelser", "Tastes good",
"Unknown", "Your"), class = "factor"), År = c(2008L, 2008L,
2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L,
2008L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L,
2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L,
2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2006L, 2006L,
2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L,
2006L), Mndr = c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 15L, 15L,
15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 27L, 27L, 27L,
27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L), Data = c(159L,
NA, NA, 23L, 204L, NA, NA, 12L, 256L, NA, NA, 24L, 166L, 6L,
NA, 43L, 228L, NA, NA, 20L, 196L, 11L, NA, 37L, 99L, 14L, 9L,
96L, 147L, 7L, 5L, 91L, 100L, 10L, 7L, 126L, 60L, 17L, 6L, 106L,
78L, 18L, 13L, 140L, 48L, 23L, 5L, 136L)), .Names = c("Institution",
"Type", "År", "Mndr", "Data"), class = "data.frame", row.names = c(NA,
-48L))
And a facetted plot:
ggplot(funny, aes(Mndr, y=Data, group=Type, col=Type)) +
geom_line() +
facet_grid(.~Institution)
Thanks in advance for your help!
The idea is that for each manual annotation you have to define not only the label, but all the variables that define the panel, color, etc. The following code adds two labels in different panels.
pl <- ggplot(funny, aes(Mndr, y=Data, group=Type, col=Type))+geom_line()
+facet_grid(.~Institution) #your plot
nd <- data.frame(Institution=c("Q-branch","Some-Ville"), #panel
Type=c("Unknown", "Tastes good"), #color
Mndr=c(7,12), #x-coordinate of label
Data= c(170,50), #y-coordinate of label
Text=c("Label 1", "Label 2")) #label text
# add labels to plot:
pl <- pl + geom_text(aes(label=Text), data=nd, hjust=0, legend=FALSE)
pl
The legend=FALSE option will ensure that the small a's denoting the text are not added to the legend. You don't have to have a data frame for the labels, you could have a separate geom_text for each, but I find this way simpler.