Related
I am attempting to use package 'iml' in R to create plots of SHAP values from a GBM model created in H2O.
When I try to create the R6 Predictor object using the Predictor.new() function I get an error that states Error : all(feature.class %in% names(feature.types)) is not TRUE.
From this I am guessing that there is something about one of the feature classes that is incorrect, but this is just an educated guess based upon what the error message is literally saying.
Here is a sample of anonymized data (I can't share the real data because it is confidential):
structure(list(dlr_id_cur = c(1, 2), date_eff = structure(c(16014,
15416), class = "Date"), new_vec_ind = structure(c(1L, 1L), .Label = c("NNA",
"UNA"), class = "factor"), cntrct_term = c(9587879614862828,
19), amt_financed = c(9455359, 65561175), reg_payment = c(885288,
389371), acct_stat_cd = structure(c(3L, 3L), .Label = c("11",
"22", "33"), class = "factor"), base_rental = c(1, 626266), down_pymt = c(2,
6654661), car_count = c(5, 1), dur_lease = c(3974, 6466), returned = structure(1:2, .Label = c("00",
"11"), class = "factor"), state = structure(c(10L, 1L), .Label = c("ANA",
"BNA", "CNA", "DNA", "FNA", "GNA", "HNA", "INA", "KNA", "LNA",
"MNA", "NNA", "ONA", "PNA", "QNA", "RNA", "SNA", "TNA", "UNA",
"VNA", "WNA"), class = "factor"), zip = c(34633, 45222), zip_two_digits = structure(c(71L,
36L), .Label = c("00", "01", "02", "03", "04", "05", "06", "07",
"08", "09", "110", "111", "112", "113", "114", "115", "116",
"117", "118", "119", "220", "221", "222", "223", "224", "225",
"226", "227", "228", "229", "330", "331", "332", "333", "334",
"335", "336", "337", "338", "339", "440", "441", "442", "443",
"444", "445", "446", "447", "448", "449", "550", "551", "552",
"553", "554", "555", "556", "557", "558", "559", "660", "661",
"662", "663", "664", "665", "666", "667", "668", "669", "770",
"771", "772", "773", "774", "775", "776", "777", "778", "779",
"880", "881", "882", "883", "884", "885", "886", "887", "888",
"889", "990", "991", "992", "993", "994", "995", "996", "997",
"998", "999", "ANA", "BNA", "CNA", "ENA", "GNA", "HNA", "JNA",
"KNA", "LNA", "MNA", "NNA", "PNA", "RNA", "SNA", "TNA", "VNA"
), class = "factor")
, mod_year_date = c(8156, 6278), vehic_mod_fam_code = structure(c(2L,
2L), .Label = c("BNA", "CNA", "ENA", "MNA", "SNA", "TNA", "VNA",
"XNA"), class = "factor"), mod_class_code = structure(c(4L, 2L
), .Label = c("BNA", "CNA", "ENA", "GNA", "MNA", "RNA", "SNA"
), class = "factor"), count_dl_DL_CDE_CSPS_A_NP = c(945, 337),
DL_CDE_CSPS_A_NP_avg_dl = c(3355188283749626, 8835582388327814
), count_sv_DL_CDE_CSPS_A_NP = c(6532, 8475), DL_CDE_CSPS_A_NP_avg_sv = c(4471193398278526,
6934672627789796), count_dl_NUM_CSPS_INIT_SCR = c(774, 773
), NUM_CSPS_INIT_SCR_avg_dl = c(9468453388562312, 5847816458727333
), count_sv_NUM_CSPS_INIT_SCR = c(2467, 3882), NUM_CSPS_INIT_SCR_avg_sv = c(5857936629789154,
8963457353776469), count_FFV = c(8563, 2566), average_FFV = c(25697792913881564,
13693335921646120), csps_NUM_SV = c(8, 6), avg_SV_rating = c(9817541424596360,
6218928542331853), csps_FFV_ratio = c(23125612473476952,
2), avg_DL_rating = c(2182256921592387, 7668957586431513),
has_DL_rating = c(1, 8), has_bad_DL_rating = c(2, 4), serv_has_MNT = c(7,
3), serv_has_SCP = c(5, 4), serv_has_ELW = c(9, 4), serv_has_LCP = c(7,
1), ro_count = c(6, 1), ro_tot_cust_pay = c(2, 188759), ro_tot_pay = c(3,
764372), date_eff_weekday = structure(c(4L, 3L), .Label = c("FNA",
"MNA", "SNA", "TNA", "WNA"), class = "factor"), date_eff_month_int = c(83,
7), date_eff_day = c(2, 24)), .Names = c("dlr_id_cur", "date_eff",
"new_vec_ind", "cntrct_term", "amt_financed", "reg_payment",
"acct_stat_cd", "base_rental", "down_pymt", "car_count", "dur_lease",
"returned", "state", "zip", "zip_two_digits", "mod_year_date",
"vehic_mod_fam_code", "mod_class_code", "count_dl_DL_CDE_CSPS_A_NP",
"DL_CDE_CSPS_A_NP_avg_dl", "count_sv_DL_CDE_CSPS_A_NP", "DL_CDE_CSPS_A_NP_avg_sv",
"count_dl_NUM_CSPS_INIT_SCR", "NUM_CSPS_INIT_SCR_avg_dl", "count_sv_NUM_CSPS_INIT_SCR",
"NUM_CSPS_INIT_SCR_avg_sv", "count_FFV", "average_FFV", "csps_NUM_SV",
"avg_SV_rating", "csps_FFV_ratio", "avg_DL_rating", "has_DL_rating",
"has_bad_DL_rating", "serv_has_MNT", "serv_has_SCP", "serv_has_ELW",
"serv_has_LCP", "ro_count", "ro_tot_cust_pay", "ro_tot_pay",
"date_eff_weekday", "date_eff_month_int", "date_eff_day"), row.names = 1:2, class = "data.frame")
# 1. create a data frame with just the features
features_iml <- as.data.frame(df_testR) %>% dplyr::select(-returned)
# 2. Create a vector with the actual responses
response_iml <- as.numeric(as.vector(df_testR$returned))
# 3. Create custom predict function that returns the predicted values as a
# vector (probability of customer churn in my example)
pred <- function(model, newdata) {
results <- as.data.frame(h2o.predict(model, as.h2o(newdata)))
return(results[[3L]])
}
# 4. example of prediction output
pred(GBM5, features_iml) %>% head()
# 5. create Predictor object
predictor = Predictor$new(model = GBM5, data = features_iml, y =
response_iml, predict.fun = pred, class = "classification")
Error : all(feature.class %in% names(feature.types)) is not TRUE
Here are also so basic descriptions of the dataset and model object I'm
using in the code above:
class(GBM5)
[1] "H2OBinomialModel"
attr(,"package")
[1] "h2o"
class(df_testR)
[1] "tbl_df" "tbl" "data.frame"
dim(df_testR)
[1] 47006 44
If there is anything else I can provide or if I have been unclear please let me know.
In the iml package there are specific feature classes that are acceptable, namely numeric, integer, character, factor and ordered. If you have any Date objects, or any other data type than the 5 listed here than the Predictor object can not be created.
This question already has an answer here:
Set NA and "" Cells in R Dataframe to NULL
(1 answer)
Closed 4 years ago.
I have a dataframe where I want to replace all values in a column that contain the value '2018' with NULL.
I have a dataset where every value in a column is a list. There are NULLs included as well. One of the values is not a list and I want to replace it with a NULL. If I replace it with NA then the datatypes in that column are mixed.
If I have a column like below, how do I replace the value containing 2018 with NULL instead of NA?
spend actions
176.2 2018-02-24
166.66 list(action_type = c("landing_page_view", "link_click", "offsit...
153.89 list(action_type = c("landing_page_view", "like", "link_click",...
156.54 list(action_type = c("landing_page_view", "like", "link_click",...
254.95 list(action_type = c("landing_page_view", "like", "link_click",...
374 list(action_type = c("landing_page_view", "like", "link_click",...
353.29 list(action_type = c("landing_page_view", "like", "link_click",...
0.41 NULL
Reproducible Example:
structure(list(spend = c("176.2", "166.66", "153.89", "156.54",
"254.95", "374", "353.29", "0.41"), actions = list("2018-02-24",
structure(list(action_type = c("landing_page_view", "link_click",
"offsite_conversion.fb_pixel_add_to_cart",
"offsite_conversion.fb_pixel_purchase",
"offsite_conversion.fb_pixel_search",
"offsite_conversion.fb_pixel_view_content",
"post", "post_reaction", "page_engagement", "post_engagement",
"offsite_conversion"), value = c("179", "275", "212", "18",
"269", "1434", "1", "17", "293", "293", "1933")), .Names = c("action_type",
"value"), class = "data.frame", row.names = c(NA, 11L)),
structure(list(action_type = c("landing_page_view", "like",
"link_click", "offsite_conversion.fb_pixel_add_to_cart",
"offsite_conversion.fb_pixel_purchase",
"offsite_conversion.fb_pixel_search",
"offsite_conversion.fb_pixel_view_content", "post_reaction",
"page_engagement", "post_engagement", "offsite_conversion"
), value = c("136", "3", "248", "101", "6", "237", "730",
"11", "262", "259", "1074")), .Names = c("action_type", "value"
), class = "data.frame", row.names = c(NA, 11L)), structure(list(
action_type = c("landing_page_view", "like", "link_click",
"offsite_conversion.fb_pixel_add_to_cart",
"offsite_conversion.fb_pixel_purchase",
"offsite_conversion.fb_pixel_search",
"offsite_conversion.fb_pixel_view_content",
"post", "post_reaction", "page_engagement", "post_engagement",
"offsite_conversion"), value = c("95", "1", "156", "91",
"5", "83", "532", "1", "13", "171", "170", "711")), .Names =
c("action_type",
"value"), class = "data.frame", row.names = c(NA, 12L)),
structure(list(action_type = c("landing_page_view", "like",
"link_click", "offsite_conversion.fb_pixel_add_to_cart",
"offsite_conversion.fb_pixel_purchase",
"offsite_conversion.fb_pixel_search",
"offsite_conversion.fb_pixel_view_content", "post_reaction",
"page_engagement", "post_engagement", "offsite_conversion"
), value = c("178", "4", "243", "56", "4", "138", "437",
"19", "266", "262", "635")), .Names = c("action_type", "value"
), class = "data.frame", row.names = c(NA, 11L)), structure(list(
action_type = c("landing_page_view", "like", "link_click",
"offsite_conversion.fb_pixel_add_to_cart",
"offsite_conversion.fb_pixel_purchase",
"offsite_conversion.fb_pixel_search",
"offsite_conversion.fb_pixel_view_content",
"post_reaction", "page_engagement", "post_engagement",
"offsite_conversion"), value = c("203", "2", "306", "105",
"7", "186", "954", "23", "331", "329", "1252")), .Names =
c("action_type",
"value"), class = "data.frame", row.names = c(NA, 11L)),
structure(list(action_type = c("landing_page_view", "like",
"link_click", "offsite_conversion.fb_pixel_add_to_cart",
"offsite_conversion.fb_pixel_purchase",
"offsite_conversion.fb_pixel_search",
"offsite_conversion.fb_pixel_view_content", "post", "post_reaction",
"page_engagement", "post_engagement", "offsite_conversion"
), value = c("241", "4", "320", "106", "3", "240", "789",
"1", "17", "342", "338", "1138")), .Names = c("action_type",
"value"), class = "data.frame", row.names = c(NA, 12L)),
NULL)), .Names = c("spend", "actions"), row.names = c(NA,
-8L), class = "data.frame")
My ultimate goal is to use this function with this dataset to make the action_types their own column. This function works when either a list or NULL is in the actions column:
fb_insights_all<-df %>%
as.tibble() %>%
filter(!map_lgl(actions, is.null)) %>%
unnest() %>%
right_join(select(df, -actions)) %>%
spread(action_type, value)
Error: Each column must either be a list of vectors or a list of data frames [actions]
Without data to test this on, I'd try:
df$COL1<-ifelse(grepl("2018", df$COL1),"NULL",df$COL1)
As stated here NA functions more like what you seem to be trying to do, while NULL serves a different function. If you just want the value to just say "NULL" rather than function like NULL, treat it like a character value.
I made the following plot, consisting of multiple plots. I am very happy with it, but the legend starts in the middle of the bottom line and I would like to place it more to the left.
However, I can't find how. This is how I placed the legend under the graph.
# extract legend
leg1 <- g1$grobs[[which(g1$layout$name == "guide-box")]]
leg2 <- g2$grobs[[which(g2$layout$name == "guide-box")]]
g$grobs[[which(g$layout$name == "guide-box")]] <-
gtable:::cbind_gtable(leg1, leg2, "first")
grid.draw(g)
Thank you very much for your help!
DATA
out121<-structure(list(MEt_R = c(-0.0541818151603231, -0.0562844791428272,
-0.0558715941992024, -0.0562399962945622, -0.0560460386125185,
-0.0570608897132082, -0.0569943385875705, -0.0568252787782472,
-0.0569942506473323, -0.0565197621205338, -0.056900534973487,
-0.0571427349989937, -0.0569618449465491, -0.0566601716889117,
-0.0563552308197707, -0.0568648464047371, -0.057047451157018,
-0.0571837090302319, -0.0588902340655496, -0.0592472918164029
), MEp_R = c(-0.247452286142448, -0.250297111391169, -0.249928846077379,
-0.25046029682347, -0.250073673565474, -0.250875645110485, -0.250823269975906,
-0.250803625118812, -0.250975027824198, -0.250800205283021, -0.249498983660567,
-0.249414312295583, -0.248700460230235, -0.247557861440942, -0.246180020784707,
-0.245773209833456, -0.245867722008906, -0.245832189026612, -0.248451853542242,
-0.248819121423065), MEt_Irr = c(-0.0930626749780042, -0.0924059309460578,
-0.0924771937440385, -0.0905386156125412, -0.0914934037180768,
-0.0898948119109486, -0.0898827200499507, -0.090372707751177,
-0.0901901622784647, -0.0914484064620663, -0.0925147845884521,
-0.0927733849042059, -0.0960873954367445, -0.0948131376144847,
-0.0955133693827158, -0.0933133384990093, -0.0927340360155418,
-0.0925138612415783, -0.0896139882242573, -0.0912014136494108
), MEp_Irr = c(-0.134285798811785, -0.130421729939034, -0.130843425161555,
-0.125678194629783, -0.12773193697829, -0.124481076478246, -0.124497401309687,
-0.125610694968169, -0.123946111674758, -0.123370795186237, -0.126287791384532,
-0.126473323542922, -0.132539755897724, -0.132493992548119, -0.136001653508856,
-0.134027790837091, -0.133453827739445, -0.133605798794612, -0.125624822911512,
-0.12651195788011), se_MEt_Rainfed = c(0.124867384884912, 0.124157398945455,
0.124169568385358, 0.124110270348855, 0.12391954965997, 0.123742628011372,
0.123766054757713, 0.123576175335345, 0.12353428904291, 0.123443556846824,
0.122869340273675, 0.122594726299249, 0.122332685310317, 0.12197210341919,
0.121115745201095, 0.120880251090657, 0.120851770150267, 0.120746714650168,
0.120922991632831, 0.120866928018865), se_MEp_Rainfed = c(0.143672836801446,
0.144657376398904, 0.144507363687457, 0.144769378821498, 0.144648550573144,
0.144872373777953, 0.145051397794363, 0.144915196543632, 0.144991517393619,
0.144873626704144, 0.143989720401395, 0.143853417769885, 0.143333599218362,
0.1427407217193, 0.142440215481916, 0.142160801176927, 0.142096501723151,
0.14202065464998, 0.143129678943783, 0.143137620878338), se_MEt_Irrigation = c(0.0790595725119853,
0.0819113174332981, 0.0818328749299557, 0.0834638025854297, 0.0818357384597404,
0.0830466544695816, 0.0830677796154873, 0.0829941906461297, 0.083141965909444,
0.082714324704666, 0.0809987066350066, 0.0810565659915952, 0.0792023249112186,
0.0779277210970589, 0.0797106575341609, 0.0796897823245035, 0.0793238667046254,
0.0794345101645159, 0.0805370559814554, 0.0816802765047257),
se_MEp_Irrigation = c(0.0739612622169091, 0.0737063705751054,
0.07367641793811, 0.0723914728669354, 0.0740776203174818,
0.069800467728211, 0.0696877344237401, 0.069931499405769,
0.0696836713882552, 0.0698379678577116, 0.0715460291976299,
0.0716115167234236, 0.0766235981234074, 0.0784897268021916,
0.083907652449956, 0.0837898365477357, 0.083360196343319,
0.0836313050243269, 0.0820421848807231, 0.0830267028200501
), Irrigationtotal = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0), Irrigation0 = c(0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Irrigation10 = c(537,
532, 526, 524, 520, 517, 516, 511, 505, 500, 489, 482, 478,
471, 465, 463, 457, 455, 446, 439), Irrigation20 = c(384,
384, 384, 384, 384, 384, 384, 384, 384, 384, 384, 384, 384,
384, 384, 384, 384, 384, 384, 384), Irrigation30 = c(268,
268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268,
268, 268, 268, 268, 268, 268, 268), Irrigation40 = c(272,
272, 272, 272, 272, 272, 272, 272, 272, 272, 272, 272, 272,
272, 272, 272, 272, 272, 272, 272), Irrigation50 = c(246,
246, 246, 246, 246, 246, 246, 246, 246, 246, 246, 246, 246,
246, 246, 246, 246, 246, 246, 246), Irrigation60 = c(185,
185, 185, 185, 185, 185, 185, 185, 185, 185, 185, 185, 185,
185, 185, 185, 185, 185, 185, 185), Irrigation70 = c(194,
194, 194, 194, 194, 194, 194, 194, 194, 194, 194, 194, 194,
194, 194, 194, 194, 194, 194, 194), Irrigation80 = c(184,
184, 184, 184, 184, 184, 184, 184, 184, 184, 184, 184, 184,
184, 184, 184, 184, 184, 184, 184), Irrigation90 = c(172,
172, 172, 172, 172, 172, 172, 172, 172, 172, 172, 172, 172,
172, 172, 172, 172, 172, 172, 172), Irrigation100 = c(1168,
1168, 1168, 1168, 1168, 1168, 1168, 1168, 1168, 1168, 1168,
1168, 1168, 1168, 1168, 1168, 1168, 1168, 1168, 1168), perc1 = 1:20,
perc = structure(1:20, .Label = c("1", "2", "3", "4", "5",
"6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16",
"17", "18", "19", "20", "21", "22", "23", "24", "25", "26",
"27", "28", "29", "30", "31", "32", "33", "34", "35", "36",
"37", "38", "39", "40", "41", "42", "43", "44", "45", "46",
"47", "48", "49", "50", "51", "52", "53", "54", "55", "56",
"57", "58", "59", "60", "61", "62", "63", "64", "65", "66",
"67", "68", "69", "70", "71", "72", "73", "74", "75", "76",
"77", "78", "79", "80", "81", "82", "83", "84", "85", "86",
"87", "88", "89", "90", "91", "92", "93", "94", "95", "96",
"97", "98", "99", "100", "101", "102", "103", "104", "105",
"106", "107", "108", "109", "110", "111", "112", "113", "114",
"115", "116", "117", "118", "119", "120", "121", "122", "123",
"124", "125", "126", "127", "128", "129", "130", "131", "132",
"133", "134", "135", "136", "137", "138", "139", "140", "141",
"142", "143", "144", "145", "146", "147", "148", "149", "150",
"151", "152", "153", "154", "155", "156", "157", "158", "159",
"160", "161", "162", "163", "164", "165", "166", "167", "168",
"169", "170", "171", "172", "173", "174", "175", "176", "177",
"178", "179", "180", "181", "182", "183", "184", "185", "186",
"187", "188", "189", "190", "191", "192", "193", "194", "195",
"196", "197", "198", "199", "200", "201", "202", "203", "204",
"205", "206", "207", "208", "209", "210", "211", "212", "213",
"214", "215", "216", "217", "218", "219", "220", "221", "222",
"223", "224", "225", "226", "227", "228", "229", "230", "231",
"232", "233", "234", "235", "236", "237", "238", "239", "240",
"241", "242", "243", "244", "245", "246", "247", "248", "249",
"250", "251", "252", "253", "254", "255", "256", "257", "258",
"259", "260", "261", "262", "263", "264", "265", "266", "267",
"268", "269", "270", "271", "272", "273", "274", "275", "276",
"277", "278", "279", "280", "281", "282", "283", "284", "285",
"286", "287", "288", "289", "290", "291", "292", "293", "294",
"295", "296", "297", "298", "299", "300", "301", "302", "303",
"304", "305", "306", "307", "308", "309", "310", "311", "312",
"313", "314", "315", "316", "317", "318", "319", "320", "321",
"322", "323", "324", "325", "326", "327", "328", "329", "330",
"331", "332", "333", "334", "335", "336", "337", "338", "339",
"340", "341", "342", "343", "344", "345", "346", "347", "348",
"349", "350", "351", "352", "353", "354", "355", "356", "357",
"358", "359", "360", "361", "362", "363", "364", "365", "366",
"367", "368", "369", "370", "371", "372", "373", "374", "375",
"376", "377", "378", "379", "380", "381", "382", "383", "384",
"385", "386", "387", "388", "389", "390", "391", "392", "393",
"394", "395", "396", "397", "398", "399", "400", "401", "402",
"403", "404", "405", "406", "407", "408", "409", "410", "411",
"412", "413", "414", "415", "416", "417", "418", "419", "420",
"421", "422", "423", "424", "425", "426", "427", "428", "429",
"430", "431", "432", "433", "434", "435", "436", "437", "438",
"439", "440", "441", "442", "443", "444", "445", "446", "447",
"448", "449", "450", "451", "452", "453", "454", "455", "456",
"457", "458", "459", "460", "461", "462", "463", "464", "465",
"466", "467", "468", "469", "470", "471", "472", "473", "474",
"475", "476", "477", "478", "479", "480", "481", "482", "483",
"484", "485", "486", "487", "488", "489", "490", "491", "492",
"493", "494", "495", "496", "497", "498", "499", "500", "501",
"502", "503", "504", "505", "506", "507", "508", "509", "510",
"511", "512", "513", "514", "515", "516", "517", "518", "519",
"520", "521", "522", "523", "524", "525", "526", "527", "528",
"529", "530", "531", "532", "533", "534", "535", "536", "537",
"538", "539", "540", "541", "542", "543", "544", "545", "546",
"547", "548", "549", "550", "551", "552", "553", "554", "555",
"556", "557", "558", "559", "560", "561", "562", "563", "564",
"565", "566", "567", "568", "569", "570", "571", "572", "573",
"574", "575", "576", "577", "578", "579", "580", "581", "582",
"583", "584", "585", "586", "587", "588", "589", "590", "591",
"592", "593", "594", "595", "596", "597", "598", "599", "600",
"601", "602", "603", "604", "605", "606", "607", "608", "609",
"610", "611", "612", "613", "614", "615", "616", "617", "618",
"619", "620", "621", "622", "623", "624", "625", "626", "627",
"628", "629", "630", "631", "632", "633", "634", "635", "636",
"637", "638", "639", "640", "641", "642", "643", "644", "645",
"646", "647", "648", "649", "650", "651", "652", "653", "654",
"655", "656", "657", "658", "659", "660", "661", "662", "663",
"664", "665", "666", "667", "668", "669", "670", "671", "672",
"673", "674", "675", "676", "677", "678", "679", "680", "681",
"682", "683", "684", "685", "686", "687", "688", "689", "690",
"691", "692", "693", "694", "695", "696", "697", "698", "699",
"700", "701", "702", "703", "704", "705", "706", "707", "708",
"709", "710", "711", "712", "713", "714", "715", "716", "717",
"718", "719", "720", "721", "722", "723", "724", "725", "726",
"727", "728", "729", "730", "731", "732", "733", "734", "735",
"736", "737", "738", "739", "740", "741", "742", "743", "744",
"745", "746", "747", "748", "749", "750", "751", "752", "753",
"754", "755", "756", "757", "758", "759", "760", "761", "762",
"763", "764", "765", "766", "767", "768", "769", "770", "771",
"772", "773", "774", "775", "776", "777", "778", "779", "780",
"781", "782", "783", "784", "785", "786", "787", "788", "789",
"790", "791", "792", "793", "794", "795", "796", "797", "798",
"799", "800", "801", "802", "803", "804", "805", "806", "807",
"808", "809", "810", "811", "812", "813", "814", "815", "816",
"817", "818", "819", "820", "821", "822", "823", "824", "825",
"826", "827", "828", "829", "830", "831", "832", "833", "834",
"835", "836", "837", "838", "839", "840", "841", "842", "843",
"844", "845", "846", "847", "848", "849", "850", "851", "852",
"853", "854", "855", "856", "857", "858", "859", "860", "861",
"862", "863", "864", "865", "866", "867", "868", "869", "870",
"871", "872", "873", "874", "875", "876", "877", "878", "879",
"880", "881", "882", "883", "884", "885", "886", "887", "888",
"889", "890", "891", "892", "893", "894", "895", "896", "897",
"898", "899", "900", "901", "902", "903", "904", "905", "906",
"907", "908", "909", "910", "911", "912", "913", "914", "915",
"916", "917", "918", "919", "920", "921", "922", "923", "924",
"925", "926", "927", "928", "929", "930", "931", "932", "933",
"934", "935", "936", "937", "938", "939", "940", "941", "942",
"943", "944", "945", "946", "947", "948", "949", "950", "951",
"952", "953", "954", "955", "956", "957", "958", "959", "960",
"961", "962", "963", "964", "965", "966", "967", "968", "969",
"970", "971", "972", "973", "974", "975", "976", "977", "978",
"979", "980", "981", "982", "983", "984", "985", "986", "987",
"988", "989", "990", "991", "992", "993", "994", "995", "996",
"997", "998", "999"), class = "factor")), .Names = c("MEt_R",
"MEp_R", "MEt_Irr", "MEp_Irr", "se_MEt_Rainfed", "se_MEp_Rainfed",
"se_MEt_Irrigation", "se_MEp_Irrigation", "Irrigationtotal",
"Irrigation0", "Irrigation10", "Irrigation20", "Irrigation30",
"Irrigation40", "Irrigation50", "Irrigation60", "Irrigation70",
"Irrigation80", "Irrigation90", "Irrigation100", "perc1", "perc"
), row.names = c(NA, 20L), class = "data.frame")
Code to produce graph
library(ggplot2)
library(gtable)
library(reshape2)
# line plot
out121$perc1<-c(1:999)
l64<-ggplot(out121,aes(perc1))
l65<-l64+geom_line(aes(y=MEt_R,colour="Rainfed"),size=1.3)+
geom_line(aes(y=MEt_R+se_MEt_Rainfed,colour="Rainfed range"),size=0.7)+
geom_line(aes(y=MEt_R-se_MEt_Rainfed,colour="Rainfed range"),size=0.7)+
geom_line(aes(y=MEt_Irr,colour="Irrigation"),size=1.3)+
geom_line(aes(y=MEt_Irr+se_MEt_Irrigation,colour="Irrigation range"),size=0.7)+
geom_line(aes(y=MEt_Irr-se_MEt_Irrigation,colour="Irrigation range"),size=0.7)+
scale_colour_manual(values=c("blue3","mediumslateblue","green3","green"), name="")+
scale_x_discrete(name="Threshold irrigation (in percentage)",breaks=c(0, 250, 500,750,1000),
labels=c("0", "25", "50","75","100")) +
scale_y_continuous(name="MEt",limits = c(-0.7, 0.5),breaks=c(-0.3,-0.1,0,0.1,0.3,0.5))
l66<-l65+ theme_bw()+ggtitle("subsidies 1 large") +
theme(plot.title = element_text(lineheight=.8, face="bold"),legend.position="bottom")+
guides(col=guide_legend(ncol=2))+
theme(panel.background = element_rect(fill = NA),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank())
l66
# bar plot
test <- data.frame(out121$perc,out121$Irrigation10,out121$Irrigation20,out121$Irrigation30,out121$Irrigation40,
out121$Irrigation50,out121$Irrigation60,out121$Irrigation70,out121$Irrigation80,
out121$Irrigation90,out121$Irrigation100)
# barplot(as.matrix(test))
library(reshape2)
foo.long<-melt(test)
foo.long$out121.perc<- as.character(foo.long$out121.perc)
foo.long$out121.perc <- factor(foo.long$out121.perc, levels=unique(foo.long$out121.perc))
cbbPalette <- c("yellow","greenyellow","#00FF00", "#00C639","#00AA55", "#00718E", "#0055AA", "#001CE3","blue4","midnightblue")
l2<-ggplot(foo.long, aes(out121.perc,value,fill=variable))+
geom_bar(position="stack",stat="identity")+
scale_fill_manual(values=cbbPalette,name = "% of irrigation",
labels = c("0-10% irrigation", "10-20% irrigation", "20-30% irrigation",
"30-40% irrigation", "40-50% irrigation", "50-60% irrigation",
"60-70% irrigation", "70-80% irrigation", "80-90% irrigation",
"90-100% irrigation"))+
scale_x_discrete(name="Threshold irrigation (in percentage)",breaks=c(0, 250, 500,750,1000),
labels=c("0", "25", "50","75","100")) +
scale_y_continuous(name="number of farms",limits = c(0, 10000), breaks=c(0, 1000,2000,3000, 4000))+
theme(legend.position=c(0.7,0.4),panel.background = element_rect(fill = NA),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank())+
guides(fill=guide_legend(title="Number of irrigated farms",ncol=3))
l2
# ggplotGrob
g1 <- ggplotGrob(l2)
g2 <- ggplotGrob(l66)
# Add plots together
pp <- c(subset(g2$layout, name == "panel", se = t:r))
g <- gtable_add_grob(g2, g1$grobs[[which(g1$layout$name == "panel")]], pp$t,
pp$l, pp$b, pp$l)
# Add second axis for accuracy
ia <- which(g1$layout$name == "axis-l")
ga <- g1$grobs[[ia]]
ax <- ga$children[[2]]
ax$widths <- rev(ax$widths)
ax$grobs <- rev(ax$grobs)
ax$grobs[[1]]$x <- ax$grobs[[1]]$x - unit(1, "npc") + unit(0.15, "cm")
g <- gtable_add_cols(g, g1$widths[g1$layout[ia, ]$l], length(g$widths) - 1)
g <- gtable_add_grob(g, ax, pp$t, length(g$widths) - 1, pp$b)
# Add second y-axis title
ia <- which(g1$layout$name == "ylab")
ax <- g1$grobs[[ia]]
# str(ax) # you can change features (size, colour etc for these -
# change rotation below
ax$rot <- 270
g <- gtable_add_cols(g, g1$widths[g1$layout[ia, ]$l], length(g$widths) - 1)
g <- gtable_add_grob(g, ax, pp$t, length(g$widths) - 1, pp$b)
# extract legend
leg1 <- g1$grobs[[which(g1$layout$name == "guide-box")]]
leg2 <- g2$grobs[[which(g2$layout$name == "guide-box")]]
g$grobs[[which(g$layout$name == "guide-box")]] <-
gtable:::cbind_gtable(leg1, leg2, "first")
grid.draw(g)
You were close. I think your problem is that you position the legend in the second plot, and that positioning carries through to the combined legend, thus mucking it up. Also, the legend title should be positioned on top, and then the combined legend needs a little more space. Your first plot is fine. Picking up from your second plot:
# Making adjustments to legend: adjusting position and title position
l2 <- ggplot(foo.long, aes(out121.perc,value,fill=variable))+
geom_bar(position="stack",stat="identity")+
scale_fill_manual(values=cbbPalette,name = "% of irrigation",
labels = c("0-10% irrigation", "10-20% irrigation", "20-30% irrigation",
"30-40% irrigation", "40-50% irrigation", "50-60% irrigation",
"60-70% irrigation", "70-80% irrigation", "80-90% irrigation",
"90-100% irrigation"))+
scale_x_discrete(name="Threshold irrigation (in percentage)",breaks=c(0, 250, 500,750,1000),
labels=c("0", "25", "50","75","100")) +
scale_y_continuous(name="number of farms",limits = c(0, 10000), breaks=c(0, 1000,2000,3000, 4000))+
theme(legend.position="bottom", panel.background = element_rect(fill = NA),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank())+
guides(fill=guide_legend(title="Number of irrigated farms",ncol=3, title.position = "top"))
l2
# ggplotGrob
g1 <- ggplotGrob(l2)
g2 <- ggplotGrob(l66)
# Add plots together
pp <- c(subset(g2$layout, name == "panel", se = t:r))
g <- gtable_add_grob(g2, g1$grobs[[which(g1$layout$name == "panel")]], pp$t,
pp$l, pp$b, pp$l)
# Add second axis for accuracy
ia <- which(g1$layout$name == "axis-l")
ga <- g1$grobs[[ia]]
ax <- ga$children[[2]]
ax$widths <- rev(ax$widths)
ax$grobs <- rev(ax$grobs)
ax$grobs[[1]]$x <- ax$grobs[[1]]$x - unit(1, "npc") + unit(0.15, "cm")
g <- gtable_add_cols(g, g1$widths[g1$layout[ia, ]$l], length(g$widths) - 1)
g <- gtable_add_grob(g, ax, pp$t, length(g$widths) - 1, pp$b)
# Add second y-axis title
ia <- which(g1$layout$name == "ylab")
ax <- g1$grobs[[ia]]
# str(ax) # you can change features (size, colour etc for these -
# change rotation below
ax$rot <- 270
g <- gtable_add_cols(g, g1$widths[g1$layout[ia, ]$l], length(g$widths) - 1)
g <- gtable_add_grob(g, ax, pp$t, length(g$widths) - 1, pp$b)
# extract legend
leg1 <- g1$grobs[[which(g1$layout$name == "guide-box")]]
leg2 <- g2$grobs[[which(g2$layout$name == "guide-box")]]
legc = gtable:::cbind_gtable(leg1, leg2, "first")
g$grobs[[which(g$layout$name == "guide-box")]] <-
gtable:::cbind_gtable(leg1, leg2, "first")
grid.draw(g) # Note: Legend does not fit
g$heights[[6]] = unit(5, "cm") # Add more space for the legend - can adjust this to suit
grid.draw(g)
I'm running a hurdle type analysis on species distribution data which involves two fitting steps. The first step is to model (m1) presence/absence data using all data with family=quasibinomial. The second step (m2) is to use positive presence only data with family=Gamma. This works wonderfully until I try to predict using the second model (m2) on the full dataset I receive an error due to new factor levels. I understand why I am receiving this error; there are factor levels that appear in the full dataset that are not present in the reduce (presence only) dataset. My question is how do I work around this error so that I can get predictions using the second model on the full set?
I am using mgcv.
Edit: Updated with additional code and data.
# Step1 - GAM using full dataset for presence/absense
grays<-structure(list(Grid_ID = structure(c(39L, 51L, 52L, 67L), .Label = c("1",
"1,000", "1,001", "1,008", "1,009", "1,010", "1,011", "1,012",
"1,013", "1,014", "1,015", "1,016", "1,022", "1,023", "1,024",
"1,025", "1,026", "1,027", "1,028", "1,029", "1,034", "1,035",
"1,036", "1,037", "1,039", "1,040", "1,045", "1,046", "1,047",
"1,048", "1,053", "1,054", "1,055", "10", "100", "101", "103",
"104", "105", "106", "107", "108", "109", "11", "110", "118",
"119", "12", "122", "125", "126", "127", "128", "129", "13",
"130", "131", "132", "133", "14", "141", "142", "15", "150",
"151", "152", "153", "154", "155", "156", "157", "158", "159",
"160", "161", "162", "163", "167", "168", "169", "173", "174",
"175", "176", "177", "178", "179", "180", "181", "182", "183",
"184", "185", "188", "189", "190", "196", "197", "198", "199",
"2", "20", "200", "201", "202", "203", "204", "205", "206", "207",
"209", "210", "211", "219", "22", "220", "221", "222", "223",
"224", "225", "226", "227", "228", "229", "23", "230", "231",
"233", "234", "235", "236", "237", "24", "246", "247", "248",
"249", "25", "250", "252", "253", "254", "255", "256", "257",
"258", "259", "26", "260", "261", "267", "268", "269", "27",
"270", "271", "272", "273", "274", "275", "276", "277", "278",
"279", "28", "280", "281", "286", "287", "288", "289", "29",
"290", "291", "292", "293", "294", "295", "296", "297", "298",
"299", "3", "300", "301", "302", "303", "305", "306", "307",
"308", "309", "310", "311", "312", "313", "314", "315", "316",
"317", "318", "319", "320", "321", "326", "327", "328", "329",
"330", "331", "332", "333", "334", "335", "336", "337", "339",
"340", "341", "343", "344", "345", "346", "347", "348", "349",
"350", "351", "352", "355", "356", "357", "36", "360", "361",
"362", "363", "364", "365", "366", "367", "368", "369", "37",
"372", "373", "374", "38", "380", "381", "382", "383", "384",
"385", "386", "39", "391", "392", "397", "398", "399", "4", "40",
"400", "401", "402", "408", "409", "41", "410", "412", "413",
"414", "415", "416", "417", "42", "423", "424", "425", "426",
"43", "430", "431", "432", "433", "434", "44", "441", "442",
"443", "444", "447", "448", "449", "45", "450", "451", "458",
"459", "46", "460", "461", "462", "463", "464", "465", "466",
"470", "471", "472", "473", "474", "475", "476", "484", "485",
"486", "487", "488", "489", "490", "491", "492", "496", "497",
"498", "499", "5", "500", "501", "513", "514", "515", "516",
"517", "518", "523", "524", "525", "526", "527", "528", "529",
"54", "541", "542", "543", "544", "545", "55", "550", "551",
"552", "553", "554", "56", "569", "57", "570", "571", "572",
"573", "574", "578", "579", "580", "581", "582", "599", "60",
"600", "601", "602", "603", "604", "605", "606", "607", "608",
"609", "61", "610", "62", "626", "627", "628", "629", "63", "632",
"633", "634", "635", "636", "637", "638", "639", "64", "653",
"654", "655", "656", "657", "658", "659", "660", "663", "664",
"665", "666", "667", "668", "669", "670", "671", "672", "673",
"687", "688", "689", "690", "691", "692", "693", "696", "697",
"698", "699", "7", "700", "701", "702", "703", "704", "705",
"716", "717", "718", "720", "721", "722", "723", "724", "725",
"726", "727", "728", "739", "74", "740", "741", "746", "747",
"748", "749", "75", "750", "751", "752", "753", "754", "764",
"765", "768", "769", "77", "770", "771", "772", "773", "78",
"782", "783", "784", "788", "789", "79", "790", "798", "799",
"8", "80", "800", "801", "804", "805", "81", "812", "813", "814",
"815", "816", "819", "82", "820", "821", "827", "828", "829",
"83", "830", "831", "833", "834", "835", "836", "84", "842",
"843", "844", "845", "846", "849", "85", "850", "851", "852",
"853", "854", "860", "861", "862", "863", "864", "869", "870",
"871", "872", "873", "874", "88", "881", "882", "883", "884",
"885", "886", "89", "890", "891", "892", "893", "894", "9", "902",
"903", "904", "905", "906", "908", "909", "910", "911", "912",
"922", "923", "924", "925", "926", "927", "928", "929", "930",
"940", "941", "942", "943", "944", "945", "946", "947", "948",
"957", "958", "959", "96", "960", "961", "962", "963", "964",
"965", "966", "97", "976", "977", "978", "979", "980", "981",
"982", "983", "984", "992", "993", "994", "995", "996", "997",
"998", "999"), class = "factor"), Grid_Lat = c(56.85582097, 56.90062505,
56.90024495, 56.94461032), Grid_Long = c(153.4783612, 153.4777153,
153.3954873, 153.3124098), Er_Pres = c(0L, 0L, 0L, 0L), Er_Count = c(0L,
0L, 0L, 0L), Er_Count_Density = c(0, 0, 0, 0), Month = structure(c(8L,
8L, 8L, 8L), .Label = c("1", "2", "3", "4", "5", "6", "7", "8",
"9", "10", "11"), class = "factor"), Year = structure(c(1L, 1L,
1L, 1L), .Label = c("1997", "1998", "1999", "2000", "2001", "2002",
"2003", "2004", "2005", "2006", "2007", "2008", "2009", "2010",
"2011", "2012", "2013"), class = "factor"), chl = c(0.53747,
0.53747, 0.53747, 0.581741), SST = c(13.4171, 13.4171, 13.4171,
13.4025002), Bathymetry = c(76.11354065, 92.14147949, 90.60312653,
71.55316162), Grid_Area = c(25, 25, 25, 25), DFS = c(6.807817092,
4.233185446, 9.199096676, 5.153224038), Slope = c(0.13670446,
0.38316911, 0.08646853, 0.20038579), DOY = c(244L, 244L, 244L,
244L)), .Names = c("Grid_ID", "Grid_Lat", "Grid_Long", "Er_Pres",
"Er_Count", "Er_Count_Density", "Month", "Year", "chl", "SST",
"Bathymetry", "Grid_Area", "DFS", "Slope", "DOY"), row.names = c(NA,
4L), class = "data.frame")
m1<-gam(Er_Pres~ s(Grid_Lat,Grid_Long,k=10,bs='tp')+Month+Year+s(SST,k=5,bs='tp'),family=quasibinomial(link='logit'),data=grays,gamma=1.4,offset(Grid_Area))
#step 2 - reduce dataset and run second GAM for positive abundance only.
grays2<-subset(grays,Er_Pres>0)
m2<-gam(Er_Count~ Year +s(Grid_Lat,Grid_Long,k=10,bs='tp') + s(SST,k=5,bs='tp') + s(sqrt(DFS),k=5,bs='tp') + Month +log10(chl),family=Gamma(link='log'),data=grays2,Gamma=1.4,offset(Grid_Area))
Running the second model gives me the follow error:
Error in predict.gam(m2, newdata = full, type = "response") :
1997, 1998, 2006, 2007 not in original fit
This is an old post, so I suspect you have found a solution by now, but if not consider this:
If you only want to account for data within the same year being more similar than data across year, but you are not necessarily interested in the effect of particular years (say the difference between 2007 and 1998) then you could specify year as a random effect.
I believe there are several ways to do this, but in mgcv, you can specify:
s(Year, bs="re")
I'm trying to calculate sums and means on a very large dataset (~22000 records) for several parameters (e.g. Er_Count, Mn_Count) by month, year , Survey ID and Grid ID. I tried this code initially to get overall sums:
dlply(Effort_All,c("Er_Count","Mn_Count","Bp_Count"),sum)
And received the following error:
Error: only defined on a data frame with all numeric variables
Since I cannot even get overall sums, I am unable to get statistics by the specific variables either. Do I need to split the data in some manner?
I have included a sample dataset of 25 records below.
structure(list(Grid_ID = structure(c(527L, 92L, 331L, 395L, 934L,
93L), .Label = c("1", "1,000", "1,001", "1,002", "1,003", "1,004",
"1,005", "1,006", "1,007", "1,008", "1,009", "1,010", "1,011",
"1,012", "1,013", "1,014", "1,015", "1,016", "1,017", "1,018",
"1,019", "1,020", "1,021", "1,022", "1,023", "1,024", "1,025",
"1,026", "1,027", "1,028", "1,029", "1,030", "1,031", "1,032",
"1,033", "1,034", "1,035", "1,036", "1,037", "1,038", "1,039",
"1,040", "1,041", "1,042", "1,043", "1,044", "1,045", "1,046",
"1,047", "1,048", "1,049", "1,050", "1,051", "1,052", "1,053",
"1,054", "1,055", "1,056", "1,057", "1,058", "1,059", "1,060",
"1,061", "10", "100", "101", "102", "103", "104", "105", "106",
"107", "108", "109", "11", "110", "111", "112", "113", "114",
"115", "116", "117", "118", "119", "12", "120", "121", "122",
"123", "124", "125", "126", "127", "128", "129", "13", "130",
"131", "132", "133", "134", "135", "136", "137", "138", "139",
"14", "140", "141", "142", "143", "144", "145", "146", "147",
"148", "149", "15", "150", "151", "152", "153", "154", "155",
"156", "157", "158", "159", "16", "160", "161", "162", "163",
"164", "165", "166", "167", "168", "169", "17", "170", "171",
"172", "173", "174", "175", "176", "177", "178", "179", "18",
"180", "181", "182", "183", "184", "185", "186", "187", "188",
"189", "19", "190", "191", "192", "193", "194", "195", "196",
"197", "198", "199", "2", "20", "200", "201", "202", "203", "204",
"205", "206", "207", "208", "209", "21", "210", "211", "212",
"213", "214", "215", "216", "217", "218", "219", "22", "220",
"221", "222", "223", "224", "225", "226", "227", "228", "229",
"23", "230", "231", "232", "233", "234", "235", "236", "237",
"238", "239", "24", "240", "241", "242", "243", "244", "245",
"246", "247", "248", "249", "25", "250", "251", "252", "253",
"254", "255", "256", "257", "258", "259", "26", "260", "261",
"262", "263", "264", "265", "266", "267", "268", "269", "27",
"270", "271", "272", "273", "274", "275", "276", "277", "278",
"279", "28", "280", "281", "282", "283", "284", "285", "286",
"287", "288", "289", "29", "290", "291", "292", "293", "294",
"295", "296", "297", "298", "299", "3", "30", "300", "301", "302",
"303", "304", "305", "306", "307", "308", "309", "31", "310",
"311", "312", "313", "314", "315", "316", "317", "318", "319",
"32", "320", "321", "322", "323", "324", "325", "326", "327",
"328", "329", "33", "330", "331", "332", "333", "334", "335",
"336", "337", "338", "339", "34", "340", "341", "342", "343",
"344", "345", "346", "347", "348", "349", "35", "350", "351",
"352", "353", "354", "355", "356", "357", "358", "359", "36",
"360", "361", "362", "363", "364", "365", "366", "367", "368",
"369", "37", "370", "371", "372", "373", "374", "375", "376",
"377", "378", "379", "38", "380", "381", "382", "383", "384",
"385", "386", "387", "388", "389", "39", "390", "391", "392",
"393", "394", "395", "396", "397", "398", "399", "4", "40", "400",
"401", "402", "403", "404", "405", "406", "407", "408", "409",
"41", "410", "411", "412", "413", "414", "415", "416", "417",
"418", "419", "42", "420", "421", "422", "423", "424", "425",
"426", "427", "428", "429", "43", "430", "431", "432", "433",
"434", "435", "436", "437", "438", "439", "44", "440", "441",
"442", "443", "444", "445", "446", "447", "448", "449", "45",
"450", "451", "452", "453", "454", "455", "456", "457", "458",
"459", "46", "460", "461", "462", "463", "464", "465", "466",
"467", "468", "469", "47", "470", "471", "472", "473", "474",
"475", "476", "477", "478", "479", "48", "480", "481", "482",
"483", "484", "485", "486", "487", "488", "489", "49", "490",
"491", "492", "493", "494", "495", "496", "497", "498", "499",
"5", "50", "500", "501", "502", "503", "504", "505", "506", "507",
"508", "509", "51", "510", "511", "512", "513", "514", "515",
"516", "517", "518", "519", "52", "520", "521", "522", "523",
"524", "525", "526", "527", "528", "529", "53", "530", "531",
"532", "533", "534", "535", "536", "537", "538", "539", "54",
"540", "541", "542", "543", "544", "545", "546", "547", "548",
"549", "55", "550", "551", "552", "553", "554", "555", "556",
"557", "558", "559", "56", "560", "561", "562", "563", "564",
"565", "566", "567", "568", "569", "57", "570", "571", "572",
"573", "574", "575", "576", "577", "578", "579", "58", "580",
"581", "582", "583", "584", "585", "586", "587", "588", "589",
"59", "590", "591", "592", "593", "594", "595", "596", "597",
"598", "599", "6", "60", "600", "601", "602", "603", "604", "605",
"606", "607", "608", "609", "61", "610", "611", "612", "613",
"614", "615", "616", "617", "618", "619", "62", "620", "621",
"622", "623", "624", "625", "626", "627", "628", "629", "63",
"630", "631", "632", "633", "634", "635", "636", "637", "638",
"639", "64", "640", "641", "642", "643", "644", "645", "646",
"647", "648", "649", "65", "650", "651", "652", "653", "654",
"655", "656", "657", "658", "659", "66", "660", "661", "662",
"663", "664", "665", "666", "667", "668", "669", "67", "670",
"671", "672", "673", "674", "675", "676", "677", "678", "679",
"68", "680", "681", "682", "683", "684", "685", "686", "687",
"688", "689", "69", "690", "691", "692", "693", "694", "695",
"696", "697", "698", "699", "7", "70", "700", "701", "702", "703",
"704", "705", "706", "707", "708", "709", "71", "710", "711",
"712", "713", "714", "715", "716", "717", "718", "719", "72",
"720", "721", "722", "723", "724", "725", "726", "727", "728",
"729", "73", "730", "731", "732", "733", "734", "735", "736",
"737", "738", "739", "74", "740", "741", "742", "743", "744",
"745", "746", "747", "748", "749", "75", "750", "751", "752",
"753", "754", "755", "756", "757", "758", "759", "76", "760",
"761", "762", "763", "764", "765", "766", "767", "768", "769",
"77", "770", "771", "772", "773", "774", "775", "776", "777",
"778", "779", "78", "780", "781", "782", "783", "784", "785",
"786", "787", "788", "789", "79", "790", "791", "792", "793",
"794", "795", "796", "797", "798", "799", "8", "80", "800", "801",
"802", "803", "804", "805", "806", "807", "808", "809", "81",
"810", "811", "812", "813", "814", "815", "816", "817", "818",
"819", "82", "820", "821", "822", "823", "824", "825", "826",
"827", "828", "829", "83", "830", "831", "832", "833", "834",
"835", "836", "837", "838", "839", "84", "840", "841", "842",
"843", "844", "845", "846", "847", "848", "849", "85", "850",
"851", "852", "853", "854", "855", "856", "857", "858", "859",
"86", "860", "861", "862", "863", "864", "865", "866", "867",
"868", "869", "87", "870", "871", "872", "873", "874", "875",
"876", "877", "878", "879", "88", "880", "881", "882", "883",
"884", "885", "886", "887", "888", "889", "89", "890", "891",
"892", "893", "894", "895", "896", "897", "898", "899", "9",
"90", "900", "901", "902", "903", "904", "905", "906", "907",
"908", "909", "91", "910", "911", "912", "913", "914", "915",
"916", "917", "918", "919", "92", "920", "921", "922", "923",
"924", "925", "926", "927", "928", "929", "93", "930", "931",
"932", "933", "934", "935", "936", "937", "938", "939", "94",
"940", "941", "942", "943", "944", "945", "946", "947", "948",
"949", "95", "950", "951", "952", "953", "954", "955", "956",
"957", "958", "959", "96", "960", "961", "962", "963", "964",
"965", "966", "967", "968", "969", "97", "970", "971", "972",
"973", "974", "975", "976", "977", "978", "979", "98", "980",
"981", "982", "983", "984", "985", "986", "987", "988", "989",
"99", "990", "991", "992", "993", "994", "995", "996", "997",
"998", "999"), class = "factor"), ER_Groups = c(2, 2, 2, 3, 5,
6), Er_Count = c(60, 75, 14, 12, 8, 26), Mn_Count = c(30, 9, 6, 33,
7, 12), Bp_Groups = c(1, 2, 1, 1, 0, 1), Bp_Count = c(3, 3, 2,
5, 0, 6), Mn_Groups = c(1, 1, 3, 1, 0, 0), Month = c(10L, 6L,
12L, 4L, 2L, 4L), Year = c(2000L, 2001L, 2009L, 2004L, 2002L,
2001L), SurveyID = structure(c(16L, 24L, 93L, 56L, 34L, 22L), .Label = c("199708HS",
"199808HS", "199908HS", "199909SSLQ", "199910SSL", "199911SSL",
"200001SSLQ", "200002SSL", "200003SSLQ", "200004SSLQ", "200005SSL",
"200006SSL", "200007SSL", "200008HS", "200008SSL", "200009SSL",
"200010SSL", "200011SSL", "200101SSL", "200102SSL", "200103SSL",
"200104SSL", "200105SSL", "200106SSL", "200107SSL", "200108HS",
"200108SSL", "200109SSL", "200110SSL", "200111SSL", "200112SSL",
"200201SSL", "200202SSL", "200203SSL", "200204SSL", "200205SSL",
"200206SSL", "200207SSL", "200208HS", "200208SSL", "200210SSL",
"200211SSL", "200212SSL", "200301SSL", "200302SSL", "200303SSL",
"200304SSL", "200305SSL", "200306SSL", "200307SSL", "200309SSL",
"200310SSL", "200311SSL", "200312SSL", "200403SSL", "200404SSL",
"200405SSL", "200406SSL", "200407SSL", "200408HS", "200408SSL",
"200409SSL", "200505SSL", "200506SSL", "200507SSL", "200510SSL",
"200512SSL", "200603SSL", "200609SSL", "200612SSL", "200709GAP07",
"200710GAP07", "200712GAP07", "200802GAP07", "200803GAP07", "200804GAP07",
"200805GAP07", "200806GAP07", "200807GAP07", "200808GAP07", "200809GAP08",
"200810GAP08", "200812GAP08", "200901GAP08", "200903GAP08", "200904GAP08",
"200905GAP08", "200906GAP08", "200907GAP08", "200908GAP08", "200909GAP08",
"200910GAP09", "200912GAP09", "201001GAP09", "201002GAP09", "201003GAP09",
"201004GAP09", "201005GAP09", "201006GAP09", "201007GAP09", "201008GAP09",
"201009GAP09", "201010GAP09", "201011GAP09", "201101GAP09", "201102GAP09",
"201103GAP09", "201104GAP09", "201106GAP09", "201108GAP09", "201109GAP09",
"201111GAP09", "201201GAP09", "201203GAP09", "201205GAP09", "201207GAP09",
"201208GAP09", "201211GAP09", "201301GAP09", "201303GAP09", "201305GAP09",
"201307GAP09", "201309GAP09", "201311GAP09"), class = "factor"),
Er_Group_Density = c(4, 9, 12, 4, 1, 0), Mn_Group_Density = c(3,
1, 1, 1, 0, 2), Bp_Group_Density = c(1, 2, 1, 0, 1, 0), Er_Count_Density = c(50,
14, 12, 9, 6, 4), Mn_Count_Density = c(9, 5, 2, 3, 2, 0), Bp_Count_Density = c(2,
3, 0, 4, 1, 0)), .Names = c("Grid_ID", "ER_Groups", "Er_Count",
"Mn_Count", "Bp_Groups", "Bp_Count", "Mn_Groups", "Month", "Year",
"SurveyID", "Er_Group_Density", "Mn_Group_Density", "Bp_Group_Density",
"Er_Count_Density", "Mn_Count_Density", "Bp_Count_Density"), row.names = c(2770L,
4421L, 17348L, 11263L, 6736L, 3974L), class = "data.frame")
There are a number of ways to get statistics by group. I'll assume you have a bias for plyr, since your example uses it.
Remember that dlply() splits the data into smaller dataframes by the grouping variables, then it applies the requested function to each of the smaller dataframes. Therefore the function you pass should operate on a whole dataframe. sum() does not do this. You can write your own function, though.
Based on your description, what you want is something like this
myfun <- function(x) colSums(x[, c("Er_Count", "Mn_Count", "Bp_Count")])
dlply(Effort_All, c("Month", "Year", "Grid_ID", "SurveyID"), myfun)
Remember that the second argument to dlply() is the set of variables used for grouping. Not sure why you want the output as a list. Would it be easier to read if you used ddply (with the same arguments)?
Other approaches include using sqldf() or something like lapply().
=============== EDIT: Other approaches =============
sqldf is always very easy to read and understand:
output <- sqldf('select Month,Year,Grid_ID,SurveyID,
sum(Er_Count) as ercount,
sum(Mn_Count) as mncount,
sum(Bp_Count) as bpcount
from Effort_All
group by Month, Year, Grid_ID, SurveyID')
lapply works pretty much the same way as dlply. Just different arguments.
Also, you could use colwise from plyr
dlply(Effort_All, .(Month, Year, Grid_ID, SurveyID), colwise(sum, .(Er_Count, Mn_Count, Bp_Count)))
Or summarise_each from dplyr
library(dplyr)
Effort_All%>%
group_by(Month, Year, Grid_ID, SurveyID) %>%
summarise_each(funs(sum), Er_Count, Mn_Count, Bp_Count)
#Source: local data frame [6 x 7]
#Groups: Month, Year, Grid_ID
# Month Year Grid_ID SurveyID Er_Count Mn_Count Bp_Count
# 1 2 2002 884 200203SSL 8 7 0
# 2 4 2001 126 200104SSL 26 12 6
# 3 4 2004 399 200404SSL 12 33 5
# 4 6 2001 125 200106SSL 75 9 3
# 5 10 2000 517 200009SSL 60 30 3
# 6 12 2009 340 200912GAP09 14 6 2