Replace value with NULL in column [duplicate] - r

This question already has an answer here:
Set NA and "" Cells in R Dataframe to NULL
(1 answer)
Closed 4 years ago.
I have a dataframe where I want to replace all values in a column that contain the value '2018' with NULL.
I have a dataset where every value in a column is a list. There are NULLs included as well. One of the values is not a list and I want to replace it with a NULL. If I replace it with NA then the datatypes in that column are mixed.
If I have a column like below, how do I replace the value containing 2018 with NULL instead of NA?
spend actions
176.2 2018-02-24
166.66 list(action_type = c("landing_page_view", "link_click", "offsit...
153.89 list(action_type = c("landing_page_view", "like", "link_click",...
156.54 list(action_type = c("landing_page_view", "like", "link_click",...
254.95 list(action_type = c("landing_page_view", "like", "link_click",...
374 list(action_type = c("landing_page_view", "like", "link_click",...
353.29 list(action_type = c("landing_page_view", "like", "link_click",...
0.41 NULL
Reproducible Example:
structure(list(spend = c("176.2", "166.66", "153.89", "156.54",
"254.95", "374", "353.29", "0.41"), actions = list("2018-02-24",
structure(list(action_type = c("landing_page_view", "link_click",
"offsite_conversion.fb_pixel_add_to_cart",
"offsite_conversion.fb_pixel_purchase",
"offsite_conversion.fb_pixel_search",
"offsite_conversion.fb_pixel_view_content",
"post", "post_reaction", "page_engagement", "post_engagement",
"offsite_conversion"), value = c("179", "275", "212", "18",
"269", "1434", "1", "17", "293", "293", "1933")), .Names = c("action_type",
"value"), class = "data.frame", row.names = c(NA, 11L)),
structure(list(action_type = c("landing_page_view", "like",
"link_click", "offsite_conversion.fb_pixel_add_to_cart",
"offsite_conversion.fb_pixel_purchase",
"offsite_conversion.fb_pixel_search",
"offsite_conversion.fb_pixel_view_content", "post_reaction",
"page_engagement", "post_engagement", "offsite_conversion"
), value = c("136", "3", "248", "101", "6", "237", "730",
"11", "262", "259", "1074")), .Names = c("action_type", "value"
), class = "data.frame", row.names = c(NA, 11L)), structure(list(
action_type = c("landing_page_view", "like", "link_click",
"offsite_conversion.fb_pixel_add_to_cart",
"offsite_conversion.fb_pixel_purchase",
"offsite_conversion.fb_pixel_search",
"offsite_conversion.fb_pixel_view_content",
"post", "post_reaction", "page_engagement", "post_engagement",
"offsite_conversion"), value = c("95", "1", "156", "91",
"5", "83", "532", "1", "13", "171", "170", "711")), .Names =
c("action_type",
"value"), class = "data.frame", row.names = c(NA, 12L)),
structure(list(action_type = c("landing_page_view", "like",
"link_click", "offsite_conversion.fb_pixel_add_to_cart",
"offsite_conversion.fb_pixel_purchase",
"offsite_conversion.fb_pixel_search",
"offsite_conversion.fb_pixel_view_content", "post_reaction",
"page_engagement", "post_engagement", "offsite_conversion"
), value = c("178", "4", "243", "56", "4", "138", "437",
"19", "266", "262", "635")), .Names = c("action_type", "value"
), class = "data.frame", row.names = c(NA, 11L)), structure(list(
action_type = c("landing_page_view", "like", "link_click",
"offsite_conversion.fb_pixel_add_to_cart",
"offsite_conversion.fb_pixel_purchase",
"offsite_conversion.fb_pixel_search",
"offsite_conversion.fb_pixel_view_content",
"post_reaction", "page_engagement", "post_engagement",
"offsite_conversion"), value = c("203", "2", "306", "105",
"7", "186", "954", "23", "331", "329", "1252")), .Names =
c("action_type",
"value"), class = "data.frame", row.names = c(NA, 11L)),
structure(list(action_type = c("landing_page_view", "like",
"link_click", "offsite_conversion.fb_pixel_add_to_cart",
"offsite_conversion.fb_pixel_purchase",
"offsite_conversion.fb_pixel_search",
"offsite_conversion.fb_pixel_view_content", "post", "post_reaction",
"page_engagement", "post_engagement", "offsite_conversion"
), value = c("241", "4", "320", "106", "3", "240", "789",
"1", "17", "342", "338", "1138")), .Names = c("action_type",
"value"), class = "data.frame", row.names = c(NA, 12L)),
NULL)), .Names = c("spend", "actions"), row.names = c(NA,
-8L), class = "data.frame")
My ultimate goal is to use this function with this dataset to make the action_types their own column. This function works when either a list or NULL is in the actions column:
fb_insights_all<-df %>%
as.tibble() %>%
filter(!map_lgl(actions, is.null)) %>%
unnest() %>%
right_join(select(df, -actions)) %>%
spread(action_type, value)
Error: Each column must either be a list of vectors or a list of data frames [actions]

Without data to test this on, I'd try:
df$COL1<-ifelse(grepl("2018", df$COL1),"NULL",df$COL1)
As stated here NA functions more like what you seem to be trying to do, while NULL serves a different function. If you just want the value to just say "NULL" rather than function like NULL, treat it like a character value.

Related

Remove single characters without changing the numbers from an r dataframe

My dataframe has many arrows, ">" and "<"s in it alongside some of the element values. I want to remove these characters but keep the numbers. I only know how to replace the entire element with NA with the following code.
df <- apply(df, 1:2, gsub, pattern = "<|>", replacement = "")
Will someone please help me edit this so that it keeps the element numbers too, instead of throwing the entire thing out?
Dataframe:
structure(list(`Analyte Sample` = c(1, 2, 3, 4, 5, 6, 7, 8,
9, 10, 11, 12, 13, 14), A = c("4190", "6665", "7435", "2052",
"783", "322", "199", "90", "46", "17", "8", "3", "3", "<1↓"
), B = c("11569", "6677", "3852", "983.88", "589", "359", "203",
"68", "33", "12", "6", "<2↓", "4", "<1↓"), C = c("20453",
"7699", "2499", "707.98", "412", "328", "156", "88", "39", "27",
"17", "<1↓", "<3↓", "<1↓"), D = c("7893", ">20000↑",
"1623", "685.64", "321", "644", "112", "65", "35", "29", "9",
"5", "<3↓", "<1↓"), E = c("320", "15444", "2049", "1065",
"389", "365", "145", "77", "38", "16", "9", "6", "<2↓", "<2↓"
), F = c("7438", ">21999↑", "3472", "1057", "563", "401", "167",
"89", "46", "19", "6", "<1↓", "<1↓", "<1↓"), G = c(7345,
9001, 2473, 1138, 516, 403, 134, 81, 37, 17, 8, 6, 4, 3), H = c("9004",
"3998", "2299", "964.88", "499", "341", "112", "88", "39", "32",
"<29↓", "<30↓", "<31↓", "<29↓"), I = c("8434", "8700",
"2217", "1263", "567", "352", "153", "80", "43", "18", "9", "2",
"3", "<1↓"), J = c("7734", "6733", "2092", "1115", "637", "332",
"155", "82", "37", "17", "10", "4", "1", "<1↓"), K = c(">3718↑",
">3000↑", "2118", "862.13", "426", "355", "143", "78", "44",
"22", "11", "<4↓", "<4↓", "<3↓"), L = c(6345, 7688, 2311,
1195, 647, 366, 177, 83, 41, 20, 8, 6, 3, 2), M = c("4222", ">25587↑",
"1846", "814.61", "422", "314", "154", "86", "41", "27", "21",
"<2↓", "<2↓", "<3↓"), N = c("6773", "8934", "2381", "1221",
"677", "356", "146", "89", "40", "17", "10", "5", "2", "<2↓"
), O = c(">2200↑", ">2133↑", ">2000↑", "564.5", "226",
"476", "111", "60", "32", "36", "18", "<10↓", "<1↓", "<2↓"
)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA,
-14L), spec = structure(list(cols = list(`Analyte Sample` = structure(list(), class = c("collector_double",
"collector")), A = structure(list(), class = c("collector_character",
"collector")), B = structure(list(), class = c("collector_character",
"collector")), C = structure(list(), class = c("collector_character",
"collector")), D = structure(list(), class = c("collector_character",
"collector")), E = structure(list(), class = c("collector_character",
"collector")), F = structure(list(), class = c("collector_character",
"collector")), G = structure(list(), class = c("collector_double",
"collector")), H = structure(list(), class = c("collector_character",
"collector")), I = structure(list(), class = c("collector_character",
"collector")), J = structure(list(), class = c("collector_character",
"collector")), K = structure(list(), class = c("collector_character",
"collector")), L = structure(list(), class = c("collector_double",
"collector")), M = structure(list(), class = c("collector_character",
"collector")), N = structure(list(), class = c("collector_character",
"collector")), O = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
You can use lapply() which returns a list and assign it back to df[]. [] is to keep the original attributes, i.e. a class of data.frame. You will see that df becomes what you want.
df[] <- lapply(df, gsub, pattern = "<|>", replacement = "")
I think in your case the best would be to use a regular expression. Using tidyverse:
df %>% mutate_at(vars(A:O), ~ as.numeric(gsub("[^0-9]*([0-9]*).*", "\\1", .)))
If you specifically want only to change values which start with a < or >, you do the following:
df %>% mutate_at(vars(A:O), ~ as.numeric(gsub("[<>]*([0-9]*).*", "\\1", .)))
Of course, you can also use apply... but mind the way apply changes the data frame into a matrix before applying the function (the columns which are numbers will have spaces prefixed, so we need to include space in the pattern):
apply(df, 2, function(x) gsub("[ <>]*([0-9]*).*", "\\1", x))
Explanation:
The pattern [0-9]* matches a digit any number of times. The pattern [^0-9] matches anything but a digit any number of times.
You can try one of these options:
#Code 1
df <- apply(df, 1:2, function(x) gsub(pattern = "<|>", replacement = "",x))
#Code 2
df <- sapply(df,function(x) gsub(pattern = "<|>", replacement = "",x))
Just be careful that the output can be a matrix, so you will have to transform again to dataframe using as.data.frame().

error Predictor.new() function package IML in R

I am attempting to use package 'iml' in R to create plots of SHAP values from a GBM model created in H2O.
When I try to create the R6 Predictor object using the Predictor.new() function I get an error that states Error : all(feature.class %in% names(feature.types)) is not TRUE.
From this I am guessing that there is something about one of the feature classes that is incorrect, but this is just an educated guess based upon what the error message is literally saying.
Here is a sample of anonymized data (I can't share the real data because it is confidential):
structure(list(dlr_id_cur = c(1, 2), date_eff = structure(c(16014,
15416), class = "Date"), new_vec_ind = structure(c(1L, 1L), .Label = c("NNA",
"UNA"), class = "factor"), cntrct_term = c(9587879614862828,
19), amt_financed = c(9455359, 65561175), reg_payment = c(885288,
389371), acct_stat_cd = structure(c(3L, 3L), .Label = c("11",
"22", "33"), class = "factor"), base_rental = c(1, 626266), down_pymt = c(2,
6654661), car_count = c(5, 1), dur_lease = c(3974, 6466), returned = structure(1:2, .Label = c("00",
"11"), class = "factor"), state = structure(c(10L, 1L), .Label = c("ANA",
"BNA", "CNA", "DNA", "FNA", "GNA", "HNA", "INA", "KNA", "LNA",
"MNA", "NNA", "ONA", "PNA", "QNA", "RNA", "SNA", "TNA", "UNA",
"VNA", "WNA"), class = "factor"), zip = c(34633, 45222), zip_two_digits = structure(c(71L,
36L), .Label = c("00", "01", "02", "03", "04", "05", "06", "07",
"08", "09", "110", "111", "112", "113", "114", "115", "116",
"117", "118", "119", "220", "221", "222", "223", "224", "225",
"226", "227", "228", "229", "330", "331", "332", "333", "334",
"335", "336", "337", "338", "339", "440", "441", "442", "443",
"444", "445", "446", "447", "448", "449", "550", "551", "552",
"553", "554", "555", "556", "557", "558", "559", "660", "661",
"662", "663", "664", "665", "666", "667", "668", "669", "770",
"771", "772", "773", "774", "775", "776", "777", "778", "779",
"880", "881", "882", "883", "884", "885", "886", "887", "888",
"889", "990", "991", "992", "993", "994", "995", "996", "997",
"998", "999", "ANA", "BNA", "CNA", "ENA", "GNA", "HNA", "JNA",
"KNA", "LNA", "MNA", "NNA", "PNA", "RNA", "SNA", "TNA", "VNA"
), class = "factor")
, mod_year_date = c(8156, 6278), vehic_mod_fam_code = structure(c(2L,
2L), .Label = c("BNA", "CNA", "ENA", "MNA", "SNA", "TNA", "VNA",
"XNA"), class = "factor"), mod_class_code = structure(c(4L, 2L
), .Label = c("BNA", "CNA", "ENA", "GNA", "MNA", "RNA", "SNA"
), class = "factor"), count_dl_DL_CDE_CSPS_A_NP = c(945, 337),
DL_CDE_CSPS_A_NP_avg_dl = c(3355188283749626, 8835582388327814
), count_sv_DL_CDE_CSPS_A_NP = c(6532, 8475), DL_CDE_CSPS_A_NP_avg_sv = c(4471193398278526,
6934672627789796), count_dl_NUM_CSPS_INIT_SCR = c(774, 773
), NUM_CSPS_INIT_SCR_avg_dl = c(9468453388562312, 5847816458727333
), count_sv_NUM_CSPS_INIT_SCR = c(2467, 3882), NUM_CSPS_INIT_SCR_avg_sv = c(5857936629789154,
8963457353776469), count_FFV = c(8563, 2566), average_FFV = c(25697792913881564,
13693335921646120), csps_NUM_SV = c(8, 6), avg_SV_rating = c(9817541424596360,
6218928542331853), csps_FFV_ratio = c(23125612473476952,
2), avg_DL_rating = c(2182256921592387, 7668957586431513),
has_DL_rating = c(1, 8), has_bad_DL_rating = c(2, 4), serv_has_MNT = c(7,
3), serv_has_SCP = c(5, 4), serv_has_ELW = c(9, 4), serv_has_LCP = c(7,
1), ro_count = c(6, 1), ro_tot_cust_pay = c(2, 188759), ro_tot_pay = c(3,
764372), date_eff_weekday = structure(c(4L, 3L), .Label = c("FNA",
"MNA", "SNA", "TNA", "WNA"), class = "factor"), date_eff_month_int = c(83,
7), date_eff_day = c(2, 24)), .Names = c("dlr_id_cur", "date_eff",
"new_vec_ind", "cntrct_term", "amt_financed", "reg_payment",
"acct_stat_cd", "base_rental", "down_pymt", "car_count", "dur_lease",
"returned", "state", "zip", "zip_two_digits", "mod_year_date",
"vehic_mod_fam_code", "mod_class_code", "count_dl_DL_CDE_CSPS_A_NP",
"DL_CDE_CSPS_A_NP_avg_dl", "count_sv_DL_CDE_CSPS_A_NP", "DL_CDE_CSPS_A_NP_avg_sv",
"count_dl_NUM_CSPS_INIT_SCR", "NUM_CSPS_INIT_SCR_avg_dl", "count_sv_NUM_CSPS_INIT_SCR",
"NUM_CSPS_INIT_SCR_avg_sv", "count_FFV", "average_FFV", "csps_NUM_SV",
"avg_SV_rating", "csps_FFV_ratio", "avg_DL_rating", "has_DL_rating",
"has_bad_DL_rating", "serv_has_MNT", "serv_has_SCP", "serv_has_ELW",
"serv_has_LCP", "ro_count", "ro_tot_cust_pay", "ro_tot_pay",
"date_eff_weekday", "date_eff_month_int", "date_eff_day"), row.names = 1:2, class = "data.frame")
# 1. create a data frame with just the features
features_iml <- as.data.frame(df_testR) %>% dplyr::select(-returned)
# 2. Create a vector with the actual responses
response_iml <- as.numeric(as.vector(df_testR$returned))
# 3. Create custom predict function that returns the predicted values as a
# vector (probability of customer churn in my example)
pred <- function(model, newdata) {
results <- as.data.frame(h2o.predict(model, as.h2o(newdata)))
return(results[[3L]])
}
# 4. example of prediction output
pred(GBM5, features_iml) %>% head()
# 5. create Predictor object
predictor = Predictor$new(model = GBM5, data = features_iml, y =
response_iml, predict.fun = pred, class = "classification")
Error : all(feature.class %in% names(feature.types)) is not TRUE
Here are also so basic descriptions of the dataset and model object I'm
using in the code above:
class(GBM5)
[1] "H2OBinomialModel"
attr(,"package")
[1] "h2o"
class(df_testR)
[1] "tbl_df" "tbl" "data.frame"
dim(df_testR)
[1] 47006 44
If there is anything else I can provide or if I have been unclear please let me know.
In the iml package there are specific feature classes that are acceptable, namely numeric, integer, character, factor and ordered. If you have any Date objects, or any other data type than the 5 listed here than the Predictor object can not be created.

Convert column types to their read_csv() column type in R

One of my favorite things about library(readr) and the read_csv() function in R is that it almost always sets the column types of my data to the correct class. However, I am currently working with an API in R that returns data to me as a dataframe of all character classes, even if the data is clearly numbers. Take this dataframe for example, which has some sports data:
dput(mydf)
structure(list(isUnplayed = c("false", "false", "false"), isInProgress =
c("false", "false", "false"), isCompleted = c("true", "true", "true"), awayScore = c("106",
"95", "95"), homeScore = c("94", "97", "111"), game.ID = c("31176",
"31177", "31178"), game.date = c("2015-10-27", "2015-10-27",
"2015-10-27"), game.time = c("8:00PM", "8:00PM", "10:30PM"),
game.location = c("Philips Arena", "United Center", "Oracle Arena"
), game.awayTeam.ID = c("88", "86", "110"), game.awayTeam.City = c("Detroit",
"Cleveland", "New Orleans"), game.awayTeam.Name = c("Pistons",
"Cavaliers", "Pelicans"), game.awayTeam.Abbreviation = c("DET",
"CLE", "NOP"), game.homeTeam.ID = c("91", "89", "101"), game.homeTeam.City = c("Atlanta",
"Chicago", "Golden State"), game.homeTeam.Name = c("Hawks",
"Bulls", "Warriors"), game.homeTeam.Abbreviation = c("ATL",
"CHI", "GSW"), quarterSummary.quarter = list(structure(list(
`#number` = c("1", "2", "3", "4"), awayScore = c("25",
"23", "34", "24"), homeScore = c("25", "18", "23", "28"
)), .Names = c("#number", "awayScore", "homeScore"), class = "data.frame", row.names = c(NA,
4L)), structure(list(`#number` = c("1", "2", "3", "4"), awayScore = c("17",
"23", "28", "27"), homeScore = c("26", "20", "25", "26")), .Names = c("#number",
"awayScore", "homeScore"), class = "data.frame", row.names = c(NA,
4L)), structure(list(`#number` = c("1", "2", "3", "4"), awayScore = c("35",
"14", "26", "20"), homeScore = c("39", "20", "35", "17")), .Names = c("#number",
"awayScore", "homeScore"), class = "data.frame", row.names = c(NA,
4L)))), .Names = c("isUnplayed", "isInProgress", "isCompleted",
"awayScore", "homeScore", "game.ID", "game.date", "game.time",
"game.location", "game.awayTeam.ID", "game.awayTeam.City", "game.awayTeam.Name",
"game.awayTeam.Abbreviation", "game.homeTeam.ID", "game.homeTeam.City",
"game.homeTeam.Name", "game.homeTeam.Abbreviation", "quarterSummary.quarter"
), class = "data.frame", row.names = c(NA, 3L))
It is quite a hassle to deal with this dataframe once it is returned by the API, given the class types. I've come up with a sort of a hack to update the column classes, which is as follows:
write_csv(mydf, 'mydf.csv')
mydf <- read_csv('mydf.csv')
By writing to CSV and then re-reading the CSV using read_csv(), the dataframe columns update. Unfortunately I am left with a CSV file in my directory that I don't want. Is there a way to update the columns of an R dataframe to their 'read_csv()' column classes, without actually having to write the CSV?
Any help is appreciated!
You don't need to write and read the data if you just want readr to guess you column type. You could use readr::type_convert for that:
iris %>%
dplyr::mutate(Sepal.Width = as.character(Sepal.Width)) %>%
readr::type_convert() %>%
str()
For comparison:
iris %>%
dplyr::mutate(Sepal.Width = as.character(Sepal.Width)) %>%
str()
try this code, type.convert convert a character vector to logical, integer, numeric, complex or factor as appropriate.
indx <- which(sapply(df, is.character))
df[, indx] <- lapply(df[, indx], type.convert)
indx <- which(sapply(df, is.factor))
df[, indx] <- lapply(df[, indx], as.character)

Reading list like values from dataframe cell

I have pulled data from facebook api fbRads. Here is the sample of data frame I have:
mydata <- fread('ID,ACTIONS
02,"list(action_type = c("link_click", "post_reaction", "page_engagement", "post_engagement"), value = c("1", "4", "5", "5"))"
03,"list(action_type = c("app_custom_event.fb_mobile_activate_app", "app_custom_event.fb_mobile_add_to_cart", "app_custom_event.fb_mobile_content_view", "app_custom_event.fb_mobile_purchase", "app_custom_event.fb_mobile_search", "app_custom_event.other", "like", "link_click", "mobile_app_install", "offsite_conversion.fb_pixel_add_to_cart", "offsite_conversion.fb_pixel_add_to_wishlist", "offsite_conversion.fb_pixel_lead", "offsite_conversion.fb_pixel_purchase", "offsite_conversion.fb_pixel_search", "offsite_conversion.fb_pixel_view_content", "post_reaction", "page_engagement", "post_engagement", "offsite_conversion", "app_custom_event"), value = c("994", "219", "1696", "9", "47", "425", "67", "2267", "37", "348", "53", "3", "7", "218", "3286", "145", "2479", "2412", "3915", "3390"))"
04,"NULL"
05,"list(action_type = c("app_custom_event.fb_mobile_activate_app", "app_custom_event.fb_mobile_add_to_cart", "app_custom_event.fb_mobile_content_view", "app_custom_event.fb_mobile_purchase", "app_custom_event.fb_mobile_search", "app_custom_event.other", "like", "link_click", "mobile_app_install", "offsite_conversion.fb_pixel_add_to_cart", "offsite_conversion.fb_pixel_add_to_wishlist", "offsite_conversion.fb_pixel_lead", "offsite_conversion.fb_pixel_purchase", "offsite_conversion.fb_pixel_search", "offsite_conversion.fb_pixel_view_content", "post", "post_reaction", "page_engagement", "post_engagement", "offsite_conversion", "app_custom_event"), value = c("1703", "188", "2233", "13", "155", "731", "229", "2568", "62", "303", "46", "7", "17", "257", "4433", "1", "473", "3271", "3042", "5063", "5023"))"')
I need find values for app_custom_event.fb_mobile_purchase against each id. ACTION column contains two list in each cell i.e. action_type and value.
The output which I am expecting is :
mydata <- fread('ID,app_custom_event.fb_mobile_purchase
02,"NULL"
03,"9"
04,"NULL"
05,"13"')
Do I need to use dictionaries to get the values? Any approach will be highly appreciated.
Here is a start:
lapply(mydata$ACTIONS, function(i){
x <- eval(parse(text = i))
ix <- which(x$action_type == "app_custom_event.fb_mobile_purchase")
x$value[ ix ]
})
I don't know about fbRads package, but it must have some "read" function, to avoid this problem.

Count rows in R data.table [duplicate]

This question already has answers here:
Count number of rows per group and add result to original data frame
(11 answers)
Closed 7 years ago.
For a sample dataframe:
library(data.table)
df = structure(list(country = c("AT", "AT", "AT", "BE", "BE", "BE",
"DE", "DE", "DE", "DE", "DE", "DE", "DE", "DE", "DE", "DE", "DE",
"DE", "DE", "DE"), level = c("1", "1", "1", "1", "1", "1", "1",
"1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1"
), region = c("AT2", "AT1", "AT3", "BE2", "BE1", "BE3", "DE4",
"DE3", "DE9", "DE7", "DE1", "DEE", "DEG", "DE2", "DED", "DEB",
"DEA", "DEF", "DE6", "DE8"), N = c("348", "707", "648", "952",
"143", "584", "171", "155", "234", "176", "302", "144", "148",
"386", "257", "126", "463", "74", "44", "119"), result = c("24.43",
"26.59", "20.37", "23.53", "16.78", "25.51", "46.2", "43.23",
"41.03", "37.5", "33.44", "58.33", "47.97", "34.46", "39.69",
"31.75", "36.93", "43.24", "36.36", "43.7")), .Names = c("country",
"level", "region", "N", "result"), class = c("data.table", "data.frame"
), row.names = c(NA, -20L))
I am using the following code to produce a summary table:
variable.country <-setDT(variable.regions)[order(country), list(min_result = min(result),
max_result = max(result), level= level[1L]), by = country]
I simply want to add another variable to this data table which allows me to know how many regions i.e. rows there are in each country (i.e. AT has 3) - how would I get length or dim to work under these circumstances?
Thanks.
We can use .N to get the length per each 'country'
setDT(variable.regions)[order(country),
list(min_result = min(result),
len = .N,
max_result = max(result),
level= level[1L]),
by = country]

Resources