incorrect number of dimensions R - rowSums

incorrect number of dimensions R - rowSums - r

I'm very new to coding so I'm basically googling everything but I couldn't figure this one out:
I have a data frame of 32 rows, and 19 columns. I want to calculate the sum of each row in three specific columns.
I'm writing it like this:
D10 - my data frame.
Compliance_score - the new column I want to add
Compliance_1-3 - the columns I want to sum
D10$Compliance_score = rowSums(D10[ ,c("Compliance_1", "Compliance_2", "Compliance_3"), drop = FALSE])
I keep getting the error: "incorrect number of dimensions".
Can't figure out what I'm doing wrong, or what this error message even means.
Any thoughts?
**editing: if I understood correctly what is a reproduce example (this is my first time, I hope I got this right- if not please let me know)
> dput(head(D10))
structure(list(PP = c("003", "014", "047", "013", "053", "048"
), MAAS_1 = c("4.0", "4.0", "3.0", "5.0", "3.0", "4.0"), MAAS_2 =
c("3.0",
"1.0", "6.0", "4.0", "3.0", "3.0"), MAAS_3 = c("4.0", "5.0",
"4.0", "3.0", "4.0", "4.0"), MAAS_4 = c("2.0", "2.0", "6.0",
"2.0", "3.0", "4.0"), MAAS_5 = c("3.0", "3.0", "4.0", "5.0",
"5.0", "5.0"), MAAS_6 = c("3.0", "3.0", "4.0", "3.0", "2.0",
"4.0"), MAAS_7 = c("3.0", "3.0", "4.0", "3.0", "3.0", "5.0"),
MAAS_8 = c("2.0", "4.0", "4.0", "4.0", "4.0", "4.0"), MAAS_9
= c("3.0",
"4.0", "3.0", "2.0", "4.0", "5.0"), MAAS_10 = c("3.0", "4.0",
"4.0", "2.0", "4.0", "4.0"), MAAS_11 = c("2.0", "5.0", "4.0",
"4.0", "1.0", "5.0"), MAAS_12 = c("2.0", "5.0", "6.0", "3.0",
"3.0", "6.0"), MAAS_13 = c("3.0", "3.0", "5.0", "3.0", "3.0",
"2.0"), MAAS_14 = c("3.0", "4.0", "5.0", "4.0", "4.0", "4.0"
), MAAS_15 = c("3.0", "5.0", "6.0", "3.0", "5.0", "5.0"),
Compliance_1 = c("0.0", "0.0", "0.0", "0.0", "1.0", "0.0"
), Compliance_2 = c("1.0", "0.0", "1.0", "0.0", "1.0", "0.0"
), Compliance_3 = c("0.0", "0.0", "0.0", "0.0", "0.0", "0.0"
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl",
"data.frame"
))
>
Does that make sense?

you can try this :
library(tidyverse)
New_D <- D10 %>%
mutate(Compliance_score = sum(c(Compliance_1, Compliance_2, Compliance_3), na.rm=TRUE))
But a reproducible example would be great to understand the error.
Claire

Your problem is that your data is stored as a character, hence you need transform it to the class numeric in order to calculate the sum, i.e.
library(dplyr)
New_D <- df %>%
mutate(across(starts_with("Compliance"), as.numeric)) %>%
mutate(Compliance_score = Compliance_1 + Compliance_2 + Compliance_3)
Or with #Claire suggestion when you have NA'values
New_D <- df %>%
mutate(across(starts_with("Compliance"), as.numeric)) %>%
mutate(Compliance_score = sum(c(Compliance_1, Compliance_2, Compliance_3),
na.rm=TRUE))

Related

R - Dataframe is setting an arbitrary max of 10?

I am working on a personal project to visualize some NBA data, and when sorting within the dataframe created from reading a csv file, it seems to set 10 as the max value for all categories ( such as points, FGA, etc). Does anyone know why/how to "uncap" this?
Ex: Steph Curry can be seen here with 3PA greater than 10.
However, max(stats$`3pa`) returns 9.2, same thing occurs with other categories such as Points, where max(stats$PTS) returns 9.9.
EDIT: dput(head(stats)):
> dput(head(stats))
structure(list(Player = c("Precious Achiuwa", "Steven Adams",
"Bam Adebayo", "LaMarcus Aldridge", "Ty-Shon Alexander", "Nickeil Alexander-Walker"
), Pos = c("PF", "C", "C", "C", "SG", "SG"), Age = c("21", "27",
"23", "35", "22", "22"), Tm = c("MIA", "NOP", "MIA", "TOT", "PHO",
"NOP"), G = c("61", "58", "64", "26", "15", "46"), GS = c("4",
"58", "64", "23", "0", "13"), MP = c("12.1", "27.7", "33.5",
"25.9", "3.1", "21.9"), FG = c("2.0", "3.3", "7.1", "5.4", "0.2",
"4.2"), FGA = c("3.7", "5.3", "12.5", "11.4", "0.8", "10.0"),
`FG%` = c(".544", ".614", ".570", ".473", ".250", ".419"),
`3P` = c("0.0", "0.0", "0.0", "1.2", "0.1", "1.7"), `3PA` = c("0.0",
"0.1", "0.1", "3.1", "0.6", "4.8"), `3P%` = c(".000", ".000",
".250", ".388", ".222", ".347"), `2P` = c("2.0", "3.3", "7.1",
"4.2", "0.1", "2.5"), `2PA` = c("3.7", "5.3", "12.4", "8.3",
"0.2", "5.2"), `2P%` = c(".546", ".620", ".573", ".505",
".333", ".485"), `eFG%` = c(".544", ".614", ".571", ".525",
".333", ".502"), FT = c("0.9", "1.0", "4.4", "1.6", "0.1",
"1.0"), FTA = c("1.8", "2.3", "5.5", "1.8", "0.1", "1.4"),
`FT%` = c(".509", ".444", ".799", ".872", ".500", ".727"),
ORB = c("1.2", "3.7", "2.2", "0.7", "0.1", "0.3"), DRB = c("2.2",
"5.2", "6.7", "3.8", "0.5", "2.8"), TRB = c("3.4", "8.9",
"9.0", "4.5", "0.7", "3.1"), AST = c("0.5", "1.9", "5.4",
"1.9", "0.4", "2.2"), STL = c("0.3", "0.9", "1.2", "0.4",
"0.0", "1.0"), BLK = c("0.5", "0.7", "1.0", "1.1", "0.1",
"0.5"), TOV = c("0.7", "1.3", "2.6", "1.0", "0.2", "1.5"),
PF = c("1.5", "1.9", "2.3", "1.8", "0.1", "1.9"), PTS = c("5.0",
"7.6", "18.7", "13.5", "0.6", "11.0"), c(" ", " ", " ", " ",
" ", " ")), row.names = c("1", "3", "4", "5", "6", "7"), class = "data.frame")
EDIT2: Update
as.numeric(max(stats$3PA)) seems to work, thank you all for bearing with my stupidity!

Seems like the problem was the vectors being read as strings rather than doubles, so the simple addition of as.numeric() seemed to fix it!

error Predictor.new() function package IML in R

I am attempting to use package 'iml' in R to create plots of SHAP values from a GBM model created in H2O.
When I try to create the R6 Predictor object using the Predictor.new() function I get an error that states Error : all(feature.class %in% names(feature.types)) is not TRUE.
From this I am guessing that there is something about one of the feature classes that is incorrect, but this is just an educated guess based upon what the error message is literally saying.
Here is a sample of anonymized data (I can't share the real data because it is confidential):
structure(list(dlr_id_cur = c(1, 2), date_eff = structure(c(16014,
15416), class = "Date"), new_vec_ind = structure(c(1L, 1L), .Label = c("NNA",
"UNA"), class = "factor"), cntrct_term = c(9587879614862828,
19), amt_financed = c(9455359, 65561175), reg_payment = c(885288,
389371), acct_stat_cd = structure(c(3L, 3L), .Label = c("11",
"22", "33"), class = "factor"), base_rental = c(1, 626266), down_pymt = c(2,
6654661), car_count = c(5, 1), dur_lease = c(3974, 6466), returned = structure(1:2, .Label = c("00",
"11"), class = "factor"), state = structure(c(10L, 1L), .Label = c("ANA",
"BNA", "CNA", "DNA", "FNA", "GNA", "HNA", "INA", "KNA", "LNA",
"MNA", "NNA", "ONA", "PNA", "QNA", "RNA", "SNA", "TNA", "UNA",
"VNA", "WNA"), class = "factor"), zip = c(34633, 45222), zip_two_digits = structure(c(71L,
36L), .Label = c("00", "01", "02", "03", "04", "05", "06", "07",
"08", "09", "110", "111", "112", "113", "114", "115", "116",
"117", "118", "119", "220", "221", "222", "223", "224", "225",
"226", "227", "228", "229", "330", "331", "332", "333", "334",
"335", "336", "337", "338", "339", "440", "441", "442", "443",
"444", "445", "446", "447", "448", "449", "550", "551", "552",
"553", "554", "555", "556", "557", "558", "559", "660", "661",
"662", "663", "664", "665", "666", "667", "668", "669", "770",
"771", "772", "773", "774", "775", "776", "777", "778", "779",
"880", "881", "882", "883", "884", "885", "886", "887", "888",
"889", "990", "991", "992", "993", "994", "995", "996", "997",
"998", "999", "ANA", "BNA", "CNA", "ENA", "GNA", "HNA", "JNA",
"KNA", "LNA", "MNA", "NNA", "PNA", "RNA", "SNA", "TNA", "VNA"
), class = "factor")
, mod_year_date = c(8156, 6278), vehic_mod_fam_code = structure(c(2L,
2L), .Label = c("BNA", "CNA", "ENA", "MNA", "SNA", "TNA", "VNA",
"XNA"), class = "factor"), mod_class_code = structure(c(4L, 2L
), .Label = c("BNA", "CNA", "ENA", "GNA", "MNA", "RNA", "SNA"
), class = "factor"), count_dl_DL_CDE_CSPS_A_NP = c(945, 337),
DL_CDE_CSPS_A_NP_avg_dl = c(3355188283749626, 8835582388327814
), count_sv_DL_CDE_CSPS_A_NP = c(6532, 8475), DL_CDE_CSPS_A_NP_avg_sv = c(4471193398278526,
6934672627789796), count_dl_NUM_CSPS_INIT_SCR = c(774, 773
), NUM_CSPS_INIT_SCR_avg_dl = c(9468453388562312, 5847816458727333
), count_sv_NUM_CSPS_INIT_SCR = c(2467, 3882), NUM_CSPS_INIT_SCR_avg_sv = c(5857936629789154,
8963457353776469), count_FFV = c(8563, 2566), average_FFV = c(25697792913881564,
13693335921646120), csps_NUM_SV = c(8, 6), avg_SV_rating = c(9817541424596360,
6218928542331853), csps_FFV_ratio = c(23125612473476952,
2), avg_DL_rating = c(2182256921592387, 7668957586431513),
has_DL_rating = c(1, 8), has_bad_DL_rating = c(2, 4), serv_has_MNT = c(7,
3), serv_has_SCP = c(5, 4), serv_has_ELW = c(9, 4), serv_has_LCP = c(7,
1), ro_count = c(6, 1), ro_tot_cust_pay = c(2, 188759), ro_tot_pay = c(3,
764372), date_eff_weekday = structure(c(4L, 3L), .Label = c("FNA",
"MNA", "SNA", "TNA", "WNA"), class = "factor"), date_eff_month_int = c(83,
7), date_eff_day = c(2, 24)), .Names = c("dlr_id_cur", "date_eff",
"new_vec_ind", "cntrct_term", "amt_financed", "reg_payment",
"acct_stat_cd", "base_rental", "down_pymt", "car_count", "dur_lease",
"returned", "state", "zip", "zip_two_digits", "mod_year_date",
"vehic_mod_fam_code", "mod_class_code", "count_dl_DL_CDE_CSPS_A_NP",
"DL_CDE_CSPS_A_NP_avg_dl", "count_sv_DL_CDE_CSPS_A_NP", "DL_CDE_CSPS_A_NP_avg_sv",
"count_dl_NUM_CSPS_INIT_SCR", "NUM_CSPS_INIT_SCR_avg_dl", "count_sv_NUM_CSPS_INIT_SCR",
"NUM_CSPS_INIT_SCR_avg_sv", "count_FFV", "average_FFV", "csps_NUM_SV",
"avg_SV_rating", "csps_FFV_ratio", "avg_DL_rating", "has_DL_rating",
"has_bad_DL_rating", "serv_has_MNT", "serv_has_SCP", "serv_has_ELW",
"serv_has_LCP", "ro_count", "ro_tot_cust_pay", "ro_tot_pay",
"date_eff_weekday", "date_eff_month_int", "date_eff_day"), row.names = 1:2, class = "data.frame")
# 1. create a data frame with just the features
features_iml <- as.data.frame(df_testR) %>% dplyr::select(-returned)
# 2. Create a vector with the actual responses
response_iml <- as.numeric(as.vector(df_testR$returned))
# 3. Create custom predict function that returns the predicted values as a
# vector (probability of customer churn in my example)
pred <- function(model, newdata) {
results <- as.data.frame(h2o.predict(model, as.h2o(newdata)))
return(results[[3L]])
}
# 4. example of prediction output
pred(GBM5, features_iml) %>% head()
# 5. create Predictor object
predictor = Predictor$new(model = GBM5, data = features_iml, y =
response_iml, predict.fun = pred, class = "classification")
Error : all(feature.class %in% names(feature.types)) is not TRUE
Here are also so basic descriptions of the dataset and model object I'm
using in the code above:
class(GBM5)
[1] "H2OBinomialModel"
attr(,"package")
[1] "h2o"
class(df_testR)
[1] "tbl_df" "tbl" "data.frame"
dim(df_testR)
[1] 47006 44
If there is anything else I can provide or if I have been unclear please let me know.

In the iml package there are specific feature classes that are acceptable, namely numeric, integer, character, factor and ordered. If you have any Date objects, or any other data type than the 5 listed here than the Predictor object can not be created.

Replace value with NULL in column [duplicate]

This question already has an answer here:
Set NA and "" Cells in R Dataframe to NULL
(1 answer)
Closed 4 years ago.
I have a dataframe where I want to replace all values in a column that contain the value '2018' with NULL.
I have a dataset where every value in a column is a list. There are NULLs included as well. One of the values is not a list and I want to replace it with a NULL. If I replace it with NA then the datatypes in that column are mixed.
If I have a column like below, how do I replace the value containing 2018 with NULL instead of NA?
spend actions
176.2 2018-02-24
166.66 list(action_type = c("landing_page_view", "link_click", "offsit...
153.89 list(action_type = c("landing_page_view", "like", "link_click",...
156.54 list(action_type = c("landing_page_view", "like", "link_click",...
254.95 list(action_type = c("landing_page_view", "like", "link_click",...
374 list(action_type = c("landing_page_view", "like", "link_click",...
353.29 list(action_type = c("landing_page_view", "like", "link_click",...
0.41 NULL
Reproducible Example:
structure(list(spend = c("176.2", "166.66", "153.89", "156.54",
"254.95", "374", "353.29", "0.41"), actions = list("2018-02-24",
structure(list(action_type = c("landing_page_view", "link_click",
"offsite_conversion.fb_pixel_add_to_cart",
"offsite_conversion.fb_pixel_purchase",
"offsite_conversion.fb_pixel_search",
"offsite_conversion.fb_pixel_view_content",
"post", "post_reaction", "page_engagement", "post_engagement",
"offsite_conversion"), value = c("179", "275", "212", "18",
"269", "1434", "1", "17", "293", "293", "1933")), .Names = c("action_type",
"value"), class = "data.frame", row.names = c(NA, 11L)),
structure(list(action_type = c("landing_page_view", "like",
"link_click", "offsite_conversion.fb_pixel_add_to_cart",
"offsite_conversion.fb_pixel_purchase",
"offsite_conversion.fb_pixel_search",
"offsite_conversion.fb_pixel_view_content", "post_reaction",
"page_engagement", "post_engagement", "offsite_conversion"
), value = c("136", "3", "248", "101", "6", "237", "730",
"11", "262", "259", "1074")), .Names = c("action_type", "value"
), class = "data.frame", row.names = c(NA, 11L)), structure(list(
action_type = c("landing_page_view", "like", "link_click",
"offsite_conversion.fb_pixel_add_to_cart",
"offsite_conversion.fb_pixel_purchase",
"offsite_conversion.fb_pixel_search",
"offsite_conversion.fb_pixel_view_content",
"post", "post_reaction", "page_engagement", "post_engagement",
"offsite_conversion"), value = c("95", "1", "156", "91",
"5", "83", "532", "1", "13", "171", "170", "711")), .Names =
c("action_type",
"value"), class = "data.frame", row.names = c(NA, 12L)),
structure(list(action_type = c("landing_page_view", "like",
"link_click", "offsite_conversion.fb_pixel_add_to_cart",
"offsite_conversion.fb_pixel_purchase",
"offsite_conversion.fb_pixel_search",
"offsite_conversion.fb_pixel_view_content", "post_reaction",
"page_engagement", "post_engagement", "offsite_conversion"
), value = c("178", "4", "243", "56", "4", "138", "437",
"19", "266", "262", "635")), .Names = c("action_type", "value"
), class = "data.frame", row.names = c(NA, 11L)), structure(list(
action_type = c("landing_page_view", "like", "link_click",
"offsite_conversion.fb_pixel_add_to_cart",
"offsite_conversion.fb_pixel_purchase",
"offsite_conversion.fb_pixel_search",
"offsite_conversion.fb_pixel_view_content",
"post_reaction", "page_engagement", "post_engagement",
"offsite_conversion"), value = c("203", "2", "306", "105",
"7", "186", "954", "23", "331", "329", "1252")), .Names =
c("action_type",
"value"), class = "data.frame", row.names = c(NA, 11L)),
structure(list(action_type = c("landing_page_view", "like",
"link_click", "offsite_conversion.fb_pixel_add_to_cart",
"offsite_conversion.fb_pixel_purchase",
"offsite_conversion.fb_pixel_search",
"offsite_conversion.fb_pixel_view_content", "post", "post_reaction",
"page_engagement", "post_engagement", "offsite_conversion"
), value = c("241", "4", "320", "106", "3", "240", "789",
"1", "17", "342", "338", "1138")), .Names = c("action_type",
"value"), class = "data.frame", row.names = c(NA, 12L)),
NULL)), .Names = c("spend", "actions"), row.names = c(NA,
-8L), class = "data.frame")
My ultimate goal is to use this function with this dataset to make the action_types their own column. This function works when either a list or NULL is in the actions column:
fb_insights_all<-df %>%
as.tibble() %>%
filter(!map_lgl(actions, is.null)) %>%
unnest() %>%
right_join(select(df, -actions)) %>%
spread(action_type, value)
Error: Each column must either be a list of vectors or a list of data frames [actions]

Without data to test this on, I'd try:
df$COL1<-ifelse(grepl("2018", df$COL1),"NULL",df$COL1)
As stated here NA functions more like what you seem to be trying to do, while NULL serves a different function. If you just want the value to just say "NULL" rather than function like NULL, treat it like a character value.

Drawing slope graph in R using ggplot, Error: Aesthetics must be either length 1 or the same as the data

I want to create a slope graph in R like this using ggplot
https://rud.is/b/2013/01/11/slopegraphs-in-r/
after cleaning the data and melt the data frame i ran into an error like this:
Error: Aesthetics must be either length 1 or the same as the data (182): x, y, group, colour, label
There's no NAs in my data. Any ideas? Much appreciated!
Here's the code
#Read file as numeric data
betterlife<-read.csv("betterlife.csv",skip=4,stringsAsFactors = F)
num_data <- data.frame(data.matrix(betterlife))
numeric_columns <- sapply(num_data,function(x){mean(as.numeric(is.na(x)))<0.5})
final_data <- data.frame(num_data[,numeric_columns],
betterlife[,!numeric_columns])
## rescale selected columns data frame
final_data <- data.frame(lapply(final_data[,c(3,4,5,6,7,10,11)], function(x) scale(x, center = FALSE, scale = max(x, na.rm = TRUE)/100)))
## Add country names as indicator
final_data["INDICATOR"] <- NA
final_data$INDICATOR <- betterlife$INDICATOR
employment.data <- final_data[5:30,]
indicator <- employment.data$INDICATOR
## Melt data to draw graph
employment.melt <- melt(employment.data)
#plot
sg = ggplot(employment.melt, aes(factor(variable), value,
group = indicator,
colour = indicator,
label = indicator)) +
theme(legend.position = "none",
axis.text.x = element_text(size=5),
axis.text.y=element_blank(),
axis.title.x=element_blank(),
axis.title.y=element_blank(),
axis.ticks=element_blank(),
axis.line=element_blank(),
panel.grid.major.x = element_line("black", size = 0.1),
panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank(),
panel.background = element_blank())
sg1
This is the data I'm working with
dput(betterlife)
structure(list(X = c("", "ISO3", "AUS", "AUT", "BEL", "CAN",
"CHL", "CZE", "DNK", "EST", "FIN", "FRA", "DEU", "GRC", "HUN",
"ISL", "IRL", "ISR", "ITA", "JPN", "KOR", "LUX", "MEX", "NLD",
"NZL", "NOR", "POL", "PRT", "SVK", "SVN", "ESP", "SWE", "CHE",
"TUR", "GBR", "USA", "OECD", "", ""),
INDICATOR = c("UNIT", "COUNTRY",
"Australia", "Austria", "Belgium", "Canada", "Chile", "Czech Republic",
"Denmark", "Estonia", "Finland", "France", "Germany", "Greece",
"Hungary", "Iceland", "Ireland", "Israel", "Italy", "Japan",
"Korea", "Luxembourg", "Mexico", "Netherlands", "New Zealand",
"Norway", "Poland", "Portugal", "Slovak Republic", "Slovenia",
"Spain", "Sweden", "Switzerland", "Turkey", "United Kingdom",
"United States", "OECD average", "", "n.a. : not available"),
Rooms.per.person = c("Average number of rooms shared per person in a dwelling",
"", "2.4", "1.7", "2.3", "2.5", "1.3", "1.3", "1.9", "1.2",
"1.9", "1.8", "1.7", "1.2", "1", "1.6", "2.1", "1.1", "1.4",
"1.8", "1.3", "1.9", "1.566666667", "2", "2.3", "1.9", "1",
"1.5", "1.1", "1.1", "1.9", "1.8", "1.7", "0.7", "1.8", "1.605208333",
"1.6", "", ""),
Dwelling.without.basic.facilities = c("% of people without indoor flushing toilets in their home",
"", "3.425714286", "1.3", "0.6", "2.722", "9.36", "0.7",
"0", "12.2", "0.8", "0.8", "1.2", "1.8", "7.1", "0.3", "0.3",
"2.52", "0.2", "6.4", "7.46", "0.8", "6.6", "0", "2.984285714",
"0.1", "4.8", "2.4", "1.1", "0.6", "0", "0", "0.1", "17.1",
"0.5", "0", "2.82", "", ""),
Household.disposable.income = c("USD (PPPs adjusted)",
"", "27,039", "27,670", "26,008", "27,015", "8,712", "16,690",
"22,929", "13,486", "24,246", "27,508", "27,665", "21,499",
"13,858", "19,621", "24,313", "22,539", "24,383", "23,210",
"16,254", "19,621", "12,182", "25,977", "18,819", "29,366",
"13,811", "18,540", "15,490", "19,890", "22,972", "26,543",
"27,542", "21,030", "27,208", "37,685", "22,284", "", ""),
Employment.rate = c("% of the working age population (15-64)",
"", "72.3", "71.73", "62.01", "71.68", "59.32", "65", "73.44",
"61.02", "68.15", "63.99", "71.1", "59.55", "55.4", "78.17",
"59.96", "59.21", "56.89", "70.11", "63.31", "65.21", "60.39",
"74.67", "72.34", "75.31", "59.26", "65.55", "58.76", "66.2",
"58.55", "72.73", "78.59", "46.29", "69.51", "66.71", "64.52",
"", ""),
Long.term.unemployment.rate = c("% of people, aged 15-64, who are not working but have been actively seeking a job for over a year",
"", "1", "1.13", "4.07", "0.97", "2.98375", "3.19", "1.44",
"7.84", "2.01", "3.75", "3.4", "5.73", "5.68", "1.35", "6.74",
"1.85", "4.13", "1.99", "0.01", "1.29", "0.13", "1.24", "0.6",
"0.34", "2.49", "5.97", "8.56", "3.21", "9.1", "1.42", "1.49",
"3.11", "2.59", "2.85", "2.74", "", ""),
Quality.of.support.network = c("% of people who have friends or relatives to rely on in case of need",
"", "95.4", "94.6", "92.6", "95.3", "85.2", "88.9", "96.8",
"84.6", "93.4", "93.9", "93.5", "86.1", "88.6", "97.6", "97.3",
"93", "86", "89.7", "79.8", "95", "87.1", "94.8", "97.1",
"93.1", "92.2", "83.3", "89.6", "90.7", "94.1", "96.2", "93.2",
"78.8", "94.9", "92.3", "91.1", "", ""),
Educational.attainment = c("% of people, aged 15-64, having at least an upper-secondary (high-school) degree",
"", "69.72", "81.04", "69.58", "87.07", "67.97", "90.9",
"74.56", "88.48", "81.07", "69.96", "85.33", "61.07", "79.7",
"64.13", "69.45", "81.23", "53.31", "87", "79.14", "67.94",
"33.55", "73.29", "72.05", "80.7", "87.15", "28.25", "89.93",
"82.04", "51.23", "85.04", "86.81", "30.31", "69.63", "88.7",
"72.95", "", ""),
Students.reading.skills = c("Average reading performance of students aged 15, according to PISA",
"", "515", "470", "506", "524", "449", "478", "495", "501",
"536", "496", "497", "483", "494", "500", "496", "474", "486",
"520", "539", "472", "425", "508", "521", "503", "500", "489",
"477", "483", "481", "497", "501", "464", "494", "500", "493",
"", ""),
Air.pollution = c("Average concentration of particulate matter (PM10) in cities with population larger than 100 000, measured in micrograms per cubic meter",
"", "14.28", "29.03", "21.27", "15", "61.55", "18.5", "16.26",
"12.62", "14.87", "12.94", "16.21", "32", "15.6", "14.47",
"12.54", "27.57", "23.33", "27.14", "30.76", "12.63", "32.69",
"30.76", "11.93", "15.85", "35.07", "21", "13.14", "29.03",
"27.56", "10.52", "22.36", "37.06", "12.67", "19.4", "21.99",
"", ""),
Consultation.on.rule.making = c("Composite index, increasing with the number of key elements of formal consultation processes",
"", "10.5", "7.13", "4.5", "10.5", "2", "6.75", "7", "3.25",
"9", "3.5", "4.5", "6.5", "7.88", "5.13", "9", "2.5", "5",
"7.25", "10.38", "6", "9", "6.13", "10.25", "8.13", "10.75",
"6.5", "6.63", "10.25", "7.25", "10.88", "8.38", "5.5", "11.5",
"8.25", "7.28", "", ""),
Voter.turnout = c("Number of people voting as % of the registered population ",
"", "95", "82", "91", "60", "88", "64", "87", "62", "74",
"84", "78", "74", "64", "84", "67", "65", "81", "67", "63",
"57", "59", "80", "79", "77", "54", "64", "55", "63", "75",
"82", "48", "84", "61", "90", "72", "", ""),
Life.expectancy = c("Average number of years a person can expect to live",
"", "81.5", "80.5", "79.8", "80.7", "77.8", "77.3", "78.8",
"73.9", "79.9", "81", "80.2", "80", "73.8", "81.3", "79.9",
"81.1", "81.5", "82.7", "79.9", "80.6", "75.1", "80.2", "80.4",
"80.6", "75.6", "79.3", "74.8", "78.8", "81.2", "81.2", "82.2",
"73.6", "79.7", "77.9", "79.2", "", ""),
Self.reported.health = c("% of people reporting their health to be \"good or very good\"",
"", "84.9", "69.6", "76.7", "88.1", "56.2", "68.2", "74.3",
"56.3", "67.7", "72.4", "64.7", "76.4", "55.2", "80.6", "84.4",
"79.7", "63.4", "32.7", "43.7", "74", "65.5", "80.6", "89.7",
"80", "57.7", "48.6", "31.1", "58.8", "69.8", "79.1", "80.95",
"66.8", "76", "88", "69", "", ""),
Life.Satisfaction = c("Average self-evaluation of life satisfaction, on a scale from 0 to 10",
"", "7.5", "7.3", "6.9", "7.7", "6.6", "6.2", "7.8", "5.1",
"7.4", "6.8", "6.7", "5.8", "4.7", "6.9", "7.3", "7.4", "6.4",
"6.1", "6.1", "7.1", "6.8", "7.5", "7.2", "7.6", "5.8", "4.9",
"6.1", "6.1", "6.2", "7.5", "7.5", "5.5", "7", "7.2", "6.7",
"", ""),
Homicide.rate = c("Average number of reported homicides per 100 000 people",
"", "1.2", "0.5", "1.8", "1.7", "8.1", "2", "1.4", "6.3",
"2.5", "1.4", "0.8", "1.1", "1.5", "0", "2", "2.4", "1.2",
"0.5", "2.3", "1.5", "11.6", "1", "1.3", "0.6", "1.2", "1.2",
"1.7", "0.5", "0.9", "0.9", "0.7", "2.9", "2.6", "5.2", "2.1",
"", ""),
Assault.rate = c("% of people who report having been assaulted in the previous year",
"", "2.1", "3", "7.3", "1.4", "9.5", "3.5", "3.9", "6.2",
"2.4", "4.9", "3.6", "3.8", "3.8", "2.7", "2.7", "3.1", "4.7",
"1.6", "2.1", "4.3", "14.8", "5", "2.3", "3.3", "2.2", "6.2",
"3.5", "3.9", "4.2", "5.2", "4.2", "6", "1.9", "1.6", "4.1",
"", "")),
.Names = c("X", "INDICATOR", "Rooms.per.person", "Dwelling.without.basic.facilities",
"Household.disposable.income", "Employment.rate",
"Long.term.unemployment.rate", "Quality.of.support.network",
"Educational.attainment", "Students.reading.skills", "Air.pollution",
"Consultation.on.rule.making", "Voter.turnout", "Life.expectancy",
"Self.reported.health", "Life.Satisfaction", "Homicide.rate",
"Assault.rate"), class = "data.frame", row.names = c(NA, -39L))
Did I melt the data frame wrongly? since the index of each row are not in the correct order

Aggregate Time Series

I have a time series class that plots each daily value (river discharge) over the date range from 2012-01-01 through 2014-02-03. I want to remove seasonal variation by applying the aggregate() function but cannot find the correct syntax for the frequency parameter.
A sample of the data (with 2 variables) was created with dput:
structure(c("2014-01-01", "2014-01-02", "2014-01-03", "2014-01-04",
"2014-01-05", "2014-01-06", "2014-01-07", "2014-01-08", "2014-01-09",
"2014-01-10", "2014-01-11", "2014-01-12", "2014-01-13", "2014-01-14",
"2014-01-15", "2014-01-16", "2014-01-17", "2014-01-18", "2014-01-19",
"2014-01-20", "2014-01-21", "2014-01-22", "2014-01-23", "2014-01-24",
"2014-01-25", "2014-01-26", "2014-01-27", "2014-01-28", "2014-01-29",
"2014-01-30", "2014-01-31", "2014-02-01", "2014-02-02", "2014-02-03",
"2014-02-04", "2014-02-05", "2014-02-06", "2014-02-07", "2014-02-08",
"2014-02-09", "2014-02-10", "2014-02-11", "2014-02-12", "2014-02-13",
"2014-02-14", "2014-02-15", "2014-02-16", "2014-02-17", "2014-02-18",
"2014-02-19", "2014-02-20", "2014-02-21", "2014-02-22", "2014-02-23",
"2014-02-24", "2014-02-25", "2014-02-26", "2014-02-27", "2014-02-28",
"2014-03-01", "2014-03-02", "119000", "125000", "129000", "125000",
"122000", "155000", "157000", "152000", "156000", "156000", "106000",
"147000", "123000", "123000", "128000", "150000", "135000", "135000",
"134000", "144000", "154000", "152000", "139000", "147000", "135000",
"120000", "119000", "124000", "132000", "152000", "138000", "140000",
"137000", "133000", "126000", "102000", " 82900", "133000", "158000",
"116000", "145000", "151000", "125000", "130000", "116000", "137000",
"133000", "129000", "128000", "126000", "135000", "136000", "153000",
"172000", "4.5", "4.6", "4.6", "4.5", "4.4", "4.3", "4.4", "4.4",
"4.4", "4.4", "4.5", "4.5", "4.5", "4.5", "4.5", "4.4", "4.3",
"4.3", "4.4", "4.4", "4.4", "4.5", "4.5", "4.4", "4.3", "4.3",
"4.2", "4.1", "4.0", "4.0", "4.0", "4.0", "3.8", "3.7", "3.5",
"3.3", "3.0", "2.8", "2.6", "2.5", "2.5", "2.5", "2.5", "2.6",
"2.7", "2.5", "2.4", "2.7", "2.9", "2.8", "2.9", "3.1", "3.2",
"3.3", "3.4", "3.4", "3.5", "3.7", "4.0", "4.2", "4.1"), .Dim = c(61L,
3L), .Dimnames = list(NULL, c("date", "disch", "temp")), .Tsp = c(1,
61, 1), class = c("mts", "ts", "matrix"))
When I try to aggregate (on only the disch data for the entire time period) my choice of frequency (which is 365 for the ts object) produces a blank plot. The syntax I use is:
plot(aggregate(dalles.disch.ts, FUN=mean, freq=365)
Reading ?ts there are examples for monthly data but not daily. Since I have daily data for all of 2012 and 2013 plus the first two months of 2014, what FUN and freq should I speciry?

Would stl(), Seasonal Decomposition of Time Series by Loess, work? I've used it for daily data using frequency = 7 as an argument in ts().

First, you can use the aggregate() function in the zoo package to aggregate to monthly, quarterly, or annual data. Second, there are a hundred ways to de-trend data. I suggest doing a more thorough search, but you can use loess models, diff(), moving windows, and countless other signal processing methods. The decompose() and stl() functions are the simplest options among them, which both use loess, which stands for LOcal regrESSion.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

incorrect number of dimensions R - rowSums - r

you can try this : library(tidyverse) New_D <- D10 %>% mutate(Compliance_score = sum(c(Compliance_1, Compliance_2, Compliance_3), na.rm=TRUE)) But a reproducible example would be great to understand the error. Claire

Related

R - Dataframe is setting an arbitrary max of 10?

error Predictor.new() function package IML in R

Replace value with NULL in column [duplicate]

Drawing slope graph in R using ggplot, Error: Aesthetics must be either length 1 or the same as the data

Aggregate Time Series

Categories

Resources