unused argument when doing descriptive stats on R - r

I keep getting the error unused argument for my by function , do I need to download a package I already have dplyr ,plyr,tidyr, data.table and pacman... need help thanks
DHB<- TA[, .(mean= mean(sum_tbret, na.rm = TRUE),
sd= sd(sum_tbret, na.rm = TRUE),
var= var(sum_tbret, na.rm=TRUE),
median= as.double(median(sum_tbret, na.rm = TRUE)), ####Median has problems with data.table so need to tell it to convert to double
lq= quantile(sum_tbret, 0.25, na.rm = TRUE),
uq= quantile(sum_tbret, 0.75, na.rm = TRUE)),
by = "dhb2015"]
Error in `[.data.frame`(TA, , .(mean = mean(sum_tbret, na.rm = TRUE), :
unused argument (by = "lb2018")

Based on the error, it is still a data.frame, we can convert to data.table with setDT - converts in place (or as.data.table) and then the data.table method would work
library(data.table)
setDT(TA)[, .(mean= mean(sum_tbret, na.rm = TRUE),
sd= sd(sum_tbret, na.rm = TRUE),
var= var(sum_tbret, na.rm=TRUE),
median= as.double(median(sum_tbret, na.rm = TRUE)),
lq= quantile(sum_tbret, 0.25, na.rm = TRUE),
uq= quantile(sum_tbret, 0.75, na.rm = TRUE)),
by = dhb2015]

Related

Different values for weighted median

I am getting three different weighted median values when I use the following code/functions:
data %>%
summarise(
matrixStats::weightedMedian(income,
w = wgt,
na.rm = TRUE),
spatstat::weighted.median(income,
w = wgt,
na.rm = FALSE),
spatstat::weighted.quantile(income,
w = wgt,
na.rm = TRUE),
DescTools::Median(income_4,
weights = wgt,
na.rm = TRUE),
)
Does anyone know why this might be? I don't see a clear answer in the documentation of each of these functions.

Reshaping a set of variables obtained through summarize into organized table

I have a big problem that I have spent a lot of time trying to solve
Through the code below I obtained a tibble with 1 row and 22 columns:
median_income = survey_median(VD5008, na.rm = TRUE),
sd_income = survey_sd(VD5008, na.rm = TRUE),
mean_age = survey_mean(V2009, na.rm = TRUE),
median_age = survey_median(V2009, na.rm = TRUE),
sd_age = survey_sd(V2009, na.rm = TRUE),
mean_study = survey_mean(VD3005, na.rm = TRUE),
median_study = survey_median(VD3005, na.rm = TRUE),
sd_study = survey_sd(VD3005, na.rm = TRUE),
mean_hmembers = survey_mean(n_household_members, na.rm = TRUE),
median_hmembers = survey_median(n_household_members, na.rm = TRUE),
sd_hmembers = survey_sd(n_household_members, na.rm = TRUE),
number_observations = survey_total(na.rm = TRUE)
) %>%
mutate_if(is.numeric, round, 2)
Output of code
What I want is to transform that tibble like that:
Table that I want
I use some tools of tidyr, but unsuccessfully.
Is possible through tidyr tools? I would appreciate it if someone could help me with the code to transform this table

Is there a a way to make this function calculate sd as well as mean?

setNames(apply(cats, 1, , na.rm = TRUE), df[[1]]))
I would like mean and sd to be outputted by one function.
I don't know what is supposed to do cats in your example. So I will follow the title of your post.
With data.table, you can do complex calculations with lapply + .SD verbs in just one line :
library(data.table)
df = data.table(iris)
df[,lapply(.SD, function(x) return(c(mean(x, na.rm = TRUE), sd(x, na.rm = TRUE)))), .SDcols = colnames(df)[1]]
# Sepal.Length
# 1: 5.8433333
# 2: 0.8280661
You can do that for more than one column if wanted
Here is an option with dplyr
library(dplyr)
iris %>%
summarise_at(vars(Sepal.Length), list(mean = ~mean(., na.rm = TRUE),
sd = ~sd(., na.rm = TRUE)))
# mean sd
#1 5.843333 0.8280661

"Error: unexpected symbol" for a function I'm defining, but I can't identify the source of the error

Apologies in advance for the beginner question but I'm a noob with R for the moment.
I'm defining a function to run summary statistics on a dataframe, it reads as follows:
sumstats = function(y) {
sumst = sapply(y, function(x) {
sumstat = c(
mean(x, na.rm = TRUE),
median(x, na.rm = TRUE),
sd(x, na.rm = TRUE),
min(x, na.rm = TRUE),
max(x, na.rm = TRUE)
) names(sumstat) = c("Mean", "Median", "SD", "Min", "Max") sumstat
}) aperm(sumst)
}
However I keep getting the following error, indicating that something is wrong with the way I want to define the names of my different columns:
Error: unexpected symbol in:
" max(x, na.rm = TRUE)
) names"
Could you tell me what about my syntax is throwing the error?
Thanks
you need to add a new line between the ) for sumstat, and the ) for the names(sumstat), like
sumstats = function(y) {
sumst = sapply(y, function(x) {
c(
mean(x, na.rm = TRUE),
median(x, na.rm = TRUE),
sd(x, na.rm = TRUE),
min(x, na.rm = TRUE),
max(x, na.rm = TRUE)
)
names(sumstat) = c("Mean", "Median", "SD", "Min", "Max")
sumstat
})
aperm(sumst)
}
you could also name the items in the vector, and skip the names(sumstat) altogether
sumstats = function(y) {
sapply(y, function(x) {
sumstat = c(
Mean = mean(x, na.rm = TRUE),
Median = median(x, na.rm = TRUE),
sd = sd(x, na.rm = TRUE),
Min = min(x, na.rm = TRUE),
Max = max(x, na.rm = TRUE)
)
})
aperm(sumst)
}
You are starting the names function on the same row where c end. Just alter the formatting of your code.
Note that is is more safe to use <- for assigning things to object. = is used to pass values to function arguments.
sumstats <- function(y) {
sumst <- sapply(
y,
function(x) {
sumstat <- c(
mean(x, na.rm = TRUE),
median(x, na.rm = TRUE),
sd(x, na.rm = TRUE),
min(x, na.rm = TRUE),
max(x, na.rm = TRUE)
)
names(sumstat) <- c("Mean", "Median", "SD", "Min", "Max")
return(sumstat)
}
)
aperm(sumst)
}

KnitR removes digits after the decimal point

I am having trouble getting Knit results to show digits after the decimal point.
When I run this code as a chunk:
cereclosure %>%
group_by(Outcome.of.2nd.Closure) %>%
summarise(
min = min(age.at.c2, na.rm = TRUE),
q1 = quantile(age.at.c2, 0.25, na.rm = TRUE),
median = median(age.at.c2, na.rm = TRUE),
q3 = quantile(age.at.c2, 0.75, na.rm = TRUE),
max = max(age.at.c2, na.rm = TRUE),
mean = mean(age.at.c2, na.rm = TRUE),
st.dev = sd(age.at.c2, na.rm = TRUE)
)
As an example, I get:
Outcome.of.2nd.Closure min
Failure 217.3772
Success 177.4907
Which is the outcome I want, that is I want to see the digits after the decimal. But when I knit the whole thing, my output looks like this:
Outcome.of.2nd.Closure st.dev
Failure 217.
Success 177.
So there are no digits after the decimal place.
I'm on R v 3.5.1 and tidyverse 1.2.1
Help is most appreciated.

Resources