Optimization of Holt-Winter's smoothing parameters using optim() stops after zero iterations [R] - r

For a high number of time series, I want to optimize the smoothing parameters of the Holt-Winter's forecasting method so that I get one set of optimal parameters. There are three parameters: alpha, beta, and gamma. Below I present a simplified procedure for one time series to indicate the problems encountered. I create a seasonal time series as follows:
check_vec <- rep(c(7,6,5,4,3,2,1), times = 100)
check_ts <- ts(check_vec, frequency = 7)
The time series looks like the following.
Time Series:
Start = c(1, 1)
End = c(100, 7)
Frequency = 7
[1] 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6
[52] 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4
[103] 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2
[154] 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7
[205] 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5
[256] 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3
[307] 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1
[358] 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6
[409] 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4
[460] 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2
[511] 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7
[562] 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5
[613] 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3
[664] 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1
Then, I use the following function to use for optimization:
check_func <- function(param) {
a <- param[[1]]
b <- param[[2]]
c <- param[[3]]
if (c > 1 - a | a < b) {
return(100000)
} else {
rmse <- accuracy(hw(check_ts, h = 14, alpha = a, beta = b, gamma = c))[2]
return(rmse)
}
}
The function thus returns the root mean squared error (which I want to minimize). It returns different values for different inputs.
> check_func(c(a = 0.18, b = 0.07, c = 0.1))
[1] 3.77942e-16
> check_func(c(a = 0.18, b = 0.07, c = 0.2))
[1] 3.382083e-16
I use the following optim() command to optimize the parameters:
optim(par = c(a = 0.18, b = 0.07, c = 0.1),
fn = check_func,
lower = c(0.005,0.005,0.005),
upper = c(0.99, 0.99, 0.99),
method = "L-BFGS-B",
control = list(trace = 6,
pgtol = 1.490117e-08))
Executing the optim() command gives the initial parameters as result (that is zero iterations of the optimization procedure. It returns the following message.
N = 3, M = 5 machine precision = 2.22045e-16
L = 0.005 0.005 0.005
X0 = 0.18 0.07 0.1
U = 0.99 0.99 0.99
At X0, 0 variables are exactly at the bounds
At iterate 0 f= 37.794 |proj g|= 0
iterations 0
function evaluations 1
segments explored during Cauchy searches 0
BFGS updates skipped 0
active bounds at final generalized Cauchy point 0
norm of the final projected gradient 0
final function value 37.7942
X = 0.18 0.07 0.1
F = 37.7942
final value 37.794202
converged
$par
a b c
0.18 0.07 0.10
$value
[1] 37.7942
$counts
function gradient
1 1
$convergence
[1] 0
$message
[1] "CONVERGENCE: NORM OF PROJECTED GRADIENT <= PGTOL"
I have tried to increase the scale of the output of the function and decrease pgtol without any success. Does somebody know what to do?
EDIT I have added more code and results of the procedure I found.
EDIT 2 This is the modified check_func I use to test whether the method of Enrico works.
check_func <- function(param) {
a <- param[[1]]
b <- param[[2]]
c <- param[[3]]
rmse <- try(accuracy(hw(check_ts, h = 14, alpha = a, beta = b, gamma = c))[2])
if (inherits(rmse, "try-error"))
return(200)
else
return(rmse)
}

You might get more-helpful answers if you provided a reproducible example. Are accuracy and hw from package forecast? In any case, you could try a grid search:
library("NMOF")
res <- gridSearch(check_func,
lower = rep(0.05, 3),
upper = rep(0.99, 3),
n = 10)
## List of 4
## $ minfun : num 4.11e-17
## $ minlevels: num [1:3] 0.05 0.05 0.886
## $ values : num [1:1000] 3.81e-16 3.81e-16 ...
## $ levels :List of 1000
## ..$ : num [1:3] 0.05 0.05 0.05
## ..$ : num [1:3] 0.154 0.05 0.05
## ..$ : num [1:3] 0.259 0.05 0.05
For many parameter choices, check_func would fail. So I'd safeguard the computation with something like:
rmse <- try(accuracy(hw(check_ts, h = 14, alpha = a, beta = b, gamma = c))[2])
if (inherits(rmse, "try-error"))
return(200)
(Disclosure: I am the maintainer of package NMOF.)

Related

filtering scores from one variable and placing them in a new variable

##So I have this variable test scores is coded on a scale from 1-9.
I have to take those who score 1-3 as low, 4-6 as good and 7-9 as high in new variables.
then have to make a new variable that compares low and high and a variable that compares low and good.
test_scores<- c(sample(1:10, 122, replace = TRUE)
test_scores<-as.data.frame(test_scores)
low<- filter(test_scores,test_scores1 > 3)
high<- filter(test_scores, test_scores< 7)
good<-filter(test_scores,test_scores== 4:6)
##but the N of in the new variables are not counting up to 122
##I thought of using the if function:
low<- ifelse(test_scores$test_scores == 1:3 , 1:3 , 0)
mods<- ifelse(test_scores$test_scores == 4:6, 4:6, 0)
high<- ifelse(test_scores$test_scores == 7:9, 7:9, 0)
##but some scores are not getting filter instead they become 0 even tho the score matches. any ideas?
You can use "cut" to generate the new bins:
set.seed(123)
test_scores <- sample(1:9, 122, T)
test_scores
#> [1] 3 3 2 6 5 4 6 9 5 3 9 9 9 3 8 7 9 3 4 1 7 5 7 9 9 7 5 7 5 6 9 2 5 8 2 1 9
#> [38] 9 6 5 9 4 6 8 6 6 7 1 6 2 1 2 4 5 6 3 9 4 6 9 9 7 3 8 9 3 7 3 7 6 5 5 8 3
#> [75] 2 2 6 4 1 6 3 8 3 8 1 7 7 7 6 7 5 6 8 5 7 4 3 9 7 6 9 7 2 3 8 4 7 4 1 8 4
#> [112] 9 8 6 4 8 3 4 4 6 1 4
cuts <- cut(test_scores, c(0,3,6,9), labels = F)
cuts
#> [1] 1 1 1 2 2 2 2 3 2 1 3 3 3 1 3 3 3 1 2 1 3 2 3 3 3 3 2 3 2 2 3 1 2 3 1 1 3
#> [38] 3 2 2 3 2 2 3 2 2 3 1 2 1 1 1 2 2 2 1 3 2 2 3 3 3 1 3 3 1 3 1 3 2 2 2 3 1
#> [75] 1 1 2 2 1 2 1 3 1 3 1 3 3 3 2 3 2 2 3 2 3 2 1 3 3 2 3 3 1 1 3 2 3 2 1 3 2
#> [112] 3 3 2 2 3 1 2 2 2 1 2
if you want a variable for each bin, and zero otherwise, you must use %in%, not ==
low<- ifelse(test_scores$test_scores %in% 1:3 , test_scores$test_scores , 0)
mods<- ifelse(test_scores$test_scores %in% 4:6, test_scores$test_scores, 0)
high<- ifelse(test_scores$test_scores %in% 7:9, test_scores$test_scores, 0)

how to solve error : Error in storage.mode(x) <- "double" : 'list' object cannot be coerced to type 'double'

Hello Im trying to run som and kmeans analysis.
But I can't solve it because there's an error code.
Error in storage.mode(x) <- "double" : 'list' object cannot be coerced to type 'double'
How can I solve this problem?
cdata <- read.delim("Cluster.txt", stringsAsFactors=FALSE)
cdata.n <- scale(subset(cdata, select=-c(ID)))
som_model2 <- supersom(data = cdata.n, grid = somgrid(10, 10, "rectangular"))
k = 6
somClusters <- kmeans(som_model2$codes, centers = 6)
I want to culstering into 6 clusters.
Please help me
I use this data.
https://github.com/woosa7/R_DataAnalytics/blob/08ea98289f4def3c4f72d4c10d3767784b42619b/R_DataMining/data/Cluster.txt
Try unlist:
somClusters <- kmeans(unlist(som_model2$codes), centers = 6)
somClusters
Cluster means:
[,1]
1 -0.6702128
2 5.2157179
3 1.2555768
4 -0.2632253
5 2.6067733
6 0.3503127
Clustering vector:
[1] 1 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 6 6 4 4 4 4 4 4 4
[50] 4 6 6 4 6 4 4 4 4 4 4 6 3 3 6 6 4 4 4 4 4 3 3 3 3 6 6 4 4 4 4 5 5 3 3 6 6 4 4 4 4 2 5 3 6 6 6 4 6
[99] 6 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 4 1 1 1 1 1 1 4 1 1 1 1 1 6 6 4 4 4 4 1 4 1 1 3 3 6 6 4 4 4
[148] 1 4 4 3 3 6 6 6 4 4 4 4 4 5 5 3 6 4 6 4 4 4 4 5 5 3 6 6 6 6 6 4 4 2 5 3 3 6 6 6 6 4 4 2 5 3 6 3 6
[197] 6 6 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 4 1 1 4 4 4 4 4 3 3 4 4 4
[246] 4 4 4 4 4 3 3 6 4 6 4 6 6 4 4 3 3 6 6 6 6 6 6 6 6 5 3 3 3 3 6 6 6 6 6 5 5 3 3 3 3 3 6 6 6 5 5 5 5
[295] 3 3 3 3 3 6 2 5 3 3 6 6 4 4 4 1 5 5 3 3 6 6 6 4 4 1 5 3 3 6 6 6 4 4 4 1 3 6 6 6 4 6 6 4 4 1 1 1 4
[344] 4 4 4 6 4 4 1 1 1 1 1 4 4 4 4 4 1 1 1 1 1 1 1 4 4 4 1 1 1 1 1 1 1 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1
[393] 1 1 1 1 1 1 1 4
Within cluster sum of squares by cluster:
[1] 1.939971 9.714721 4.939015 2.981251 3.051715 3.374086
(between_SS / total_SS = 93.6 %)
Available components:
[1] "cluster" "centers" "totss" "withinss" "tot.withinss" "betweenss"
[7] "size" "iter" "ifault"

Frequency Table of Categorical Variables as a Data Frame in R

I would like to create a frequency Table of all Categorical Variables as a Data Frame in R. I would like to find the frequency and percentage of each survey response (grouped by condition, as well as the total frequency). I would like to generate this as a data frame.
An example of the desired frequency count out for just ONE variable ("q1"). I want a similar freq count for most of the variables in my data:
I have data such as this. The actual data has many more categorical variables.
library(readr)
data_in <- read_table2("treatment_cur q13_3 q14_1 q14_2 q14_3 q14_4 q14_5 q14_6 q14_7 q14_8 q14_9 q14_10 q14_11 q14_12 q14_13 q14_14 q14_15
Control 3 2 3 6 5 6 6 6 4 5 5 5 4 6 6 5
Control 2 4 5 6 5 6 5 5 6 4 5 5 6 5 4 6
Treatment 3 1 2 6 4 6 5 4 6 4 6 1 5 6 4 6
Control 3 2 3 6 4 6 6 6 6 6 6 6 6 5 5 6
Control NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Control 4 6 5 6 5 6 5 6 6 5 1 1 6 5 5 6
Control 3 3 2 2 3 3 6 6 4 6 5 5 3 6 6 2
Treatment 2 3 2 3 1 3 1 1 1 3 3 3 3 3 3 1
Control 3 5 5 6 3 6 3 3 3 2 2 1 4 2 3 4
Control 2 1 1 1 1 1 4 4 1 1 1 1 1 4 4 2
Control 4 3 4 6 6 6 6 6 6 6 6 6 6 6 6 6
Control 4 2 6 6 4 6 5 6 6 5 6 5 6 6 6 6
Control 2 2 3 3 2 3 5 6 5 3 3 3 3 5 3 2
Control 3 2 4 3 4 5 4 4 5 3 3 5 4 5 5 4
Treatment 2 2 2 2 2 3 1 1 2 2 3 2 3 3 2 3
Control 4 3 3 3 5 6 6 6 6 6 6 6 6 6 6 6
Treatment 2 1 3 3 2 1 3 4 2 2 3 3 2 3 3 3
Treatment 4 2 6 4 4 2 3 5 4 5 1 1 5 4 4 5
Control 3 3 3 4 4 4 4 5 3 2 5 4 5 5 4 4
Control 4 6 6 6 6 6 6 6 6 6 6 6 5 6 6 5
Control 2 2 3 6 2 5 1 2 4 4 1 1 6 4 4 6
Treatment 4 3 3 6 6 6 6 6 6 6 6 6 6 6 6 6
Treatment 4 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
Treatment 1 1 2 4 4 4 1 1 1 1 1 1 6 1 1 6
Treatment 3 2 3 3 2 6 6 6 6 3 3 2 4 5 5 6
Control 2 1 1 1 1 1 1 2 1 1 1 1 1 2 2 1
Control 1 3 3 3 1 1 5 5 2 4 5 5 4 1 2 5
Treatment 3 4 4 5 5 4 4 4 3 5 3 4 4 6 6 5
Control NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Control 2 2 4 6 2 4 2 2 3 5 4 4 4 3 3 5
Treatment 1 1 2 1 1 1 1 1 6 1 1 1 6 2 3 6
Treatment 2 6 1 4 4 1 1 2 2 2 1 2 1 2 2 2
Treatment 3 3 4 4 4 6 6 5 4 6 3 5 5 6 6 4
Treatment 2 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3
Control 4 3 4 6 4 6 4 5 6 3 4 4 6 6 4 6
Control 4 4 3 6 2 5 2 2 4 3 1 6 5 5 5 5
Control NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Treatment 2 3 3 6 5 6 1 2 6 5 4 4 5 5 5 6
Control 4 6 6 6 6 6 5 5 5 5 5 6 5 5 5 5
Treatment 2 1 1 3 1 3 4 4 4 4 1 4 3 4 4 4
Treatment 2 1 3 3 3 3 4 6 5 4 5 5 4 6 6 5
Control 4 6 6 6 6 6 5 5 5 6 6 5 5 5 6 6
Control NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Control 4 2 2 4 2 4 6 6 6 6 4 6 5 6 6 5
Control 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Treatment 3 4 2 5 5 5 6 5 5 5 5 5 5 6 6 6
Control NA 2 4 4 4 4 4 3 4 6 4 5 4 6 4 4
Control 2 2 2 3 1 3 4 1 1 1 2 1 3 3 3 3
Treatment 2 2 2 3 2 2 3 3 2 2 2 2 2 2 2 2
Control 3 3 3 6 6 6 6 6 6 6 5 6 6 6 6 6
Treatment 2 1 2 2 2 1 2 2 1 1 2 1 2 2 1 3
Treatment 4 5 5 6 6 5 5 6 5 5 4 5 5 4 4 5
Control 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
Treatment 3 3 4 4 4 6 3 2 5 3 2 2 5 6 5 6
Control 4 4 3 3 6 3 6 6 3 2 4 4 4 4 4 4
Treatment 4 1 3 4 4 4 5 6 6 6 6 6 6 6 6 6
Control 4 4 5 6 5 5 4 6 6 6 6 5 6 6 6 6
Treatment 3 3 4 6 6 6 6 6 5 6 6 5 4 6 6 4
Control 4 4 6 6 4 6 6 6 6 4 4 3 5 6 6 6
Control 4 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
Treatment 4 5 5 6 6 6 6 6 5 5 6 6 5 5 6 6
Treatment 4 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
Control 2 1 2 1 1 1 1 3 1 4 4 1 1 1 1 1
Treatment 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Treatment 4 6 5 5 5 5 5 6 5 4 5 4 4 5 5 4
Treatment 4 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
Control 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
Treatment 4 5 6 6 6 5 6 6 6 5 6 6 6 6 6 6
Control 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
Treatment 3 3 2 5 4 4 5 6 6 4 5 5 4 5 4 6
Treatment 4 5 4 4 4 5 5 6 4 5 4 3 6 6 6 6
Control 1 2 3 2 1 4 1 1 3 1 3 3 3 3 4 4
Control 3 6 6 6 6 6 5 1 5 6 5 6 6 6 6 6
Control 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Control 4 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
")
My current solution is too complicated. If I wanted to know the frequency of variables from q13_3:q14_9, I know that I can do something like this to find it:
library(tables)
varList <- 2:11
data_in[varList] <- lapply(data_in[varList], factor,exclude = NULL)
lapply(varList,function(x,df,byVar){
tabular((Factor(df[[x]],paste(colnames(df)[x])) + 1) ~ ((Factor(df[[byVar]],paste(byVar)))*((n=1) + Percent("col"))),
data= df)
},data_in,"treatment_cur")
Below is a snippet of what my current output looks like. The problem is that the output is a list of a list which cannot be exported into a single excel sheet. I have to manually copy everything from the console onto an excel file.
treatment_cur
Control Treatment
q14_8 n Percent n Percent
1 6 13.953 4 12.50
2 4 9.302 4 12.50
3 5 11.628 2 6.25
4 6 13.953 4 12.50
5 5 11.628 7 21.88
6 13 30.233 11 34.38
NA 4 9.302 0 0.00
All 43 100.000 32 100.00
[[10]]
treatment_cur
Control Treatment
q14_9 n Percent n Percent
1 6 13.953 4 12.50
2 6 13.953 4 12.50
3 4 9.302 4 12.50
4 6 13.953 5 15.62
5 5 11.628 8 25.00
6 12 27.907 7 21.88
NA 4 9.302 0 0.00
All 43 100.000 32 10
This works alright, but I want to:
Find the total frequency of each variable value as well (treatment + condition) as an additional column (as seen in the image above);
I do not like the function I am using to produce this output. I want to export this into an excel file, but since this output is actually a list of lists (it cannot be exported to excel), and I am finding it quite cumbersome to copy and paste these values from the console into excel. I would like an easier way of finding these frequencies! Surely R has a better way of doing this...
Any help is MUCH appreciated!!
One way to do this would be to explore using the gtsummary package.
using your code above you can produce a table quite easily with counts and percentages:
library(gtsummary)
library(readr)
library(flextable)
tbl_summary(data_in, by = "treatment_cur") %>%
add_overall() %>%
as_flex_table() %>%
flextable::save_as_docx(., path = "G:/test.docx")
If you just run:
tbl_summary(data_in, by = "treatment_cur") %>%
add_overall()
you will see the table it generates for you. The extra code after that makes it so that it is able to be exported to a docx file. From there you can copy that into excel. This generates the counts you requested and you can determine if it is a simpler implementation.
Another alternative is to write directly to a csv file:
tbl_summary(data_in, by = "treatment_cur") %>%
add_overall() %>%
as_tibble() %>%
readr::write_csv( .,path = "G:/test.csv")
OR
if you really need everything in separate columns you can separate the n and percents into two tables, merge them and then write to csv.
#keep counts only
ncount <- tbl_summary(data_in, by = "treatment_cur",
statistic = all_categorical()~ "{n}") %>%
add_overall()
#keep pcts only
pctdata <- tbl_summary(data_in, by = "treatment_cur",
statistic = all_categorical()~ "{p}%") %>%
add_overall()
#combine and output
tbl_merge(list(ncount, pctdata)) %>%
as_tibble() %>%
readr::write_csv(., "G:/test2.csv")
Edit:
Another way to approach this is with the janitor package. You can adorn counts and percentages pretty easily and merge the datasets together. After that it is easy to export to a csv/Excel. One downside here is you have to loop through your variables to get a table for each and then combine them together, however the code below is a good start to create it:
library(janitor)
datatry <- data_in %>%
janitor::tabyl( q13_3,treatment_cur) %>%
adorn_totals("col") %>%
adorn_totals("row")
datatry2 <- data_in %>%
janitor::tabyl( q13_3,treatment_cur) %>%
janitor::adorn_percentages(denominator = 'col') %>%
adorn_totals("row") %>%
adorn_totals("col") %>%
mutate(Total = ifelse(is.na(q13_3), Total, ifelse(q13_3 == 'Total',1, Total)))
datatry3 <- inner_join(datatry, datatry2, by = 'q13_3') %>%
mutate(variable ='q13_3')
Assuming that you constructed data_in as above:
library(dplyr)
library(purrr)
# reformat
tt <- data_in$treatment_cur
data_in$treatment_cur <- NULL
data_in %>% map(function(a)
{
ret <- data.frame(Treatment.n=rep(0, 6), Control.n=rep(0, 6))
b <- table(a[tt=="Treatment"])
ret[names(b), "Treatment.n"] <- b
b <- table(a[tt=="Control"])
ret[names(b), "Control.n"] <- b
ret$Treatment.percent <- ret$Treatment.n / sum(ret$Treatment.n)
ret$Control.percent <- ret$Control.n / sum(ret$Control.n)
ret
}) %>% do.call(what=cbind)
It assumes answers data is \in 1..6 and NA are ignored.

Mean and SD in a table

In R, when doing table of two variables, you'll get a frequency table
> table(data$Var1, data$Var2)
1 2 3 4 5
0 0 1 5 6 12
1 1 10 6 7 0
2 2 6 7 6 3
3 2 9 8 3 2
4 4 9 5 3 3
5 3 4 9 4 4
6 2 7 7 4 4
7 2 7 7 6 2
8 5 7 5 5 2
9 5 4 5 6 4
is there a way such that you include the mean and SD in each row, something like
1 2 3 4 5 mean SD
0 0 1 5 6 12 4.20833 0.93153
1 1 10 6 7 0 .. ..
2 2 6 7 6 3
3 2 9 8 3 2
4 4 9 5 3 3
5 3 4 9 4 4
6 2 7 7 4 4
7 2 7 7 6 2
8 5 7 5 5 2
9 5 4 5 6 4
Save the table in something called T, and then:
For the mean and sd:
> cbind(T,
mean=apply(T,1,function(x){
(sum(x*(1:5)))/sum(x)}),
sd=apply(T,1,function(x){sd(rep(1:5,x))}))
1 2 3 4 5 mean sd
0 4 3 1 1 1 2.200000 1.3984118
1 1 2 3 3 3 3.416667 1.3113722
2 2 2 1 2 1 2.750000 1.4880476
3 0 1 2 4 1 3.625000 0.9161254
So 2.2 and 1.3984 is mean and sd of (c(1,1,1,1,2,2,2,3,4,5))
Its probably inefficient to compute the sd by reconstructing the original vector with rep - but its late and working out all the sums of squares and squares of sums for the sd is not something my brain can do at 1am.

From table to data.frame

I have a table that looks like:
dat = data.frame(expand.grid(x = 1:10, y = 1:10),
z = sample(LETTERS[1:3], size = 100, replace = TRUE))
tabl <- with(dat, table(z, y))
tabl
y
z 1 2 3 4 5 6 7 8 9 10
A 5 3 1 1 3 6 3 7 2 4
B 4 5 3 6 5 1 3 1 4 4
C 1 2 6 3 2 3 4 2 4 2
Now how do I transform it into a data.frame that looks like
1 2 3 4 5 6 7 8 9 10
A 5 3 1 1 3 6 3 7 2 4
B 4 5 3 6 5 1 3 1 4 4
C 1 2 6 3 2 3 4 2 4 2
Here are a couple of options.
The reason as.data.frame(tabl) doesn't work is that it dispatches to the S3 method as.data.frame.table() which does something useful but different from what you want.
as.data.frame.matrix(tabl)
# 1 2 3 4 5 6 7 8 9 10
# A 5 4 3 1 1 3 3 2 6 2
# B 1 4 3 4 5 3 4 4 3 3
# C 4 2 4 5 4 4 3 4 1 5
## This will also work
as.data.frame(unclass(tabl))

Resources