How to get the Standard Deviation based on the sample [closed] - math

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 3 years ago.
Improve this question
The situation is I have 96 sets of paper, 4 questions on every paper and categorized into 5 categories.Every options consisting marks of 0,1,2,3,4. What's the formula to calculate the Standard Deviation?
| | 0 | 1 | 2 | 3 | 4 |
|-----|-----|-----|-----|-----|-----|
| A | 5 | 42 | 71 | 116 | 150 |
| B | 7 | 43 | 94 | 136 | 104 |
| C | 0 | 47 | 118 | 175 | 140 |
| D | 0 | 13 | 40 | 123 | 112 |
| E | 0 | 148 | 183 | 175 | 70 |
Category A consists of 4 questions
Category B consists of 4 questions
Category C consists of 5 questions
Category D consists of 3 questions
Category E consists of 6 questions
For A:
Total: (5*0)+(42*1)+(71*2)+(116*3)+(150*4) = 1132
Avg: (1132/96 sets)/4 questions = 2.94
STDEV: ?

The variance of a distribution is the average sum of (value - average)^2
Then, the Standard deviation is the square root of the variance.
Here, it would be :
Sum : 5*(0-2.94)^2 + 42*(1-2.94)^2 + 71*(2-2.94)^2 + 116*(3-2.94)^2 + 150*(4-2.94)^2 = 432.9824
Variance : 432.9824 /96/4 = 1.127
STDEV : sqrt(1.127) = 1.06

Related

Data preparation before running exact logistic (elrm in R)

I started out using Firth's logistic (logistf) to deal with my small sample size (n=80), but wanted to try out exact logistic regression using the elrm package. However, I'm having trouble figuring out how to create the "collapsed" data required for elrm to run. I have a csv that I import into R as a dataframe that has the following variables/columns. Here is some example data (real data has a few more columns and 80 rows):
+------------+-----------+-----+--------+----------------+
| patien_num | asymmetry | age | female | field_strength |
+------------+-----------+-----+--------+----------------+
| 1 | 1 | 25 | 1 | 1.5 |
| 2 | 0 | 50 | 0 | 3 |
| 3 | 0 | 75 | 1 | 1.5 |
| 4 | 0 | 33 | 1 | 3 |
| 5 | 0 | 66 | 1 | 3 |
| 6 | 0 | 99 | 0 | 3 |
| 7 | 1 | 20 | 0 | 1.5 |
| 8 | 1 | 40 | 1 | 3 |
| 9 | 0 | 60 | 1 | 3 |
| 10 | 0 | 80 | 0 | 1.5 |
+------------+-----------+-----+--------+----------------+
Basically my data is one line per patient (not a frequency table). I'm trying to run a regression with asymmetry as the dependent variable and age (continuous), female (binary), and field_strength (factor) as independent variables. I'm trying to understand how to collapse this into the appropriate format so I can get that "ntrials" part required for the elrm formula.
I've looked at https://stats.idre.ucla.edu/r/dae/exact-logistic-regression/ but they start with data in a different format than mine, and having trouble. Any help appreciated!

Subsetting a table in R

In R, I've created a 3-dimensional table from a dataset. The three variables are all factors and are labelled H, O, and S. This is the code I used to simply create the table:
attach(df)
test <- table(H, O, S)
Outputting the flattened table produces this table below. The two values of S were split up, so these are labelled S1 and S2:
ftable(test)
+-----------+-----------+-----+-----+
| H | O | S1 | S2 |
+-----------+-----------+-----+-----+
| Isolation | Dead | 2 | 15 |
| | Sick | 64 | 20 |
| | Recovered | 153 | 379 |
| ICU | Dead | 0 | 15 |
| | Sick | 0 | 2 |
| | Recovered | 1 | 9 |
| Other | Dead | 7 | 133 |
| | Sick | 4 | 20 |
| | Recovered | 17 | 261 |
+-----------+-----------+-----+-----+
The goal is to use this table object, subset it, and produce a second table. Essentially, I want only "Isolation" and "ICU" from H, "Sick" and "Recovered" from O, and only S1, so it basically becomes the 2-dimensional table below:
+-----------+------+-----------+
| | Sick | Recovered |
+-----------+------+-----------+
| Isolation | 64 | 153 |
| ICU | 0 | 1 |
+-----------+------+-----------+
S = S1
I know I could first subset the dataframe and then create the new table, but the goal is to subset the table object itself. I'm not sure how to retrieve certain values from each dimension and produce the reduced table.
Edit: ANSWER
I now found a much simpler method. All I needed to do was reference the specific columns in their respective directions. So a much simpler solution is below:
> test[1:2,2:3,1]
O
H Sick Healed
Isolation 64 153
ICU 0 1
Subset the data before running table, example:
ftable(table(mtcars[, c("cyl", "gear", "vs")]))
# vs 0 1
# cyl gear
# 4 3 0 1
# 4 0 8
# 5 1 1
# 6 3 0 2
# 4 2 2
# 5 1 0
# 8 3 12 0
# 4 0 0
# 5 2 0
# subset then run table
ftable(table(mtcars[ mtcars$gear == 4, c("cyl", "gear", "vs")]))
# vs 0 1
# cyl gear
# 4 4 0 8
# 6 4 2 2

(RIM) weighting samples in R

I have some survey data. As an example, I use the credit data from the ÌSLR
package.
library(ISLR)
The distribution of Gender in the data looks like this
prop.table(table(Credit$Gender))
Male Female
0.4825 0.5175
and the distribution of Student looks like this.
prop.table(table(Credit$Student))
No Yes
0.9 0.1
Let´s say, in the population, the actual distribution of Gender is Male/Female(0.35/0.65) and the distribution of Student is Yes/No(0.2/0.8).
In SPSS it´s possible to weight the samples, by dividing the "population distribution" by the "distribution of the sample" to simulated the distribution of the population. This process is called "RIM Weighting". The data will be only analyzed by crosstables (i.e. no regression, t-test, etc.). What is a good method in R the weight a sample, in order to analyze the data by crosstables later on?
It is possible to calculate the RIM weights in R.
install.packages("devtools")
devtools::install_github("ttrodrigz/iterake")
credit_uni = universe(df = Credit,
category(
name = "Gender",
buckets = c(" Male", "Female"),
targets = c(.35, .65)),
category(
name = "Student",
buckets = c("Yes", "No"),
targets = c(.2, .8)))
credit_weighted = iterake(Credit, credit_uni)
-- iterake summary -------------------------------------------------------------
Convergence: Success
Iterations: 5
Unweighted N: 400.00
Effective N: 339.58
Weighted N: 400.00
Efficiency: 84.9%
Loss: 0.178
Here the SPSS output (crosstables) of the weighted data
Student
No Yes
Gender Male 117 23 140
Female 203 57 260
320 80 400
and here from the unweighted data (I export both files and made the calculation in SPSS. I weighted the weighted sample by the calculated weights).
Student
No Yes
Gender Male 177 16 193
Female 183 24 20
360 40 400
In the weighted data set, I have the desired distribution Student: Yes/No(0.2/0.8) and Gender male/female(0.35/0.65).
Here is another example using SPSS of Gender and Married (weighted)
Married
No Yes
Gender Male 57 83 140
Female 102 158 260
159 241 400
and unweighted.
Married
No Yes
Gender Male 76 117 193
Female 79 128 207
155 245 400
This doesn't work in R (i.e. both crosstables looks like the unweighted one).
library(expss)
cro(Credit$Gender, Credit$Married)
cro(credit_weighted$Gender, credit_weighted$Married)
| | | Credit$Married | |
| | | No | Yes |
| ------------- | ------------ | -------------- | --- |
| Credit$Gender | Male | 76 | 117 |
| | Female | 79 | 128 |
| | #Total cases | 155 | 245 |
| | | credit_weighted$Married | |
| | | No | Yes |
| ---------------------- | ------------ | ----------------------- | --- |
| credit_weighted$Gender | Male | 76 | 117 |
| | Female | 79 | 128 |
| | #Total cases | 155 | 245 |
With expss package you need to explicitly provide your weight variable. As far as I understand iterake adds special variable weight to the dataset:
library(expss)
cro(Credit$Gender, Credit$Married) # unweighted result
cro(credit_weighted$Gender, credit_weighted$Married, weight = credit_weighted$weight) # weighted result

Shift planning with Linear Programming

The Modeling and Solving Linear Programming with R book has a nice example on planning shifts in Sec 3.7. I am unable to solve it with R. Also, I am not clear with the solution provided in the book.
Problem
A company has a emergency center which is working 24 hours a day. In
the table below, is detailed the minimal needs of employees for each of the
six shifts of four hours in which the day is divided.
Shift Employees
00:00 - 04:00 5
04:00 - 08:00 7
08:00 - 12:00 18
12:00 - 16:00 12
16:00 - 20:00 15
20:00 - 00:00 10
R solution
I used the following to solve the above.
library(lpSolve)
obj.fun <- c(1,1,1,1,1,1)
constr <- c(1,1,0,0,0,0,
0,1,1,0,0,0,
0,0,1,1,0,0,
0,0,0,1,1,0,
0,0,0,0,1,1,
1,0,0,0,0,1)
constr.dir <- rep(">=",6)
constr.val <-c (12,25,30,27,25,15)
day.shift <- lp("min",obj.fun,constr,constr.dir,constr.val,compute.sens = TRUE)
And, I get the following result.
> day.shift$objval
[1] 1.666667
> day.shift$solution
[1] 0.000000 1.666667 0.000000 0.000000 0.000000 0.000000
This is nowhere close to the numerical solution mentioned in the book.
Numerical solution
The total number of solutions required as per the numerical solution is 38. However, since the problem stated that, there is a defined minimum number of employees in every period, how can this solution be valid?
s1 5
s2 6
s3 12
s4 0
s5 15
s6 0
Your mistake is at the point where you initialize the variable constr, because you don't define it as a matrix. Second fault is your matrix itself. Just look at my example.
I was wondering why you didn't stick to the example in the book because I wanted to check my solution. Mine is based on that.
library(lpSolve)
obj.fun <- c(1,1,1,1,1,1)
constr <- matrix(c(1,0,0,0,0,1,
1,1,0,0,0,0,
0,1,1,0,0,0,
0,0,1,1,0,0,
0,0,0,1,1,0,
0,0,0,0,1,1), ncol = 6, byrow = TRUE)
constr.dir <- rep(">=",6)
constr.val <-c (5,7,18,12,15,10)
day.shift <- lp("min",obj.fun,constr,constr.dir,constr.val,compute.sens = TRUE)
day.shift$objval
# [1] 38
day.shift$solution
# [1] 5 11 7 5 10 0
EDIT based on your question in the comments:
This is the distribution of the shifts on the periods:
shift | 0-4 | 4-8 | 8-12 | 12-16 | 16-20 | 20-24
---------------------------------------------------
20-4 | 5 | 5 | | | |
0-8 | | 11 | 11 | | |
4-12 | | | 7 | 7 | |
8-16 | | | | 5 | 5 |
12-20 | | | | | 10 | 10
18-24 | | | | | |
----------------------------------------------------
sum | 5 | 16 | 18 | 12 | 15 | 10
----------------------------------------------------
need | 5 | 7 | 18 | 12 | 15 | 10
---------------------------------------------------

Is there any formula for number of divisors of a*b? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 7 years ago.
Improve this question
Let a and b be two numbers,
Number of divisors of (a) =n1;
Number of divisors of (b) =n2;
how to find the number of divisors of (a*b) by using n1 and n2?
a=15; b=20
n1=4; //no of divisors of a
n2=6; //no of divisors of b
ans=(a*b); //300
num_ans=18; // no of divisors of a*b
Is there any formula to find it?
You can't determine the number of divisors of the product of two numbers if all you know is the number of divisors of each number.
Example:
2 and 4 have two and three divisors, respectively. Their product, 8, has four divisors.
5 and 9 have two and three divisors, respectively. Their product, 45, has six divisors.
Both of these number pairs have the same number of individual divisors, but their products have different divisors.
No, but you can infer some information about the answer.
The bounds on the number of divisors of ans is [max(n1,n2),n1 * n2] (which is [6,24], for 20 and 21).
It's fairly easy to see how this comes about (at least for smaller numbers), by generating the divisors of 420 from the divisors of 20 and 21.
The divisors of 20 are the column headers, the divisors of 21 are the row headers. The cells contain the result of col_header * row_header for that row and column.
1 2 4 5 10 20
+----+----+----+-----+-----+-----|
1 | 1 | 2 | 4 | 5 | 10 | 20 |
3 | 3 | 6 | 12 | 15 | 30 | 60 |
7 | 7 | 14 | 28 | 35 | 70 | 140 |
21 | 21 | 42 | 84 | 105 | 210 | 420 |
Unfortunately, 20 and 21 are a special case, as they are relatively prime. Other combinations result in duplicate values in the cells.
For example, the table for 15 and 20 looks like this:
1 3 5 15
+----+----+-----+-----|
1 | 1 | 3 | 5 | 15 |
2 | 2 | 6 | 10 | 30 |
4 | 4 | 12 | 20 | 60 |
5 | 5 | 15 | 25 | 75 |
10 | 10 | 30 | 50 | 150 |
20 | 20 | 60 | 100 | 300 |
The numbers 5,10,15,20,30 and 60 appear multiple times, so we can't simply take the number of cells in the table as the number of divisors. The number of unique values in the table does equal the number of divisors (18), which falls within the interval [6,24].
This can give you a ball-park estimate on the complexity of getting the answer, the algorithm for which is essentially generating all of the divisors, discarding duplicates.

Resources