Formatting data for clogit model - r

I have a survey with 53 respondents answering 8 questions each.
Currently it's in an Excel document in the format:
Person # | Q1 | Q2 | Q3 | Q4 | Q5 | Q6 | Q7 | Q8
Each question had three possible responses, "1", "2", or "3". For a given person, each question has a single number indicating the response.
I need to transform the answers from each person into one long column vector with responses coded in binary for each of the three choices. So for each person, there should be 24 rows (3 for each question), and for each question, there should be one row with a 1 (indicating the choice that was made) and two rows with 0's.
I've tried doing this in Excel and in R and cannot figure out how to do it without manually entering each value.
Please tell me there's a better way?

See explanation of code as inline comments
Sub Demo()
Dim wsSource As Worksheet
Dim wsDest As Worksheet
Dim rSource As Range
Dim rDest As Range
Dim vSource As Variant
Dim i As Long, j As Long
'--> adjust to suit your needs
' set up source and destination references
Set wsSource = Worksheets("SourceData")
Set wsDest = Worksheets("DestData")
'--> adjust to suit your needs
' Assumes source data has header in row 1, names in column A and responces in B..I
With wsSource
Set rSource = .Range(.Cells(2, 9), .Cells(.Rows.Count, 1).End(xlUp))
End With
'--> adjust to suit your needs
' Assumes generated data starts as cell A1
Set rDest = wsDest.Cells(1, 1)
' Get Source data
vSource = rSource.Value
' Size Destination data array
ReDim vDest(1 To UBound(vSource, 1) * 24, 1 To 2)
' Generate reformated data
For i = 1 To UBound(vSource, 1) ' For each Person
For j = 1 To 24 ' Add person name
vDest((i - 1) * 24 + j, 1) = vSource(i, 1)
Next
For j = 1 To 8 ' Code 8 results
vDest((i - 1) * 24 + (j - 1) * 3 + 1, 2) = IIf(vSource(i, j + 1) = 1, 1, 0)
vDest((i - 1) * 24 + (j - 1) * 3 + 2, 2) = IIf(vSource(i, j + 1) = 2, 1, 0)
vDest((i - 1) * 24 + (j - 1) * 3 + 3, 2) = IIf(vSource(i, j + 1) = 3, 1, 0)
Next
Next
' Place result on sheet
rDest.Resize(UBound(vDest, 1), UBound(vDest, 2)) = vDest
End Sub

Related

How to Mimic Curve stretched over longer term

I am trying to find a way to mimic the curve or a product release schedule.
This is the release schedule for the number of units for a product. We have a new product coming out that will have the same release schedule in a curve. that should look like this:
I want it to have the same curve but stretched over a longer period of time:
I tried to just break it down by the phase but the results didn't match the same curve:
I have access to excel, VBA, and R. Is there a way that I can get a calculation that would allow me to match the curve structure if I have a different length of time and possibly a different total amount of units? I know I'm supposed to show what I have tried as well but nothing has gotten me even close.
You can get your NewData with Interpolation:
'Year .... 0 - 25
'x .... =year/25*17
'Units .... =Linterp($A$2:$B$19; x) --- A2:B19 is your input
Function Linterp(ByRef Tbl As Range, ByRef dX As Double) As Variant
' copied from
' https://www.ozgrid.com/forum/index.php?thread/82496-interpolate-an-array-of-numbers/
'
' shg 06 Jun 1997
' Linear interpolator / extrapolator
' Tbl is a two-column range containing known x, known y, sorted ascending
Dim i As Long ' index to Tbl
Dim nRow As Long ' rows in Tbl
Dim dXAbv As Double ' Tbl value above dX
Dim dXBlo As Double ' Tbl values below dX
Dim dRF As Double ' row fraction
nRow = Tbl.Rows.Count
If nRow < 2 Or Tbl.Columns.Count <> 2 Then
Linterp = "Table must have >= 2 rows, exactly two columns"
Exit Function '-------------------------------------------------------->
End If
If dX < Tbl(1, 1).Value Then ' dX < xmin, extrapolate first two entries
i = 1
Else
i = WorksheetFunction.Match(dX, WorksheetFunction.Index(Tbl, 0, 1), 1)
If dX = Tbl(i, 1).Value Then ' dX is exact from table
Linterp = Tbl(i, 2)
Exit Function '---------------------------------------------------->
ElseIf i = nRow Then ' dX > xmax, extrapolate last two entries
i = nRow - 1
'Else
' dX lies between two rows, so interpolate entries i, i+1
' which is what happens by default
End If
End If
dXAbv = Tbl(i, 1).Value
dXBlo = Tbl(i + 1, 1).Value
dRF = (dX - dXAbv) / (dXBlo - dXAbv) ' row fraction
Linterp = Tbl(i, 2).Value * (1 - dRF) + Tbl(i + 1, 2).Value * dRF
End Function
Your input data is in A2:B19!
The new data is columns D, E and F!
D1: Year
D2: 0
D3: 1
...
E1: x
E2: =D2/25*17
E3: =D3/25*17
...
F1: Units
F2: =Linterp($A$2:$B$19;E2)
F3: =Linterp($A$2:$B$19;E3)
...

How to find the range for a given number, interval and start value?

Provided the below values
start value = 1
End Value = 20
Interval = 5
I have been provided a number 6. I have to find the range of numbers in which the number 6 falls say now the answer is 6-10.
If the given number is greater than the end value then return the same number.
Is there any formula so that i can generate the range for the number?
UPDATE
I tried the below solution, But it is not working if the range interval is changed,
$end_value = $start_value + $range_interval;
// we blindly return the last term if value is greater than max value
if ($input_num > $end_value) {
return '>' . $end_value;
}
// we also find if its a first value
if ($input_num <= $end_value && $value >= $start_value) {
return $start_value . '-' . $end_value;
}
// logic to find the range for a given integer
$dived_value = $input_num/$end_value;
// round the value to get the exact match
$rounded_value = ceil($dived_value);
$upper_bound_range = $rounded_value*$end_value;
$lower_bound_range = $upper_bound_range - $end_value;
return $lower_bound_range . '-'. $upper_bound_range;
In (c-style) pseudocode:
// Integer division assumed
rangeNumber = (yourNumber - startValue) / rangeLength;
lower_bound_range = startValue + rangeNumber*rangeLength;
upper_bound_range = lower_bound_range + rangeLength-1;
For your input:
rangeNumber = (6-1)/5 = 1
lower_bound_range = 1 + 5*1 = 6
upper_bound_range = 10
and so range is [6, 10]
The answer depends on whether you talk about integers or floats. Since all your example numbers are integers, I assume you talk about those. I further assume that all your intervals contain the same number of integers, in your example 5, namely 1...5, 6...10, 11...15, and 16...20. Note that 0 is not contained in the 1st interval (otherwise the 1st interval had 6 numbers).
In this case the answer is easy.
Let be:
s the start value that is not contained in the 1st interval,
i the interval size, i.e. the number of integers that it contains,
p the provided number to which an interval should be assigned,
b the 1st integer in this interval, and
e the last integer in this interval.
Then:
b = s + (p-s-1)\i * i + 1 (here, "\" means integer division, i.e. without remainder)
e = b + i - 1
In your example:
s = 0, i = 5, p = 6, thus
b = 0 + (6-0-1)\5 * 5 + 1 = 6
e = 6 + 5 - 1 = 10

add duplicate value in multi dimensional array in classic asp

I have a 2d array in classic asp like
1-5
1-3
2-5
I need this array output in following format
1-8
2-5
please help me
You need a dictionary to sum up the col2 values grouped by the col1 values. As in:
ReDim aIn(2, 1)
aIn(0, 0) = 1 : aIn(0, 1) = 5
aIn(1, 0) = 1 : aIn(1, 1) = 3
aIn(2, 0) = 2 : aIn(2, 1) = 5
Dim dicX : Set dicX = CreateObject("Scripting.Dictionary")
Dim i
For i = LBound(aIn, 1) To UBound(aIn, 1)
dicX(aIn(i, 0)) = dicX(aIn(i, 0)) + aIn(i, 1)
Next
ReDim aOut(dicX.Count - 1, 1)
For i = LBound(aOut, 1) To UBound(aOut, 1)
aOut(i, 0) = dicX.Keys()(i)
aOut(i, 1) = dicX(aOut(i, 0))
Next
For i = LBound(aOut, 1) To UBound(aOut, 1)
WScript.Echo aOut(i, 0), aOut(i, 1)
Next
output:
======
1 8
2 5
======

Random function and calculating percentage

Using a random library with these functions:
randomChance(p) Returns true with the probability indicated by p.
randomInteger(low, high) Returns a random integer in the range low to high, inclusive.
what is the easiest way to implement a "random selector" that takes consideration of percentage, 1/4 or 1/3 etc... I got a array with key/value pairing. For example "a" migth have the value 2 and "b" have the value 2. 1/2 chance for both.
The max value will be the size of the array, cause it only contains unique items. The randomChance() function ranges between 0.0 - 1.0 where 1 = 100%. If my array size is, say 4. What is the best way of "letting 4 be 1".
Lets say you have:
a = 2, b = 2, c = 1, d = 3
now make it:
a = 2, b = 4, c = 5, d = 8
Create a random number from 1 to MaxVal (value of the last key, 8 in this example). Select the first Key where Value >= RandomNum
EDIT
I made a small VB.Net to show the algorithm and how it works. The code is not meant to be: Good, elegant, performant or readable.
Module Module1
Private Class Value
Public vOrg, vRecalc, HitCount As Integer
Public Key As String
Public Sub New(s, v1, v2, c)
Key = s : vOrg = v1 : vRecalc = v2 : HitCount = c
End Sub
End Class
Sub Main()
' set initial values
Dim KVP() As Value = {New Value("A", 2, 0, 0),
New Value("B", 2, 0, 0),
New Value("C", 1, 0, 0),
New Value("D", 3, 0, 0)}
' recalc values
For i = 0 To KVP.Length - 1
If i = 0 Then KVP(0).vRecalc = KVP(0).vOrg Else KVP(i).vRecalc = KVP(i).vOrg + KVP(i - 1).vRecalc
Next
' do test
Dim r As New Random
Dim runs As Integer = 1000 * 1000, maxval As Integer = KVP(KVP.Length - 1).vRecalc
For i = 1 To runs
Dim RandVal = r.Next(1, maxval + 1)
Dim chosen As Integer = (From j In Enumerable.Range(0, KVP.Length) Where KVP(j).vRecalc >= RandVal Take 1 Select j)(0)
KVP(chosen).HitCount += 1
Next
' ouput results
For Each kv In KVP
Console.WriteLine("{0} was chosen with {1:F3} propability, expected was {2:F3}", kv.Key, kv.HitCount / CDbl(runs), kv.vOrg / CDbl(maxval))
Next
Console.ReadLine()
End Sub
End Module
An output sample:
A was chosen with 0.250 propability, expected was 0.250
B was chosen with 0.251 propability, expected was 0.250
C was chosen with 0.124 propability, expected was 0.125
D was chosen with 0.375 propability, expected was 0.375
just multiply the randomChance() outcome and the array length together. It'll give you the index in the range [0,array_length-1] which you can use to access the array
array_index = (unsigned int)(randomChance(p) * (array_length - 1));
maybe you mean "letting 3 to be 1" (not 4) in your example. The last index of an array of length 4 is 3.

Collapsing a 10 period curve to 4 periods

I have a 10 period cost curve table below. How do I programmatically collapse/condense/shrink this to 4 periods. I'm using VBA but I should be able to follow other languages. The routine should work for whatever period you pass to it. For example, if I pass it a 7 it should condense the percentages to 7 periods. If I pass it 24 then expand the percentages to 24 periods, spreading the percentages based on the original curve. Any help or example will be appreciated. Thanks...
ORIGINAL
Period Pct
1 10.60%
2 19.00%
3 18.30%
4 14.50%
5 10.70%
6 8.90%
7 6.50%
8 3.10%
9 3.00%
10 5.40%
COLLAPSED
Period Pct
1 38.75%
2 34.35%
3 16.95%
4 9.95%
EDITED: I've added sample code below as to what I have so far. It only works for periods 1, 2, 3, 5, 9, 10. Maybe someone can help modify it to work for any period. Disclaimer, I'm not a programmer so my coding is bad. Plus, I have no clue as to what I'm doing.
Sub Collapse_Periods()
Dim aPct As Variant
Dim aPer As Variant
aPct = Array(0.106, 0.19, 0.183, 0.145, 0.107, 0.089, 0.065, 0.031, 0.03, 0.054)
aPer = Array(1, 2, 3, 5, 9, 10)
For i = 0 To UBound(aPer)
pm = 10 / aPer(i)
pct1 = 1
p = 0
ttl = 0
For j = 1 To aPer(i)
pct = 0
k = 1
Do While k <= pm
pct = pct + aPct(p) * pct1
pct1 = 1
p = p + 1
If k <> pm And k = Int(pm) Then
pct1 = (pm - Int(pm)) * j
pct = pct + (pct1 * aPct(p))
pct1 = 1 - pct1
End If
k = k + 1
Loop
Debug.Print aPer(i) & " : " & j & " : " & pct
ttl = ttl + pct
Next j
Debug.Print "Total: " & ttl
Next i
End Sub
I would like to know how this is done also using an Integral? This is how I would have done it - perhaps it's a longhand/longwinded method but I'd like to see some better suggestions.
It's probably easier to see the method in Excel first using the LINEST function and Named ranges. I've assumed the function is logarithmic. I've outlined steps [1.] - [5.]
This VBA code then essentially replicates the Excel method using a function to pass 2 arrays, periods and a return array that can be written to a range
Sub CallingProc()
Dim Periods As Long, returnArray() As Variant
Dim X_Values() As Variant, Y_Values() As Variant
Periods = 4
ReDim returnArray(1 To Periods, 1 To 2)
With Sheet1
X_Values = Application.Transpose(.Range("A2:A11"))
Y_Values = Application.Transpose(.Range("B2:B11"))
End With
FGraph X_Values, Y_Values, Periods, returnArray 'pass 1D array of X, 1D array of Y, Periods, Empty ReturnArray
End Sub
Function FGraph(ByVal x As Variant, ByVal y As Variant, ByVal P As Long, ByRef returnArray As Variant)
Dim i As Long, mConstant As Double, cConstant As Double
'calc cumulative Y and take Ln (Assumes Form of Graph is logarithmic!!)
For i = LBound(y) To UBound(y)
If i = LBound(y) Then
y(i) = y(i)
Else
y(i) = y(i) + y(i - 1)
End If
x(i) = Log(x(i))
Next i
'calc line of best fit
With Application.WorksheetFunction
mConstant = .LinEst(y, x)(1)
cConstant = .LinEst(y, x)(2)
End With
'redim array to fill for new Periods
ReDim returnArray(1 To P, 1 To 2)
'Calc new periods based on line of best fit
For i = LBound(returnArray, 1) To UBound(returnArray, 1)
returnArray(i, 1) = UBound(y) / P * i
If i = LBound(returnArray, 1) Then
returnArray(i, 2) = (Log(returnArray(i, 1)) * mConstant) + cConstant
Else
returnArray(i, 2) = ((Log(returnArray(i, 1)) * mConstant) + cConstant) - _
((Log(returnArray(i - 1, 1)) * mConstant) + cConstant)
End If
Next i
'returnArray can be written to range
End Function
EDIT:
This VBA code now calculates the linear trend of the points either side of the new period reduction. The data is returned in a 2dimension array named returnArray
Sub CallingProc()
Dim Periods As Long, returnArray() As Variant
Dim X_Values() As Variant, Y_Values() As Variant
Periods = 4
ReDim returnArray(1 To Periods, 1 To 2)
With Sheet1
X_Values = Application.Transpose(.Range("A2:A11"))
Y_Values = Application.Transpose(.Range("B2:B11"))
End With
FGraph X_Values, Y_Values, returnArray 'pass 1D array of X, 1D array of Y, Dimensioned ReturnArray
End Sub
Function FGraph(ByVal x As Variant, ByVal y As Variant, ByRef returnArray As Variant)
Dim i As Long, j As Long, mConstant As Double, cConstant As Double, Period As Long
Period = UBound(returnArray, 1)
'calc cumulative Y
For i = LBound(y) + 1 To UBound(y)
y(i) = y(i) + y(i - 1)
Next i
'Calc new periods based on line of best fit
For i = LBound(returnArray, 1) To UBound(returnArray, 1)
returnArray(i, 1) = UBound(y) / Period * i
'find position of new period to return adjacent original data points
For j = LBound(x) To UBound(x)
If returnArray(i, 1) <= x(j) Then Exit For
Next j
'calc linear line of best fit between existing data points
With Application.WorksheetFunction
mConstant = .LinEst(Array(y(j), y(j - 1)), Array(x(j), x(j - 1)))(1)
cConstant = .LinEst(Array(y(j), y(j - 1)), Array(x(j), x(j - 1)))(2)
End With
returnArray(i, 2) = (returnArray(i, 1) * mConstant) + cConstant
Next i
'returnarray holds cumulative % so calc period only %
For i = UBound(returnArray, 1) To LBound(returnArray, 1) + 1 Step -1
returnArray(i, 2) = returnArray(i, 2) - returnArray(i - 1, 2)
Next i
'returnArray now holds your data
End Function
Returns:
COLLAPSED
1 38.75%
2 34.35%
3 16.95%
4 9.95%

Resources