Stata tables/collect confidence interval in one cell - collections

I work a lot with the new tables collect command in stata 17. Does anybody know how to get the confidence interval in one cell in the table vs. One column for lower bound and one column for the upper bound estimate?
Alternatively a quick fix in word (or excel though my final document is word. Saving the output in excel takes so long)
Is I see it there is no option to put it in one column, so maybe a layout work around?

From the stata documentation of the collect command, the quick start mentions
table (colname) (result), command(_r_b _r_ci: regress y x1 x2 x3). You should be able to use collect with it, but without a minimum reproducible example of your specific case, it is hard to verify if this works as intended in your case. For the general idea of a minimum reproducible example please see here and for specific advice on how to create a minimum reproducible example please see here.
Here is a general example that uses table, collect and putdocx to create a word document to get the confidence interval in one cell:
use https://www.stata-press.com/data/r17/nlsw88.dta
table (colname) (result), command(_r_b _r_ci: regress wage union occupation married age)
collect layout (colname) (result)
putdocx begin
putdocx collect
putdocx save Table, replace

Related

Grouping and transposing data in R

It is hard to explain this without just showing what I have, where I am, and what I need in terms of data structure:
What structure I had:
Where I have got to with my transformation efforts:
What I need to end up with:
Notes:
I've not given actual names for anything as the data is classed as sensitive, but:
Metrics are things that can be measured- for example, the number of permanent or full-time jobs. The number of metrics is larger than presented in the test data (and the example structure above).
Each metric has many years of data (whilst trying to do the code I have restricted myself to just 3 years. The illustration of the structure is based on this test). The number of years captured will change overtime- generally it will increase.
The number of policies will fluctuate, I've just labelled them policy 1, 2 etc for sensitivity reasons and limited the number whilst testing the code. Again, I have limited the number to make it easier to check the outputs.
The source data comes from a workbook of surveys with a tab for each policy. The initial import creates a list of tibbles consisting of a row for each metric, and 4 columns (the metric names, the values for 2024, the values for 2030, and the values for 2035). I converted this to a dataframe, created a vector to be a column header and used cbind() to put this on top to get the "What structure I had" data.
To get to the "Where I have got to with my transformation efforts" version of the table, I removed all the metric columns, created another vector of metrics and used rbind() to put this as the first column.
The idea in my head was to group the data by policy to get a vector for each metric, then transpose this so that the metric became the column, and the grouped data would become the row. Then expand the data to get the metrics repeated for each year. A friend of mine who does coding (but has never used R) has suggested using loops might be a better way forward. Again, I am not sure of the best approach so welcome advice. On Reddit someone suggested using pivot_wider/pivot_longer but this appears to be a summarise tool and I am not trying to summarise the data rather transform its structure.
Any suggestions on approaches or possible tools/functions to use would be gratefully received. I am learning R whilst trying to pull this data together to create a database that can be used for analysis, so, if my approach sounds weird, feel free to suggest alternatives. Thanks

SPSS: correlating two vectors

I have two vectors in my dataset Vs = s1 to s10 and Vt= t1 to t10.
They describe two pictures and I want to know for each case what the correlation is.
However there is no such a function Cor(Vs, Vt) because Vectors are apparently not usable in the standard functions. There is even no mean(Vs)!
I tried to write syntax but failed also because the problem of missing variables (implementing pairwise deletion seems complex).
Any hint is welcome.
Is it possible to ask a question that is only seen by SPSS experts?
calculating the correlation in the present structure is probably feasible but would be pretty complex. I suggest restructuring the data, then all becomes easy:
The code assumes you have some line ID in the data, called lineNum.
If you don't, you'll need to create one using the first line.
compute lineNum=$casenum. /* this is only necessary if you don't have some other line ID.
varstocases /mame V_s from S1 to S10 /make V_t from V1 to V10 /index=pairNum(V_s).
sort cases by lineNum.
split file by lineNum.
correlations V_s with V_t. /* you can edit the code here to add features to the analysis.
split file off.
That's it. Now the results will appear in the output window - one correlation for each of the original lines. If you need to import the correlations back to the original data you can do that by using OMS control to capture the results into a new dataset and then matching it back to the original file.

How to calculate average annual salary in libreoffice calc

I have salary data table from 10 years period. Every column has properly set data type (date for "B", number for "C" and "E".
I'm trying to write a formula to calculate average salary for every year. In column "E" I've manually entered all possible years and in column "F" should be an yearly average, according to year from "E".
So, my best try is this formula: =AVERAGEIF(YEAR(B2:B133);"="&E2;C2:C133)
Trying so calculate an average from column C, where year in date from column B equals a year in column E
But all I get is an error Err:504. Figured out, that problem is in YEAR(interval) part, but can't get what exactly...
Can someone point that out?
Thank you!
There are actually many possibilities to solve this.
#JvdV answer;
using an array formula with #JvdV solution;
using an array formula with a combination of AVERAGE() and IF();
using the SUMPRODUCT() function;
and surely many other solutions that I don't know about!
Please beware: I use , instead of ; as formula separator, according to my locale; adapt to your needs.
A side note on "array formulas"
This kind of formulas are applied by mandatory pressing the Ctrl + Shift + Enter key combination to insert them, not only Enter or Tab or mouse-clicking elsewhere on the sheet.
The resulting formula is shown between brackets {}, which are not inserted by the user but are automatically shown by the software to inform that this is actually an array formula.
More on array formulas i.e. on the LibreOffice help system.
Usually you cannot drag and drop array formulas, you have to copy-paste them instead.
Array formula with #JvdV solution
The solution of JvdV could be slighly modified like this, and then inserted as an array formula:
=AVERAGEIFS(C$2:C$133,YEAR($B$2:$B$133),"="&E2)
When you insert this formula with the Ctrl + Shift + Enter key combination, the software puts the formula into brackets, so that you see it like this: {=AVERAGEIFS(C$2:C$133,YEAR($B$2:$B$133),"="&E2)}
You cannot simply drag the formula down, but you can copy-paste it.
Array formula with a combination of AVERAGE() and IF():
For your example, put this formula in cell F2 (for the year 2010):
=AVERAGE(IF(YEAR($B$2:$B$133)=E2,$C$2:$C$133))
When you insert this formula with the Ctrl + Shift + Enter key combination, the software puts the formula into brackets, so that you see it like this {=AVERAGE(IF(YEAR($B$2:$B$133)=E2,$C$2:$C$133))}
You cannot simply drag the formula down, but you can copy-paste it.
SUMPRODUCT() formula:
My loved one...
Plenty of resources on the web to explain this formula.
In your situation, this would give:
=SUMPRODUCT($C$2:$C$133,--(YEAR($B$2:$B$133)=E2))/SUMPRODUCT(--(YEAR($B$2:$B$133)=E2))
This one you can drag down to your needs.
Unfortunately AVERAGEIF() expects a range reference instead of a calculated array. Therefor it will error out. That's the theory at least for Excel, and I expect this to be the same for LibreCalc.
One way around it is using the AVERAGEIFS() function and check against first and last days of the year, for example:
=AVERAGEIFS(C$2:C$133;B$2:B$133;">="&DATE(E2;1;1);B$2:B$133;"<="&DATE(E2;12;31))
Drag the formula down.

How do I retrieve aggregate measures from R when I need to pass disaggregated data in Tableau?

I have extensively read and re-read the Troubleshooting R Connections and Tableau and R Integration help documents, but as a new Tableau user they just aren't helping me.
I need to be able to calculate Kaplan-Meier survival probabilities across any dimensions that are dragged onto the sheet. Ideally, I would be able to retrieve this in a tabular format at multiple time points, but for now, I would be happy just to get it at a single time point.
My data in Tableau have columns for [event-boolean] and [time to event]. Let's say I also have columns for Gender and District.
Currently, I have a calculated field [surv] as:
SCRIPT_REAL('
library(survival);
fit <- summary(survfit(Surv(.arg2,.arg1) ~ 1), times=365);
fit$surv'
, min([event-boolean])
, min([time to event])
)
I have messed with Computed Using, Addressing, Partitions, Aggregate Measures, and parameters to the R function, but no combination I have tried has worked.
If [District] is in Columns, do I need to change my SCRIPT_REAL call or do I just need to change some other combination of levers?
I used Andrew's solution to solve this problem. Essentially,
- Turn off Aggregate Measures
- In the Measure Values shelf, select Compute Using > Cell
- In the calculated field, start with If FIRST() == 0 script_*() END
- Ctrl+drag the measure to the Filters shelf and use a Special > Non-null filter.

Locating minimum on table/output

(R studio)
Basic question about how to go about finding the minimum of a value on a table.
The table is longer than this (nsplit goes to 1532), which is why I'm looking for a search function.
In the picture basically I'd like to find the minimum value of "xerror", and after that I'd like to find "nsplit" at the minimum of "xerror"
I'd definitely appreciate any help.
You can use the following code (assuming the name of your data frame is d):
d[which(d$xerror==min(d$xerror)),]
With this code you can find values of every other variables (including "nsplit") at the minimum value of "xerror". You can also see which observation it is at the left most line of the output.

Resources