Locating minimum on table/output - r

(R studio)
Basic question about how to go about finding the minimum of a value on a table.
The table is longer than this (nsplit goes to 1532), which is why I'm looking for a search function.
In the picture basically I'd like to find the minimum value of "xerror", and after that I'd like to find "nsplit" at the minimum of "xerror"
I'd definitely appreciate any help.

You can use the following code (assuming the name of your data frame is d):
d[which(d$xerror==min(d$xerror)),]
With this code you can find values of every other variables (including "nsplit") at the minimum value of "xerror". You can also see which observation it is at the left most line of the output.

Related

Stata tables/collect confidence interval in one cell

I work a lot with the new tables collect command in stata 17. Does anybody know how to get the confidence interval in one cell in the table vs. One column for lower bound and one column for the upper bound estimate?
Alternatively a quick fix in word (or excel though my final document is word. Saving the output in excel takes so long)
Is I see it there is no option to put it in one column, so maybe a layout work around?
From the stata documentation of the collect command, the quick start mentions
table (colname) (result), command(_r_b _r_ci: regress y x1 x2 x3). You should be able to use collect with it, but without a minimum reproducible example of your specific case, it is hard to verify if this works as intended in your case. For the general idea of a minimum reproducible example please see here and for specific advice on how to create a minimum reproducible example please see here.
Here is a general example that uses table, collect and putdocx to create a word document to get the confidence interval in one cell:
use https://www.stata-press.com/data/r17/nlsw88.dta
table (colname) (result), command(_r_b _r_ci: regress wage union occupation married age)
collect layout (colname) (result)
putdocx begin
putdocx collect
putdocx save Table, replace

Find the index of the last occurence of fulfilled criteria in a matrix in r

I have an array (x) in R of size 30x11x10.
x=array(-2:20, c(30,11,10))
Each 'grid' or matrix represents a day of data for a month (30 days represented here). I want to find the index (i,j,k) of when the last occurrence of a number less than 2 occurs. Ideally, I would also like the value returned too. If this was in Matlab, I could just use [i,j,k]=find(x(x<2)) but I don't see an exact equivalent for this in R.
I have looked at 'match' as suggested in other posts here, but it seems to find elements when they are specified, but not when a criteria (x<2) is given?
I tried this:
xxx<-match(x,x<2,0) but it returns a long vector of integers that don't appear to show what I am looking for.
Then I tried:xxx<-match(x,x[x<2],0) which looks a bit more promising, but still isn't what I want (to be honest I'm not sure what the output is indexing).
I think I'm probably asking a foolish question here because if I want 3 indices and the value returned, then I should be assigning them to something preemptively right (which I'm not doing)? Can anyone offer any advice?

Is there a way to extract a substring from a cell in OpenOffice Calc?

I have tens of thousands of rows of unstructured data in csv format. I need to extract certain product attributes from a long string of text. Given a set of acceptable attributes, if there is a match, I need it to fill in the cell with the match.
Example data:
"[ROOT];Earrings;Brands;Brands>JeweleryExchange;Earrings>Gender;Earrings>Gemstone;Earrings>Metal;Earrings>Occasion;Earrings>Style;Earrings>Gender>Women's;Earrings>Gemstone>Zircon;Earrings>Metal>White Gold;Earrings>Occasion>Just to say: I Love You;Earrings>Style>Drop/Dangle;Earrings>Style>Fashion;Not Visible;Gifts;Gifts>Price>$500 - $1000;Gifts>Shop>Earrings;Gifts>Occasion;Gifts>Occasion>Christmas;Gifts>Occasion>Just to say: I Love You;Gifts>For>Her"
Look up table of values:
Zircon, Diamond, Pearl, Ruby
Output:
Zircon
I tried using the VLOOKUP() function, but it needs to match an entire cell and works better for translating acronyms. Haven't really found a built in function that accomplishes what I need. The data is totally unstructured, and changes from row to row with no consistency even within variations of the same product. Does anyone have an idea how to do this?? Or how to write an OpenOffice Calc function to accomplish this? Also open to other better methods of doing this if anyone has any experience or ideas in how to approach this...
ok so I figured out how to do this on my own... I created many different columns, each with a keyword I was looking to extract as a header.
Spreadsheet solution for structured data extraction
Then I used this formula to extract the keywords into the correct row beneath the column header. =IF(ISERROR(SEARCH(CF$1,$D769)),"",CF$1) The Search function returns a number value for the position of a search string otherwise it produces an error. I use the iserror function to determine if there is an error condition, and the if statement in such a way that if there is an error, it leaves the cell blank, else it takes the value of the header. Had over 100 columns of specific information to extract, into one final column where I join all the previous cells in the row together for the final list. Worked like a charm. Recommend this approach to anyone who has to do a similar task.

Google Spreadsheet IF and AND

im trying to find an easy formula to do the following:
=IF(AND(H6="OK";H7="OK";H8="OK";H9="OK";H10="OK";H11="OK";);"OK";"X")
This actually works. But I want to apply to a range of cells within a column (H6:H11) instead of having to create a rule for each and every one of them... But trying as a range:
=IF(AND(H6:H11="OK";);"OK";"X")
Does not work.
Any insights?
Thanks.
=ArrayFormula(IF(AND(H6:H11="OK");"OK";"X"))
also works
arrayformulas work the same way they do in excel... they just need an ArrayFormula() around to work (will be automatically set when pressing Ctrl+Alt+Return like in excel)
In google sheets the formula is:
=ArrayFormula(IF(SUM(IF(H6:H11="OK";1;0))=6;"OK";"X"))
in excel:
=IF(SUM(IF(H6:H11="OK";1;0))=6;"OK";"X")
And confirm with Ctrl-Shift-Enter
This basically counts the number of times the said range is = to the criteria and compares it to the number it should be. So if the range is increased then increase the number 6 to accommodate.

Extract formula from Excel Data Table (What-If Analysis)

I am faced with rewriting an Excel project in R. I see a table in which a cell {= TABLE (F2, C2)} is shown. I understand how to create a Table like this (What-If Analysis, Data Table...).
As I have to understand this to rewrite in R, how can I find the original formula which stands behind that cell?
EXAMPLE: I have created a Data Table as shown here and the sheet looks like this:
In my case, I don't know how the sheet was created, and I want to know the initial formula. Now this is shown as {=TABLE(,C4)}.
(In the example I know the answer, it is in the cell (D10), but where is reference for this cell in Data Table?)
I'm using Excel 2007 but have no reason to believe things differ in other versions.
#Stanislav was right to reject my comment suggestion that TABLE was a name; it is an EXCEL function. But it is a very strange function :-}
There isn't any help on the TABLE function in the local help, it isn't listed in "List of worksheet functions (alphabetical)".
You can't manually enter or edit the TABLE function; error "That function is not valid".
Copy/Pasting cells containing the TABLE function pastes their values, not their formulae, even when you specify Paste Special > Formulas
You can't insert rows/columns immediately above/left of cells containing the TABLE function; error "Cannot change part of a data table".
Pace #pnuts using Formulas > Formula Auditing cells containing the TABLE function shows no precedents and no cells show them as dependents. Although in a VBA sheet auditing tool which I use the Range.DirectDependents Property finds the "formula range" dependent on the "margin" cells containing the formulas, but not those containg the values (see below for explanation of those terms).
I haven't been able to find anything I regard as decent documentation of TABLE(). I have found lots of illustrations of how to produce and use that function, but nothing clearly specifying the arguments and result. The best I've found is https://support.office.com/en-us/article/Calculate-multiple-results-by-using-a-data-table-e95e2487-6ca6-4413-ad12-77542a5ea50b. I'd be pleased if anyone can point me to better documentation.
I deduce the bahaviour as described here:
TABLE(Rowinp,Colinp) is an array formula in a contiguous array of cells. I'll refer to that contiguous array as the "formula range" of the data table.
The cells immediately above/left of the formula range are also part of the data table, even though they do not contain a TABLE() function and can be edited; I'll refer to those cells as the "margins" of the data table.
Rowinp and Colinp must be blank or references to single cells.
Rowinp and Colinp must be different (or error "Input cell reference is not valid"), they must not both be blank.
The values in the formula range are calculated by taking formula(s) from the margin(s) and substituting references to Rowinp and/or Colinp with values from the margin(s).
There are three mutually exclusive possibilities, corresponding to Rowinp blank or not.
TABLE(Rowinp, ) Colinp blank. The formula is that in the left margin of the same row with instances of Rowinp replaced by values from the upper margin of the same column.
TABLE( , Colinp) Rowinp blank. The formula is that in the top margin of the same column with instances of Colinp replaced by values from the the left margin of the same row.
TABLE(Rowinp, Colinp) Neither blank. The formula is that in the cell at the intersection of the left and top margins with instances of Rowinp replaced by values from the upper margin of the same column and instances of Colinp replaced by values from the the left margin of the same row.
I think that should let you work out what the effective formula is in each cell of the formula range.
But I wouldn't be surprised to learn that any of the above is wrong :-0
I welcome pointers to anything more authoritative.
I think in your example the F2 and C2 are effectively only the addresses of parameters for a function (TABLE) where that may be located anywhere, with the associated formula in the table's top left cell.
So I suggest go to C2, FORMULAS > Formula Auditing and click Trace Dependents, repeat for F2 and see where the arrows converge.

Resources