LibreOffice Calc: Find a cell by value - formula

On a calc sheet I have a vector with the following values:
a
b
c
d
e
1
3.14
3.11
27
2.12
0.005
2
31
7.21
55
32.12
0.003
3
45
8.31
12
7.77
0.515
Is there a way to determine in which row and in which column a certain value is found?
For example: using the value “55” I need a formula that returns me “C2”, using the value “0.005” instead the formula should return me “E1”.

Perhaps this solution will help you:
compare all the cells in the original range with the value... If the value does not match, then use an empty string. Otherwise, use the ADDRESS() function to get the coordinates of the cell. Combine the results with TEXTJOIN() and don't forget that this is an array formula, complete the formula with Ctrl+Shift+Enter .
{=TEXTJOIN(";";1;IF($A$1:$E$3=0.005;ADDRESS(ROW($A$1:$E$3);COLUMN($A$1:$E$3);4)))}
Please, be careful. I hope that you randomly generated values like 3.14, 3.1, 0.515 just for example, but in fact your values are more accurate. The thing is, if you look up something like =1/1.9415 in your table, you won't get E3 as the result, since 0.515 in the table will not equal 0.515066 as a result of the calculation

Related

Similar to R's rbinom function on SQL Server

I've recently using R's rbinom function for generate a success event such as
rbinom(7,7,p=0.81)
The case if, now I have a table that looked like
num denum prob
4 7 0.57
5 8 0.625
3 4 0.75
2 5 0.4
9 11 0.81
I want to create new column that come from sum of success event of rbinom function using the num,denum & prob column from the table for each row like this
table %>%
mutate(new_column=sum(rbinom(num,denum,prob)))
but this return to give me same result for each row even with different num, denum & prob value.
The question is :
Is there any problem with my code so it always return a same result for whole table?
And if the source table is on my SQL Server Database, is it possible to do the same objective with SQLServer function? (rbinom function for SQLServer)
Thank you.

Match each row in a table to a row in another table based on the difference between row timestamps

I have two unevenly-spaced time series that each measure separate attributes of the same system. The two series's data points are not sampled at the same times, and the series are not the same length. I would like to match each row from series A to the row of B that is closest to it in time. What I have in mind is to add a column to A that contains indexes to the closest row in B. Both series have a time column measured in Unix time (eg. 1459719755).
for example, given two datasets
a time
2 1459719755
4 1459719772
3 1459719773
b time
45 1459719756
2 1459719763
13 1459719766
22 1459719774
The first dataset should be updated to
a time index
2 1459719755 1
4 1459719772 4
3 1459719773 4
since B[1,]$time has the closest value to A[1,]$time, B[4,]$time has the closest value to A[2,]$time and A[3,]$time.
Is there any convenient way to do this?
Try something like this:
(1+ecdf(bdat$time)(adat$time)*nrow(bdat))
[1] 1 4 4
Why should this work? The ecdf function returns another function that has a value from 0 to 1. It returns the "position" in the "probability range" [0,1] of a new value in a distribution of values defined by the first argument to ecdf. The expression is really just rescaling that function's result to the range [1, nrow(bdat)]. (I think it's flipping elegant.)
Another approach would be to use approxfun on the sorted values of bdat$time which would then let get you interpolated values. These might need to be rounded. Using them as indices would instead truncate to integer.
apf <- approxfun( x=sort(bdat$time), y=seq(length( bdat$time)) ,rule=2)
apf( adat$time)
#[1] 1.000 3.750 3.875
round( apf( adat$time))
#[1] 1 4 4
In both case you are predicting a sorted value from its "order statistic". In the second case you should check that ties are handled in the manner you desire.

In R, how to take the mean of a varying number of elements for each row in a data frame?

So I have a dataframe, PVALUES, like this:
PVALS <- read.csv(textConnection("PVAL1 PVAL2 PVAL3
0.1 0.04 0.02
0.9 0.001 0.98
0.03 0.02 0.01"),sep = " ")
That corresponds to another dataframe, DATA, like this:
DATA <- read.csv(textConnection("COL1 COL2 CO3
10 2 9
11 20 200
2 3 5"),sep=" ")
For every row in DATA, I'd like to take the mean of the numbers whose indices correspond to entries in PVALUES that are <= 0.05.
So, for example, the first row in PVALUES only has two entries <= 0.05, the entries in [1,2] and [1,3]. Therefore, for the first row of DATA, I want to take the mean of 2 and 9.
In the second row of PVALUES, only the entry [2,2] is <=0.05, so instead of taking the mean for the second row of DATA, I would just use DATA[20,20].
So, my output would look like:
MEANS
6.5
20
3.33
I thought I might be able to generate indices for every entry in PVALUES <=0.05, and then use that to select entries in DATA to use for the mean. I tried to use this command to generate indices:
exp <- which(PVALUES[,]<=0.05, arr.ind=TRUE)
...but it only picks up on indices for entries the first column that are <=0.05. In my example above, it would only output [3,1].
Can anyone see what I'm doing wrong, or have ideas on how to tackle this problem?
Thank you!
It's a bit funny looking, but this should work
rowMeans(`is.na<-`(DATA,PVALUES>=.05), na.rm=T)
The "ugly" part is calling is.na<- without doing the automatic replacement, but here we just set all data with p-values larger than .05 to missing and then take the row means.
It's unclear to me exactly what you were doing with exp, but that type of method could work as well. Maybe with
expx <- which(PVALUES[,]<=0.05, arr.ind=TRUE)
aggregate(val~row, cbind(expx,val=DATA[exp]), mean)
(renamed so as not to interfere with the built in exp() function)
Tested with
PVALUES<-read.table(text="PVAL1 PVAL2 PVAL3
0.1 0.04 0.02
0.9 0.001 0.98
0.03 0.02 0.01", header=T)
DATA<-read.table(text="COL1 COL2 CO3
10 2 9
11 20 200
2 3 5", header=T)
I usually enjoy MrFlick's responses but the use of is.na<- in that manner seems to violate my expectations of R code because it is destructively modifies the data. I admit that I probably should have been expecting that possibility because of assignment arrow but it surprised me nonetheless. (I don't object to data.table code because it is hones t and forthright about modifying its contents with the := function.) I also admit that my efforts to improve one it lead me down a rabbit hole where I found this equally "baroke" effort. (You have incorrectly averaged 2 and 9)
sapply( split( DATA[which( PVALS <= 0.05, arr.ind=TRUE)],
which( PVALS <= 0.05, arr.ind=TRUE)[,'row']),
mean)
1 2 3
5.500000 20.000000 3.333333

Having trouble understanding how "Identical" works

I have a probably very stupid question regarding the identical() function.
I was writting a script to test if some values come several time in my data.frame to regroup them. I compare values 2 by 2 over 4 columns.
I identified some in my table, and wanted to test my script. Here is part of the data.frame:
Ret..Time Mass Ret..Time Mass deltaRT deltaMZ
178 3.5700 797.6324 3.4898 797.6018 0.0802 0.0306
179 3.6957 797.6519 3.7502 797.5798 0.0545 0.0721
180 3.3526 797.6655 3.2913 797.5980 0.0613 0.0675
182 3.1561 797.7123 3.1650 797.5620 0.0089 0.1503
182.1 3.1561 797.7123 3.0623 797.6174 0.0938 0.0949
183 3.4495 797.8207 3.3526 797.6655 0.0969 0.1552
So here the elements of column 1 and 2 on row "180" are equal to those in 3 and 4 on row "183".
Here is what I get and what confuses me:
all.equal(result["180",1:2],result["183",3:4])
[1] "Attributes: < Component “row.names”: 1 string mismatch >"
identical(result["180",1:2],result["183",3:4])
[1] FALSE
identical(result["180",1],result["183",3]) & identical(result["180",2],result["183",4])
[1] TRUE
I get that all.equal reacts to the different rownames (although I don't really understand why, I'm asking to compare the values in specifice columns, not whole rows).
But why does identical need to compare the values separately? It doesn't work any better if I use result[180,c(1,2)] and result[183,c(3,4)]. Does identical() start to use the rownames too if I compare more than 1 value? How to prevent that? In my case, I have only 2 values to compare to 2 other values, but what if the string to compare was spanned over 10 columns? Would I need to add & and identical() to compare each of the 10 columns individually?
Thanks in advance!
Keep in mind that not only the value but also all attributes must match for identical to return TRUE . Consider:
foo<-1
bar<-1
dim(foo)<-c(1,1)
identical(foo,bar)
[1] FALSE

Right way to convert data.frame to a numeric matrix, when df also contains strings?

I have a data frame taken from a .csv-file which contains numeric and character values. I want to convert this data frame into a matrix. All containing information is numbers (the non-number-rows I deleted), so it should be possible to convert the data frame into a numeric matrix. However, I do get a character matrix.
I found the only way to solve this is to use as.numeric for each and every row, but this is quite time-consuming. I am quite sure there is a way to do this with some kind of if(i in 1:n)-form, but I cannot figure out how it might work. Or is the only way really to already start with numeric values, like proposed here(Making matrix numeric and name orders)?
Probably this is a very easy thing for most of you :P
The matrix is a lot bigger, this is only the first few rows... Here's the code:
cbind(
as.numeric(SFI.Matrix[ ,1]),
as.numeric(SFI.Matrix[ ,2]),
as.numeric(SFI.Matrix[ ,3]),
as.numeric(SFI.Matrix[ ,4]),
as.numeric(SFI.Matrix[ ,5]),
as.numeric(SFI.Matrix[ ,6]))
# to get something like this again:
Social.Assistance Danger.Poverty GINI S80S20 Low.Edu Unemployment
0.147 0.125 0.34 5.5 0.149 0.135 0.18683691
0.258 0.229 0.27 3.8 0.211 0.175 0.22329362
0.207 0.119 0.22 3.1 0.139 0.163 0.07170422
0.219 0.166 0.25 3.6 0.114 0.163 0.03638525
0.278 0.218 0.29 4.1 0.270 0.198 0.27407825
0.288 0.204 0.26 3.6 0.303 0.211 0.22372633
Thank you for any help!
Edit 2: See #flodel's answer. Much better.
Try:
# assuming SFI is your data.frame
as.matrix(sapply(SFI, as.numeric))
Edit:
or as # CarlWitthoft suggested in the comments:
matrix(as.numeric(unlist(SFI)),nrow=nrow(SFI))
data.matrix(SFI)
From ?data.matrix:
Description:
Return the matrix obtained by converting all the variables in a
data frame to numeric mode and then binding them together as the
columns of a matrix. Factors and ordered factors are replaced by
their internal codes.
Here is an alternative way if the data frame just contains numbers.
apply(as.matrix.noquote(SFI),2,as.numeric)
but the most reliable way of converting a data frame to a matrix is using data.matrix() function.
I had the same problem and I solved it like this, by
taking the original data frame without row names and adding them later
SFIo <- as.matrix(apply(SFI[,-1],2,as.numeric))
row.names(SFIo) <- SFI[,1]
Another way of doing it is by using the read.table() argument colClasses to specify the column type by making colClasses=c(*column class types*).
If there are 6 columns whose members you want as numeric, you need to repeat the character string "numeric" six times separated by commas, importing the data frame, and as.matrix() the data frame.
P.S. looks like you have headers, so I put header=T.
as.matrix(read.table(SFI.matrix,header=T,
colClasses=c("numeric","numeric","numeric","numeric","numeric","numeric"),
sep=","))
I manually filled NAs by exporting the CSV then editing it and reimporting, as below.
Perhaps one of you experts might explain why this procedure worked so well
(the first file had columns with data of types char, INT and num (floating point numbers)), which all became char type after STEP 1; but at the end of STEP 3 R correctly recognized the datatype of each column).
# STEP 1:
MainOptionFile <- read.csv("XLUopt_XLUstk_v3.csv",
header=T, stringsAsFactors=FALSE)
#... STEP 2:
TestFrame <- subset(MainOptionFile, str_locate(option_symbol,"120616P00034000") > 0)
write.csv(TestFrame, file = "TestFrame2.csv")
# ...
# STEP 3:
# I made various amendments to `TestFrame2.csv`, including replacing all missing data cells with appropriate numbers. I then read that amended data frame back into R as follows:
XLU_34P_16Jun12 <- read.csv("TestFrame2_v2.csv",
header=T,stringsAsFactors=FALSE)
On arrival back in R, all columns had their correct measurement levels automatically recognized by R!

Resources