Computing min and max values of a matrix - math

This question might be closed as it sounds vague but I'm really asking this because I have no idea or my math background is not sufficient enough.
I'm trying to implement a challenge and a part of the challenge requires me to compute min and max values of a matrix. I have no trouble with matrix implementations and its operations however what is the min and max values of a matrix? Considering a 3x3 matrix is the min the smallest number among 9 numbers and max is the greatest or something else?

It really depends. The maximum could be the maximum entry, or the entry with the maximum absolute value or it could be the row (col) vector that is largest with respect to some norm.

Related

R: Data-cleaning dataframe that exceeds 2 standard derivations from mean

I simply don't know how to do it, and my coding experience is very limited. I have 90 reaction times per subject and some of the trial times are too high (because of lacking motivation or attention). How can I extract subject data that is too high (not in comparison to other subjects, in comparison to the subjects own mean and SD).
print(d1$(Reaction.time < mean+2*SD))
as you can see I have no clue what I'm doing, but eh, I'm trying
d1 is the dataframe containing the data of one particular subject.
Reaction.time is one column of the dataframe containing (duh) all the reaction times of that subject.
mean (mean reaction time) is a column that I added via mutate(and so on) and SD is the standard deviation of that subject that I added via mutate(and so on) as well. SD and mean can be seen in the dataframe. But how can I take out (/or only print the others) all the rows containing reaction times that are above 2 SD from mean and also all rows that are below 2 SD from Mean.

How to find the max value of a specific column in a matrix

I am new to R. Many thanks in advance for your help.
I am trying to find the maximum value of a single (the first) column in a matrix, "bin.matrix". Of course, I have been able to find the max value for all columns using:
apply(bin.matrix, 2, max)
But I can't seem to figure out how to get the value for just the first column. It's a homework question, so just reading the first value won't do unfortunately. The next question asks for the max value in all but the first column.
Thanks again for your help.
We can select the first column by subsetting with numeric index and get the max
max(bin.matrix[,1])

Find samples from numeric vector that have a predefined mean value

I am using historical yearly rainfall data to devise 'whatif' scenarios of altered rainfall in ecological models. To do that, I am trying to sample actual rainfall values to create a sample of rainfall years that meet a certain criteria (such as sample of rainfall years that are 10% wetter than the historical average).
I have come up with a relatively simple brute force method described below that works ok if I have a single criteria (such as a target mean value):
rainfall_values = c(270.8, 150.5, 486.2, 442.3, 397.7,
593.4191, 165.608, 116.9841, 265.69, 217.934, 358.138, 238.25,
449.842, 507.655, 344.38, 188.216, 210.058, 153.162, 232.26,
266.02801, 136.918, 230.634, 474.984, 581.156, 674.618, 359.16
)
#brute force
sample_size=10 #number of years included in each sample
n_replicates=1000 #number of total samples calculated
target=mean(rainfall_values)*1.1 #try to find samples that are 10% wetter than historical mean
tolerance=0.01*target #how close do we want to meet the target specified above?
#create large matrix of samples
sampled_DF=t(replicate(n_replicates, sample(x=rainfall_values, size=sample_size, replace=T)))
#calculate mean for each sample
Sampled_mean_vals=apply(sampled_DF,1, mean)
#create DF only with samples that meet the criteria
Sampled_DF_on_target=sampled_DF[Sampled_mean_vals>(target-tolerance)&Sampled_mean_vals<(target+tolerance),]
The problem is that I will eventually have multiple criteria to match (not only a means target, but also standard deviation, and auto correlation coefficients, etc.). With more complex multivariate targets, this brute force method becomes really inefficient in finding matches where I essentially have to look over millions of samples, and taking days even when parallelized...
So -my question is- is there any way to implement this search using an optimization algo or other non-brute force approach?
Some approaches to this kind of question are covered in this link. One respondent calls the "rejection" method what you refer to as the "brute force" method.
This link addresses a related question.

Peculiarity with Scale and Z-Score

I was attempting to scale my data in R after doing some research on the function (which it seems to follow (x - mean) / std.dev. This was just what I was looking for, so I scaled my dataframe in R. I'd also want to make sure my assumptions are correct so that I don't have wrong conclusions.
Assumption
R scales each column independently. Therefore, column 1 will have its own mean and standard deviation. Column 2 will have its own.
Assuming I have a dataset of size 100,000 and I scale 3 columns. If I proceed to remove all columns with a Z-Score over 3 and less than -3, I could have up to (100,000 * .003) = 900 rows removed!
However, when I went to truncate my data, my 100,000 rows were left with 94,798. This means 5,202 rows were removed.
Does this mean my assumption about scale was wrong, and that it doesn't scale by column?
Update
So I ran a test and did Z-Score conversion on my own. Still the same amount of columns removed in the end so I believe scale does work. Now I'm just curious why more than .3% of the data is removed when 3 standard deviations out are removed.

Setting a maximum limit for values in a data frame in R

In a data frame (in R), I have two columns - the first is a list of species names (species), the second is the number of occurrence records I have for that species (number). There is a large variation in the number column with most values being <100 but a few being very high values (>100,000), and there are many rows (~4000). Here is a simplified example:
x<-data.frame(species=c("a","b","c","d","e","f","g","h","i","j"),number=c(53,17,67,989,135,67,13,786,100400,28))
Basically what I want to do is reduce the maximum number of records (the value in the number column) until the mean of all the values in this column stabilises.
To do this, I need to set a maximum limit for values in the number column so that any value > this limit is reduced to this maximum limit, and record the mean. I want to repeat this multiple times, each time reducing the maximum limit by 100.
I've not been able to find any similar questions online and am not really sure where to start with this! Any help, even just a point in the right direction, would be much appreciated! Cheers
you should use the pmin value :
pmin(x$number, 1e3)
# to test multiple limits :
mns <- sapply(c(1e6, 1e4, 1e2), function(u) mean(pmin(x$number, u)))

Resources