Does anyone know how to calculate the L_0 norm of a vector in R. You can't do the usual sum(x^p)^(1/p) when p=0 so I was unsure if there was an easy way to do this. Thanks!
It really depends how you define the L_0 norm. There is not a clear consensus.
From wikipedia: ℓ0 "norm" by David Donoho — whose quotation marks warn that this
function is not a proper norm — is the number of non-zero entries of
the vector x. Many authors abuse terminology by omitting the quotation
marks. Defining 0^0 = 0,
just use sum(x != 0)
Related
I found this previous thread here:
Range standardization (0 to 1) in R
This leads me to my question: when building a function to perform a calculation across all values
in a vector, my understanding was that this scenario is when the use of for-loops would be necessary (because said calculation is being all applied to all vector values). However, apparently that is not the case. What am I misunderstanding?
All of the basic arithmetic operations in R, and most of the basic numerical/mathematical functions, are natively vectorized; that is, they operate position-by-position on elements of vectors (or pairs of elements from matched vectors). So for example if you compute
c(1,3,5) + c(2,4,7)
You don't need an explicit for loop (although there is one in the underlying C code!) to get c(3,7,12) as the answer.
In addition, R has vector recycling rules; any time you call an operation with two vectors, the shorter automatically gets recycled to match the length of the longer one. R doesn't have scalar types, so a single number is stored as a numeric vector of length 1. So if we compute
(x-min(x))/(max(x)-min(x))
max(x) and min(x) are both of length 1, so the denominator is also of length 1. Subtracting min(x) from x, min(x) gets replicated/recycled to a vector the same length as x, and then the pairwise subtraction is done. Finally, the numerator (length == length(x)) is divided by the denominator (length 1) following similar rules.
In general, exploiting this built-in vectorization in R is much faster than writing your own for-loop, so it's definitely worth trying to get the hang of when operations can be vectorized.
This kind of vectorization is implemented in many higher-level languages/libraries that are specialized for numerical computation (pandas and numpy in Python, Matlab, Julia, ...)
I am new to R and trying to understand the effect of the following code.
> x <- c(1, 2)
> x[0]
numeric(0)
> x[FALSE]
numeric(0
> x[c(FALSE, TRUE)]
[1] 2
Specifically, having extensive background in C and C++, I am interesting in knowing what R does internally when accessing an element at index 0. I know that R has 1 based array indexing. But in this specific case, does it access the vector and then remove the result (numeric(0)) or does it remove 0 from the vector and show the results?
So, I want to know what is the definitive way to know about this? What should I type in R as part of '?' or 'help' command?
Based on comments from Roland and G. Grothendieck, I did a quick readup of the R language definition. The answer is right there in $3.4.1
A special case is the zero index, which has null effects: x[0] is an
empty vector and otherwise including zeros among positive or negative
indices has the same effect as if they were omitted.
I have the following mathematical formula that I want to program as efficiently as possible in R.
$\sum_{i=1}^{N}(x_i-\bar x)(y_i-\bar y)$
Let's say we have the following example data:
x = c(1,5,7,10,11)
y = c(2,4,8,9,12)
How can I easily get this sum with this data without making a separate function?
Isn't there a package or a function that can compute these mathematical sums?
Use the sum command and vectorized operations: sum((x-mean(x))*(y-mean(y)))
The key revelation here is that the sum function is just taking the sum over the argument (vector, matrix, whatever). In this case, it's sufficient to give it a vector, and in this case, the vector expression is a little more complicated than sum(z), but notice that (x-mean(x))*(y-mean(y)) evaluates to z, so the fact that the command is slightly ornate doesn't really change how the function works. This is true in many places, not just the sum command.
I have a function f(v,u) and I defined function
solutionf(u) := fsolve(f(v,u)=v);
I need to plot solutionf(u) depending on u but just
plot(solutionf(u), u = 0 .. 0.4e-1)
gives me an error
Error, (in fsolve) number of equations, 1, does not match number of variables, 2
However I can always take the value solutionf(x) at any x.
Is there simple way to plot this? Or I have to make own for loop over u, take value at every point and plot interploating values?
This is one of the most-often-asked Maple questions. Your error is caused by what is known as premature evaluation, the expression solutionf(u) being evaluated before u has been given a numeric value.
There are several ways to avoid premature evaluation. The simplest is probably to use forward single quotes:
plot('solutionf(u)', u= 0..0.4e-1);
I often write R code where I test the length of a vector, the number of rows in a data frame, or the dimensions of a matrix, for example if (length(myVector) == 1). While poking around in some base R code, I noticed that in such comparisons values are explicitly stated as integers, usually using the 'L' suffix, for example if (nrow(data.frame) == 5L). Explicit integers are also sometimes used for function arguments, for example these statements from the cor function: x <- matrix(x, ncol = 1L) and apply(u, 2L, rank, na.last = "keep"). When should integers be explicitly specified in R? Are there any potentially negative consequences from not specifying integers?
You asked:
Are there any potentially negative consequences from not specifying
integers?
There are situations where it is likely to matter more. From Chambers Software for Data Analysis p193:
Integer values will be represented exactly as "double" numbers so long
as the absolute value of the integer is less than 2^m, the length of
the fractional part of the representation (2^54 for 32-bit machines).
It's not hard to see how if you calculated a value it might look like an integer but not quite be one:
> (seq(-.45,.45,.15)*100)[3]
[1] -15
> (seq(-.45,.45,.15)*100)[3] == -15L
[1] FALSE
However, it's harder to come up with an example of explicitly typing in an integer and having it come up not quite an integer in the floating point representation, until you get into the larger values Chambers describes.
Using 1L etc is programmatically safe, as in it is explicit as to what is meant, and does not rely on any conversions etc.
When writing code interactively, it can be easy to notice errors and fix along the way, however if you are writing a package (even base R), it will be safer to be explicit.
When you are considering equality, using floating point numbers will cause precision issues See this FAQ.
Explicitly specifying integers avoids this, as nrow and length, and the index arguments to apply return or require integers.