Suppose if I have a random time series that I want to interpolate over another time series. How would I do this in R?
# generate time interval series from exponential distribution
s = sort(rexp(10))
# scale between 0 and 1
scale01 = function(x){(x-min(x))/(max(x)-min(x))}
s = scale01(s)
> s
[1] 0.00000000 0.02804113 0.05715588 0.10630185 0.15778932 0.20391987 0.26066608 0.27265697 0.39100373
[10] 1.00000000
# generate random normal series
x = rnorm(20)
> x
[1] -0.82530658 0.92289557 0.39827984 -0.62416117 -1.69055539 -0.28164232 -1.32717654 -1.36992509
[9] -1.54352202 -1.09826247 -0.68260576 1.07307043 2.35298180 -0.41472811 0.38919315 -0.27325343
[17] -1.52592682 0.05400849 -0.43498544 0.73841106
# interpolate 'x' over 's' ?
> approx(x,xout=s)
$x
[1] 0.00000000 0.02804113 0.05715588 0.10630185 0.15778932 0.20391987 0.26066608 0.27265697 0.39100373
[10] 1.00000000
$y
[1] NA NA NA NA NA NA NA NA NA
[10] -0.8253066
>
I want to interpolate the series 'x' over the series 's'. Lets assume time interval series for the 'x' series has 20 elements distributed uniformly over the interval [0,1]. Now I want to interpolate those 10 elements from 'x' that occur at time intervals described by 's'.
EDIT:
I think this does the job.
> approx(seq(0,1,length.out=20), x, xout=s)
$x
[1] 0.00000000 0.02804113 0.05715588 0.10630185 0.15778932 0.20391987 0.26066608 0.27265697 0.39100373
[10] 1.00000000
$y
[1] -0.8253066 0.1061033 0.8777987 0.3781018 -0.6221134 -1.5566990 -0.3483466 -0.4703429 -1.4444105
[10] 0.7384111
Thanks for your help guys. I think I now understand how to use interpolation functions in R now. I should really use a time series data structure here.
This isn't meant as a direct answer to the OP's Q but rather to illustrate how approx() works so the OP can formulate a better Q
Your Q makes next to no sense. approx() works by taking a reference set of x, and y coordinates and then interpolating to find y at n locations over the range of x, or at the specified xout locations supplied by the user.
So in your call, you don't provide y and x doesn't contain a y component so I don't see how this can work.
If you want to interpolate s, so you can find time intervals for any value over range of s then:
> approx(s, seq_along(s), n = 20)
$x
[1] 0.00000000 0.05263158 0.10526316 0.15789474 0.21052632 0.26315789
[7] 0.31578947 0.36842105 0.42105263 0.47368421 0.52631579 0.57894737
[13] 0.63157895 0.68421053 0.73684211 0.78947368 0.84210526 0.89473684
[19] 0.94736842 1.00000000
$y
[1] 1.00000 26.25815 42.66323 54.79831 64.96162 76.99433 79.67388
[8] 83.78458 86.14656 89.86223 91.98513 93.36233 93.77353 94.19731
[15] 94.63652 95.26239 97.67724 98.74056 99.40548 100.00000
Here $y contains the interpolated values for s at n = 20 equally spaced locations on the range of s (0,1).
Edit: If x represents the series at unstated time intervals uniform on 0,1 and you want the interpolated values of x at the time intervals s, then you need something like this:
> set.seed(1)
> x <- rnorm(20)
> s <- sort(rexp(10))
> scale01 <- function(x) {
+ (x - min(x)) / (max(x) - min(x))
+ }
> s <- scale01(s)
>
> ## interpolate x at points s
> approx(seq(0, 1, length = length(x)), x, xout = s)
$x
[1] 0.00000000 0.04439851 0.11870795 0.14379236 0.20767388 0.21218632
[7] 0.25498856 0.29079300 0.40426335 1.00000000
$y
[1] -0.62645381 0.05692127 -0.21465011 0.94393053 0.39810806 0.29323742
[7] -0.64197207 -0.13373472 0.62763207 0.59390132
Is that closer to what you want?
Related
Let's say I have a vector of some numbers, which can be < 1, but never <= 0.
> x = abs(rnorm(30))
> x
[1] 0.32590946 0.05018667 1.54354863 0.28925652 0.61712682 0.09444528
[7] 0.87951971 1.46243702 0.87099892 1.28553745 0.70360649 0.58973942
[13] 1.20054389 0.94429737 0.64038139 1.04173338 0.24249771 1.67273503
[19] 0.77546385 0.33547348 1.73480609 0.20757933 1.94491872 1.10547259
[25] 1.28570768 1.37621399 0.99389595 2.14107987 2.31719369 1.24458788
And when I log the entire vector I get negative values:
> log(x)
[1] -1.121135658 -2.992005742 0.434084070 -1.240441366 -0.482680726
[6] -2.359734671 -0.128379302 0.380104238 -0.138114546 0.251176883
[11] -0.351536037 -0.528074505 0.182774695 -0.057314153 -0.445691353
[16] 0.040886038 -1.416763021 0.514460030 -0.254293908 -1.092212359
[21] 0.550895644 -1.572241722 0.665220187 0.100272928 0.251309290
[26] 0.319336240 -0.006122756 0.761310314 0.840356837 0.218804452
Now the min of this vector is:
> min(x)
[1] 0.05018667
My question is this. I want to scale the data by 10^x or by 2^x(depends on log) so the log performed on this scaled set will produce only positive (or non-negative) numbers. How can I get the lowest exponent that will make it so?
Maybe this fits your needs. The function scales the values such that the minimum value gets assigned to 1 and returns the exponent as well as the scaled vector as a list:
set.seed(123)
scale_log <- function(x, base = 2) {
y <- -log(min(x)) / log(base)
list(exp = y, scaled = base^y * x)
}
x <- abs(rnorm(5))
scale_log(x, 2)
#> $exp
#> [1] 3.826061
#>
#> $scaled
#> [1] 7.949063 3.264540 22.106706 1.000000 1.833650
log2(scale_log(x, 2)$scaled)
#> [1] 2.990785e+00 1.706880e+00 4.466412e+00 3.203427e-16 8.747185e-01
scale_log(x, 10)
#> $exp
#> [1] 1.151759
#>
#> $scaled
#> [1] 7.949063 3.264540 22.106706 1.000000 1.833650
log10(scale_log(x, 10)$scaled)
#> [1] 9.003159e-01 5.138220e-01 1.344524e+00 9.643275e-17 2.633165e-01
I have calculated the differences of my data points and received this vector:
> diff(smooth$a)/(diff(smooth$b))
[1] -0.0099976150 0.0011162606 0.0116275973 0.0247594149 0.0213592319 0.0205187495 0.0179274056 0.0207752713
[9] 0.0231903072 -0.0077549224 -0.0401528643 -0.0477294350 -0.0340842051 -0.0148157337 0.0003829642 0.0160912230
[17] 0.0311189830
Now I want to get the positions (index) where I have a change from negative to positive when the following 3 data points are also positive.
So my output would be like this:
> output
-0.0099976150 -0.0148157337
How could I do this?
One way like this:
series <- paste(ifelse(vec < 0, 0, 1), collapse = '')
vec[gregexpr('0111', series)[[1]]]
#[1] -0.009997615 -0.014815734
The first line creates a sequence of 0s and 1s depending on the sign of the number. In the second line of the code we capture the sequence with gregexpr. Finally, we use these indices to subset the original vector.
Imagine a vector z:
z <- seq(-2, 2, length.out = 20)
z
#> [1] -2.0000000 -1.7894737 -1.5789474 -1.3684211 -1.1578947 -0.9473684 -0.7368421 -0.5263158
#> [9] -0.3157895 -0.1052632 0.1052632 0.3157895 0.5263158 0.7368421 0.9473684 1.1578947
#> [17] 1.3684211 1.5789474 1.7894737 2.0000000
then you can do
turn_point <- which(z == max(z[z < 0]))
turn_plus_one <- c(turn_point, turn_point + 1)
z[turn_plus_one]
#> [1] -0.1052632 0.1052632
I have a vector file with 1000 values. All the values were generated using Random function between 0-1.
x <- runif(100,min=0,max=1)
x
[1] 0.84620011 0.82525410 0.31622827 0.08040362 0.12894525 0.23997187 0.57177296 0.91691368 0.65751720
[10] 0.39810175 0.60632205 0.26339035 0.93543618 0.09662383 0.35147739 0.51731042 0.29151612 0.54411769
[19] 0.73688309 0.26086586 0.37808273 0.19163366 0.62776847 0.70973345 0.31802726 0.69101574 0.50042561
[28] 0.20768256 0.23555818 0.21015820 0.18221151 0.85593725 0.12916935 0.52222127 0.62269135 0.51267707
[37] 0.60164023 0.30723904 0.81990231 0.61771762 0.02502631 0.47427724 0.21250040 0.88611710 0.88648546
[46] 0.92586513 0.57015942 0.33454379 0.03572245 0.68120369 0.48692522 0.76587764 0.55214917 0.31137200
[55] 0.47170307 0.48639510 0.68922858 0.73506033 0.23541740 0.81793240 0.17184666 0.06670039 0.55664270
[64] 0.10030533 0.94620061 0.58572228 0.53333567 0.80887841 0.55015406 0.82491114 0.81251132 0.06038019
[73] 0.10918904 0.84011824 0.33169617 0.03568364 0.07703029 0.15601158 0.31623253 0.25021777 0.77024833
[82] 0.88588620 0.49044305 0.10165930 0.55494697 0.17455070 0.94458467 0.43135868 0.99313733 0.04482747
[91] 0.53453604 0.52500493 0.35496966 0.06994880 0.11377845 0.71307042 0.35086237 0.04032254 0.23744845
[100] 0.81131033
Out of all these values in the vector, I need to find the most occurring value(Or close to that). I'm new to R and have no idea what this. Please help?
One approach I have - Divide all the values in a certain ranges and find the frequency distribution. But will it be helpful?
One possibility to analyze the distribution of the numbers could consist in plotting a histogram and adding an approximate probability density distribution.
This can be done with the ggplot2 library:
set.seed(123) # used here for reproducibility
x <- runif(100) # pseudo-random numbers between 0 and 1
library(ggplot2)
p <- ggplot(as.data.frame(x),aes(x=x, y=..density..)) +
geom_histogram(fill="lightblue",colour="grey60",bins=50) +
geom_density()
The value of bins specified in geom_histigram() is the number of bars in the histogram. You may want to try to change this value to obtain a different representation of the distribution.
OR
You could use base Rand plot a simple histogram:
hist(x)
There you can also change the bin width (see breaks), but the default might be sufficient to show the concept.
You can identify which bin in this histogram has the most entries with
> hist(x)$mids[which.max(hist(x)$counts)]
#[1] 0.45
Which in this case means that most values occur near a value of 0.45 (the middle of the bin describing the range between 0.4 and 0.5).
Hope this helps.
You can do this:
set.seed(12)
x <- runif(100,min=0,max=1)
n <- length(x)
x_cut<-cut(x, breaks = n/4)
which(table(x_cut)==max(table(x_cut)))
The result depends on the breaks value you set. This is an alternative to using a histogram if you don't need one.
To really get just the most occurrent value, or when using discrete data as input, you could simply create a table, sort the results and return the highest value:
values <- c("a", "a", "c", "c", "c")
names(sort(table(values), decreasing = TRUE)[1])
#> [1] "c"
Breaking it down:
# create a table of the values
table(values)
#> a c
#> 2 3
# sort the table descending on number of occurrences
sort(table(values), decreasing = TRUE)
#> c a
#> 3 2
# now only keep the first value
sort(table(values), decreasing = TRUE)[1]
#> c
#> 3
# so the final line:
names(sort(table(values), decreasing = TRUE)[1])
#> [1] "c"
If you're feeling like wanting to do fancy stuff, create a function that does this for you:
get_mode <- function(x) {
names(sort(table(values), decreasing = TRUE)[1])
}
get_mode(values)
#> [1] "c"
Given two points, how can I interpolate and generate 20 points in between those two points.
E.g., points:
x = c(2,8)
y = c(2,19)
I tried to fit a linear model and then use that to generate the points, but when the x value is the same, a linear line cannot be fit.
Possibly easier to run approx(x,y, n=20)
This is weird, because interpolating two points means...a straight line?
Anyway, here you go:
> x2<-seq(x[1],x[2],length.out=20)
> x2
[1] 2.000000 2.315789 2.631579 2.947368 3.263158 3.578947 3.894737 4.210526 4.526316 4.842105
[11] 5.157895 5.473684 5.789474 6.105263 6.421053 6.736842 7.052632 7.368421 7.684211 8.000000
> y2<-seq(y[1],y[2],length.out=20)
> y2
[1] 2.000000 2.894737 3.789474 4.684211 5.578947 6.473684 7.368421 8.263158 9.157895
[10] 10.052632 10.947368 11.842105 12.736842 13.631579 14.526316 15.421053 16.315789 17.210526
[19] 18.105263 19.000000
How about...
yfrom <- 8
yto <- 19
y <- seq(yfrom, yto, by = ((yto - yfrom)/(20 + 1)))
x <- rep(2, 22)
data.frame(x,y)
I wanted to use a function that would quickly give me a standard deviation of a vector ad allow me to include weights for elements in the vector. i.e.
sd(c(1,2,3)) #weights all equal 1
#[1] 1
sd(c(1,2,3,3,3)) #weights equal 1,1,3 respectively
#[1] 0.8944272
For weighted means I can use wt.mean() from library(SDMTools) e.g.
> mean(c(1,2,3))
[1] 2
> wt.mean(c(1,2,3),c(1,1,1))
[1] 2
>
> mean(c(1,2,3,3,3))
[1] 2.4
> wt.mean(c(1,2,3),c(1,1,3))
[1] 2.4
but the wt.sd function does not seem to provide what I thought I wanted:
> sd(c(1,2,3))
[1] 1
> wt.sd(c(1,2,3),c(1,1,1))
[1] 1
> sd(c(1,2,3,3,3))
[1] 0.8944272
> wt.sd(c(1,2,3),c(1,1,3))
[1] 1.069045
I am expecting a function that returns 0.8944272 from me weighted sd. Preferably I would be using this on a data.frame like:
data.frame(x=c(1,2,3),w=c(1,1,3))
library(Hmisc)
sqrt(wtd.var(1:3,c(1,1,3)))
#[1] 0.8944272
You can use rep to replicate the values according to their weights. Then, sd can be computed for the resulting vector.
x <- c(1, 2, 3) # values
w <- c(1, 1, 3) # weights
sd(rep(x, w))
[1] 0.8944272