Function to calculate shift between to time series based on data points - r

I am trying to find a function that matches two time series such that the datetime corresponds to reality.
So I need a function that minimizes the distance between the two curves shown above and outputs a new dataframe that has TAIR time-shifted towards the values of tre200h0.
From my bare eyes, it looks like this shift is about 22h.
ggplot
Best,
Fabio
I don't know a function that does this job for me.

Solved by Ric Villalba in the comments to OG Question.
Two R base functions to analyze time series lags are acf and pacf. i.e. given you have x and y you can use acf(y-x) and seek the zeroes in the plot (if your series have adequate seasonal behaviour), or, if you prefer, acf(y-x, plot=F) and get the data. Try which.min( acf(x-y)$acf^2 ).
Of course, it is a simplification of otherwise complex matter

Related

How to cleanly use interpolation between points to generate a mean in R

I am having issues trying to generate a code that will cleanly produce a mean (specifically a weighted average) based on a simple plot of points using interpolation.
For Example;
ex=c(1,2,3,4,5)
why=c(2,5,9,15,24)
This shows the kind of information I am working with.
plot(ex, why, type="o")
At this point, I want to actually have each point "binned" so the lines between them are straight. To do this, I have been adding points to the x values manually in excel as (x+0.01).
This is the new output:
why=c(2,2,5,5,9,9,15,15,24,24)
ex=c(1,2,2.01,3,3.01,4,4.01,5,5.01,6)
plot(ex, why, type="o")
So this is where my question comes in to play. I have to do this many times and do not want to generate a ton of new vectors and objects. To get a weighted average, I have been interpolating y values for increments of x at 0.01 using interpolation into a new object. I am then able to go into this new object and get a mean when a point falls between the actual ex values, i.e.
mean(newy[1:245])
Because I made new y values for 100 increments of x that (basically) follow a straight line, I am getting a weighted average here for x= 1 to 2.45.
Is there an easier and more elegant way to embed the interpolate code into the mean code so I could just say "average of interpolated y for nonreal x to nonreal x?"
It doesn't do exactly what you want, but you should consider the stepfun function -- this creates a step function out of two series.
plot(stepfun(ex[-1], why))
stepfun is handy because it gives you a function defined over that interval, so you can easily interpolate just by evaluating anywhere. The downside to it is that it is not strictly defined on the range given (which is why we have to cut off the first value in ex).
Based on your second plotting example, I think you are probably looking for this:
library(ggplot2)
qplot(ex, why, geom="step")
this gives:
Or if you want the line to go vertical first, you can use:
qplot(ex, why, geom="step", direction = "vh")
which gives:

Plotting Leibniz series

How to plot Leibniz series in R for above? Basically I am looking for R commands.
Let's see if I can cobble together an exact transliteration using Reduce which allows cumulative function applications to series. The :-operator is also quite handy for building the underlying series:
plot( pi/4 - Reduce( 'sum' ,
(-1)^(0:200)*(1/(1+2*(0:200))),
acc=TRUE) ) # preserves the intermediate values
This is definitely a homework assignment because I googled the same thing lol. I will help without giving away the answer because you'll learn better if you actually work through the assignment.
At this point, my class did not learn the Reduce function so as an alternative, what you can do is create a function that implements the series: 1 - 1/3 + 1/5 - 1/7 +......, for n iterations (n = 200).
Have the function return a list of values (this would be your y-axis values) and you can plot those for 0:200 (your x-axis values). Then plot a second line graph with y-axis as pi/4 minus the values returned by the function.

Difference between two density plots

Is there a simple way to plot the difference between two probability density functions?
I can plot the pdfs of my data sets (both are one-dimensional vectors with roughly 11000 values) on the same plot together to get an idea of the overlap/difference but it would be more useful to me if I could see a plot of the difference.
something along the lines of the following (though this obviously doesn't work):
> plot(density(data1)-density(data2))
I'm relatively new to R and have been unable to find what I'm looking for on any of the forums.
Thanks in advance
This should work:
plot(x =density(data1, from= range(c(data1, data2))[1],
to=range(c(data1, data2))[2] )$x,
y= density(data1, from= range(c(data1, data2))[1],
to=range(c(data1, data2))[2] )$y-
density(data2, from= range(c(data1, data2))[1],
to=range(c(data1, data2))[2] )$y )
The trick is to make sure the densities have the same limits. Then you can plot their differences at the same locations.My understanding of the need for the identical limits comes from having made the error of not taking that step in answering a similar question on Rhelp several years ago. Too bad I couldn't remember the right arguments.
It looks like you need to spend a little time learning how to use R (or any other language, for that matter). Help files are your friend.
From the output of ?density :
Value [i.e. the data returned by the function]
If give.Rkern is true, the number R(K), otherwise an object with class
"density" whose underlying structure is a list containing the
following components.
x the n coordinates of the points where the density is estimated.
y the estimated density values. These will be non-negative, but can
be zero [remainder of "value" deleted for brevity]
So, do:
foo<- density(data1)
bar<- density(data2)
plot(foo$y-bar$y)

Plotting fluctuation in R

I will try to be as less vague as possible. The below data set consists of a device's power measurement and I have to plot a graph which would show the average fluctuation of the power (watt) during the Time column. I have to accomplish this in R but i really don't know which function or how should I do it as i'm a newbie to R. Any help will be highly appreciated!
Store No.,Date,Time,Watt
33,2011/09/26,09:11:01,0.0599E+03
34,2011/09/26,09:11:02,0.0597E+03
35,2011/09/26,09:11:03,0.0598E+03
36,2011/09/26,09:11:04,0.0596E+03
37,2011/09/26,09:11:05,0.0593E+03
38,2011/09/26,09:11:06,0.0595E+03
39,2011/09/26,09:11:07,0.0595E+03
40,2011/09/26,09:11:08,0.0595E+03
41,2011/09/26,09:11:09,0.0591E+03
rollapply in package:zoo will return a moving average (or a moving any-function). You can plot using points and then add a moving average line:
dat$D.time <- as.POSIXct(paste(dat$Date, dat$Time))
require(zoo)
?rollapply
length(rollapply(dat$Watt,3, mean))
plot(dat$D.time, dat$Watt)
lines(dat$D.time[3:9], rollapply(dat$Watt,3, mean))

Lorentz curve plot

I need to get a plot of a Lorentz curve of a cumulative variable as a function of the number of observations. I want both axes to be displayed on a percentage basis (e.g. say observations are the number of buyers and the y variable is the amount they bought, buyers are already ranked in descending order, I want to get the plot that says "The top 10% buyers purchased 90% of the total bought"). My dataset is a couple million observations.
What is the best way to do this? Sub-questions:
If I need to add two variables for the quantiles of total observations and total $ bought (so as to use them to plot), what is the object that returns the row number? I tried:
user_quantile <- row(df)/nrow(df)
but I get a matrix of identical columns (user_quantile.1, user_quantile.2) of which I only need one column.
Is there instead any way to skip adding percentages as variables and only have them for axes values?
The plot has way to many points than I need to get the line. What is the best approach to minimize the computational effort and get a nice graph?
Thanks.
You may want to acquaint yourself with the excellent RSeek search engine for R content. One quick query for Lorentz curve (and Lorenz curve) lead to these packages:
ineq: Measuring inequality, concentration, and poverty
reldist: Relative Distribution Methods
GeoXp: Interactive exploratory spatial data analysis
lawstat: An R package for biostatistics, public policy and law
all of which seem to supply a Lorenz curve function.
In order to get the plot done you need first to arrange the raw data.
1) You can use the cut2() function from the Hmisc package to cut the data in quantiles. Check the documentation, it's not hard. It's similar to the cut() from the base package.
2) After using the cut2() function with the income data, you need to compute the frequency of each decile. Use table() for that. Then calculate percentages of income for each decile.
3) Now you should have a very small table with the following columns:
Decile, cumulative % of total income.
Add another column with the 45 degree line. Just add a constant cumulative % of income.
finaltable$cumulative_equality_line = seq(0.1, 1, by = 0.1)
4) You can use base graphics or ggplot2 for plotting. I guess you can do it with the info of step 3 or perhaps check out specific plotting questions.
I'll have to do it soon, but i already have the final table. I'll post the code for plotting once i do it.
Good luck!

Resources