I will try to be as less vague as possible. The below data set consists of a device's power measurement and I have to plot a graph which would show the average fluctuation of the power (watt) during the Time column. I have to accomplish this in R but i really don't know which function or how should I do it as i'm a newbie to R. Any help will be highly appreciated!
Store No.,Date,Time,Watt
33,2011/09/26,09:11:01,0.0599E+03
34,2011/09/26,09:11:02,0.0597E+03
35,2011/09/26,09:11:03,0.0598E+03
36,2011/09/26,09:11:04,0.0596E+03
37,2011/09/26,09:11:05,0.0593E+03
38,2011/09/26,09:11:06,0.0595E+03
39,2011/09/26,09:11:07,0.0595E+03
40,2011/09/26,09:11:08,0.0595E+03
41,2011/09/26,09:11:09,0.0591E+03
rollapply in package:zoo will return a moving average (or a moving any-function). You can plot using points and then add a moving average line:
dat$D.time <- as.POSIXct(paste(dat$Date, dat$Time))
require(zoo)
?rollapply
length(rollapply(dat$Watt,3, mean))
plot(dat$D.time, dat$Watt)
lines(dat$D.time[3:9], rollapply(dat$Watt,3, mean))
Related
I am trying to find a function that matches two time series such that the datetime corresponds to reality.
So I need a function that minimizes the distance between the two curves shown above and outputs a new dataframe that has TAIR time-shifted towards the values of tre200h0.
From my bare eyes, it looks like this shift is about 22h.
ggplot
Best,
Fabio
I don't know a function that does this job for me.
Solved by Ric Villalba in the comments to OG Question.
Two R base functions to analyze time series lags are acf and pacf. i.e. given you have x and y you can use acf(y-x) and seek the zeroes in the plot (if your series have adequate seasonal behaviour), or, if you prefer, acf(y-x, plot=F) and get the data. Try which.min( acf(x-y)$acf^2 ).
Of course, it is a simplification of otherwise complex matter
I'm trying to find out the peak frequencies hidden in my data using the fft() method in R. While preparing the data, a more experienced user recommends to create a "mask" (more after explaining the details), that does give me the exact diagram I'm looking for. The problem is, I don't understand what it does or why it's needed.
To give some context, I'm working with .txt files with around 12000 entries each. It's voltage vs. time information, and the expected result is just a sinusoidal wave with a clear peak frequency that should be close to 1-2 Hz. This is an example of what one of those files look like:
I've been trying to use the Fast Fourier Transform method fft() implemented in R to find the peak frequencies and get a diagram that reflected them clearly. At first, I calculate some things that I understand are going to be useful, like the Nyquist frequency and the range of frequencies I'll show in the final graph:
n = length(variable)
dt = time[5]-time[4]
df = 1/(max(time)) #Find out the "unit" frequency
fnyquist = 1/(2*dt) #The Nyquist frequency
f = seq(-fnyquist, fnyquist-df, by=df) #These are the frequencies I'll plot
But when I plot the absolute value of what fft(data) calculates vs. the range of frequencies, I get this:
The peak frequency seems to be close to 50 Hz, but I know that's not the case. It should be close to 1 Hz. I'm a complete newbie in R and in Fourier analysis, so after researching a little, I found in a Swiss page that this can be solved by creating a "mask", which is actually just a vector with a repeatting patern (1, -1, 1, -1...) with the same length as my data vector itself:
mask=rep(c(1, -1),length.out=n)
Then if I multiply my data vector by this mask and plot the results:
results = mask*data
plot(f,abs(fft(results)),type="h")
I get what I was looking for. (This is the graph after limiting the x-axis to a reasonable scale).
So, what's the mask actually doing? I undestand it's changing my data point signs in an alternate manner, but I don't get why it would take the infered peak frequencies from ~50 Hz to the correct result of ~1 Hz.
Thanks in advance!
Your "mask" is one of two methods of performing an fftshift, which is commonly done to center the 0 Hz output of an FFT in the middle of a graph or plot (instead of at the left edge, with the negative frequencies wrapping around to the right edge).
To perform an fftshift, you can hetrodyne or modulate your data (by Fs/2) before the FFT, or simply do a circular shift by 50% after the FFT. Both produce the same result. They are the same due to the shift property of the DFT.
Lets say i have those coordinates..
Long.GPS Lat.GPS Long.TEST Lat.TEST
22.355951 44.699745 24.00092 44.37806
22.355951 44.699745 24.08816 44.36839
22.355951 44.699745 23.73256 44.42112
22.355951 44.699745 22.35929 44.6953
Now, what i want is to know how to calculate the magnitude of longitudinal and latitudinal error by the following: Long.error = |Long.GPS - Long.Test|
and the same for latitude.
Then, i would like to know how to calculate the magnitude of total horizontal error as the Euclidean distance from the true location(GPS) to tested location(TEST) by the following: Total.error = sqrt(Long.error^2 + Lat.error^2)
After all this, i would like to plot them using ggplot2 geom_point, but i think that i can handle this on my own.
I am quite new into this field and i need help in order to finish my project.
Thanks in advance !
Is there a simple way to plot the difference between two probability density functions?
I can plot the pdfs of my data sets (both are one-dimensional vectors with roughly 11000 values) on the same plot together to get an idea of the overlap/difference but it would be more useful to me if I could see a plot of the difference.
something along the lines of the following (though this obviously doesn't work):
> plot(density(data1)-density(data2))
I'm relatively new to R and have been unable to find what I'm looking for on any of the forums.
Thanks in advance
This should work:
plot(x =density(data1, from= range(c(data1, data2))[1],
to=range(c(data1, data2))[2] )$x,
y= density(data1, from= range(c(data1, data2))[1],
to=range(c(data1, data2))[2] )$y-
density(data2, from= range(c(data1, data2))[1],
to=range(c(data1, data2))[2] )$y )
The trick is to make sure the densities have the same limits. Then you can plot their differences at the same locations.My understanding of the need for the identical limits comes from having made the error of not taking that step in answering a similar question on Rhelp several years ago. Too bad I couldn't remember the right arguments.
It looks like you need to spend a little time learning how to use R (or any other language, for that matter). Help files are your friend.
From the output of ?density :
Value [i.e. the data returned by the function]
If give.Rkern is true, the number R(K), otherwise an object with class
"density" whose underlying structure is a list containing the
following components.
x the n coordinates of the points where the density is estimated.
y the estimated density values. These will be non-negative, but can
be zero [remainder of "value" deleted for brevity]
So, do:
foo<- density(data1)
bar<- density(data2)
plot(foo$y-bar$y)
I need to get a plot of a Lorentz curve of a cumulative variable as a function of the number of observations. I want both axes to be displayed on a percentage basis (e.g. say observations are the number of buyers and the y variable is the amount they bought, buyers are already ranked in descending order, I want to get the plot that says "The top 10% buyers purchased 90% of the total bought"). My dataset is a couple million observations.
What is the best way to do this? Sub-questions:
If I need to add two variables for the quantiles of total observations and total $ bought (so as to use them to plot), what is the object that returns the row number? I tried:
user_quantile <- row(df)/nrow(df)
but I get a matrix of identical columns (user_quantile.1, user_quantile.2) of which I only need one column.
Is there instead any way to skip adding percentages as variables and only have them for axes values?
The plot has way to many points than I need to get the line. What is the best approach to minimize the computational effort and get a nice graph?
Thanks.
You may want to acquaint yourself with the excellent RSeek search engine for R content. One quick query for Lorentz curve (and Lorenz curve) lead to these packages:
ineq: Measuring inequality, concentration, and poverty
reldist: Relative Distribution Methods
GeoXp: Interactive exploratory spatial data analysis
lawstat: An R package for biostatistics, public policy and law
all of which seem to supply a Lorenz curve function.
In order to get the plot done you need first to arrange the raw data.
1) You can use the cut2() function from the Hmisc package to cut the data in quantiles. Check the documentation, it's not hard. It's similar to the cut() from the base package.
2) After using the cut2() function with the income data, you need to compute the frequency of each decile. Use table() for that. Then calculate percentages of income for each decile.
3) Now you should have a very small table with the following columns:
Decile, cumulative % of total income.
Add another column with the 45 degree line. Just add a constant cumulative % of income.
finaltable$cumulative_equality_line = seq(0.1, 1, by = 0.1)
4) You can use base graphics or ggplot2 for plotting. I guess you can do it with the info of step 3 or perhaps check out specific plotting questions.
I'll have to do it soon, but i already have the final table. I'll post the code for plotting once i do it.
Good luck!