I have a csv, with the columns representing a set of measurements taken over a period of time (in this case, the opening area of a larynx during a breath).
However, the times series (may) different numbers of measurements. eg:
23,34,44
25,35,39
23,33,,
23,,,
Using ts.plot(data) I've been able to plot these on the same graph. However, I need each series to be "stretched" to the same length. (Such that each column in the CSV represents the same distance on the x-axis, but with varying resolution) How might this be best achieved?
Additionally, I had been using lines(rowMeans(data, na.rm = TRUE)) to produce an average, which I also need to do with the "stretched" series.
I had been considering performing the interpolation (up to some arbitrary resolution such at 1000) in Ruby, and then producing a new CSV file to run the original R code against. I would expect there, however, to be a more elegant solution in R.
Maybe you just need approx? E.g., approx(some.series, n=length.max.series). This function offers constant or linear interpolation.
Related
I am working with frame by frame video data in tabular format. When an event happens, it is only coded for that single frame. I would like to order events in pseudotime and there is no distinct periodicity to them. Regular one-hot encoding does not encode the 'eventness' of timestamps within close proximity to the frame labeled "hey stuff happens here". I imagine that this could be modeled as a sinusoidal function, convolved with the the initial array that represents a column. This way, even when frames are not ordered, I can still see the 'eventness' of each one.
I was thinking something along these lines:
x <- c(...0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0...)
y <- c(...0,1,2,3,4,5,6,7,8,9,10,9,8,7,6,5,4,3,2,1,0...)
(convolve(x, y))
I am struggling with designing a for loop that might do this for each column of dummy variables.
Also, I am using "bit depth" as an analogy, since this is like increasing the bit depth of an audio sample. Basically, I want to trade in my NES for a Sega Genesis.
Thanks!
To the bevy of people on the edge of their seats, I ended up using a moving average to solve this problem.
I am using gnuplot and the function fitting facilities to perform least squares fitting to some of my data.
I have many data points (sometimes tens of millions) and hence fitting to all data points is impossible. (Or at least too slow to be practical.)
It is possible to plot data points with the keyword every (EDIT: Should be pointinterval not every!) followed by an integer, N, to plot only every other Nth point.
eg plot 'data.csv' using 1:2 pointinterval 1000 plots every thousandth data point. Useful for when plotting 10's of millions of points - you can't see anything useful otherwise.
Is there a similar way of doing this with fitting, ie, fit only every 1000'th point?
I tried fit 'data.csv' f(x) using 1:2 pointinterval 1000 via a,b where a and b are parameters of my f(x) - but I just get an error: ';' expected.
I also tried googling this and reading the documentation for gnuplot plotting but didn't find anything.
Alternatively, I could change my program code to only write every 1000th point to datafile, but then I will have to have 2 lots of datafiles - one with all the points and one with 1 in every 1000 data points... which seems kind of wasteful.
Edit: I am not sure why I thought every was the correct syntax for this. Turns out it should be pointinterval (pi short) followed by an integer.
However, this only works for plotting, not function fitting, so the question is still open.
Note for future: use every syntax
thank you kindly for your time.
I'm merely trying to plot a simple time series data set, but am running into a number of basic issues (one of which I'll ask here). For example, I have a notepad file that starts with:
"x"
"1",2.731
"2",2.562
"3",2.632
"4",2.495
"5",1.978
...and so on...
So R reads it just fine, e.g. myfile=read.table("F:/Documents/myfile.txt",sep=""). However, the values seem to change under a conversion using R's ts function, i.e.
myfile = ts(myfile,start=1,end=120,frequency=1)
plot(myfile, type="o",pch=22,lty=1,pty=2,xlab="Month",ylab="Values",main="My File")
So when plotted, the first value starts at 20+ for some reason, as opposed to 2+. Furthermore, R assumes that the y-axis goes from 1 to 120 (mirroring the x-axis), which is not the right scale (i.e. 0 through 10). In another data set that I did (using integers), it was shifted upward by 1. In any event, I believe the issue is probably about how to properly identifying the y-axis.
Any ideas on how to tackle this? Thanks!
I'm using TraMineR to analyze sets of sequences. Each coherent set of sequences may contain 100 work processes from a single project for a single period of time. Using TraMineR I can easily calculate descriptive statistics for each sequence, however I'm more interested in descriptive statistics of the sequence object itself - subsuming all the smaller sequences within.
For example, to get state frequencies, I run:
seqstatd(sequences.sts)
However, this gives me the state frequencies for each sequence within my sequence object. I want to access the frequencies of states across all sequences inside of my sequence object. How can I accomplish this?
I am not sure to understand your question since seqstatd() returns the cross-sectional frequencies at each successive position, and NOT the state frequencies for each sequence. The latter is returned by seqistatd().
Assuming you refer to the outcome of seqistatd() you would get the mean time spent in each state with seqmeant(sequence.sts).
For other summaries you can use the apply function. For instance, you get the variance of the time spent in each state with
tab <- seqistatd(mvad.seq)
vart <- apply(tab,2,var)
head(vart)
Hope this helps.
I want to analyse angles in movement of animals. I have tracking data that has 10 recordings per second. The data per recording consists of the position (x,y) of the animal, the angle and distance relative to the previous recording and furthermore includes speed and acceleration.
I want to analyse the speed an animal has while making a particular angle, however since the temporal resolution of my data is so high, each turn consists of a number of minute angles.
I figured there are two possible ways to work around this problem for both of which I do not know how to achieve such a thing in R and help would be greatly appreciated.
The first: Reducing my temporal resolution by a certain factor. However, this brings the disadvantage of losing possibly important parts of the data. Despite this, how would I be able to automatically subsample for example every 3rd or 10th recording of my data set?
The second: By converting straight movement into so called 'flights'; rule based aggregation of steps in approximately the same direction, separated by acute turns (see the figure). A flight between two points ends when the perpendicular distance from the main direction of that flight is larger than x, a value that can be arbitrarily set. Does anyone have any idea how to do that with the xy coordinate positional data that I have?
It sounds like there are three potential things you might want help with: the algorithm, the math, or R syntax.
The algorithm you need may depend on the specifics of your data. For example, how much data do you have? What format is it in? Is it in 2D or 3D? One possibility is to iterate through your data set. With each new point, you need to check all the previous points to see if they fall within your desired column. If the data set is large, however, this might be really slow. Worst case scenario, all the data points are in a single flight segment, meaning you would check the first point the same number of times as you have data points, the second point one less, etc. The means n + (n-1) + (n-2) + ... + 1 = n(n-1)/2 operations. That's O(n^2); the operating time could have quadratic growth with respect to the size of your data set. Hence, you may need something more sophisticated.
The math to check whether a point is within your desired column of x is pretty straightforward, although maybe more sophisticated math could help inform a better algorithm. One approach would be to use vector arithmetic. To take an example, suppose you have points A, B, and C. Your goal is to see if B falls in a column of width x around the vector from A to C. To do this, find the vector v orthogonal to C, then look at whether the magnitude of the scalar projection of the vector from A to B onto v is less than x. There is lots of literature available for help with this sort of thing, here is one example.
I think this is where I might start (with a boolean function for an individual point), since it seems like an R function to determine this would be convenient. Then another function that takes a set of points and calculates the vector v and calls the first function for each point in the set. Then run some data and see how long it takes.
I'm afraid I won't be of much help with R syntax, although it is on my list of things I'd like to learn. I checked out the manual for R last night and it had plenty of useful examples. I believe this is very doable, even for an R novice like myself. It might be kind of slow if you have a big data set. However, with something that works, it might also be easier to acquire help from people with more knowledge and experience to optimize it.
Two quick clarifying points in case they are helpful:
The above suggestion is just to start with the data for a single animal, so when I talk about growth of data I'm talking about the average data sample size for a single animal. If that is slow, you'll probably need to fix that first. Then you'll need to potentially analyze/optimize an algorithm for processing multiple animals afterwards.
I'm implicitly assuming that the definition of flight segment is the largest subset of contiguous data points where no "sub" flight segment violates the column rule. That is to say, I think I could come up with an example where a set of points satisfies your rule of falling within a column of width x around the vector to the last point, but if you looked at the column of width x around the vector to the second to last point, one point wouldn't meet the criteria anymore. Depending on how you define the flight segment then (e.g. if you want it to be the largest possible set of points that meet your condition and don't care about what happens inside), you may need something different (e.g. work backwards instead of forwards).