I am working with a data frame containing diferent time series. I have 157 days or time series and I have done clustering with it. To do so, I have used the pam command.
Therefore, now I know which day corresponds to each cluster. What I want is to separate my data frame depending on their clusters. So, create a data frame just with the time series from the first cluster. I add below some pictures of my source.
That's the traindata just with two different days
Clustering using pam command
It will be great if anybody can help me out.
I think what you are looking for is split()
data = data.frame(days=rep(1:30,3),cluster=sample(c(1:3),90,replace = T))
days = split(data,data$cluster)
days[1] #Days that were assigned to the first cluster
Related
I am a cross country runner on a high school team, and I am using my limited knowledge of R and linear algebra to create a ranking index for xc teams.
I get my data from milesplit.com, but I am unsure if I am formatting this data properly. So far I created matrices for each race, with odd columns including runner score and even columns including time, where each team has a team_score and team_time column. I want to analyze growth of teams in a time series, but I have two questions about this:
(1): can I combine all of these "race matrices" into a time series? Can I assign all the data in a race matrix a certain date, then make one big time series including all 25 race matrices I made?
(2): Am I closing myself off to insights by not including name and grade for each runner (as I only record time and score)? If so, how can I write a matrix that contains all this information?
I have two data sets, one of which shows seasonality while the other shows a trend.
I have removed seasonality from the first data set but I am not able to remove trend from the other data set.
Also, if I remove trend from the other data set and then try to make a data frame of both the altered data sets, then the number of rows will be different for both the data sets (because I have removed seasonality from the first data set using lag, so there is a difference of 52 values in the two data sets).
How do I go about it?
For de-trending a time series, you have several options, but the most commonly used one is HP filter from the "mFilter" package:
a <- hpfilter(x,freq=270400,type="lambda",drift=FALSE)
The frequency is for the weekly nature of the data, and drift=FALSE sets no intercept. The function calculates the cyclical and trend components and gives them to you separately.
If the time indices for both your series are the same (i.e weekly), you could use the following, where x and y are your dataframes:
final <- merge(x,y,by=index(a),all=FALSE)
You can always set all.x=TRUE (all.y=TRUE) to see which rows of x (y) have no matching output in y (x). Look at the documentation for merge here.
Hope this helps.
I'm making a project connected with identifying dynamic of sales. That's how the piece of my database looks like http://imagizer.imageshack.us/a/img854/1958/zlco.jpg . There are three columns:
Product - present the group of product
Week - time since launch the product (week), first 26 weeks
Sales_gain - how the sales of product change by week
In the database there is 3302 observations = 127 time series
My aim is to cluster time series in groups which are going to show me different dynamic of sales. I used k-medoids algorithm (after transforming data with FFT/DWT) and I do not know how to present each cluster = grouped time series on different plots.
Can somebody tell me how should I do that?
Here is the code of clustering:
clustersalesGain = pam(t(salesGain), 8)
nazwy = as.character(nazwy)
cbind(nazwy,clustersalesGain$clustering)
I would like to present the output on different plots.
k-medoids returns actual data points as cluster centers.
Just visualize them the same way you visualize your data!
(And if you havn't been visualizing your data, you better work on that now.)
I am working with NDVI3g data sets. My problem is that i am trying to create monthly composite data sets from the bi-monthly original data sets using maximum value composite method in R. Please i need your help, because i tried my possible best, but couldn't figure it out. The problem with data is that the first composite in a month is named as for example below;
AF99sep15a.n14-VI3g: first 15 days
AF99sep15b.n14-VI3g : Last 15 days;
I have 31 years data sets (i.e 1982-2012).
Kindly need your help on how to combine the whole data sets into a monthly composite.
given RasterStack gimms and that you want to average sequential pairs, I think you can do
i <- rep(1:(nlayers(gimms)/2), each =2)
x <- stackApply(gimms, i, mean)
Make sure to also check out the gimms package which includes the function monthlyComposite (including optional parallel support) to create monthly maximum value composites from the initial half-monthly layers. Needless to say, the function is heavily based on stackApply from the raster package.
I've spent a lot of time searching for a solution but not successfully. For that reason I decided to post my problem or question here hoping somebody of you can help me.
I want to find out which variables are influencing the travel distance of two animals (same species).
The response variable is distance moved (in meters). In total I have 66 tracking sessions for both animals.
The independent variables are: temperature, rainfall, offspring (yes = 1, no = 0), observation period (in minutes) and activity.
I looked at the animals (one day - one animal) every 15 minutes and noted the state of activity (active = 1 or inactive = 0). For that reason my data table consists around 1800 points and the same amount of activity records.
Then I created a table with following columns:
Animal, Tracking-Session, rainfall, offspring, observation period, active, inactive, distance
The two columns active and inactive contain the sum of active (inactive) records per tracking session.
For example in tracking-session 1 the animal A was 30 times active and 11 inactive and moved 6000 meters during that tracking session.
I thought I could do my analysis with this table using the command cbind() to make one column for activity out of the two columns with "inactive" and "active". But this does not work, I get:
Error in lme4::lFormula(formula = distance~ (1 | animal) + activity + offspring + ...
rank of X = 12 < ncol(X) = 13
I want to include the second animal as a random factor to get an output valid for the whole "population" (which only consits of two animals in that case).
How can I fit a linear mixed model to this data or the first question is: how my data table has to look like to do such analysis?
I started running a linear mixed model with my original data table consisting of 1800 rows but the outcome was not convincing. And I don't know if this table was built up correctly for this task. Because I have only 60 tracking sessions and for that reason only 60 resulting travel distances, but 1800 records of activity (each 15 minutes - active or inactive). I don't know how to handle this situation the only possibility for me to overcome this problem was to copy the travel distace (which is the result of all points watched per day) and assign it to each single point of that tracking session.
The same is for rainfall and temperature because these conditions were only measured once a day I had to copy the value for each single point taken on the same day.
Is this correct or better can R handle such tables (like in the picture)? Or is it better to create a table with one row for each day (as I describe above)?
If the the second table (the one with one row per tracking session) is the better choice, how has it be transformed that R can use it?
Hopefully you can follow my explanations (I tried to explain it as detailed as possible) and anyone can help me!
Thanks in advance!
Iris