I'm making a project connected with identifying dynamic of sales. That's how the piece of my database looks like http://imagizer.imageshack.us/a/img854/1958/zlco.jpg . There are three columns:
Product - present the group of product
Week - time since launch the product (week), first 26 weeks
Sales_gain - how the sales of product change by week
In the database there is 3302 observations = 127 time series
My aim is to cluster time series in groups which are going to show me different dynamic of sales. I used k-medoids algorithm (after transforming data with FFT/DWT) and I do not know how to present each cluster = grouped time series on different plots.
Can somebody tell me how should I do that?
Here is the code of clustering:
clustersalesGain = pam(t(salesGain), 8)
nazwy = as.character(nazwy)
cbind(nazwy,clustersalesGain$clustering)
I would like to present the output on different plots.
k-medoids returns actual data points as cluster centers.
Just visualize them the same way you visualize your data!
(And if you havn't been visualizing your data, you better work on that now.)
Related
I am a cross country runner on a high school team, and I am using my limited knowledge of R and linear algebra to create a ranking index for xc teams.
I get my data from milesplit.com, but I am unsure if I am formatting this data properly. So far I created matrices for each race, with odd columns including runner score and even columns including time, where each team has a team_score and team_time column. I want to analyze growth of teams in a time series, but I have two questions about this:
(1): can I combine all of these "race matrices" into a time series? Can I assign all the data in a race matrix a certain date, then make one big time series including all 25 race matrices I made?
(2): Am I closing myself off to insights by not including name and grade for each runner (as I only record time and score)? If so, how can I write a matrix that contains all this information?
I have a panel data set with return, ESG score and market value for a number of companies over 11 years. I need to extract data for all variables for one year at a time, to make yearly portfolios.
The data frame looks like this:
How can I extract one year at a time and then construct portfolios of high and low ESG score for each year?
Thanks in advance
Have you considered processing the data with Python and Pandas instead of R? The following solution should help to slice your data into different time intervals:
Slice JSON File into Different Time Intercepts with Python
In terms of sorting ESG scores, you can use the following command: df.sort_values('ESG')
Hope that helps and good luck with your dataset.
I have a table which contains a list of products scores by date:
From this table, have to make a plot of the porcentage of each quality by date.
I know how to do it in python but I´m having a hard time figuring out how to make it using power BI.
This is what I'm trying to make:
1) Get the percentages by class:
I'm python this is easily done by grouping by date and score and divede it be a group by date:
df_grouped = (df.groupby(["Date","Score"]).sum()/df.groupby(["Date"]).sum())*100
And then just make a plot of the percentage of each score by day
Like this:
How Can I get a similar result from powerbi?
Here is a google drive link to download a csv with the sample data: https://drive.google.com/file/d/1dEdUwwofv1OQ9rOGQMuyfYKO9_YJDTcl/view?usp=sharing
EDIT:
I'm getting this result from M D code:
Create a new measure and change the data type of the measure to a percentage in the data modelling tab. The measure is to have the following DAX formula:
Measure = CALCULATE(sum(Table1[Percentage_By_Class]),filter(Table1,Table1[Date]=max(Table1[Date])),ALLEXCEPT(Table1,Table1[Score]))/ CALCULATE(sum(Table1[Percentage_By_Class]),filter(all(Table1),Table1[Date]=max(Table1[Date])))
This will calculate the sum of a group (the score) by date and divide it by the total for all groups on the day. Then add it to a line chart with date as the axis, score as legend and the new measure as the values.
Review below images as well as the code.
Graph Created With Your Sample Data
I am working with a data frame containing diferent time series. I have 157 days or time series and I have done clustering with it. To do so, I have used the pam command.
Therefore, now I know which day corresponds to each cluster. What I want is to separate my data frame depending on their clusters. So, create a data frame just with the time series from the first cluster. I add below some pictures of my source.
That's the traindata just with two different days
Clustering using pam command
It will be great if anybody can help me out.
I think what you are looking for is split()
data = data.frame(days=rep(1:30,3),cluster=sample(c(1:3),90,replace = T))
days = split(data,data$cluster)
days[1] #Days that were assigned to the first cluster
I've spent a lot of time searching for a solution but not successfully. For that reason I decided to post my problem or question here hoping somebody of you can help me.
I want to find out which variables are influencing the travel distance of two animals (same species).
The response variable is distance moved (in meters). In total I have 66 tracking sessions for both animals.
The independent variables are: temperature, rainfall, offspring (yes = 1, no = 0), observation period (in minutes) and activity.
I looked at the animals (one day - one animal) every 15 minutes and noted the state of activity (active = 1 or inactive = 0). For that reason my data table consists around 1800 points and the same amount of activity records.
Then I created a table with following columns:
Animal, Tracking-Session, rainfall, offspring, observation period, active, inactive, distance
The two columns active and inactive contain the sum of active (inactive) records per tracking session.
For example in tracking-session 1 the animal A was 30 times active and 11 inactive and moved 6000 meters during that tracking session.
I thought I could do my analysis with this table using the command cbind() to make one column for activity out of the two columns with "inactive" and "active". But this does not work, I get:
Error in lme4::lFormula(formula = distance~ (1 | animal) + activity + offspring + ...
rank of X = 12 < ncol(X) = 13
I want to include the second animal as a random factor to get an output valid for the whole "population" (which only consits of two animals in that case).
How can I fit a linear mixed model to this data or the first question is: how my data table has to look like to do such analysis?
I started running a linear mixed model with my original data table consisting of 1800 rows but the outcome was not convincing. And I don't know if this table was built up correctly for this task. Because I have only 60 tracking sessions and for that reason only 60 resulting travel distances, but 1800 records of activity (each 15 minutes - active or inactive). I don't know how to handle this situation the only possibility for me to overcome this problem was to copy the travel distace (which is the result of all points watched per day) and assign it to each single point of that tracking session.
The same is for rainfall and temperature because these conditions were only measured once a day I had to copy the value for each single point taken on the same day.
Is this correct or better can R handle such tables (like in the picture)? Or is it better to create a table with one row for each day (as I describe above)?
If the the second table (the one with one row per tracking session) is the better choice, how has it be transformed that R can use it?
Hopefully you can follow my explanations (I tried to explain it as detailed as possible) and anyone can help me!
Thanks in advance!
Iris