I am using the set.seed and kmeans function. Although I use set.seed my cluster centers keep changing but my data isn't. And, it only changes from day to day, not daily. So, within the same day there aren't any changes, but the next day my clusters will change. I'm assuming the set.seed function is causing this. If so, does anyone know how to set randomness within kmeans or similar function? Can someone give me some insight. Sample code below:
set.seed(1234)
ITsegment2 <- kmeans(iTeller_z, 4)
There is probably something more clever but here an easy solution :
set.seed(as.numeric(Sys.Date()))
Sys.Date() returns the today date, as.numeric transform's it into a number...So the number will change every day .
Cheers
Related
I received readings of multiple multiple filters for a time series x. Those filters are differences of EMAs and EMAs on EMAs. My filters have been calculated as
F1 = EMA(x, period.short) - EMA(x, period.long) and
F2 = ZMA(x, period.short) - ZMA(x, period.long).
ZMA is a delagged EMA which is calculated as EMA(EMA(x, period), period). Unfortunately the original values of the time series x have been lost. Thus, I wonder whether it is possible to somehow reconstruct the original time series. I suppose that an exact reconstruction won't be possible, however to get an glimpse of an idea I'd be already happy to get one possible time series history for the filter values of identical time points.
Therefor I wonder how such a reconstruction could be done in an efficient way and how it could be implemented (if possible in R). Maybe there are already some packages available for such a purpose?!
Thus, I'd be happy if you'd have any advice. Many thanks in advance!
PREFACE This is a question about using linear modelling to understand an electricity generation system but you actually don't need to know very much of either to understand this. I'm pretty sure this is a question about R.
I am building a linear model to optimise the dispatch, hourly, of electric generators in a country (called "Lebanon" but actually it's a little fictitious in terms of the data I am using). I have a model which optimises the hourly generation satisfactorily, the code looks like the below:
lp.newobjfun.norelax <- lpSolve::lp(dir = "min", objfun.lebanon.postwalk1, constraintmatrix.lebanon.postwalk.allgenerators, directions.lebanon.postwalk3, rhs.lebanon.postwalk4)
The above works fine. Of course though, doing it per day is a bit useless, so instead I want to be able to run it iteratively every day for a year. The below code is supposed to do that, but instead the returned values (the objective function's value) is always 0. Any ideas what I am doing wrong?
for(i in 1:365)
{
rhs.lebanon.postwalk4[1:24] = as.numeric(supplylebanon2010wholeyear[i,])
lp.newobjfun.norelax <- lpSolve::lp(dir = "min", objfun.lebanon.postwalk1, constraintmatrix.lebanon.postwalk.allgenerators, directions.lebanon.postwalk3, rhs.lebanon.postwalk4)
print(lp.newobjfun.norelax$solution);
}
Just to be clear, in the second version, the right hand side of the first 24 constraints are modified to relfect how the hourly supply of electricity changes each day of the year.
Thanks in advance!
Okay, nevermind I've figured this out it's that there's a unit conversion from kWh to MWh which I hadn't taken care of.
Sorry for any bother!
I would like to calculate the differences in time from POSIXct format. I am able to calculate the differences between consecutive points using
diff(data$time)
but not from all against all. So I guess my data is at least correctly imported.
I actually want to calculate all distances between points of one individual, so my data looks like: Posix, individual, otherinfo. If there is a simple way i would love to calculate automaticly the differences from all points per indiviual. If its not so straight forward I will do data subsets per individual thats fine.
I would be happy if someone could help me! I tried
dist(data$time)
because I know its a distance matrix calculation tool but unfortunalety it just gives me a list of rising numers (1,2,3,...) so i guess it is not familiar with the time format..
Thanks a lot!
We can use sapply
sapply(data$time, `-`, data$time)
or with outer
outer(data$time, data$time, FUN = `-`)
I'm making some calculations based on quantmod for some stocks.
Below I prepared a very simple example to reflect what I'd like to do which is at this time select some cells based on date, for example yesterday.
library(quantmod)
getSymbols("BAC", from="2018-06-18", src="yahoo")
As a result I get the following:
Now I'd like to make some calculations with the volume of yesterday so I wonder if something like this could work:
# I would like to multiply yesterday's volume for 1.05.
Vol_k <- (BAC$BAC.Volume Sys.Date()-1) * 1.05
How do I use sys.date here to indicate today -1 and select the volume cell of yesterday date?
Thank you very much for any comment.
V.
When I pulled this data, I don't get anything for yesterday being the 4th July (timezone I assume, or maybe due to public holiday in US?) so I did it for 2 days ago.
BAC[Sys.Date() - 2, "BAC.Volume"]
should give you the desired result of the volume. Did a bit of research (https://s3.amazonaws.com/assets.datacamp.com/blog_assets/xts_Cheat_Sheet_R.pdf)
last(BAC, '1 day')$BAC.Volume
should give you the last day, regardless of weekends/ holidays
You can always get the last value by accessing the index.i.e.
xts.object[max(index(xts.object),column]
In your case:
BAC[max(index(BAC)),"BAC.Volume"]
I have a problem going out of basic programming towards more sophisticated. Could you help me to adjust this code?
There are two vectors with dates and times, one is when activities happens, and another one - when triggers appear. The aim is to find nearest activities date/time to each of triggers, after each trigger happen. Final result is average of all differences.
I have this code. It works. But it's very slow when working with large dataset.
time_activities<- as.POSIXct(c("2008-09-14 22:15:14","2008-09-15 09:05:14","2008-09-16 14:05:14","2008-09-17 12:05:14"), , "%Y-%m-%d %H:%M:%S")
time_triggers<- as.POSIXct(c("2008-09-15 06:05:14","2008-09-17 12:05:13"), , "%Y-%m-%d %H:%M:%S")
for (j in 1:length(time_triggers))
{
for(i in 1:length(time_activities))
{
if(time_triggers[j]<time_activities[i])
{
result[j] = ceiling(difftime(time_activities[i], time_triggers[j], units = 'mins'))
break
}
}
}
print(mean(as.numeric(result)))
Can I somehow get rid of the loop, and do everything with vectors? Maybe you can give me some hint of which function I could use to compare dates at once?
delay=sapply(time_triggers,function(x) max(subset(difftime(x,time_activities,units='mins'),difftime(x,time_activities,units='mins')<0)))
mean(delay[is.finite(delay)])
This should do the trick. As always, the apply family of functions is a good replacement for a for loop.
This gives the average number of minutes that an activity occurred after a trigger.
If you want to see what the activity delay was after each trigger (rather than just the mean of all the triggers), you can just remove the mean() at the beginning. The values will then correspond to each value in time_triggers.
UPDATE:
I updated the code to ignore Inf values as requested. Sadly, this means the code should be 2 lines rather than 1. If you really want, you can make this all one line, but then you will be doing the majority of the computation twice (not very efficient).