How to represent the efficency of a function in a plot - r

In school, we are learning to use R and we had to find an algorithm to calculate the order of a permutation in different ways. So I came up with 4 different algorithms that can be compared. But now, I'd like to be able to display the time that each the function works depending on the size of the data that we give.
So first, I wanted to display the time given for at least one of the functions (I called it calculOrdrePermutation) without changing the size of the data.
So that's what I did :
createProcessTest <- function(func, variables, numberOfTests) {
outputProgress <- T
ptm <- proc.time()
times <- c()
for(i in 1:numberOfTests) {
func(variables)
times <- append(times, (proc.time() - ptm)[3])
if(outputProgress & i %% 5 == 0) {
print(paste((i/numberOfTests) * 100, "%"))
}
}
return(times)
}
sampleSize <- 100
nbOperations <- 100
extrait <- sample(1:sampleSize, sampleSize)
matriceDePermutation <- trouverMatriceDePermutation(extrait)
tempsRapideMatrice = createProcessTest(calculOrdrePermutation, matriceDePermutation, nbOperations)
plot(y=tempsRapideMatrice, x=1:nbOperations, cex=0.1, type="l", main="Using matrix", sub="sans boucle", ylab="Time (s)", xlab="Number of iterations")
It looks approximatively like this
So that's not that bad, I'm able to display a plot that represent the time for this function. But it is linear, of course, so there's not much that interest us...
So I started to create a function that do the process by changing graduatly the sampleSize :
doFullTest <- function(func, useMatrix, numberOfTestsPerN, maxN) {
temps <- c()
for(sampleSize in seq(from=1, to=maxN, by=1)) {
permut <- sample(1:sampleSize, sampleSize)
if(useMatrix) {
permut <- trouverMatriceDePermutation(extrait)
}
temps <- append(temps, mean(createProcessTest(func, permut, numberOfTestsPerN)))
}
return(temps)
}
And so I can use it this way :
plot(x=1:100, y=doFullTest(calculOrdrePermutation, T, 5, 100), type="h")
(source: i.ibb.co)
Time used depending on the size of the data, from N=1 to N=100
So what I asked is to run 5 times the function per size of data to take the mean, and then repeat with an increased size. But as you can see, it isn't possible to study it, I hoped to have a linear histogram (because my algorithm has a complexity of O(n) ).
Is there a problem in my code? Am I doing it totally wrong?
I'm pretty sure I'm not that far from my goal, but the result is quite upseting...
Thank you for your help!

Related

Way to progressively overlap line plots in R

I have a for loop from which I call a function grapher() which extracts certain columns from a dataframe (position and w, both continuous variables) and plots them. My code changes the Y variable (called w here) each time it runs and so I'd like to plot it as an overlay progressively. If I run the grapher() function 4 times for example, I'd like to have 4 plots where the first plot has only 1 line, and the 4th has all 4 overlain on each other (as different colours).
I've already tried points() as suggested in other posts, but for some reason it only generates a new graph.
grapher <- function(){
position.2L <- data[data$V1=='2L', 'V2']
w.2L <- data[data$V1=='2L', 'w']
plot(position.2L, w.2L)
points(position.2L, w.2L, col='green')
}
# example of my for loop #
for (t in 1:200){
#code here changes the 'w' variable each iteration of 't'
if (t%%50==0){
grapher()
}
}
Not knowing any details about your situation I can only assume something like this might be applicable.
# Example data set
d <- data.frame(V1=rep(1:2, each=6), V2=rep(1:6, 2), w=rep(1:6, each=2))
# Prepare the matrix we will write to.
n <- 200
m <- matrix(d$w, nrow(d), n)
# Loop progressively adding more noise to the data
set.seed(1)
for (i in 2:n) {
m[,i] <- m[,i-1] + rnorm(nrow(d), 0, 0.05)
}
# We can now plot the matrix, selecting the relevant rows and columns
matplot(m[d$V1 == 1, seq(1, n, by=50)], type="o", pch=16, lty=1)

Using R and Sensor Accelerometer Data to Detect a Jump

I'm fascinated by sensor data. I used my iPhone and an app called SensorLog to capture
accelerometer data while I stand and push my legs to jump.
My goal is to use R to create a model which can identify jumps and how long I'm in the air.
I'm unsure how to proceed in such a challenge. I have a timeseries with accelerometer data.
https://drive.google.com/file/d/0ByWxsCBUWbqRcGlLVTVnTnZIVVk/view?usp=sharing
Some questions:
How can a jump be detected in timeseries data?
How to identify the air time part?
How to train such a model?
Below is the R code used to create the graphs above, which is me standing and doing a simple jump.
Thanks!
# Training set
sample <- read.csv("sample-data.csv")
# Sum gravity
sample$total_gravity <- sqrt(sample$accelerometerAccelerationX^2+sample$accelerometerAccelerationY^2+sample$accelerometerAccelerationZ^2)
# Smooth our total gravity to remove noise
f <- rep(1/4,4)
sample$total_gravity_smooth <- filter(sample$total_gravity, f, sides=2)
# Removes rows with NA from smoothing
sample<-sample[!is.na(sample$total_gravity_smooth),]
#sample$test<-rollmaxr(sample$total_gravity_smooth, 10, fill = NA, align = "right")
# Plot gravity
plot(sample$total_gravity, type="l", col=grey(.2), xlab="Series", ylab="Gravity", main="Accelerometer Gravitational Force")
lines(sample$total_gravity_smooth, col="red")
stdevs <- mean(sample$total_gravity_smooth)+c(-2,-1,+1,+2)*sd(sample$total_gravity_smooth)
abline(h=stdevs)
This is probably less than perfect solution, but it might be enough to get you started. The first part relies on a small modification of the find_peaks function from the gazetools package.
find_maxima <- function(x, threshold)
{
ranges <- find_peak_ranges(x, threshold)
peaks <- NULL
if (!is.null(ranges)) {
for (i in 1:nrow(ranges)) {
rnge <- ranges[i, 1]:ranges[i, 2]
r <- x[rnge]
peaks <- c(peaks, rnge[which(r == max(r))])
}
}
peaks
}
find_minima <- function(x, threshold)
{
ranges <- find_peak_ranges(x, threshold)
peaks <- NULL
if (!is.null(ranges)) {
for (i in 1:nrow(ranges)) {
rnge <- ranges[i, 1]:ranges[i, 2]
r <- x[rnge]
peaks <- c(peaks, rnge[which(r == min(r))])
}
}
peaks
}
In order to get the find_maxima and find_minima functions to give us what we're looking for we are going to need to smooth the total_gravity data even further:
spline <- smooth.spline(sample$loggingSample, y = sample$total_gravity, df = 30)
Note: I 'zeroed out' total gravity (sample$total_gravity <- sample$total_gravity - 1)
Next, pull out the smoothed x and y values:
out <- as.data.frame(cbind(spline$x,spline$y))
Then find our local maxima and minima
max <- find_maxima(out$y, threshold = 0.4)
min <- find_minima(out$y, threshold = -0.4)
And then plot the data to make sure everything looks legit:
plot(out$y, type="l", col=grey(.2), xlab="Series", ylab="Gravity", main="Accelerometer Gravitational Force")
lines(out$y, col="red")
stdevs <- mean(out$y)+c(-2,-1,+1,+2)*sd(out$y)
abline(h=stdevs)
abline(v=max[1], col = 'green')
abline(v=max[2], col = 'green')
abline(v=min[1], col = 'blue')
And finally, we can see how long you were off the ground.
print(hangtime <- min[1] - max[1])
[1] 20
You can reduce your thresholds to get additional datapoints (changes in acceleration).
Hope this helps!
I would consider a few things:
Smooth the data by collecting median values every 100ms - accelerometer data on iPhones is not perfectly accurate, so this approach will help.
Identify turningpoints as #scribbles suggests.
There is code available in my github repository that could be modified to help with both of these issues. A PDF with some explanation is here: https://github.com/MonteShaffer/mPowerEI/blob/master/mPowerEI/example/challenge-1a.pdf
Specifically, take a look at:
library(devtools);
install_github("MonteShaffer/mPowerEI", subdir="mPowerEI");
library(mPowerEI);
# data smoothing
?scaleToTimeIncrement
# turning points
?pastecs::turnpoints

Levy Walk simulation in R

I am trying to generate a series of numbers to simulate a Levy Walk in R. Currently I am using the following code:
alpha=2
n=1000
x=rep(0,n)
y=rep(0,n)
for (i in 2:n){
theta=runif(1)*2*pi
f=runif(1)^(-1/alpha)
x[i]=x[i-1]+f*cos(theta)
y[i]=y[i-1]+f*sin(theta)
}
The code is working as expected and I am able to generate the numbers according to my requirements. The figure below shows on such Levy Walk:
The following histogram confirms that the numbers generated (i.e. f) actually belong to a power law:
My question is as follows:
The step lengths generated (i.e. f) are quite large. Haw can I modify the code so that the step lengths only fall within some bound [fmin, fmax]?
P.S. I have intentionally not vectorized the code.
Try using this:
f=runif(1, fmax^(-alpha), fmin^(-alpha))^(-1/alpha)
Note that you need 0 < fmin < fmax.
BTW, you can vectorize your code like this:
theta <- runif(n-1)*2*pi
f <- runif(n-1, fmax^(-alpha), fmin^(-alpha))^(-1/alpha)
x <- c(0, cumsum(f*cos(theta)))
y <- c(0, cumsum(f*sin(theta)))
Just for precision, what you're simmulating here is a Lévy flight. For it to be a Lévy walk, you should allow the particle to "walk" from the beginning to the end of each flight (with a for, for example). If you plot your resulting simmulation with plot(x, y, type = "o") you will see that there are no positions within flights (no walking) using your code.
library(ggplot2)
library(gridExtra)
alpha= 5
n= 1000
x= rep(0,n)
y= rep(0,n)
fmin= 1
fmax= n
for (i in 2:n){
theta= runif(n-1)*2*pi
f= runif(n-1, fmax^(-alpha), fmin^(-alpha))^(-1/alpha)
x= c(0, cumsum(f*cos(theta)))
y= c(0, cumsum(f*sin(theta)))
}
ggplot(data.frame(x=x, y=y), aes(x, y))+geom_point()+geom_path()

Plotting during a loop in RStudio

I am implementing a solution to the Traveling Salesman Problem (TSP) in R (simulated Annealing) and I want to output the current best path periodically. I have searched quite a bit for how to output plots during a for loop and have thus far failed.
I use RStudio, and want to see the graphs as they are generated. If you have ever watched TSP solvers do their thing, you will understand how cool it is to watch. Here is a sample of the graphics output I want to see http://www.staff.science.uu.nl/~beuke106/anneal/anneal.html
I don't think that the memory usage will be a problem (during about 500,000 iterations, I am only expecting 50-100 plots). Here is a sample function, where we would expect to see 10 different plots during the time the function runs:
Plotz <- function(iter = 1000000, interval = 100000) {
x <- 1:10
for(i in 1:iter){
y <- runif(10)
if(i %% interval == 0) {
plot(x, y)
}
}
return(c(x, y))
}
Plotz()
When I run this, all I see is the final plot produced (in RStudio). How can I see the plots as they're generated?
Also: I am on Ubuntu (whatever the newest stable release is). Don't know if that is relevant.
Thank you everyone in advance.
EDIT: Per Captain Murphy's suggestion, I tried running this in the Linux terminal, and the graphics appeared. I still think the question of "How to do this in RStudio?" Is still relevant, however. It's such a good program, so maybe someone has an idea of what could be done to get this to work?
EDIT2: As Thilo stated, this is a known bug in Rstudio. If anyone has any other ideas to solve this without the software itself being fixed, then there is still something to discuss. Otherwise, consider this question solved.
Calling Sys.sleep(0) should cause the plot to draw. Unlike the X11 solution, this will work on server versions of RStudio as well.
(I was surprised that dev.flush() did not give the result you were hoping for, that might be a bug.)
One thing you can do is open a x11 window and plot in there:
x11()
Plotz()
That should work the same as running it in terminal.
Following up on #JoeCheng's answer and #RGuy's comment on that answer: as I worked out with the RStudio folks, the problem seems to primarily arise when there is too much plotting going on in too short a timespan. The solution is twofold:
Sys.sleep(0) helps force an update to the plotting window.
Plotting updates every Wth loop rather than every loop.
For instance, on my computer (i7, RStudio Server), the following code does not update until the loop completes:
N <- 1000
x <- rep(NA,N)
plot(c(0,1)~c(0,N), col=NA)
for(i in seq(N)) {
Sys.sleep(.01)
x[i] <- runif(1)
iseq <- seq(i-99,i)
points( x[i]~i )
Sys.sleep(0)
}
The following code updates in real-time, despite having the same number of points to be plotted:
N <- 1000
x <- rep(NA,N)
plot(c(0,1)~c(0,N), col=NA)
for(i in seq(N)) {
Sys.sleep(.01)
x[i] <- runif(1)
iseq <- seq(i-99,i)
if(i%%100==0) {
points( x[iseq]~iseq )
Sys.sleep(0)
}
}
In other words, it's the number of calls the plot that seems to matter, not the amount of data to be plotted.
If you want to save the plots as well you could just open a new device in the loop and close it afterwards.
Plotz <- function(iter = 1000, interval = 100) {
x <- 1:10
p <- 0 #plot number
for(i in 1:iter){
y <- runif(10)
if(i %% interval == 0) {
png(file=paste(i,"png",sep="."))
p <- p + 1; plot(x, y)
dev.off()
}
}
return(c(x, y))
}
Plotz <- function(iter = 1000, interval = 100) {
x <- 1:10
p <- 0 #plot number
for(i in 1:iter){
y <- runif(10)
if(i %% interval == 0) {
p <- p + 1; plot(x, y)
readline("Please press the Enter key to see the next plot if there is one.")
}
}
return(c(x, y))
}
Plotz()
You can also use the back arrows on the plots tab of the lower left pane of the RStudio interface in order to view the plots.
You can use the animate package to layer your plots into a GIF.

Utilise Surv object in ggplot or lattice

Anyone knows how to take advantage of ggplot or lattice in doing survival analysis? It would be nice to do a trellis or facet-like survival graphs.
So in the end I played around and sort of found a solution for a Kaplan-Meier plot. I apologize for the messy code in taking the list elements into a dataframe, but I couldnt figure out another way.
Note: It only works with two levels of strata. If anyone know how I can use x<-length(stratum) to do this please let me know (in Stata I could append to a macro-unsure how this works in R).
ggkm<-function(time,event,stratum) {
m2s<-Surv(time,as.numeric(event))
fit <- survfit(m2s ~ stratum)
f$time <- fit$time
f$surv <- fit$surv
f$strata <- c(rep(names(fit$strata[1]),fit$strata[1]),
rep(names(fit$strata[2]),fit$strata[2]))
f$upper <- fit$upper
f$lower <- fit$lower
r <- ggplot (f, aes(x=time, y=surv, fill=strata, group=strata))
+geom_line()+geom_ribbon(aes(ymin=lower,ymax=upper),alpha=0.3)
return(r)
}
I have been using the following code in lattice. The first function draws KM-curves for one group and would typically be used as the panel.group function, while the second adds the log-rank test p-value for the entire panel:
km.panel <- function(x,y,type,mark.time=T,...){
na.part <- is.na(x)|is.na(y)
x <- x[!na.part]
y <- y[!na.part]
if (length(x)==0) return()
fit <- survfit(Surv(x,y)~1)
if (mark.time){
cens <- which(fit$time %in% x[y==0])
panel.xyplot(fit$time[cens], fit$surv[cens], type="p",...)
}
panel.xyplot(c(0,fit$time), c(1,fit$surv),type="s",...)
}
logrank.panel <- function(x,y,subscripts,groups,...){
lr <- survdiff(Surv(x,y)~groups[subscripts])
otmp <- lr$obs
etmp <- lr$exp
df <- (sum(1 * (etmp > 0))) - 1
p <- 1 - pchisq(lr$chisq, df)
p.text <- paste("p=", signif(p, 2))
grid.text(p.text, 0.95, 0.05, just=c("right","bottom"))
panel.superpose(x=x,y=y,subscripts=subscripts,groups=groups,...)
}
The censoring indicator has to be 0-1 for this code to work. The usage would be along the following lines:
library(survival)
library(lattice)
library(grid)
data(colon) #built-in example data set
xyplot(status~time, data=colon, groups=rx, panel.groups=km.panel, panel=logrank.panel)
If you just use 'panel=panel.superpose' then you won't get the p-value.
I started out following almost exactly the approach you use in your updated answer. But the thing that's irritating about the survfit is that it only marks the changes, not each tick - e.g., it will give you 0 - 100%, 3 - 88% instead of 0 - 100%, 1 - 100%, 2 - 100%, 3 - 88%. If you feed that into ggplot, your lines will slope from 0 to 3, rather than remaining flat and dropping straight down at 3. That might be fine depending on your application and assumptions, but it's not the classic KM plot. This is how I handled the varying numbers of strata:
groupvec <- c()
for(i in seq_along(x$strata)){
groupvec <- append(groupvec, rep(x = names(x$strata[i]), times = x$strata[i]))
}
f$strata <- groupvec
For what it's worth, this is how I ended up doing it - but this isn't really a KM plot, either, because I'm not calculating out the KM estimate per se (although I have no censoring, so this is equivalent... I believe).
survcurv <- function(surv.time, group = NA) {
#Must be able to coerce surv.time and group to vectors
if(!is.vector(as.vector(surv.time)) | !is.vector(as.vector(group))) {stop("surv.time and group must be coercible to vectors.")}
#Make sure that the surv.time is numeric
if(!is.numeric(surv.time)) {stop("Survival times must be numeric.")}
#Group can be just about anything, but must be the same length as surv.time
if(length(surv.time) != length(group)) {stop("The vectors passed to the surv.time and group arguments must be of equal length.")}
#What is the maximum number of ticks recorded?
max.time <- max(surv.time)
#What is the number of groups in the data?
n.groups <- length(unique(group))
#Use the number of ticks (plus one for t = 0) times the number of groups to
#create an empty skeleton of the results.
curves <- data.frame(tick = rep(0:max.time, n.groups), group = NA, surv.prop = NA)
#Add the group names - R will reuse the vector so that equal numbers of rows
#are labeled with each group.
curves$group <- unique(group)
#For each row, calculate the number of survivors in group[i] at tick[i]
for(i in seq_len(nrow(curves))){
curves$surv.prop[i] <- sum(surv.time[group %in% curves$group[i]] > curves$tick[i]) /
length(surv.time[group %in% curves$group[i]])
}
#Return the results, ordered by group and tick - easier for humans to read.
return(curves[order(curves$group, curves$tick), ])
}

Resources