I tried to do a stochastic simulation on a epidemiology SEIR model using the coding below.
library(GillespieSSA)
parms <- c(beta=0.591,sigma=1/8,gamma=1/7)
x0 <- c(S=50,E=0,I=1,R=0)
a <- c("beta*S*I","sigma*E","gamma*I")
nu <- matrix(c(-1,0,0,
1,-1,0,
0,1,-1,
0,0,1),nrow=4,byrow=TRUE)
set.seed(12345)
out <- lapply(X=1:10,FUN=function(x) ssa(x0,a,nu,parms,tf=50)$data)
out
I managed to obtain the 10 simulations values that I wanted. The time is in continuous form. Now, I have to extract time in discrete form such as 1,2,3...,50 from each simulation. Which type of coding should I use?
I tried doing data.frame and extract but still not able to do it.
Thanking in advance for any help.
Lets say the data looks like this:
df <- data.frame(t=seq(0.4,4.5,0.03), x=1:137)
## t x
## 1 0.40 1
## 2 0.43 2
## 3 0.46 3
## 4 0.49 4
## 5 0.52 5
To get the discrete time index values:
idx <- diff(ceiling(df$t)) == 1
Discrete time series will be:
df[idx,]
## t x
## 21 1.00 21
## 54 1.99 54
## 87 2.98 87
## 121 4.00 121
Having run the trial myself a problem seems to be that many of the time stamps are quite a distance from an integer result.
To see these remainders, check: out[[1]][,1] %% 1
The good news is that you can use the output from this, with a tuning parameter, to select what you want. For this purpose, you'll want to find the distance from one and then control for what's an acceptable gap.
Do this as follows and save the result (and bunch of TRUE and FALSE results)
selection <- abs((out[[1]][,1] %% 1) - 1) < 0.1
You can then subset the matrix out using the selection index we just saved:
out[[1]][selection,]
Related
I'm very new to R so excuse any incorrect language. I'm not sure if I even asked this question correctly, but here is the problem I'm dealing with.
Suppose I have a data frame that contains data for lengths and weights for 10 different species of fish. Suppose I have 100 samples for each species a fish (1000 rows of data). Is it possible to return the describe() function of a column for each unique species of fish without having to create an object for each species?
For example if I write:
Catfish <- filter(dataframe, dataframe$lengths == "Catfish")
describe(Catfish$lengths)
Do I have to manually create an object (Catfish for example) for each species and then describe? Or is there a simpler way to return describe() for the lengths of each unique species directly from my original dataframe? Hopefully I asked the clearly enough. Thanks for any help!
I think what you might want to look into is a split-apply-combine technique (example below)
df
value ID
1 1 ID
2 2 ID
3 3 PD
4 4 PD
5 5 ID
#split by grouping variable (in your case a fishspecies)
df_split <- split(df, df$ID)
#apply a function (in your case describe)
df_split <- lapply(df_split, function(x) { x["ID"] <- NULL; x }) #removed ID for easier merging
df_split <- lapply(df_split, describe)
#combine
Result <- Reduce(rbind, df_split)
Result
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 3 2.67 2.08 2.0 2.67 1.48 1 5 4 0.29 -2.33 1.2
X11 1 2 3.50 0.71 3.5 3.50 0.74 3 4 1 0.00 -2.75 0.5
What would improve this script is to add the specific grouping variable to each row (so "ID" in this example). But I think this provides a starting point for you.
**EDITED, I have made progress, but didn't think my original question was as well constructed as it could be.
I am new to R and computer programming in general and I am attempting to write my first for loop.
I want to be able to do some tidal analysis using harmonic constituents from NOAA.
I have my initial data=data which looks like:
Constituent # Name Amplitude Phase Speed
1 M2 3.264 29.0 28.98
2 S2 0.781 51.9 30.0
3 N2 0.63 12.3 28.43
4 K1 1.263 136.8 15.04
5 M4 0.043 286.0 57.96
The equation for wave height is h(t)= Amplitude*cos(Speed*t-Phase) where t is time.
Therefore I need to perform this calculation for each constituent (row) and sum the results of each constituent by time.
So my middle result will be a table with the ncols=number of time stamps and the nrow= number of constituents.
T1 T2 T3...
data[1,3]*cos(data[1,4]*T1-data[1,3]) data[1,3]*cos(data[1,4]*T2-data[1,3])
data[2,3]*cos(data[2,4]*T1-data[2,3]) data[2,3]*cos(data[2,4]*T2-data[2,3])
.
.
.
data[n,3]*cos(data[n,4]*T1-data[n,3]) data[n,3]*cos(data[n,4]*T2-data[n,3])
With this table I can sum the columns to get my final answer of what the tide height is at each time stamp.
To do this I have attempted to create a for loop.
DF=NULL
for (i in 1:nrow(data)){
DF<- matrix(c(DF, data[i,2]*cos(pi/180*(data[i,4]*Time[,]-data[i,3]))))
}
This returns all the results a single vector. I can't figure out how to separate it into columns by the timestamp. It just runs through all the timestamps for the furst constituent, then the second and so on. So for my current station I have 37 constituents and 100 time stamps so my matrix DF is 1 column with 3700 rows.
I have tried setting the matrix DF with the appropriate number of columns and rows, but this returns a single result for all rows and columns. I have also tried a nested if statement with time, and many other things that I can't remember.
***Used Rusan's approach and finished what I was doing with the script below. Any other approaches are appreciated.
Time<-matrix(seq(1,100,1)) #my time series
n<-hh3(Time) #Function outlined by Rusan below
b<- matrix(c(rep(Time[1,1]:Time[nrow(Time),1], nrow(wave_table)))) #A repeating list to bind with n
height<-matrix(colSums(dcast(data.frame(cbind(b,n)),Constituent~V1,value.var="V1.1")[,-1])) #The sums of all the constituents at each time stamp, the final height of the wave at each time
This allows me to sum all the constituents at each time stamp. Height=sum of all constituents at time t. So for my example above height(t1)=M2(t1)+S2(t1)+N2(t1)+K1(t1)+M4(t1)
My final output is a matrix of a single vector height. I want this to create an inundation duration curve.
Perhaps this is not an answer - but I would suggest a different approach. I will use the package data.table in R.
library(data.table)
#use own location of your data
wave_table=fread(input="F:\\wave.csv");
wave_table
# Constituent Name Amplitude Phase Speed
# 1: 1 M2 3.264 29.0 28.98
# 2: 2 S2 0.781 51.9 30.00
# 3: 3 N2 0.630 12.3 28.43
# 4: 4 K1 1.263 136.8 15.04
# 5: 5 M4 0.043 286.0 57.96
#create a function which does your calculation on the named columns of your data,
#taking time 't' as a parameter
hh<-function(t){ wave_table[,{Amplitude*cos(Speed*t-Phase)}] }
hh2<-function(t) wave_table[,{Amplitude*cos(Speed*t-Phase)}, by=Name]
hh3<-function(t) wave_table[,{Amplitude*cos(Speed*t-Phase)}, by=Constituent]
hh4<-function(t) wave_table[,{sum(Amplitude*cos(Speed*t-Phase))}, by=Constituent]
#Now the function `hh` can be used like this, giving you a bit
#more flexibility with what you want to do, perhaps?
hh(1)
#3.26334722 -0.77775795 -0.57472163 -0.91362687 -0.01165717
or
hh2(1)
# Name V1
#1: M2 3.26334722
#2: S2 -0.77775795
#3: N2 -0.57472163
#4: K1 -0.91362687
#5: M4 -0.01165717
or
hh4(1) #after adding an extra row to your data: "Constituent=1, Name=M3,
#Amp=1.263,Phase=51.9, Speed=15.04
# Constituent V1
#1: 1 4.10718774
#2: 2 -0.77775795
#3: 3 -0.57472163
#4: 4 -0.91362687
#5: 5 -0.01165717
In general, loops in R for this type of problem should be avoided, as they are slow/there are much better tools available. Loops are typically "last resort".
If the function hh to hh4 do not do exactly what you want, there are other variations that could be used. Check out http://cran.r-project.org/web/packages/data.table/vignettes/datatable-faq.pdf
I don't think this question has asked yet (most similar questions are about extracting data or returning a count). I am new to R, so any help would be appreciated!
I have a dataset of multiple runs of an experiment in one file and the data looks like this, where i have all the time steps for each run in rows
time [info] id (unique per run)
I am attempting to calculate when the system reaches equilibrium, which I am defining as stable values in 3 interdependent parameters. I would like to have the contents of rows compared and if they are within 5% of each other over 20 timesteps, to return the timestep at which the stability begins and the id.
So far, I'm thinking it will be something like the following (or maybe have a while loop)(sorry for the bad formatting):
y=1;
z=0; #variables to control the loop
x=0;
for (ID) {
if (CC at time=x == 0.05+-CC at time=y ) {
if(z<=20){ #catalogs the number of periods that match
y++
z++}
else [save value in column]
}
else{ #no match for sustained period so start over again
x++
y=x+1
z=0
}
}
eta: CC is one of my parameters of interest and ranges between 0 and 1 although the endpoints are unlikely.
Here's a simple example that might help: this is something like how my data looks:
zz <- textConnection("time CC ID
1 0.99 1
2 0.80 1
3 0.90 1
4 0.91 1
5 0.92 1
6 0.91 1
1 0.99 2
2 0.90 2
3 0.90 2
4 0.91 2
5 0.92 2
6 0.91 2")
Data <- read.table(zz, header = TRUE)
close(zz)
my question is, how can i run through the lines to find out when the value of CC becomes 'stable' (meaning it doesn't change by more than 0.05 over X (here, 3) time steps) so that it would create the following results:
ID timeToEQ
1 1 3
2 2 2
does this help? The only way I can think to do this is with a for-loop and I think there must be an easier way!
Here is my code. I will post the explanation in some time.
require(plyr)
ddply(Data, .(ID), summarize, timeToEQ = Position(isTRUE, abs(diff(CC)) < 0.05 ))
ID timeToEQ
1 1 3
2 2 2
EDIT. Here is how it works.
ddply breaks Data into subsets based on ID.
diff(CC) computes the difference between CC of successive rows.
abs(diff(CC)) < 0.05) returns TRUE if the difference has stabilized.
Position locates the first instance of an element which satisfies isTRUE.
I have a set of user recommandations
review=matrix(c(5:1,10,2,1,1,2), nrow=5, ncol=2, dimnames=list(NULL,c("Star","Votes")))
and wanted to use summary(review) to show basic properties mean, median, quartiles and min max.
But it gives back the summary of both columns. I refrain from using data.frame because the factors 'Star' are ordered.
How can I tell R that Star is a ordered list of factors numeric score and votes are their frequency?
I'm not exactly sure what you mean by taking the mean in general if Star is supposed to be an ordered factor. However, in the example you give where Star is actually a set of numeric values, you can use the following:
library(Hmisc)
R> review=matrix(c(5:1,10,2,1,1,2), nrow=5, ncol=2, dimnames=list(NULL,c("Star","Votes")))
R> wtd.mean(review[, 1], weights = review[, 2])
[1] 4.0625
R> wtd.quantile(review[, 1], weights = review[, 2])
0% 25% 50% 75% 100%
1.00 3.75 5.00 5.00 5.00
I don't understand what's the problem. Why shouldn't you use data.frame?
rv <- data.frame(star = ordered(review[, 1]), votes = review[, 2])
You should convert your data.frame to vector:
( vts <- with(rv, rep(star, votes)) )
[1] 5 5 5 5 5 5 5 5 5 5 4 4 3 2 1 1
Levels: 1 < 2 < 3 < 4 < 5
Then do the summary... I just don't know what kind of summary, since summary will bring you back to the start. O_o
summary(vts)
1 2 3 4 5
2 1 1 2 10
EDIT (on #Prasad's suggestion)
Since vts is an ordered factor, you should convert it to numeric, hence calculate the summary (at this moment I will disregard the background statistical issues):
nvts <- as.numeric(levels(vts)[vts]) ## numeric conversion
summary(nvts) ## "ordinary" summary
fivenum(nvts) ## Tukey's five number summary
Just to clarify -- when you say you would like "mean, median, quartiles and min/max", you're talking in terms of number of stars? e.g mean = 4.062 stars?
Then using aL3xa's code, would something like summary(as.numeric(as.character(vts))) be what you want?
It is easy to do an Exact Binomial Test on two values but what happens if one wants to do the test on a whole bunch of number of successes and number of trials. I created a dataframe of test sensitivities, potential number of enrollees in a study and then for each row I calculate how may successes that would be. Here is the code.
sens <-seq(from=.1, to=.5, by=0.05)
enroll <-seq(from=20, to=200, by=20)
df <-expand.grid(sens=sens,enroll=enroll)
df <-transform(df,succes=sens*enroll)
But now how do I use each row's combination of successes and number of trials to do the binomial test.
I am only interested in the upper limit of the 95% confidence interval of the binomial test. I want that single number to be added to the data frame as a column called "upper.limit"
I thought of something along the lines of
binom.test(succes,enroll)$conf.int
alas, conf.int gives something such as
[1] 0.1266556 0.2918427
attr(,"conf.level")
[1] 0.95
All I want is just 0.2918427
Furthermore I have a feeling that there has to be do.call in there somewhere and maybe even an lapply but I do not know how that will go through the whole data frame. Or should I perhaps be using plyr?
Clearly my head is spinning. Please make it stop.
If this gives you (almost) what you want, then try this:
binom.test(succes,enroll)$conf.int[2]
And apply across the board or across the rows as it were:
> df$UCL <- apply(df, 1, function(x) binom.test(x[3],x[2])$conf.int[2] )
> head(df)
sens enroll succes UCL
1 0.10 20 2 0.3169827
2 0.15 20 3 0.3789268
3 0.20 20 4 0.4366140
4 0.25 20 5 0.4910459
5 0.30 20 6 0.5427892
6 0.35 20 7 0.5921885
Here you go:
R> newres <- do.call(rbind, apply(df, 1, function(x) {
+ bt <- binom.test(x[3], x[2])$conf.int;
+ newdf <- data.frame(t(x), UCL=bt[2]) }))
R>
R> head(newres)
sens enroll succes UCL
1 0.10 20 2 0.31698
2 0.15 20 3 0.37893
3 0.20 20 4 0.43661
4 0.25 20 5 0.49105
5 0.30 20 6 0.54279
6 0.35 20 7 0.59219
R>
This uses apply to loop over your existing data, compute test, return the value you want by sticking it into a new (one-row) data.frame. And we then glue all those 90 data.frame objects into a new single one with do.call(rbind, ...) over the list we got from apply.
Ah yes, if you just want to directly insert a single column the other answer rocks as it is simple. My longer answer shows how to grow or construct a data.frame during the sweep of apply.