Related
On my way to calculate each component of the time series for each X (50 levels) and Y (80 levels) from my dataset (df) I realised I probably need to create a list of lists of all the decomposition lists for those combinations.
Unfortunately, my code below brings me no closer to the solution. I know that on its own, the first lines work, until the decomposition bit. It creates a list of lists but then how do I extract components for each X*Y combination?
P <- df$X
for(y in 1:length(P)) {
OneP <- P[y]
AllS <- unique(df$Y[df$X== OneP])
for(i in 1:length(AllS)) {
OneS<- AllS[i]
df$TS[df$Y == OneS & df$X== OneP] <- ts(df$Mean[df$Y == OneS & df$X
== OneP], start = c(1999, 1), end = c(2015, 12), frequency = 12)
ListOfDec <- list()
for(d in 1:length(OneS)) {
# Run for-loop over lists
ListOfDec[df$Y == OneS & df$X== OneP] <- list(decompose(ts(df$TS[df$Y == OneS & df$X== OneP], frequency = 12), type = c("additive")))
}
df$Decomposition_seasonal[df$Y == OneS & df$X== OneP] <- what goes here?
}
Any advice is deeply appreciated.
Similar data:
X Y Mean Date(mY)
Tru A 35.6 02.2015
Fle A 15 05.2010
Srl C 67.1 05.1999
Tru A 13.2 08.2006
Srl B 89 08.2006
Tru B 14.8 12.2001
Fle A 21.5 11.2001
Lub D 34.8 03.2000
I am building a program for simulating sequences of wind vectors (in base R).
I have a data set of parameters for six wind-generation mechanisms ('pars'),(I'll call them ellipses) and there are 5 parameters for each ellipse, thus 30 columns of parameters, plus other parameters that indicate the proportion of time (frequency, indicated by f.0, f.1...) each ellipse is in operation. There are 24 rows in 'pars', each identified by an 'hour' variable. The following codes generates a simulated 'pars' data frame
pars <- as.data.frame(matrix(rnorm(24*42), 24, 42, dimnames=list(NULL, c(
'f.0', 'f.1', 'f.2', 'f.3', 'f.4', 'f.5', 'f.6',
'W.0', 'W.1', 'W.2', 'W.3', 'W.4', 'W.5', 'W.6',
'S.0', 'S.1', 'S.2', 'S.3', 'S.4', 'S.5', 'S.6',
'w.0', 'w.1', 'w.2', 'w.3', 'w.4', 'w.5', 'w.6',
's.0', 's.1', 's.2', 's.3', 's.4', 's.5', 's.6',
'r.0', 'r.1', 'r.2', 'r.3', 'r.4', 'r.5', 'r.6')
)))
jobFun <- function(n) {
m <- matrix(runif(7*n), ncol=7)
m <- sweep(m, 1, rowSums(m), FUN="/")
m
}
pars[1:24,c('f.0', 'f.1', 'f.2', 'f.3', 'f.4', 'f.5', 'f.6')] <- jobFun(24) # generate ellipse frequencies, summing to 1
pars$hour <- 0:23 # Add an 'hour' variable
pars$p0 <- with(pars, f.0) # change to make it zero if < zero!
pars$p1 <- with(pars, f.1 + p0)
pars$p2 <- with(pars, f.2 + p1)
pars$p3 <- with(pars, f.3 + p2)
pars$p4 <- with(pars, f.4 + p3)
pars$p5 <- with(pars, f.5 + p4)
pars$p6 <- with(pars, f.6 + p5)
I start by generating a sequence of POSIXct date-times for a single day, e.g, at 5 minute intervals ('sim'). For each date-time in 'sim', I need to select an ellipse and assign the parameters to the 'sim' data set. I have made additional columns in 'pars' with the cumulative probability of each ellipse, e.g., p0 = f.0, p1 = p0 + f.1, p2 = p1 + f.2, etc. I am going to select a different ellipse for each 5 minute time increment (then select the parameters corresponding to that ellipse). My difficulty lies in being unable to specify the appropriate value for p.
START <- ISOdate(2022, MONTH, 1, hour=0, min=0)
END <- START + (24*3600) - 1
tseq <- seq(from=START,to=END,by=300)
sim = data.frame(tseq)
sim$Ep <- runif(nrow(sim)) # Generate random vector Ep for ellipse picking
sim$Enum <- with(sim, ifelse( # number identifying ellipse to be used
Ep < pars$p0[which(pars$hour == hour(tseq))], 0, ifelse(
Ep < pars$p1[which(pars$hour == hour(tseq))], 1, ifelse(
Ep < pars$p2[which(pars$hour == hour(tseq))], 2, ifelse(
Ep < pars$p3[which(pars$hour == hour(tseq))], 3, ifelse(
Ep < pars$p4[which(pars$hour == hour(tseq))], 4, ifelse(
Ep < pars$p5[which(pars$hour == hour(tseq))], 5, 6)))))))
...
The result should be a vector (Enum) of integers between 0 and 6 identifying the ellipse to be used at each 5 minute time increment. My program only provides a correct answer at the 0th minute of each hour; there is something wrong with the statement
pars$p[which(pars$hour == hour(tseq))]
which ends up generating NA's for all the other 5 minute time increments in the hour. (i.e., there are 12 increments of 5 minutes in an hour, and the statement
which(pars$hour == hour(tseq))
brings up all 12 at once, instead of one at a time which is what I need here. Maybe I need a 'for' loop? Any suggestions for fixing, and for making the above code more compact, will be appreciated.
The problem is that the logical subscripting too complicated. All that is necessary is to change, e.g.,
pars$p0[which(pars$hour == hour(tseq))]
to
pars$p0[hour(tseq)+1]
and the value for p0 that is specific to the hour being simulated will be selected.
Spector (2008) "Data Manipulation with R" is helpful as usual.
Note that for question above, the 'lubridate' package is necessary for the hour() function, and MONTH must be specified (e.g., MONTH=4) to run the code
I am trying to do something that I am sure is really simple in R. But I cannot figure it out. I want to run the same equation 6 times, changing the variables within the equation each time.
My data is something like this:
[#Rename my data
mydata <- BSC_OnlineSurvey_Salient.Beliefs
summary (mydata)
View(mydata)
##Descriptive stats
sapply(mydata, mean, na.rm = TRUE)
sumstats <- sapply(mydata, mean, na.rm = TRUE)
sumstats
#1st: Rename columns
colnames (mydata)
colnames(mydata)=c("ID", "Understands restocking", "Restocking will increase the No. of crabs", "Increasing the No. of crabs is...", "Restocking will result in more crabs to catch", "More crabs to catch is...", "Restocking will result in more fishers fishing for crabs", "More fishers fishing for crabs is...", "Resocking will result in no change in abundance of crabs", "No change in the abundance of crabs is...","Restocking will increase the fishing pressure on crabs", "Increasing the fishing pressure on crabs is", "Restocking will have an impact on the environment and other species", "Having an impact on the environment and other species is...", "Overall views on restocking")
View(mydata)
#Replace Belief evaluation (very unlikely to very likely) from -3-3 to 0-6
Eval1 <- mydata$`Restocking will increase the No. of crabs`
...#Done for 6 "Eval"
Eval1
Eval1\[Eval1 == 3\] <- 6
Eval1\[Eval1 == 2\] <- 5
Eval1\[Eval1 == 1\] <- 4
Eval1\[Eval1 == 0\] <- 3
Eval1\[Eval1 == -3\] <- 0
Eval1\[Eval1 == -2\] <- 1
Eval1\[Eval1 == -1\] <- 2
...
Strength1 <- mydata$`Increasing the No. of crabs is...`
Strength2 <- mydata$`More crabs to catch is...`
Strength3 <- mydata$`More fishers fishing for crabs is...`
...#Done for 6 "Strength"][1]
I do not want to write 6 times the same simple equation. I cannot figure out how to do it, I just have a slight idea that it is probably using one of the apply f(x) or making a loop...
My data`Is a set of variables, Eval(1,2,3...) are on a scale from -3 to 3; Strength (1,2,3,..) are on a scale from 0 to 6
I want to do the cross product of for each row, and then get the mean for each cross products:
Eval1*Strength1
Eval2*Strength2
Ideally without writting
crossprod1 <- mean(Eval1*Strength1, na.rm=TRUE)
crossprod1
If anyone could help with this I would really appreciate it!
Cheers!
[1]: https://i.stack.imgur.com/jH9Zs.png
Hopefully this gives you some ideas. Cheers!
meanTotals = c()
for(r in 1:nrow(dataset)){
rowTotals = c()
for(c in 1:ncol(dataset)/2){
rowTotals = c(rowTotals, dataset[r, 2*c-1] * dataset[r, 2*c])
}
meanTotals = c(meanTotals, rowTotals)
}
mean(meanTotals)
How do I extract only random numbers(CD) for 'Trt' at time point 1.
ns <- 20
ans <- matrix(rep(0,200),nrow=100)
for(k in 1:100)
{
x1=rnorm(ns,0,1)
x2=rnorm(ns,5,5)
x3=rnorm(ns,10,5)
U=c(x1,x2,x3)
simdata=data.frame(CD=U,
Time=factor(rep(c(1,2,3),each=ns)),
treatment=sample(rep(c('Trt','placebo'),ns/2)))
ans[k,]=table(simdata$treatment)
}
simdata
You can do that in multiple ways:
simdata$CD[sim_data$Time == 1]
or use subset:
subset(simdata, Time == 1, select = "CD")
The former is recommended for use in scripts, the latter works well in interactive mode (R prompt).
You can subset for both conditions (treatment = "Trt" and Time = "1") like this:
smpl <- simdata[simdata$Time=="1" & simdata$treatment=="Trt",]
If you only want the CD column:
smpl <- simdata$CD[simdata$Time=="1" & simdata$treatment=="Trt",]
I think you want CD for Timepoint "1" and Treatment ="Trt"
subset(simdata, Time == 1 & treatment == "Trt", select = "CD")
alternatively for the whole data frame
subset(simdata, Time == 1 & treatment == "Trt")
I am trying to write a nested for loop in R, but am running into problems. I have researched as much as possible but can't find (or understand) the help I need. I am fairly new to R, so any advice on this looping would be appreciated, or if there is a simpler, more elegant way!
I have generated a file of daily temperatures for many many locations (I'll call them sites), and the file columns are set up like this:
year month day unix_time site_a site_b site_c site_d ... on and on
For each site (within each column), I want to run through the temperature values and create new columns (or a new data frame) with a number (a physiological rate) that corresponds with a range of those temperatures. (for example, temperatures less than 6.25 degrees have a rate of -1.33, temperatures between 6.25 and 8.75 have a rate of 0.99, etc). I have created a loop that does this for a single column of data. For example:
for(i in 1:dim(data)[1]){
if (data$point_a[i]<6.25) data$rate_point_a[i]<--1.33 else
if (data$point_a[i]>=6.25 && data$point_a[i]<8.75) data$rate_point_a[i]<-0.99 else
if (data$point_a[i]>=8.75 && data$point_a[i]<11.25) data$rate_point_a[i]<-3.31 else
if (data$point_a[i]>=11.25 && data$point_a[i]<13.75) data$rate_point_a[i]<-2.56 else
if (data$point_a[i]>=13.75 && data$point_a[i]<16.25) data$rate_point_a[i]<-1.81 else
if (data$point_a[i]>=16.25 && data$point_a[i]<18.75) data$rate_point_a[i]<-2.78 else
if (data$point_a[i]>=18.75 && data$point_a[i]<21.25) data$rate_point_a[i]<-3.75 else
if (data$point_a[i]>=21.25 && data$point_a[i]<23.75) data$rate_point_a[i]<-1.98 else
if (data$point_a[i]>=23.75 && data$point_a[i]<26.25) data$rate_point_a[i]<-0.21
}
The above code gives me a new column called "rate_site_a" that has my physiological rates. What I am having trouble doing is nesting this loop into another loop that runs through all of the columns. I have tried things such as:
for (i in 1:ncol(data)){
#for each row in that column
for (s in 1:length(data)){
if ([i]<6.25) rate1[s]<--1.33 else ...
I guess I don't know how to make the "if else" statement refer to the correct places. I know that I can't add the "rate" columns onto the existing data frame, as this would increase my ncol as I go through the loop, so need to put them into another data frame (though don't think this is my main issue). I am going to have many many many points to work through and would rather not have to do them one at a time, hence my attempt at a nested loop.
Any help would be much appreciated. Here is a link to some sample data if that is helpful. http://dl.dropbox.com/u/17903768/AVHRR_output.txt Thanks in advance!
Use ifelse which is vectorized:
ifelse(data$point<= 6.25,-1.33,ifelse(data$point<= 8.25,-0.99,ifelse(data$point<= 11.25,-3.31,.....Until finished.
For instance:
datap=read.table('http://dl.dropbox.com/u/17903768/AVHRR_output.txt',header=T)
apply(datap[,5:9],2,function(x){
datap$x =
ifelse(x<=6.25,1.33,
ifelse(x<=8.75,-0.99,
ifelse(x<=11.25,-3.31,
ifelse(x<=13.75,-2.56,
ifelse(x<=16.25,-1.81,
ifelse(x<=18.75,-2.78,
ifelse(x<=21.25,-3.75,
ifelse(x<=23.75,-1.98,-0.21))))))))})
Andres answer is great for the apply part to get you thru all the "temperature" columns. I'm stuck here without a copy of R (at work) to experiment with, but I suspect if you create a vector of your cutoff values
xcut <- c(0,6.25,8.75,.11.25,...
and just do
x <- xcut[(which(x>xcut))]
you'll have a much simpler bit of code, and easier to edit as well. (note: I added the 0 value to avoid problems with small x values :-) )
here's another way using just logicals:
DAT <- read.table("http://dl.dropbox.com/u/17903768/AVHRR_output.txt",header=TRUE,as.is=TRUE)
recodecolumn <- function(x){
out <- vector(length=length(x))
out[x < 6.25] <- 1.33
out[x >= 6.25 & x < 8.75] <- .99
out[x >= 8.75 & x < 11.25] <- 3.31
out[x >= 11.25 & x < 13.25] <- 2.56
out[x >= 13.25 & x < 16.25] <- 1.81
out[x >= 16.25 & x < 18.75] <- 2.78
out[x >= 18.75 & x < 21.25] <- 3.75
out[x >= 21.25 & x < 23.75] <- 1.98
out[x >= 23.75 & x < 26.25] <- 0.21
out
}
NewCols <- apply(DAT[,5:9],2,recodecolumn)
colnames(NewCols) <- paste("rate",1928:1932,sep="_")
DAT <- cbind(DAT,NewCols)
I find that findInterval is useful in situations like this instead of nested if else statements as it is already vectorized and returns the position within a vector of cutoff points.
DAT <- read.table("http://dl.dropbox.com/u/17903768/AVHRR_output.txt",header=TRUE,as.is=TRUE)
recode.fn <- function(x){
cut.vec <- c(0, seq(6.25,26.25,by = 2.5),Inf)
recode.val <- c(-1.33, 0.99, 3.31, 2.56,1.81,2.78,3.75,1.98, 0.21)
cut.interval <- findInterval(x, cut.vec, FALSE)
return(recode.val[cut.interval])
}
# Add on recoded data to existing data frame
DAT[,10:14] <- sapply(DAT[,5:9],FUN=recode.fn)