Ratio between two datasets with conditions in R - r

I have two data sets. One dataset is dye and it has two columns time and dye and another dataset is sed. Sed dataset has also time and sed. Then I want to find the new variable new from the ratio of the sed to dye. I want to find the ratio such that if either dye or sed is zero then new would be zero else calculate new as sed /dye.
The code I have worked so far is as follows:
dyei94j66 <- read.table("i94 j66 dye time series.Dat",header=T,sep="\t")
names(dyei94j66) <- c("Time","Dye")
head(dyei94j66)
sedi94j66 <- read.table("i94 j66 sediment time series.Dat", header=T,sep="\t")
names(sedi94j66) <- c("Time","Sed")
head(sedi94j66)
newi94j66 <- data.frame(sedi94j66$Sed / dyei94j66$Dye)
head(newi94j66)
# Merge two datasets
totaldata <- merge(dyei94j66,sedi94j66, all=TRUE)
I didn't know how to use the condition to calculate the ratio. May be I need to create a function I am not sure. Right now, when either dye or sed is zero then the value calculated by R is NaN. So, I want to change that NaN to zero while calculating the ratio.
I hope I explained what I am looking for. Thanks.
The data for i94 j66 sediment time series can be found on https://www.dropbox.com/s/1cbo8wul9nb8zeh/i94%20j66%20sediment%20time%20series.Dat
The data for i94 j66 dey time series can be found on https://www.dropbox.com/s/sqm8rip2xjh3tiu/i94%20j66%20dye%20time%20series.Dat

Related

How can I use the for function to stack new rasters?

I am trying to create a new raster from calculating the difference in values between existing rasters. I want to find the difference between all of the existing rasters and one specific raster. Then, I want to stack all of these rasters. I typed out the entire calculations for 60 rasters, but I want to know the faster way using for.
Change<- stack(
AMTs$X1950-AMTs$X1950,
AMTs$X1951-AMTs$X1950,
AMTs$X1952-AMTs$X1950,
AMTs$X1953-AMTs$X1950,
AMTs$X1954-AMTs$X1950,
AMTs$X1955-AMTs$X1950,
AMTs$X1956-AMTs$X1950,
AMTs$X1957-AMTs$X1950,
AMTs$X1958-AMTs$X1950,
AMTs$X1959-AMTs$X1950,
AMTs$X1960-AMTs$X1950,
AMTs$X1961-AMTs$X1950,
AMTs$X1962-AMTs$X1950,
AMTs$X1963-AMTs$X1950,
AMTs$X1964-AMTs$X1950,
AMTs$X1965-AMTs$X1950,
AMTs$X1966-AMTs$X1950,
AMTs$X1967-AMTs$X1950,
AMTs$X1968-AMTs$X1950,
AMTs$X1969-AMTs$X1950,
AMTs$X1970-AMTs$X1950,
AMTs$X1971-AMTs$X1950,
AMTs$X1972-AMTs$X1950,
AMTs$X1973-AMTs$X1950,
AMTs$X1974-AMTs$X1950,
AMTs$X1975-AMTs$X1950,
AMTs$X1976-AMTs$X1950,
AMTs$X1977-AMTs$X1950,
AMTs$X1978-AMTs$X1950,
AMTs$X1979-AMTs$X1950,
AMTs$X1980-AMTs$X1950,
AMTs$X1981-AMTs$X1950,
AMTs$X1982-AMTs$X1950,
AMTs$X1983-AMTs$X1950,
AMTs$X1984-AMTs$X1950,
AMTs$X1985-AMTs$X1950,
AMTs$X1986-AMTs$X1950,
AMTs$X1987-AMTs$X1950,
AMTs$X1988-AMTs$X1950,
AMTs$X1989-AMTs$X1950,
AMTs$X1990-AMTs$X1950,
AMTs$X1991-AMTs$X1950,
AMTs$X1992-AMTs$X1950,
AMTs$X1993-AMTs$X1950,
AMTs$X1994-AMTs$X1950,
AMTs$X1995-AMTs$X1950,
AMTs$X1996-AMTs$X1950,
AMTs$X1997-AMTs$X1950,
AMTs$X1998-AMTs$X1950,
AMTs$X1999-AMTs$X1950,
AMTs$X2000-AMTs$X1950,
AMTs$X2001-AMTs$X1950,
AMTs$X2002-AMTs$X1950,
AMTs$X2003-AMTs$X1950,
AMTs$X2004-AMTs$X1950,
AMTs$X2005-AMTs$X1950,
AMTs$X2006-AMTs$X1950,
AMTs$X2007-AMTs$X1950,
AMTs$X2008-AMTs$X1950,
AMTs$X2009-AMTs$X1950
)
Is this what you're looking for?
create a function that takes the minus of every layer equal to and after 1950 from 1950.
minus<-function(dd, cc) {
return(dd-cc)
}
#now use overlay() from raster to create this new raster object
change <- overlay(AMTs[[1:59]], AMTs$1950, fun=minus)
breakdown:
the x variable from overlay AMTs[[1:59]] is equal to dd from the minus function, and the y variable AMTs$1950 is equal to cc from the minus function.

Averaging different length vectors with same domain range in R

I have a dataset that looks like the one shown in the code.
What I am guaranteed is that the "(var)x" (domain) of the variable is always between 0 and 1. The "(var)y" (co-domain) can vary but is also bounded, but within a larger range.
I am trying to get an average over the "(var)x" but over the different variables.
I would like some kind of selective averaging, not sure how to do this in R.
ax=c(0.11,0.22,0.33,0.44,0.55,0.68,0.89)
ay=c(0.2,0.4,0.5,0.42,0.5,0.43,0.6)
bx=c(0.14,0.23,0.46,0.51,0.78,0.91)
by=c(0.1,0.2,0.52,0.46,0.4,0.41)
qx=c(0.12,0.27,0.36,0.48,0.51,0.76,0.79,0.97)
qy=c(0.03,0.2,0.52,0.4,0.45,0.48,0.61,0.9)
a<-list(ax,ay)
b<-list(bx,by)
q<-list(qx,qy)
What I would like to have something like
avgd_x = c(0.12,0.27,0.36,0.48,0.51,0.76,0.79,0.97)
and
avgd_y would have contents that would
find the value of ay and by at 0.12 and find the mean with ay, by and qy.
Similarly and so forth for all the values in the vector with the largest number of elements.
How can I do this in R ?
P.S: This is a toy dataset, my dataset is spread over files and I am reading them with a custom function, but the raw data is available as shown in the code below.
Edit:
Some clarification:
avgd_y would have the length of the largest vector, for example, in the case above, avgd_y would be (ay'+by'+qy)/3 where ay' and by' would be vectors which have c(ay(qx(i))) and c(by(qx(i))) for i from 1 to length of qx, ay' and by' would have values interpolated at data points of qx

Bourdet Derivative in R with Smoothing Window

I am calculating pressure derivatives using algorithms from this PDF:
Derivative Algorithms
I have been able to implement the "two-points" and "three-consecutive-points" methods relatively easily using dplyr's lag/lead functions to offset the original columns forward and back one row.
The issue with those two methods is that there can be a ton of noise in the high resolution data we use. This is why there is the third method, "three-smoothed-points" which is significantly more difficult to implement. There is a user-defined "window width",W, that is typically between 0 and 0.5. The algorithm chooses point_L and point_R as being the first ones such that ln(deltaP/deltaP_L) > W and ln(deltaP/deltaP_R) > W. Here is what I have so far:
#If necessary install DPLYR
#install.packages("dplyr")
library(dplyr)
#Create initial Data Frame
elapsedTime <- c(0.09583, 0.10833, 0.12083, 0.13333, 0.14583, 0.1680,
0.18383, 0.25583)
deltaP <- c(71.95, 80.68, 88.39, 97.12, 104.24, 108.34, 110.67, 122.29)
df <- data.frame(elapsedTime,deltaP)
#Shift the elapsedTime and deltaP columns forward and back one row
df$lagTime <- lag(df$elapsedTime,1)
df$leadTime <- lead(df$elapsedTime,1)
df$lagP <- lag(df$deltaP,1)
df$leadP <- lead(df$deltaP,1)
#Calculate the 2 and 3 point derivatives using nearest neighbors
df$TwoPtDer <- (df$leadP - df$lagP) / log(df$leadTime/df$lagTime)
df$ThreeConsDer <- ((df$deltaP-df$lagP)/(log(df$elapsedTime/df$lagTime)))*
((log(df$leadTime/df$elapsedTime))/(log(df$leadTime/df$lagTime))) +
((df$leadP-df$deltaP)/(log(df$leadTime/df$elapsedTime)))*
((log(df$elapsedTime/df$lagTime))/(log(df$leadTime/df$lagTime)))
#Calculate the window value for the current 1 row shift
df$lnDeltaT_left <- abs(log(df$elapsedTime/df$lagTime))
df$lnDeltaT_right <- abs(log(df$elapsedTime/df$leadTime))
Resulting Data Table
If you look at the picture linked above, you will see that based on a W of 0.1, only row 2 matches this criteria for both the left and right point. Just FYI, this data set is an extension of the data used in example 2.5 in the referenced PDF.
So, my ultimate question is this:
How can I choose the correct point_L and point_R such that they meet the above criteria? My initial thoughts are some kind of while loop, but being an inexperienced programmer, I am having trouble writing a loop that gets anywhere close to what I am shooting for.
Thank you for any suggestions you may have!

Why does regtol.int() resort my X variable in ascending order?

I'm pretty new at R, so I guess I must be doing something wrong. I have a dataset named "series" with two columns, V1=CP and V2=CU, and I want to perform a linear regression with CU as the independent variable, and then calculate tolerance intervals using regtol.int (in the "tolerance" package).
So what I do is this:
fit=lm(v1~v2,series)
inttol=regtol.int(fit,new.x=NULL,side=2,alpha=0.5,P=0.95)
And when I open or view inttol, it appears sorted by the CU value. It's important that the order doesn't change because both CU and CP are time series.
Any help?

R Accumulate equity data - add time and price

I have some data formatted as below. I have done some analysis on this and would like to be able to plot the price development in the same graph as the analyzed data.
This requires me to have the same x-axes for the data.
So I would like to aggregate the "shares" column in say 150 increments, and add the "finalprice" and "time" to this.
The aggregation should include the latest time and price, so if the aggregation needs to occur over two or more rows of data then the last row should provide the price and time data.
My question is how to create a new vector with 150 shares per row.
The length of the vector will equal sum(shares)/150.
Is there an easy way to do this? Thanks in advance.
Edit:
I thought about expanding the observations using rep(finalprice, shares) and then getting each 150th value of the expanded vector.
Data sample:
"date","ord","shares","finalprice","time","stock"
20120702,E,2000,99.35,540.84753333,500
20120702,E,28000,99.35,540.84753333,500
20120702,E,50,99.5,542.03073333,500
20120702,E,13874,99.5,542.29411667,500
20120702,E,292,99.5,542.30191667,500
20120702,E,784,99.5,542.30193333,500
20120702,E,13300,99.35,543.04805,500
20120702,E,16658,99.35,543.04805,500
20120702,E,42,99.5,543.04805,500
20120702,E,400,99.4,546.17173333,500
20120702,E,100,99.4,547.07,500
20120702,E,2219,99.3,549.47988333,500
20120702,E,781,99.3,549.5238,500
20120702,E,50,99.3,553.4052,500
20120702,E,1500,99.35,559.86275,500
20120702,E,103,99.5,567.56726667,500
20120702,E,1105,99.7,573.93326667,500
20120702,E,4100,99.5,582.2657,500
20120702,E,900,99.5,582.2657,500
20120702,E,1024,99.45,582.43891667,500
20120702,E,8214,99.45,582.43891667,500
20120702,E,10762,99.45,582.43895,500
20120702,E,1250,99.6,586.86446667,500
20120702,E,5000,99.45,594.39061667,500
20120702,E,20000,99.45,594.39061667,500
20120702,E,15000,99.45,594.39061667,500
20120702,E,4000,99.45,601.34491667,500
20120702,E,8700,99.45,603.53608333,500
20120702,E,3290,99.6,609.23213333,500
I think I got it solved.
expand <- rep(finalprice, shares)
Increment <- expand[seq(from = 1, to = length(expand), by = 150)]

Resources