Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
r code for an algorithm with fictitious data:
I am working to translate this to MATLAB but struggling with the calculation that's running inside the loop. Any help will be appreciated.
data <- c(-0.39, 0.12, 0.94, 1.67, 1.76, 2.44, 3.72,
4.28, 4.92, 5.53, 0.06, 0.48, 1.01, 1.68, 1.80,
3.25, 4.12, 4.60, 5.28, 6.22)
pi <- 0.546
sigmas1 <- 0.87
sigmas2 <- 0.77
mu1 <- numeric(0)
mu2 <- numeric(0)
r <- numeric(0)
R1 <- matrix (0 ,20 ,100)
mu1[1] <- 4.62
mu2[1] <- 1.06
for(j in 1:100){
for ( i in 1:20){
r [i] <- pi * dnorm (data[i] , mu2[j], sigmas2^(1/2))/((1- pi)*dnorm(data[i],
mu1[j], sigmas1^(1/2))+ pi*dnorm(data[i], mu2[j], sigmas2^(1/2)))
R1[i, j] <- r[i]
}
r
mu1[j+1] <- sum((1-r)*data)/sum(1-r)
mu2[j+1] <- sum(r*data)/sum(r)
Muu1 <- mu1[j+1]
Muu2 <- mu2[j+1]
}
Muu1
Muu2
x11()
layout(matrix(c(1, 2)))
plot(mu1, type="l", main="", xlab="EM Iteration for the Fictitious Data")
plot(mu2, type="l", main="", xlab='EM Iteration for the Fictitious Data')
The MATLAB equivalent of the dnorm function of R is normpdf. The arguments are the same as in R:
normpdf(X,mu,sigma)
With that the for loop can easily be adapted. As the normpdf function allows vectors as inputs, you can dump the inner for loop and use a vectorized approach instead. Always keep in mind, that * and / are the matrix multiplication and division in MATLAB. To get element-wise operators, use .* and ./ instead.
Note that in MATLAB it is better to preallocate all variables. As mu1 and mu2 go from 1 to 100, but in each step you set the value mu[j+1], it will have size 1x101. For rand R1 the size is clear i think.
All together, this would give the following code:
data = [-0.39, 0.12, 0.94, 1.67, 1.76, 2.44, 3.72,...
4.28, 4.92, 5.53, 0.06, 0.48, 1.01, 1.68, 1.80,...
3.25, 4.12, 4.60, 5.28, 6.22];
pi=0.546;
sigmas1 = 0.87;
sigmas2 = 0.77;
mu1 = zeros(1,101);
mu2 = zeros(1,101);
r = zeros(1,20);
R1 = zeros(20,100);
mu1(1) = 4.62;
mu2(1) = 1.06;
for j=1:100
r= pi*normpdf(data,mu2(j),sigmas2^(1/2)) ./ ...
((1-pi)*normpdf(data,mu1(j),sigmas1^(1/2)) + ...
pi*normpdf(data,mu2(j),sigmas2^(1/2)));
R1(:,j) = r;
mu1(j+1) = sum((1-r).*data)/sum(1-r);
mu2(j+1) = sum(r.*data)/sum(r);
end
figure;
subplot(1,2,1);
plot(mu1);
subplot(1,2,2);
plot(mu2);
If this doesn't work correctly for you, or you have any questions on the code, feel free to comment.
Related
I would like to run a recursive regression using my variables residential_ddiff and interest_diff thus testing the stability of the coefficients of my variable interest_diff.
The issue occurs as I want the recursive regression to be run on the 1 lag in the window of [i:(i+10)] observations but keep getting the same error:
Error in merge.zoo(residential_ddiff[i], L(interest_diff, 1)[i:(i + 10)], :
all(sapply(args, function(x) is.zoo(x) || !is.plain(x) || (is.plain(x) && .... is not TRUE
Both time series are n=91 and stored as ts objects. I've tried using both ts and numerical objects in my loop.
I've attached a screenshot of my code. For loop in R
Gratefull for any help, thank you.
I've tried lots of different options. Both trying to coerce using the as.zoo() function as well as defining the current observation of residential_ddiff[i] both same error keep occuring.
Reproducable example:
library(dynlm)
# Creating two datasets
data_1 <- c(3.2705, -1.9710, 1.3821, 1.3194, -0.8008, 0.2832, 3.2705, -1.9710, 1.3821, 1.3194, -0.8008, 0.2832, 3.2705, -1.9710, 1.3821, 1.3194, -0.8008, 0.2832, 3.2705, -1.9710, 1.3821, 1.3194, -0.8008, 0.2832)
data_2 <- c(1.41, 0.33, 0.32, 1.53, -1.55, 0.73, 1.41, 0.33, 0.32, 1.53, -1.55, 0.73, 1.41, 0.33, 0.32, 1.53, -1.55, 0.73, 1.41, 0.33, 0.32, 1.53, -1.55, 0.73)
# Storing data in a dataframe
df <- data.frame(data_1, data_2)
# Making sure the dataframe are numeric
df1 <- mutate_all(df, function(x) as.numeric(as.character(x)))
# Creating variable to store coefficients
estimate.store <- matrix(ncol = 6, nrow = nrow(df1)-3)
# Loop begins
for (i in 1:(nrow(df1-3))) {
# Creating af dynamic linear regression with data 1 and the rekursive values of i:(i+3) for the first lag of data_2. Here the main issue arises, I think.
estimation.store <- dynlm(df1$data_1[i] ~ L(df1$data_2,1)[i:(i+3)], as.zoo(df1))
estimate.store[i,] <- c(estimation.store$coef[1], confint(estimation.store)[1,], estimation.store$coef[2], confint(estimation.store)[1,])
}
I am a Python user, and am fairly new to R. I am trying to pass two subsets of a df to a pearson_def function in order to calculate the correlation between the two. Please find the first 100 rows of the tibble below:
[texas_biz1:100c("countyfips", "merchants_all") <- tibble::tibble(
countyfips = c(48001, 48003, 48005, 48007, 48013, 48015, 48019, 48021, 48025,
48027, 48029, 48031, 48035, 48037, 48039, 48041, 48043, 48049,
48051, 48053, 48055, 48057, 48061, 48063, 48067, 48071, 48073,
48085, 48089, 48091, 48093, 48097, 48099, 48113, 48121, 48123,
48133, 48135, 48139, 48141, 48143, 48145, 48147, 48149, 48157,
48161, 48163, 48165, 48167, 48171, 48181, 48183, 48185, 48187,
48189, 48193, 48199, 48201, 48203, 48209, 48213, 48215, 48217,
48219, 48221, 48223, 48225, 48227, 48231, 48233, 48237, 48239,
48241, 48245, 48249, 48251, 48255, 48257, 48259, 48265, 48273,
48277, 48281, 48285, 48287, 48289, 48291, 48293, 48299, 48303,
48307, 48309, 48321, 48323, 48325, 48329, 48331, 48337, 48339,
48341),
merchants_all = c(-0.214, -0.125, -0.324, -0.614, -0.361, -0.3, -0.0265, -0.289,
-0.398, -0.293, -0.375, -0.52, -0.414, -0.275, -0.368, -0.282,
-0.31, -0.368, -0.261, -0.228, -0.412, -0.217, -0.273, -0.49,
-0.0355, -0.064, -0.245, -0.214, -0.2, -0.16, -0.431, -0.394,
-0.351, -0.22, -0.313, -0.194, -0.566, -0.376, -0.241, -0.258,
-0.255, 0.251, -0.295, -0.276, -0.348, -0.318, -0.26, -0.379,
-0.37, -0.318, -0.278, -0.234, -0.352, -0.34, -0.195, -0.162,
-0.396, -0.362, -0.241, -0.325, -0.172, -0.246, -0.109, -0.312,
-0.26, -0.163, -0.125, -0.405, -0.237, -0.0109, -0.358, -0.533,
-0.34, -0.292, -0.171, -0.227, -0.273, -0.278, -0.344, -0.188,
-0.457, -0.21, -0.374, -0.39, -0.162, -0.371, -0.231, -0.019,
0.00933, -0.291, -0.307, -0.226, -0.408, -0.343, -0.343, -0.306,
0.223, -0.45, -0.242, -0.251)
)
I am trying to subset the df by county, then take the rows from the current county iteration of merchants_all and the rows from the next county iteration of merchants_all and test for correlation between the two times series using pearson's r. Please find what I've tried below:
lst <- list(unique(texas_biz$countyfips))
for(i in 1:length(lst)) {
obj <- tryCatch({
x <- lst[[i]]
a <- texas_biz[which(texas_biz$countyfips %in% x),]
y <- lst[[i+1]]
y <- ifelse(is.null(y), NA, y)
b <- texas_biz[which(texas_biz$countyfips %in% y),]
if (length(a$merchants_all == b$merchants_all)) {
print(pearson_def(a$merchants_all, b$merchants_all)) }
else {
print(length(a$merchants_all), length(b$merchants_all))
}
}, error = function(e) NA)
print(obj)
}
I'm getting just a print out of NA, which I assume occurs as the loop attempts to index outside of the list(unique(texas_biz$countyfips)) list. I've tried to implement a tryCatch but I must be doing something wrong... Any help would be greatly appreciated!
Please find the helper functions being used in the loop below:
average <- function(x) { # x is a numeric vector
mean(x)
}
pearson_def <- function(x, y) {
n = length(x)
# insert a check for equivalence of length between x and y in R
avg_x <- average(x)
avg_y <- average(y)
diffprod <- 0
xdiff2 <- 0
ydiff2 <- 0
for(i in 1:n) {
xdiff <- x[i] - avg_x
ydiff <- y[i] - avg_y
diffprod <- diffprod + (xdiff * ydiff)
xdiff2 = xdiff2 + (xdiff * xdiff)
ydiff2 = ydiff2 + (ydiff * ydiff)
}
diffprod / sqrt(xdiff2 * ydiff2)
}
Thank you!
in R I'm trying to interactively identify bin value in a histogram using the mouse. I think I need something equivalent to the identify() function for scatterplots. But identify() doesn't seem to work for histograms.
Use locator() to find the points, then lookup which interval the value sits in, make sure it is less than the y-value for the bar, then return the count:
set.seed(100)
h <- hist(rnorm(1:100))
# use locator() when doing this for real, i'm going to use a saved set of points
#l <- locator()
l <- list(x = c(-2.22, -1.82, -1.26, -0.79,-0.57, -0.25, 0.18, 0.75,
0.72, 1.26), y = c(1.46, 7.81, 3.79, 9.08, 17.11, 11.61, 15,
17.96, 5.9, 3.37))
# for debugging purposes - the nth value of the output should match where
# the n value is shown on the histogram
text(l, labels=1:10, cex=0.7, font=2)
fi <- findInterval(l$x, h$breaks)
sel <- (l$y > 0) & (l$y < h$counts[fi])
replace(h$counts[fi], !sel, NA)
#[1] 3 NA 9 14 NA 22 20 NA 13 7
I have a vector as follows
vec <- c(0, -0.072, -0.092, -0.092, -0.222, -0.445, -0.345, -0.031,
0.016, 0.158, 0.349, 0.749, 1.182, 1.289, 1.578, 1.767, 1.621,
1.666, 1.892, 1.866, 1.821, 1.702, 1.69, 1.53, 1.38, 1.494, 1.833,
2.392, 2.502, 2.921, 3.363, 3.698, 3.645, 3.89, 3.987, 4.066,
3.963, 3.749, 3.512, 3.259, 3.153, 2.972, 2.918, 2.93, 2.719,
2.458, 2.275, 2.346, 2.588, 2.774, 2.607, 2.336, 1.799, 1.365,
1.025, 0.379, -0.087, -0.765, -1.19, -1.423, -1.751, -1.965,
-1.907, -1.919, -1.848, -1.772, -1.49, -1.19, -1.104, -1.138,
-1.054, -1.139, -1.269, -1.429, -1.56, -1.543, -1.364, -1.318,
-1.094, -1.061, -0.918, -0.861, -0.913, -0.767, -0.615, -0.532,
-0.615, -0.688, -0.75, -0.724, -0.755, -0.685, -0.752, -0.863,
-0.944, -1.004, -1.02, -1.041, -1.073, -1.392)
The following code scales this vector between 1 and -1 perfectly fine.
scale <- function(input)
{
min_val = min(input, na.rm = T)
max_val = max(input, na.rm = T)
average = (min_val + max_val) / 2
range = (max_val - min_val) / 2
normalized_x = (input - average) / range
return(normalized_x)
}
However, I want to scale this vector from -1 to 1 while keeping the midpoint at 0.
Can someone please improve the above function to center this scaling around 0?
Thanks!
Calling this operation "normalization" is rather confusing. The correct term is scaling. Normalization means you have transformed the values to something resembling a Normal distribution. (There is an R function named scale,)
This will scale the values below 0 to the range [-1, 0) and the values above 0 to the range (0,1] which is what I understand to be the desire:
c( -vec[vec<0]/min(vec), vec[vec>=0]/max(vec) )
They are not in the original order, however. If that is desired, you might need to put an ifelse operation in place:
newvec <- ifelse(vec < 0 , -vec[vec<0]/min(vec), vec[vec>=0]/max(vec) )
#-------------
> sum(newvec<0)
[1] 51
> sum(newvec>0)
[1] 47
> sum(newvec==0)
[1] 2
I have a data frame with two columns: (1) datetimes and (2) streamflow values. I would like to create a 3rd column with indicator values to find sudden increases (usually a 0 but it is a 1 when the streamflow shows a big increase).
datetime <- as.POSIXct(c(1557439200, 1557440100, 1557441000, 1557441900,1557442800,
1557443700, 1557444600, 1557445500, 1557446400, 1557447300, 1557448200, 1557449100, 1557450000, 1557450900,
1557451800, 1557452700, 1557453600, 1557454500, 1557455400, 1557456300, 1557457200, 1557458100, 1557459000), origin = "1970-01-01")enter code here
streamflow <- c(0.35, 0.35, 0.36, 0.54, 1.0, 2.7, 8.4, 9.3, 6.2, 3.8, 4.7,
2.91, 2.01, 1.65, 1.41, 1.12, 0.95, 0.62, 0.52, 0.53, 0.53, 0.44, 0.35)
data <- data.table(as.POSIXct(datetime), as.numeric(streamflow))
I am trying to create a function that would identify the datetime of where it jumps from 0.5 to 1 because that is when the event starts. It would then stop indicating it is an event when the streamflow goes below a certain threshold.
My current idea is a function that compares the local slope between two consecutive points in streamflow to a slope of all the values of streamflow within some window, but I don't really know how to write that. Or maybe there is a better idea for how to do what I am trying to do
data = data[, delta := (V2-lag(V2))/lag(V2)][
, ind_jump := delta > 0.5
]
indices <- data[ind_jump==TRUE, V1]
Not related to this, but for some weird reason R gives
(0.54 - 0.36)/0.36 > 0.5
[1] TRUE
while
0.18/0.36 > 0.5
[1] FALSE