My question is pretty simple but I'm a new R user...
So I have a function which took an argument. I want to put the results in a vector with a specific name for each call to the function.
My function
`Window_length<-function(x,y) {
first_interval<-length(which(x <= -1.5))
second_interval<-length(which(x <= -1 & x > -1.5 ))
third_interval<-length(which(x <= -0.5 & x > -1 ))
fourth_interval<-length(which(x <= 0 & x > -0.5 ))
fifth_interval<-length(which(x <= 0.5 & x > 0 ))
sixth_interval<-length(which(x <= 1 & x > 0.5 ))
seventh_interval<-length(which(x <= 1.5 & x > 1 ))
eighth_interval<-length(which(x <= 2 & x > 1.5 ))
ninth_interval<-length(which(x > 2 ))
y <<- c(
rep("1",first_interval),
rep("2",second_interval),
rep("3",third_interval),
rep("4",fourth_interval),
rep("5",fifth_interval),
rep("6",sixth_interval),
rep("7",seventh_interval),
rep("8",eighth_interval),
rep("9",ninth_interval))}`
So when I call Window_length, I want to put the results into a given variable for example :
`Window_length(data,output_result)`
In output_result I expect to have the "y" values.
Also I'm sure that my code is not perfect at all. If someone can help me to optimized a little bit my code it's will be nice.
I'm trying to make all of this because I need to make a plot with ggplot of data. My value are between -4 and +3. And I want to create a plot with specific window ( <-1.5 / -1.5:-1 / -1:-0.5 / -0.5:0 / 0:1 / 1:1.5 / 1.5:2 / >2 )
My data :
data<- c(-3.7865964 -3.7865964 -3.1975372 -3.1975372 -3.169925 -3.1292830 -3.1292830 -2.6629650 -2.4739312 -2.4739312 -2.3536370 -2.3536370 -2.2446224 -2.2446224 -2.0000000 -1.8744691 -1.8744691 -1.7705182 -1.7655347 -1.7655347 -1.7472339 -1.7472339 -1.7062688 -1.7036070........... 1.8744691 1.8744691 2.0000000 2.2446224 2.2446224 2.3536370)
length(data)=21685
To_Be_Plot = data.frame(data,y)
fig1<-ggplot(To_Be_Plot, aes(x=y, y=data))+geom_boxplot()
expected results :
Thanks everyone
One solution, if I understood correctly the issue, would be to use the function cut:
x <- seq(-2.9, 3, l=5000)
FC <- sin(x*pi) + x^2/10 + 0.1*rnorm(5000)
dat <- data.frame(x, FC)
dat$windows <- cut(dat$x, breaks = seq(-3, 3, by=1))
ggplot(data=dat, aes(x, FC, color=windows)) +
geom_boxplot() + theme_bw()
The resulting command plot boxplots to display the windows.
Related
I'm using the stats::filter function in R in order to understand ARIMA simulations in R (as in the function stats::arima.sim) and estiamtion. I know that stats::filter applies a linear filter to a vector or time series, but I'm not sure how to "unfilter" my series.
Consider the following example: I want to use a recursive filter with value 0.7 to my series x = 1:5 (which is essentially generating an AR(1) with phi=0.7). I can do so by:
x <- 1:5
ar <-0.7
filt <- filter(x, ar, method="recursive")
filt
Time Series:
Start = 1
End = 5
Frequency = 1
[1] 1.0000 2.7000 4.8900 7.4230 10.1961
Which returns me essentially c(y1,y2,y3,y4,y5) where:
y1 <- x[1]
y2 <- x[2] + ar*y1
y3 <- x[3] + ar*y2
y4 <- x[4] + ar*y3
y5 <- x[5] + ar*y4
Now imagine I have the y = c(y1,y2,y3,y4,y5) series. How can I use the filter function to return me the original series x = 1:5?
I can write a code to do it like:
unfilt <- rep(NA, 5)
unfilt[1] <- filt[1]
for(i in 2:5){
unfilt[i] <- filt[i] - ar*filt[i-1]
}
unfilt
[1] 1 2 3 4 5
But I do want to use the filter function to do so, instead of writing my own function. How can I do so? I tried stats::filter(filt, -ar, method="recursive"), which returns me [1] 1.0000 2.0000 3.4900 4.9800 6.7101 not what I desire.
stats::filter used with the recursive option is a particular case of an ARMA filter.
a[1]*y[n] + a[2]*y[n-1] + … + a[n]*y[1] = b[1]*x[n] + b[2]*x[m-1] + … + b[m]*x[1]
You could implement this filter with the signal package which allows more options than stat::filter :
a = c(1,-ar)
b = 1
filt_Arma <- signal::filter(signal::Arma(b = b, a = a),x)
filt_Arma
# Time Series:
# Start = 1
# End = 5
# Frequency = 1
# [1] 1.0000 2.7000 4.8900 7.4230 10.1961
identical(filt,filt_Arma)
# [1] TRUE
Reverting an ARMA filter can be done by switching b and a, provided that the inverse filter stays stable (which is the case here):
signal::filter(signal::Arma(b = a, a = b),filt)
# Time Series:
# Start = 2
# End = 6
# Frequency = 1
# [1] 1 2 3 4 5
This corresponds to switching numerator and denominator in the z-transform:
Y(z) = a(z)/b(z) X(z)
X(z) = b(z)/a(z) Y(z)
I have the following model I want to calculate:
where
My data is the following:
D0= 100
variance = 3
T = 20
uf = 0.3
ue = 0.7
CRRA = 0.1
theta = 0.7
Q = 1
X1=1
# DIVIDENDS:
epsilon <- c(0,0, 6, rep(0,18))
DD_t <- D0
for (t in 2:(T+1)) {
DD_t[t] <- epsilon[t]+DD_t[t-1]
}
# PRICE in t0 t1 t2
PP_t<- c(rep(0,21))
PP_t[1] <- DD_t[1] - 0.1*(3^2)*(18 + (1/0.3))
PP_t[2] <- DD_t[2] + (ue/uf)*theta^(1-1) - CRRA*(variance^2)*Q*(T-1-1 + (1/uf))
PP_t[3] <- DD_t[3] + (ue/uf)*theta^(2-1) - CRRA*(variance^2)*Q*(T-2-1 + (1/uf))
Up until here, everything is correct. Now, I want to calculate P_3 to P_20 and this is were the following code yields the wrong solution unfortunately.
# PRICE in t3 t4 to t20
for (t in 3:(T-1)) {
t1 <- (t + 1)
XX_t <- 0
for (k in 1:(t-1)) {
XX_t <- XX_t + (theta^(k-1))*(PP_t[t1-k]-PP_t[t1-k-1])
}
XX_t <- (1-theta)*XX_t + theta^(t-1)*1
PP_t[t1] <- DD_t[t1] + XX_t*(ue/uf) - CRRA*(variance^2)*Q*(T - t - 1 + (1/uf))
}
What this gives me is:
PP_t
[1] 80.80000 83.13333 89.33333 95.22667 98.26400 98.66093 97.36051 95.53206 94.15252 93.77109 94.47277 95.99211 97.89801 99.77274 101.33323 102.47560 103.25258
[18] 103.81069 104.31816 104.90794 0.00000
However, the correct solution looks like this:
[1] 80.80000 83.13333 89.33333 94.08333 96.66333 97.22033 96.46413 95.28555 94.43488 94.33895 95.07011 96.43089 98.09417 99.74021 101.15038 102.24238 103.05407
[18] 103.69603 104.29659 104.95801 105.73360
I cannot find the mistake in my code unfortunately. It must be slight one, since the values are not that far off.
I'm new to R and programming in general, and I'm struggling with a for-loop for building the lx function in a life table.
I have the age function x, the death function qx (the probability that someone aged exactly x will die before reaching age x+1), and the surviving function px = 1 - qx.
I want to write a function that returns a vector with all the lx values from first to last age in my table. The function is simple...
I've defined cohort = 1000000. The first age in my table is x = 5, so, considering x = 5...
l_(x) = cohort
And, from now on, l_(x+n) = l_(x+n-1)*p_(x+n-1)
I've searched about for-loops, and I can only get my code working for lx[1] and lx[2], and I get nothing for lx[n] if n > 2.
I wrote that function:
living_x <- function(px, cohort){
result <- vector("double", length(px))
l_x <- vector("double", length(px))
for (i in 1:length(px)){
if (i == 1){
l_x[i] = cohort
}
else l_x[i] = l_x[i-1]*px[i-1]
result[i] = l_x
print(result)
}
}
When I run it, I get several outputs (more than length(px)) and "There were 50 or more warnings (use warnings() to see the first 50)".
When I run warnings(), I get "In result[i] <- l_x : number of items to replace is not a multiple of replacement length" for every number.
Also, everything I try besides it give me different errors or only calculate lx for lx[1] and lx[2]. I know there's something really wrong with my code, but I still couldn't identify it. I'd be glad if someone could give me a hint to find out what to change.
Thank you!
Here's an approach using dplyr from the tidyverse packages, to use px to calculate lx. This can be done similarly in "Base R" using excerpt$lx = 100000 * cumprod(1 - lag(excerpt$qx)).
lx is provided in the babynames package, so we can check our work:
library(tidyverse)
library(babynames)
# Get excerpt with age, qx, and lx.
excerpt <- lifetables %>%
filter(year == 2010, sex == "F") %>%
select(x, qx_given = qx, lx_given = lx)
excerpt
# A tibble: 120 x 3
x qx_given lx_given
<dbl> <dbl> <dbl>
1 0 0.00495 100000
2 1 0.00035 99505
3 2 0.00022 99471
4 3 0.00016 99449
5 4 0.00012 99433
6 5 0.00011 99421
7 6 0.00011 99410
8 7 0.0001 99399
9 8 0.0001 99389
10 9 0.00009 99379
# ... with 110 more rows
Using that data to estimate lx_calc:
est_lx <- excerpt %>%
mutate(px = 1 - qx_given,
cuml_px = cumprod(lag(px, default = 1)),
lx_calc = cuml_px * 100000)
And finally, comparing visually the given lx with the one calculated based on px. They match exactly.
est_lx %>%
gather(version, val, c(lx_given, lx_calc)) %>%
ggplot(aes(x, val, color = version)) + geom_line()
I could do it in a very simple way after thinking for some minutes more.
lx = c()
for (i in 2:length(px)){
lx[1] = 10**6
lx[i] = lx[i-1]*px[i-1]
}
I have a data frame of 222 observations and 2 variables: landslide_z_prediction and y (occurrences)
landslide_z_prediction takes values from 0 to 1 while ytest takes on integers 0 or 1.
my task is to find out how many of the predicted positives/negatives were predicted correctly.
if z < 0.5 --> predicted negative
if z > 0.5 --> predicted positive
if y = 0 --> observed negative
if y = 1 --> observed positive
The scenarios are:
a) if z < 0.5 and y = 0 --> prediction is correct
b) if z <0.5 and y = 1 --> prediction is wrong
c) if z > 0.5 and y = 1 --> prediction is correct
d) if z > 0.5 and y = 0 --> prediction is wrong
I have placed my 222 observations in a data.frame format
combined_predicitons <- data.frame(landslide_z_predicted, ytest)
How am I able to extract out the number of occurrences of each scenarios?
Hy,
I found one solution for your problem with the dplyr package. Here is the code:
library(dplyr)
# generate sample data
df <- data.frame(landslide_z_predicted=runif(75), y=sample(c(0, 1), 75, replace=T))
# add is_correct and case variables to the data frame
df <- df %>%
mutate(is_correct = case_when((landslide_z_predicted < 0.5) & !y ~ TRUE,
(landslide_z_predicted >= 0.5) & y ~ TRUE,
TRUE ~ FALSE)) %>%
mutate(case = case_when((landslide_z_predicted < 0.5) & !y & is_correct ~ "case_01",
(landslide_z_predicted < 0.5) & y & !is_correct ~ "case_02",
(landslide_z_predicted >= 0.5) & y & is_correct ~ "case_03",
(landslide_z_predicted >= 0.5) & !y & !is_correct ~ "case_04"))
# count the occurrences of the cases
df %>% select(case) %>% group_by(case) %>% summarize(count=n())
First I generate a variable called is_correct. This is True/False if the prediction was correct or not. In the second mutate I list all your cases from your question and name them "case_01", "case_02" ,... With that in hand I can group the data frame by the cases and count the occurrences.
If I understand correctly, you want a make a confusion matrix.
In order to make it I can suggest you this:
1 - Change z values into 1 or 0 according with your threshold:
Since your threshold is at 0.5 you could use round.
combined_predicitons$landslide_z_predicted_dicotomy = round(combined_predicitons$landslide_z_predicted)
Otherwise, use ifelse, if the value predicted is over the threshold, it will output a value of 1, 0 otherwise.
threshold = 0.5
combined_predicitons$landslide_z_predicted_dicotomy = ifelse(combined_predicitons$landslide_z_predicted >= threshold, 1, 0)
2 - Create the table
table(combined_predicitons$y, combined_predicitons$landslide_z_predicted_dicotomy)
With this, you'll have the number of ocurrences of each scenario.
I have the following data, which shows the values for 5 different cohorts of patients (3 patients in each cohort):
dat <- data.frame(Cohort=c(1,1,1, 2,2,2, 3,3,3, 4,4,4, 5,5,5),
LEN_Dose=c(15,15,15, 25,25,25, 15,15,15, 10,10,10, 10,10,10),
DLT=c("N","N","N", "Y","Y","N", "Y","N","Y", "N","N","Y", "N","N","Y"))
I would like to modify the cohort levels to be +/- 0.2 of the main cohort number so they don't sit on top of one another in a graph. I can achive what I want like this:
dat$Cohort <- dat$Cohort-0.2
dat$Cohort <- ifelse(duplicated(dat$Cohort), dat$Cohort+0.2, dat$Cohort)
dat$Cohort <- ifelse(duplicated(dat$Cohort), dat$Cohort+0.2, dat$Cohort) # have to run this twice as there are 3 patients
So the result is:
head(dat)
# Cohort LEN_Dose DLT
# 0.8 15 N
# 1.0 15 N
# 1.2 15 N
# 1.8 25 Y
# 2.0 25 Y
# 2.2 25 N
But I'm wondering if there's a better way to do this? Eg somehow inputting the base cohort level and some function automatically works out the 3 values I need?
The point is to eventually graph the data using this graph:
ggplot(aes(x=Cohort, y=as.numeric(LEN_Dose)), data = dat) +
ylab("Dose Level\n") +
xlab("\nCohort") +
ggtitle("\n") +
scale_y_continuous(breaks = c(5, 10, 15, 25),
label = c("1.2mg/kg\n5mg", "1.2mg/kg\n10mg", "1.8mg/kg\n15mg", "1.8mg/kg\n25mg")) +
scale_fill_manual(values = c("white", "darkred"),
name="Had DLT") +
geom_line(colour="grey20", size=1) +
geom_point(shape=23, size=6, aes(fill=DLT), stroke=1.1, colour="grey20") + # 21 for circles
theme_classic() +
theme(legend.box.margin=margin(c(0,0,0,-10))) +
expand_limits(y=c(5,25))
EDIT: I have tried position = position_jitter, position = position_dodge and all the other types of positions within ggplot itself, but they don't space the points equally or in any particular order, which is why I'm trying to modify the dataframe itself
How about writing your jitter function, something like:
jitterit<- function(xTojitter= dat$Cohort, howMuchjitter=0.2){
x<-xTojitter
uni<-unique(x)
for (i in 1:length(uni)) {
if (is.na(uni[i])) {
x[is.na(x)]<-NA
} else if (sum(x==uni[i], na.rm = T) %%2 ==1) {
if(sum(x==uni[i], na.rm = T)==1){x[x==uni[i] & !is.na(x)][middle] <- uni[i]
} else {
middle<-ceiling (sum(x==uni[i], na.rm = T)/2)
x[x==uni[i] & !is.na(x)][1:(middle-1)] <- uni[i] - howMuchjitter
x[x==uni[i] & !is.na(x)][(middle+1):sum(x==uni[i], na.rm = T) ]<- uni[i] + howMuchjitter
x[x==uni[i] & !is.na(x)][middle] <- uni[i]
}} else if (sum(x==uni[i], na.rm = T) %%2 ==0) {
x[x==uni[i] & !is.na(x)]<- rep(c(uni[i] - howMuchjitter,uni[i] + howMuchjitter), each= sum(x==uni[i],na.rm = T)/2)
}
}
return(x)
}
It will work for all kind of duplicated data (even or odd number of duplication)
jitterit(xTojitter = c(1,1,2,1,2,NA), howMuchjitter=0.2)
[1] 0.8 1.0 1.8 1.2 2.2 NA