Related
I would like to know if there is an R way (one liner) to extract the coordinates of all subsets of a vector that are above a given threshold.
Suppose I have the following data:
v = c(3.48, 2.59, 1.73, 0.91, 0.13, -0.63, -1.34, -2.03, -2.67, -3.28, -3.04, -2.15, -1.20, -0.19, 0.84, 1.86, 2.84, 3.77, 4.60, 5.31, 4.16, 2.87, 1.89, 0.51, 0.23, 0.78, 1.34, 2.63, 1.72, 0.62, 0.98, 1.45)
and let's say I have threshold = 0.7. The desired output would be:
left right
1 4
15 23
26 29
31 32
I can in principle write a while loop or some sort, subsetting v and juggling with left and right coordinates of these regions, something like:
left = which(subset >= threshold)[1] + right
right = which(subset[left:length(subset)] < threshold)[1] - 1 # -1 to get the last element above the threshold
subset = v[(right + 1):length(v)]
(not tested), but I am sure there is an R way that i can't seem to remember.
I had a look here but it's not really what I am after. Any help is appreciated.
You can use rle() to find the runs of values that exceed your threshhold. When you can turn that into your desired format
rle(v>.7) |>
with(
data.frame(start=1, end=cumsum(lengths)) |>
transform(start=c(1, head(end, -1) + 1)) |>
subset(values)
)
And that returns
start end
1 1 4
3 15 23
5 26 29
7 31 32
This is nearly identical to this existing question with the main difference of using rle() on your Boolean condition and then subsetting to only the TRUE values.
Same solution but using data.table
v = c(3.48, 2.59, 1.73, 0.91, 0.13, -0.63, -1.34, -2.03, -2.67, -3.28, -3.04, -2.15, -1.20, -0.19, 0.84, 1.86, 2.84, 3.77, 4.60, 5.31, 4.16, 2.87, 1.89, 0.51, 0.23, 0.78, 1.34, 2.63, 1.72, 0.62, 0.98, 1.45)
data.table(v)[, .(start = .I[1], end = .I[.N], keep = unique(v > 0.7)), by = rleid(v > 0.7)][keep == T, .(start, end)]
# start end
# 1: 1 4
# 2: 15 23
# 3: 26 29
# 4: 31 32
I'm trying to use prophet library to predict y using Group and Regressors. My code and the errors received are below.
In Model1:
I've received this error: Error in setup_dataframe(object, df) :
Regressor "x1" missing from dataframe
In Model2:
Model2 runs. But I'm unable to figure it out how to add regressors x1
and x2.
library(prophet)
library(dplyr)
df <- data.frame(ds = rep(c("2020-01-01", "2020-01-02", "2020-01-03", "2020-01-04", "2020-01-05",
"2020-01-06", "2020-01-07", "2020-01-08", "2020-01-09", "2020-01-10", "2020-01-11", "2020-01-12",
"2020-01-13", "2020-01-14", "2020-01-15"), 2),
group = rep(c("A", "B"), each = 15),
y = c(8.15, 1.74, 2.97, 2.36, 0.94, 1.84, 3.17, 12.51, 0.63, 6.92, 5.51,
7.50, -2.47, 4.38, 6.28, 7.69, 2.89, 3.77, 7.27, -1.19, 4.64, 9.49, 5.43, 0.36, 14.12,
8.77, -3.05, -0.72, 10.99, 10.33),
x1 = c(3.11, 2.16, 0.91, 2.78, 0.06, 1.12, 1.73, 3.95, 1.43, 3.40, 2.37, 1.80, 0.95,
1.66, 3.06, -0.23, 3.11, 3.07, -0.39, 0.13, 4.38, 2.15, 1.61, 1.54, 5.50, 2.21,
0.89, 3.24, 4.27, 2.55),
x2 = c(2.52, -0.21, 1.03, -0.21, 0.44, 0.36 , 0.72, 4.28, -0.40, 1.76, 1.57,
2.85, -1.71, 1.36, 1.61, 3.96, -0.11 , 0.35, 3.83, -0.66, 0.13, 3.67, 1.91, -0.59, 4.31,
3.28, -1.97, -1.98, 3.36, 3.89))
df$ds <- as.Date(df$ds)
# Model 1
Model1 <- function(df) {
m <- prophet(seasonality.mode = 'multiplicative')
m <- add_regressor(m, 'x1')
m <- add_regressor(m, 'x2')
m <- fit.prophet(m, df)
future <- make_future_dataframe(m, periods = 5, freq = 'day')
mod1 <- predict(m, future)
return(mod1)
}
mod1 <-df %>%
group_by(group) %>%
do(Model1(.)) %>%
dplyr::select(ds, group, yhat)
# Model 2
library(prophet)
library(dplyr)
library(purrr)
library(tidyr)
Model2 <- df %>%
nest(-group) %>%
mutate(m = map(data, prophet)) %>%
mutate(future = map(m, make_future_dataframe, period = 5)) %>%
mutate(forecast = map2(m, future, predict))
I cannot really understand the real usage of those types of standard deviation and the information that i get in the qcc manual is sometimes confused. I used the example provided in the qcc manual but i don't know for which one i choose and what is the reason for that choice.
I thank a lot for any support.
`# Water content of antifreeze data (Wetherill and Brown, 1991, p. 120)
x <- c(2.23, 2.53, 2.62, 2.63, 2.58, 2.44, 2.49, 2.34, 2.95, 2.54, 2.60, 2.45,
2.17, 2.58, 2.57, 2.44, 2.38, 2.23, 2.23, 2.54, 2.66, 2.84, 2.81, 2.39,
2.56, 2.70, 3.00, 2.81, 2.77, 2.89, 2.54, 2.98, 2.35, 2.53)
# the Shewhart control chart for one-at-time data
# 1) using MR (default)
qcc(x, type="xbar.one", data.name="Water content (in ppm) of batches of antifreeze")
# 2) using SD
qcc(x, type="xbar.one", std.dev = "SD", data.name="Water content (in ppm) of batches of antifreeze")`
I have some data as below:
# A tibble: 158 x 2
X Y
<dbl> <dbl>
1 -0.71 -2.39
2 0.92 0.573
3 -2.52 -1.61
4 3.88 5.43
5 0.15 0.106
6 3.49 6.66
7 -0.54 0.613
8 1.4 4.21
9 1.16 0.107
10 -3.37 -3.62
# ... with 148 more rows
I plot the observations and draw a line horizontally and vertically at 0 using:
df %>%
ggplot(aes(x = X,
y = Y)) +
geom_point() +
#geom_smooth(method = "lm") +
geom_hline(aes(yintercept = 0)) +
geom_vline(aes(xintercept = 0))
What I would like to do now is count the number of points which are in each quadrant of the plot and just display the number on each of the quadrant.
Data:
df <- structure(list(X = c(-0.71, 0.92, -2.52, 3.88, 0.15, 3.49, -0.54,
1.4, 1.16, -3.37, -0.55, -0.74, 2.13, 1.33, 3.27, 1.74, 0.65,
1.23, -1.75, 0.9, 3.86, 3.69, -1.74, -3.43, 0.67, 3.83, 2.32,
-5.46, -0.55, -6.39, -2.23, -1.3, 4.72, 2.42, -7.9, -1.54, 0.99,
-9.97, -18.41, -7.73, 1.5, -7.5, -9.88, 8.82, 10.48, 6.7, -0.23,
8.15, 3.02, 4.54, -2.76, 5.77, 3.03, -3.63, 3.71, 6.27, 1.92,
-7.86, -5.5, -4.44, 9.47, 3.89, 0.81, 6.83, 1.98, 4.01, 0.43,
2.79, -1.48, -1.87, -5.93, -8.58, 11.56, -0.46, 0.33, 5.27, 4.32,
2.4, -0.64, -6.7, 3.74, 1.01, 2.76, 2.8, -1.63, 0.65, 1.3, 5.33,
0.96, 3.71, 1.27, 2.53, -1.52, 5.69, -2.53, 3.82, 4.09, 2.79,
2.64, -3.42, 4.72, 0.62, 0.25, 1.98, 2.82, -2.06, 4.06, -2.45,
2.03, 2.22, -0.2, -3.47, 6.15, -1.2, 1.11, 1, -1.71, 1.05, -5.93,
-3.35, 7.53, 0.45, -2.45, -5.73, 0.26, 7, 1.12, 1.39, -0.11,
0.43, 0.34, -2.05, 4.54, 1.76, 2.15, 3.26, 0.2, 0.84, 0.93, 0.98,
1.97, 0.07, 2.48, 1.98, 2.88, 1.18, 5.23, -3.95, -2.17, 0.35,
2.51, 0.39, 3.11, 3.09, 0.06, -7.81, 1.62, -9.53), Y = c(-2.38916419707325,
0.572675136581781, -1.61130358515631, 5.42706994951004, 0.105533424368025,
6.65697289481407, 0.613486039256266, 4.21013704773222, 0.106990463992386,
-3.62352710962904, -0.203607589793183, -4.24563967581072, 2.97070300267885,
2.92544516479698, 5.02538739147422, 2.25461465260415, 1.66492554339803,
3.5690423154001, 0.108411247307002, 0.961008630173696, 3.79172784045593,
1.94108347244724, -2.12992072359958, -5.87473482253699, -1.45100684091412,
1.47842234462587, 1.43196010231586, -7.74290369146724, -2.79056547363334,
-5.03532133668577, -1.99400739381075, -2.92320856826413, 3.93394610595585,
3.29451174347621, -10.0410470556235, 3.34517672842812, 2.41625183369762,
-10.3476519710384, -21.791966984666, -11.1142687331988, 3.32761656369176,
-3.96223311815655, -11.093184503697, 11.6694167237026, 22.2461574652919,
9.28255170483023, 4.63817899423635, 11.8553670456421, 8.27889381692159,
8.19911670446593, -6.470817611772, 3.09218109975165, 7.5825172514382,
0.0284717847140023, 4.90864483240255, 10.0311544305095, 8.55401150272708,
-8.84107625063785, -8.04105369987643, -6.65872061590883, 10.8577722872979,
4.03706922467202, 3.04148092466194, 8.90634921641063, 1.56555573277521,
4.42535372370123, 0.841035482771217, 1.75578768128183, -2.67241757153407,
-2.25418139889371, -8.7723458397205, -11.2420616969584, 11.4836809985778,
-1.8649021388476, 0.832085873992507, 11.6062841497052, 2.59039949751966,
2.28509371230735, -1.97715071813135, -7.3280081242774, 3.97121830333205,
-0.569284938256821, 2.31082313266322, 3.02490478503254, -1.38512132143018,
-0.866847983058995, 2.97552563660034, 5.95976111047322, -0.102502393594657,
4.58003409048615, 0.842834319309465, 3.06786040532266, 0.250639945095402,
6.78696057469418, -1.62606880448011, 5.46367912370997, 2.53357559730344,
4.73895950607308, 2.50934817572881, -0.312149263565189, 4.82621271905962,
-0.79009628184665, -3.12115495501355, -0.461711220579862, 4.27359516836912,
-4.60871127364226, 3.84488020178729, -5.26245849925393, 3.54222359765326,
1.04191534953213, 1.4982293818719, -3.56618092951384, 4.95478586278666,
-0.270584959088251, -0.900452947549406, 0.901254072925249, -0.254483190258712,
-2.63217404877559, -4.71624328721887, -7.1747474980974, 4.86036342835152,
3.24549729559669, -4.19219918146311, -10.128570960197, 0.803895306904637,
9.33865112323734, 2.85517888612945, 0.316844258915139, -0.151669189522978,
1.00839469793829, 1.57398998124214, -5.0607247073979, 8.91704977465508,
2.59984205825244, 1.31737969318745, 2.70804837397023, 1.80193676584248,
1.48362026996833, -2.11380109244311, 3.54300752215851, 1.6501194298151,
-1.01504840432201, 6.74326962933175, 0.1866931051541, 2.9825290286452,
1.42593783576641, 2.71110274944611, -4.09572797775837, 1.50144422897237,
-0.552818435076999, 5.23843746771127, 1.33321908169899, 1.28745947800351,
2.60490918566195, -1.54038908822145, -9.6363012621261, -0.190177144865133,
-13.0653210889016)), row.names = c(NA, -158L), class = c("tbl_df",
"tbl", "data.frame"))
library(dplyr)
quad_count <- df %>%
# Count how many with each combination of X and Y being positive
count(right = X > 0, top = Y > 0) %>%
# TRUE = 1, FALSE = 0, so these map the TRUE to +1 and FALSE to -1
mutate(X = 2 * (right - 0.5), Y = 2 * (top - 0.5))
df %>%
ggplot(aes(x = X, y = Y)) +
geom_point() +
geom_hline(aes(yintercept = 0)) +
geom_vline(aes(xintercept = 0)) +
# This layer should use the other dataset, but keep using X and Y for location
geom_text(data = quad_count, aes(label = n), size = 10)
df %>%
ggplot(aes(x = X,
y = Y)) +
geom_point() +
#geom_smooth(method = "lm") +
geom_hline(aes(yintercept = 0)) +
geom_vline(aes(xintercept = 0)) +
geom_text(data = df %>%
mutate(X = X >= 0, Y = Y >= 0) %>%
count(X, Y) %>%
mutate(X = if_else(X, 10, -10),
Y = if_else(Y, 10, -10)),
mapping = aes(X, Y, label = n), size = 10)
I have following file :
file 1
structure(list(Total_Gene_Symbol = c("5S_rRNA", "7SK", "A1BG-AS1"
), Test = c("1.02, 1.12, 1.11, 1.18, 1.12, 1.19, 1.25, 1.24, 1.24, 1.02",
"1.97, 2.27, 2.14, 1.15", "1.3, 1.01, 1.36, 1.42, 1.38, 1.01, 1.31, 1.34,
1.29, 1.34, 2.02, 1.12, 1.01, 1.31, 1.22"
)), .Names = c("Total_Gene_Symbol", "Test"), row.names = c(NA,
3L), class = "data.frame")
file 1 column test is number separated by ",".
I tried
mat <- stri_split_fixed(Down_FC, ',', simplify=T)
mat <- `dim<-`(as.numeric(mat), dim(mat)) # convert to numeric and save dims
rowMeans(mat, na.rm=T)->M
View(M)
but the above code is averaging entire data.
I want output same like below file 2
file 2
structure(list(Total_Gene_Symbol = c("5S_rRNA", "7SK", "A1BG-AS1"
), Test = c("1.02, 1.12, 1.11, 1.18, 1.12, 1.19, 1.25, 1.24, 1.24, 1.02",
"1.97, 2.27, 2.14, 1.15", "1.3, 1.01, 1.36, 1.42, 1.38, 1.01, 1.31, 1.34,
1.29, 1.34, 2.02, 1.12, 1.01, 1.31, 1.22"
), Average = c(11.49, 7.53, 19.44)), .Names = c("Total_Gene_Symbol",
"Test", "Average"), row.names = c(NA, 3L), class = "data.frame")
What you want is the sum not average! The average is something like the mode, median, mean.
library(magrittr)
df1$total_sum<-
df1$Test %>% str_split(.,",\\s+") %>% sapply(function(x) as.numeric(x) %>% sum(na.rm=T))
Using apply
d1$sum <- apply(d1,1,
function(x)(sum(as.numeric(unlist(strsplit(x['Test'],','))),na.rm = TRUE)))
You can use scan :
df$sum <- sapply(df$Test, function(x) sum(scan(text = x, what=numeric(),sep=","), na.rm=TRUE))
df$average <- sapply(df$Test, function(x) mean(scan(text = x, what=numeric(),sep=","), na.rm=TRUE))
# Total_Gene_Symbol Test sum average
# 1 5S_rRNA 1.02, 1.12, 1.11, 1.18, 1.12, 1.19, 1.25, 1.24, 1.24, 1.02 11.49 1.1490
# 2 7SK 1.97, 2.27, 2.14, 1.15 7.53 1.8825
# 3 A1BG-AS1 1.3, 1.01, 1.36, 1.42, 1.38, 1.01, 1.31, 1.34, \n 1.29, 1.34, 2.02, 1.12, 1.01, 1.31, 1.22 19.44 1.2960