How can I take log of particular columns?

How can I take log of particular columns? - r

I have this data
structure(list(DATE = 19620101:19620106, PRECIP = c(10.54, 6.39,
0.01, 0, 0.02, 20.94), OBS_Q = c(2.39, 2.38, 2.22, 2.24, 2.26,
5.13), swb = c(4.11, 3.92, 3.8, 3.8, 3.77, 7.16), gr4j = c(3.7,
4.24, 3.73, 3.24, 2.98, 4.93), isba = c(4.82, 3.44, 4.18, 3.01,
2.88, 6.35), noah = c(3.11, 2.84, 2.57, 2.59, 2.84, 4.83), sac = c(2.83,
2.84, 2.73, 2.87, 2.94, 6), swap = c(2.16, 2.56, 2.08, 2.19,
3.65, 4.43), vic.mm.day. = c(3.44, 3.14, 3.37, 3.15, 2.88, 4.52
)), .Names = c("DATE", "PRECIP", "OBS_Q", "swb", "gr4j", "isba",
"noah", "sac", "swap", "vic.mm.day."), row.names = c(NA, 6L), class = "data.frame")
And a function that does three things
Subsets based on what 'model' is mentioned(one of the column names of df)
Outputs dfcal with 3 columns
Outputs dfcallog with the log of those values
The function
dataprep.allcal<-function(df,model){
dfcal<<-subset(df, select=c("DATE",model, "OBS_Q"))
dfcallog<-subset(df, select=c("DATE",model, "OBS_Q"))
cols<-colnames(dfcallog)
dfcallog[cols] <<- log(dfcallog[cols])
}
dataprep.allcal(df=df,model='sac')
The problem with this, it converts the date to log as well.
Can you tell me how I can apply the log only on model and OBS_Q column?

This is a way to do it by excluding DATE from the cols variable:
dataprep.allcal<-function(df,model){
dfcal<<-subset(df, select=c("DATE",model, "OBS_Q"))
#need to use the <<- operator below
dfcallog<<-subset(df, select=c("DATE",model, "OBS_Q"))
cols<-colnames(dfcallog)
cols<-cols[!cols %in% 'DATE']
dfcallog[cols] <<- log(dfcallog[cols])
}
dataprep.allcal(df=df,model='sac')
Output:
> dfcal
DATE sac OBS_Q
1 19620101 2.83 2.39
2 19620102 2.84 2.38
3 19620103 2.73 2.22
4 19620104 2.87 2.24
5 19620105 2.94 2.26
6 19620106 6.00 5.13
> dfcallog
DATE sac OBS_Q
1 19620101 1.040277 0.8712934
2 19620102 1.043804 0.8671005
3 19620103 1.004302 0.7975072
4 19620104 1.054312 0.8064759
5 19620105 1.078410 0.8153648
6 19620106 1.791759 1.6351057
Note:
I don't know if this is a typo but you should have dfcallog<-subset(df, select=c("DATE",model, "OBS_Q")) with a <<- operator because using a <- operator at the beginning and then a <<- operator at the last command in the function will produce an error because <<- looks at the parent.environment to find a variable and thus you would get an object not found error.

Related

Extract all subsets in vector where elements are above a given threshold

I would like to know if there is an R way (one liner) to extract the coordinates of all subsets of a vector that are above a given threshold.
Suppose I have the following data:
v = c(3.48, 2.59, 1.73, 0.91, 0.13, -0.63, -1.34, -2.03, -2.67, -3.28, -3.04, -2.15, -1.20, -0.19, 0.84, 1.86, 2.84, 3.77, 4.60, 5.31, 4.16, 2.87, 1.89, 0.51, 0.23, 0.78, 1.34, 2.63, 1.72, 0.62, 0.98, 1.45)
and let's say I have threshold = 0.7. The desired output would be:
left right
1 4
15 23
26 29
31 32
I can in principle write a while loop or some sort, subsetting v and juggling with left and right coordinates of these regions, something like:
left = which(subset >= threshold)[1] + right
right = which(subset[left:length(subset)] < threshold)[1] - 1 # -1 to get the last element above the threshold
subset = v[(right + 1):length(v)]
(not tested), but I am sure there is an R way that i can't seem to remember.
I had a look here but it's not really what I am after. Any help is appreciated.

You can use rle() to find the runs of values that exceed your threshhold. When you can turn that into your desired format
rle(v>.7) |>
with(
data.frame(start=1, end=cumsum(lengths)) |>
transform(start=c(1, head(end, -1) + 1)) |>
subset(values)
)
And that returns
start end
1 1 4
3 15 23
5 26 29
7 31 32
This is nearly identical to this existing question with the main difference of using rle() on your Boolean condition and then subsetting to only the TRUE values.

Same solution but using data.table
v = c(3.48, 2.59, 1.73, 0.91, 0.13, -0.63, -1.34, -2.03, -2.67, -3.28, -3.04, -2.15, -1.20, -0.19, 0.84, 1.86, 2.84, 3.77, 4.60, 5.31, 4.16, 2.87, 1.89, 0.51, 0.23, 0.78, 1.34, 2.63, 1.72, 0.62, 0.98, 1.45)
data.table(v)[, .(start = .I[1], end = .I[.N], keep = unique(v > 0.7)), by = rleid(v > 0.7)][keep == T, .(start, end)]
# start end
# 1: 1 4
# 2: 15 23
# 3: 26 29
# 4: 31 32

Forecasting Using Group and Regressors in Prophet

I'm trying to use prophet library to predict y using Group and Regressors. My code and the errors received are below.
In Model1:
I've received this error: Error in setup_dataframe(object, df) :
Regressor "x1" missing from dataframe
In Model2:
Model2 runs. But I'm unable to figure it out how to add regressors x1
and x2.
library(prophet)
library(dplyr)
df <- data.frame(ds = rep(c("2020-01-01", "2020-01-02", "2020-01-03", "2020-01-04", "2020-01-05",
"2020-01-06", "2020-01-07", "2020-01-08", "2020-01-09", "2020-01-10", "2020-01-11", "2020-01-12",
"2020-01-13", "2020-01-14", "2020-01-15"), 2),
group = rep(c("A", "B"), each = 15),
y = c(8.15, 1.74, 2.97, 2.36, 0.94, 1.84, 3.17, 12.51, 0.63, 6.92, 5.51,
7.50, -2.47, 4.38, 6.28, 7.69, 2.89, 3.77, 7.27, -1.19, 4.64, 9.49, 5.43, 0.36, 14.12,
8.77, -3.05, -0.72, 10.99, 10.33),
x1 = c(3.11, 2.16, 0.91, 2.78, 0.06, 1.12, 1.73, 3.95, 1.43, 3.40, 2.37, 1.80, 0.95,
1.66, 3.06, -0.23, 3.11, 3.07, -0.39, 0.13, 4.38, 2.15, 1.61, 1.54, 5.50, 2.21,
0.89, 3.24, 4.27, 2.55),
x2 = c(2.52, -0.21, 1.03, -0.21, 0.44, 0.36 , 0.72, 4.28, -0.40, 1.76, 1.57,
2.85, -1.71, 1.36, 1.61, 3.96, -0.11 , 0.35, 3.83, -0.66, 0.13, 3.67, 1.91, -0.59, 4.31,
3.28, -1.97, -1.98, 3.36, 3.89))
df$ds <- as.Date(df$ds)
# Model 1
Model1 <- function(df) {
m <- prophet(seasonality.mode = 'multiplicative')
m <- add_regressor(m, 'x1')
m <- add_regressor(m, 'x2')
m <- fit.prophet(m, df)
future <- make_future_dataframe(m, periods = 5, freq = 'day')
mod1 <- predict(m, future)
return(mod1)
}
mod1 <-df %>%
group_by(group) %>%
do(Model1(.)) %>%
dplyr::select(ds, group, yhat)
# Model 2
library(prophet)
library(dplyr)
library(purrr)
library(tidyr)
Model2 <- df %>%
nest(-group) %>%
mutate(m = map(data, prophet)) %>%
mutate(future = map(m, make_future_dataframe, period = 5)) %>%
mutate(forecast = map2(m, future, predict))

In which cases should i use the standard deviatiation "MR" and "SD" in a x.one.bar chart

I cannot really understand the real usage of those types of standard deviation and the information that i get in the qcc manual is sometimes confused. I used the example provided in the qcc manual but i don't know for which one i choose and what is the reason for that choice.
I thank a lot for any support.
`# Water content of antifreeze data (Wetherill and Brown, 1991, p. 120)
x <- c(2.23, 2.53, 2.62, 2.63, 2.58, 2.44, 2.49, 2.34, 2.95, 2.54, 2.60, 2.45,
2.17, 2.58, 2.57, 2.44, 2.38, 2.23, 2.23, 2.54, 2.66, 2.84, 2.81, 2.39,
2.56, 2.70, 3.00, 2.81, 2.77, 2.89, 2.54, 2.98, 2.35, 2.53)
# the Shewhart control chart for one-at-time data
# 1) using MR (default)
qcc(x, type="xbar.one", data.name="Water content (in ppm) of batches of antifreeze")
# 2) using SD
qcc(x, type="xbar.one", std.dev = "SD", data.name="Water content (in ppm) of batches of antifreeze")`

ggplot counting observations in each quadrant

I have some data as below:
# A tibble: 158 x 2
X Y
<dbl> <dbl>
1 -0.71 -2.39
2 0.92 0.573
3 -2.52 -1.61
4 3.88 5.43
5 0.15 0.106
6 3.49 6.66
7 -0.54 0.613
8 1.4 4.21
9 1.16 0.107
10 -3.37 -3.62
# ... with 148 more rows
I plot the observations and draw a line horizontally and vertically at 0 using:
df %>%
ggplot(aes(x = X,
y = Y)) +
geom_point() +
#geom_smooth(method = "lm") +
geom_hline(aes(yintercept = 0)) +
geom_vline(aes(xintercept = 0))
What I would like to do now is count the number of points which are in each quadrant of the plot and just display the number on each of the quadrant.
Data:
df <- structure(list(X = c(-0.71, 0.92, -2.52, 3.88, 0.15, 3.49, -0.54,
1.4, 1.16, -3.37, -0.55, -0.74, 2.13, 1.33, 3.27, 1.74, 0.65,
1.23, -1.75, 0.9, 3.86, 3.69, -1.74, -3.43, 0.67, 3.83, 2.32,
-5.46, -0.55, -6.39, -2.23, -1.3, 4.72, 2.42, -7.9, -1.54, 0.99,
-9.97, -18.41, -7.73, 1.5, -7.5, -9.88, 8.82, 10.48, 6.7, -0.23,
8.15, 3.02, 4.54, -2.76, 5.77, 3.03, -3.63, 3.71, 6.27, 1.92,
-7.86, -5.5, -4.44, 9.47, 3.89, 0.81, 6.83, 1.98, 4.01, 0.43,
2.79, -1.48, -1.87, -5.93, -8.58, 11.56, -0.46, 0.33, 5.27, 4.32,
2.4, -0.64, -6.7, 3.74, 1.01, 2.76, 2.8, -1.63, 0.65, 1.3, 5.33,
0.96, 3.71, 1.27, 2.53, -1.52, 5.69, -2.53, 3.82, 4.09, 2.79,
2.64, -3.42, 4.72, 0.62, 0.25, 1.98, 2.82, -2.06, 4.06, -2.45,
2.03, 2.22, -0.2, -3.47, 6.15, -1.2, 1.11, 1, -1.71, 1.05, -5.93,
-3.35, 7.53, 0.45, -2.45, -5.73, 0.26, 7, 1.12, 1.39, -0.11,
0.43, 0.34, -2.05, 4.54, 1.76, 2.15, 3.26, 0.2, 0.84, 0.93, 0.98,
1.97, 0.07, 2.48, 1.98, 2.88, 1.18, 5.23, -3.95, -2.17, 0.35,
2.51, 0.39, 3.11, 3.09, 0.06, -7.81, 1.62, -9.53), Y = c(-2.38916419707325,
0.572675136581781, -1.61130358515631, 5.42706994951004, 0.105533424368025,
6.65697289481407, 0.613486039256266, 4.21013704773222, 0.106990463992386,
-3.62352710962904, -0.203607589793183, -4.24563967581072, 2.97070300267885,
2.92544516479698, 5.02538739147422, 2.25461465260415, 1.66492554339803,
3.5690423154001, 0.108411247307002, 0.961008630173696, 3.79172784045593,
1.94108347244724, -2.12992072359958, -5.87473482253699, -1.45100684091412,
1.47842234462587, 1.43196010231586, -7.74290369146724, -2.79056547363334,
-5.03532133668577, -1.99400739381075, -2.92320856826413, 3.93394610595585,
3.29451174347621, -10.0410470556235, 3.34517672842812, 2.41625183369762,
-10.3476519710384, -21.791966984666, -11.1142687331988, 3.32761656369176,
-3.96223311815655, -11.093184503697, 11.6694167237026, 22.2461574652919,
9.28255170483023, 4.63817899423635, 11.8553670456421, 8.27889381692159,
8.19911670446593, -6.470817611772, 3.09218109975165, 7.5825172514382,
0.0284717847140023, 4.90864483240255, 10.0311544305095, 8.55401150272708,
-8.84107625063785, -8.04105369987643, -6.65872061590883, 10.8577722872979,
4.03706922467202, 3.04148092466194, 8.90634921641063, 1.56555573277521,
4.42535372370123, 0.841035482771217, 1.75578768128183, -2.67241757153407,
-2.25418139889371, -8.7723458397205, -11.2420616969584, 11.4836809985778,
-1.8649021388476, 0.832085873992507, 11.6062841497052, 2.59039949751966,
2.28509371230735, -1.97715071813135, -7.3280081242774, 3.97121830333205,
-0.569284938256821, 2.31082313266322, 3.02490478503254, -1.38512132143018,
-0.866847983058995, 2.97552563660034, 5.95976111047322, -0.102502393594657,
4.58003409048615, 0.842834319309465, 3.06786040532266, 0.250639945095402,
6.78696057469418, -1.62606880448011, 5.46367912370997, 2.53357559730344,
4.73895950607308, 2.50934817572881, -0.312149263565189, 4.82621271905962,
-0.79009628184665, -3.12115495501355, -0.461711220579862, 4.27359516836912,
-4.60871127364226, 3.84488020178729, -5.26245849925393, 3.54222359765326,
1.04191534953213, 1.4982293818719, -3.56618092951384, 4.95478586278666,
-0.270584959088251, -0.900452947549406, 0.901254072925249, -0.254483190258712,
-2.63217404877559, -4.71624328721887, -7.1747474980974, 4.86036342835152,
3.24549729559669, -4.19219918146311, -10.128570960197, 0.803895306904637,
9.33865112323734, 2.85517888612945, 0.316844258915139, -0.151669189522978,
1.00839469793829, 1.57398998124214, -5.0607247073979, 8.91704977465508,
2.59984205825244, 1.31737969318745, 2.70804837397023, 1.80193676584248,
1.48362026996833, -2.11380109244311, 3.54300752215851, 1.6501194298151,
-1.01504840432201, 6.74326962933175, 0.1866931051541, 2.9825290286452,
1.42593783576641, 2.71110274944611, -4.09572797775837, 1.50144422897237,
-0.552818435076999, 5.23843746771127, 1.33321908169899, 1.28745947800351,
2.60490918566195, -1.54038908822145, -9.6363012621261, -0.190177144865133,
-13.0653210889016)), row.names = c(NA, -158L), class = c("tbl_df",
"tbl", "data.frame"))

library(dplyr)
quad_count <- df %>%
# Count how many with each combination of X and Y being positive
count(right = X > 0, top = Y > 0) %>%
# TRUE = 1, FALSE = 0, so these map the TRUE to +1 and FALSE to -1
mutate(X = 2 * (right - 0.5), Y = 2 * (top - 0.5))
df %>%
ggplot(aes(x = X, y = Y)) +
geom_point() +
geom_hline(aes(yintercept = 0)) +
geom_vline(aes(xintercept = 0)) +
# This layer should use the other dataset, but keep using X and Y for location
geom_text(data = quad_count, aes(label = n), size = 10)

df %>%
ggplot(aes(x = X,
y = Y)) +
geom_point() +
#geom_smooth(method = "lm") +
geom_hline(aes(yintercept = 0)) +
geom_vline(aes(xintercept = 0)) +
geom_text(data = df %>%
mutate(X = X >= 0, Y = Y >= 0) %>%
count(X, Y) %>%
mutate(X = if_else(X, 10, -10),
Y = if_else(Y, 10, -10)),
mapping = aes(X, Y, label = n), size = 10)

How to calculate the average of a comma separated string of numbers in R

I have following file :
file 1
structure(list(Total_Gene_Symbol = c("5S_rRNA", "7SK", "A1BG-AS1"
), Test = c("1.02, 1.12, 1.11, 1.18, 1.12, 1.19, 1.25, 1.24, 1.24, 1.02",
"1.97, 2.27, 2.14, 1.15", "1.3, 1.01, 1.36, 1.42, 1.38, 1.01, 1.31, 1.34,
1.29, 1.34, 2.02, 1.12, 1.01, 1.31, 1.22"
)), .Names = c("Total_Gene_Symbol", "Test"), row.names = c(NA,
3L), class = "data.frame")
file 1 column test is number separated by ",".
I tried
mat <- stri_split_fixed(Down_FC, ',', simplify=T)
mat <- `dim<-`(as.numeric(mat), dim(mat)) # convert to numeric and save dims
rowMeans(mat, na.rm=T)->M
View(M)
but the above code is averaging entire data.
I want output same like below file 2
file 2
structure(list(Total_Gene_Symbol = c("5S_rRNA", "7SK", "A1BG-AS1"
), Test = c("1.02, 1.12, 1.11, 1.18, 1.12, 1.19, 1.25, 1.24, 1.24, 1.02",
"1.97, 2.27, 2.14, 1.15", "1.3, 1.01, 1.36, 1.42, 1.38, 1.01, 1.31, 1.34,
1.29, 1.34, 2.02, 1.12, 1.01, 1.31, 1.22"
), Average = c(11.49, 7.53, 19.44)), .Names = c("Total_Gene_Symbol",
"Test", "Average"), row.names = c(NA, 3L), class = "data.frame")

What you want is the sum not average! The average is something like the mode, median, mean.
library(magrittr)
df1$total_sum<-
df1$Test %>% str_split(.,",\\s+") %>% sapply(function(x) as.numeric(x) %>% sum(na.rm=T))

Using apply
d1$sum <- apply(d1,1,
function(x)(sum(as.numeric(unlist(strsplit(x['Test'],','))),na.rm = TRUE)))

You can use scan :
df$sum <- sapply(df$Test, function(x) sum(scan(text = x, what=numeric(),sep=","), na.rm=TRUE))
df$average <- sapply(df$Test, function(x) mean(scan(text = x, what=numeric(),sep=","), na.rm=TRUE))
# Total_Gene_Symbol Test sum average
# 1 5S_rRNA 1.02, 1.12, 1.11, 1.18, 1.12, 1.19, 1.25, 1.24, 1.24, 1.02 11.49 1.1490
# 2 7SK 1.97, 2.27, 2.14, 1.15 7.53 1.8825
# 3 A1BG-AS1 1.3, 1.01, 1.36, 1.42, 1.38, 1.01, 1.31, 1.34, \n 1.29, 1.34, 2.02, 1.12, 1.01, 1.31, 1.22 19.44 1.2960

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How can I take log of particular columns? - r

Related

Extract all subsets in vector where elements are above a given threshold

Forecasting Using Group and Regressors in Prophet

In which cases should i use the standard deviatiation "MR" and "SD" in a x.one.bar chart

ggplot counting observations in each quadrant

How to calculate the average of a comma separated string of numbers in R

Categories

Resources