Interrelated columns in R data.table

Interrelated columns in R data.table - r

To keep track of cash flow I have a number of interrelated columns in a data.table:
"Amount_spent" is always 5% of the "Balance".
"Revenue" is "Amount_spent" * "Price"
"Balance" is the cumulative sum of "Revenue" (starting at 100.00).
Transactions only happen on "Day" "a"
I am struggling to calculate these interrelated columns concurrently.
Example as I would like:
library(data.table)
Day <- c( "a", "c", "b", "a", "b", "a", "b", "c", "a", "a" )
Price <- c( 0.6, 0.4, 0.9, -0.3, 0.8, 0.2, 0.3, 0.9, 0.9, -0.7 )
Balance <- c( 100.00, 103.00, 103.00, 103.00, 101.46, 101.46, 102.47, 102.47, 102.47, 107.08 )
Amount_spent <- c( 5.00, 0.00, 0.00, 5.15, 0.00, 5.07, 0.00, 0.00, 5.12, 5.35 )
Revenue <- c( 3.00, 0.00, 0.00, -1.55, 0.00, 1.01, 0.00, 0.00, 4.61, -3.75 )
DT <- data.table( Day, Price, Balance, Amount_spent, Revenue )
DT
Here is my attempt so far:
# set initial balance
Balance2 <- 100.00
Day2 <- c( "a", "c", "b", "a", "b", "a", "b", "c", "a", "a" )
Price2 <- c( 0.6, 0.4, 0.9, -0.3, 0.8, 0.2, 0.3, 0.9, 0.9, -0.7 )
my.try <- data.table( Day2, Price2 )
my.try[, Balance2 := cumsum( Revenue2 )]
my.try[ Day2 == "a", Amount_spent2 := Balance2 * 0.05 ]
my.try[is.na(Amount_spent2), Amount_spent2 := 0]
my.try[, Revenue2 := Price2 * Amount_spent2 ]
my.try
As you will see it fails with this error message Error in eval(expr, envir, enclos) : object 'Revenue2' not found as the "Revenue2" column is yet to be created.
Thank you

You get mentioned error after the line my.try[, Balance2 := cumsum( Revenue2 )] which try to use the column Revenue2 which not exist in DT at that point in the code.
library(data.table)
Day <- c( "a", "c", "b", "a", "b", "a", "b", "c", "a", "a" )
Price <- c( 0.6, 0.4, 0.9, -0.3, 0.8, 0.2, 0.3, 0.9, 0.9, -0.7 )
Balance <- c( 100.00, 103.00, 103.00, 103.00, 101.46, 101.46, 102.47, 102.47, 102.47, 107.08 )
Amount_spent <- c( 5.00, 0.00, 0.00, 5.15, 0.00, 5.07, 0.00, 0.00, 5.12, 5.35 )
Revenue <- c( 3.00, 0.00, 0.00, -1.55, 0.00, 1.01, 0.00, 0.00, 4.61, -3.75 )
DT <- data.table( Day, Price, Balance, Amount_spent, Revenue )
Balance2 <- 100.00
Day2 <- c( "a", "c", "b", "a", "b", "a", "b", "c", "a", "a" )
Price2 <- c( 0.6, 0.4, 0.9, -0.3, 0.8, 0.2, 0.3, 0.9, 0.9, -0.7 )
my.try <- data.table( Day2, Price2 )
my.try[, Balance2 := cumsum( Revenue2 )]
#Error in eval(expr, envir, enclos) : object 'Revenue2' not found
"Revenue2" %in% names(DT)
#[1] FALSE
You did not produce expected results. I'm not sure what you mean by calculate the columns concurrently. If you want to assign/update multiple columns by reference in a single step you can use `:=()` function the same way as you would use .() or list() in data.table's j argument. For example: `:=`(col1=1+2, col2=2+3).
You can read more about update by reference in Reference semantics vignette.

Related

Is it possible to create bubble chart using Echarts4R

I'm trying to create a bubble chart that look like this using e_scatter.
This is what the data looks like and the state I am able to recreate now
data <- data.frame(
group = c("Upper", "Upper", "Upper", "Upper", "Upper", "Upper", "Upper", "Upper", "Lower", "Lower"),
category = c("A", "B", "C", "D", "E", "F", "G", "H", "a", "b"),
size = c(0.74, 0.72, 0.68, 0.67, 0.63, 0.54, 0.53, 0.49, 0.02, 0.02),
sample_x = c(0, 0.2, 0.4, 0.6, 0.8, 0.8, 0.6, 0.4, 0.2, 0),
sample_y = c(1, 2, 3, 2, 1, -1, -2, -3, -2, -1)
)
data |>
group_by(group) |>
e_charts(sample_x) |>
e_scatter(sample_y, size)
Originally the data only have group, category and value columns
Because I don't have x and y in the data, is it possible to recreate the bubble chart using e_scatter?
Or am I using the wrong chart type for this?
(I did try by adding e_polar but I couldn't scale the size of the dot using size column)

R: Finance - Compute Beta via CAPM for Panel Data

I have the following data containing three Funds (A, B and C) and their the respective data for (Return minus Risk Free Rate) and (Market Return minus Risk Free Rate):
structure(list(`Fund Name` = c("A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B",
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B",
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B",
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "C", "C",
"C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C",
"C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C"), Date = c("2018-08-01",
"2018-08-02", "2018-08-03", "2018-10-22", "2018-10-23", "2018-10-24",
"2018-12-18", "2019-01-08", "2019-01-09", "2019-01-10", "2019-01-11",
"2019-01-14", "2019-01-15", "2019-01-16", "2019-02-07", "2019-02-08",
"2019-02-11", "2019-02-12", "2019-02-13", "2019-02-14", "2019-02-15",
"2019-02-18", "2019-02-19", "2019-02-20", "2019-03-15", "2019-03-18",
"2019-03-19", "2019-04-01", "2019-04-02", "2019-04-03", "2019-04-04",
"2019-04-10", "2019-04-11", "2019-04-12", "2019-04-15", "2018-08-01",
"2018-08-02", "2018-08-10", "2018-08-13", "2018-08-14", "2018-08-16",
"2018-08-17", "2018-10-23", "2018-10-24", "2018-10-25", "2018-10-26",
"2018-10-29", "2018-10-30", "2018-10-31", "2018-11-13", "2018-11-14",
"2018-11-22", "2018-11-23", "2018-12-06", "2018-12-07", "2018-12-10",
"2018-12-11", "2018-12-12", "2018-12-13", "2018-12-14", "2018-12-17",
"2018-12-18", "2019-02-06", "2019-02-07", "2019-02-08", "2019-02-11",
"2019-02-12", "2019-02-13", "2019-02-14", "2019-02-15", "2019-03-04",
"2019-03-05", "2019-03-06", "2019-03-07", "2019-03-08", "2019-03-11",
"2019-03-26", "2019-03-27", "2019-04-05", "2019-04-08", "2019-04-12",
"2019-04-15", "2018-08-01", "2018-08-02", "2018-08-03", "2018-08-06",
"2018-08-07", "2018-08-08", "2018-08-09", "2018-08-10", "2018-08-13",
"2018-08-14", "2018-08-23", "2019-01-29", "2019-03-01", "2019-03-04",
"2019-03-05", "2019-03-06", "2019-03-07", "2019-03-27", "2019-03-28",
"2019-03-29", "2019-04-01", "2019-04-02", "2019-04-03", "2019-04-04",
"2019-04-12", "2019-04-15"), `Return-RF` = c(NA, -0.031053409,
-0.004149784, -0.019431914, -0.025985785, -0.022325086, -0.013000177,
-0.005969802, 0.003743827, -0.005973689, -0.012279585, -0.012621233,
-0.014248868, -0.000850313, -0.038296552, -0.020249538, -0.002319941,
-0.003117846, -0.006643616, -0.012684205, 0.00480718, -0.000708029,
-0.007510481, -0.001464912, -0.008793153, -0.003356718, -0.005595538,
0.00592619, -0.006444843, 0.007778815, -0.01019018, -0.008793842,
-0.003549589, 0.000596707, -0.005270976, NA, -0.024337163, -0.030609843,
-0.012780354, -0.011857873, NA, -0.00906015, -0.035681946, -0.007920997,
-0.020963305, -0.013154577, 0.002038879, -0.019934722, 0.007708796,
-0.019404458, 0.000443959, -0.008925886, -0.017543139, -0.033810649,
-0.002362211, -0.02975915, -0.002819632, -0.000687416, -0.006733802,
-0.02423122, -0.017747687, -0.009444599, -0.006353213, -0.020454878,
-0.028563249, -0.005726489, -0.003094262, -0.001040783, -0.012626742,
-0.001097087, -0.009497361, -0.015542972, 5.53889e-05, -0.020560822,
-0.023744172, -0.00744049, -0.00193544, -0.013016594, -0.008529772,
-0.005602241, -0.004651093, -0.005644803, NA, -0.02207606, -0.006369491,
-0.012551725, -0.003201358, -0.01153393, -0.010203346, -0.033352688,
-0.01224557, -0.011346633, -0.012929118, -0.006728953, -0.004243723,
-0.012659234, -0.009103863, -0.011760838, -0.023812576, -0.013908016,
-0.013459074, -0.004005417, 0.004751808, -0.007972052, 0.006040872,
-0.011324789, -0.000427748, -0.007779257), `Mkt-RF` = c(-0.64,
-1.36, 0.36, -0.85, -1.53, -1.26, -0.41, 0.61, 1.51, -0.13, -0.21,
-0.6, -0.01, 0.19, -1.63, -0.75, 0.33, 0.94, 0.07, 0.01, 1.22,
0.46, 0.12, 0.55, 0.93, 0.39, 0.62, 1.09, 0.45, 1.01, -0.28,
0.25, 0.11, 0.63, 0.3, -0.64, -1.36, -2.01, -0.28, -0.54, 0.71,
0.41, -1.53, -1.26, 0.5, -0.61, 0.65, -0.07, 1.37, 1.01, -0.28,
-0.44, -0.29, -2.49, 0.45, -1.98, 0.8, 1.98, -0.13, -1.23, -0.93,
-0.41, -0.28, -1.63, -0.75, 0.33, 0.94, 0.07, 0.01, 1.22, 0.03,
-0.03, -0.19, -1.44, -0.47, 0.85, 0.31, -0.14, 0.15, 0.24, 0.63,
0.3, -0.64, -1.36, 0.36, -0.18, 0.73, -0.08, -0.42, -2.01, -0.28,
-0.54, -0.54, 0.43, 0.52, 0.03, -0.03, -0.19, -1.44, -0.14, -0.34,
0.67, 1.09, 0.45, 1.01, -0.28, 0.63, 0.3)), class = "data.frame", row.names = c(NA,
-108L))
Now I would like to compute the beta via the CAPM for the three different funds.
I tried with the lm function but I it gives only one beta for all three funds together.
I tried with the following code:
Panel <- Panel %>%
group_by(`Fund Name`)
Regression <- lm(Panel$`Return-RF`~ Panel$`Mkt-RF`)
Could someone help me here with the code?

You can split() your dataframe by fund, then run the regression on each subset using lapply():
Panel_Funds <- split(Panel, Panel$`Fund Name`)
Regressions <- lapply(
Panel_Funds,
\(x) lm(`Return-RF` ~ `Mkt-RF`, data = x)
)
Regressions
Output:
$A
Call:
lm(formula = `Return-RF` ~ `Mkt-RF`, data = x)
Coefficients:
(Intercept) `Mkt-RF`
-0.00964 0.01205
$B
Call:
lm(formula = `Return-RF` ~ `Mkt-RF`, data = x)
Coefficients:
(Intercept) `Mkt-RF`
-0.010538 0.008266
$C
Call:
lm(formula = `Return-RF` ~ `Mkt-RF`, data = x)
Coefficients:
(Intercept) `Mkt-RF`
-0.009401 0.010676
If you want to save the coefficients to a table, you can use broom::tidy(); see my answer here for an example.

Are you trying to calculate the variance and covariance to compute the beta?
I would turn your data into a tibble then drop the NA values,
(data %>% as_tibble() %>% drop_na())
then you can easily extract variance for each company,
fundA <- data %>% filter(`Fund Name` == A)
then get variance,
var(fundA$`Return-RF`)

How to accurately estimate the start of an increasing value of a variable in time?

Goal
I have brake force (kg) data for many drivers, and I want to find when the brake application started in time. Particularly, I need the time frame of brake start. Following are three examples of brake pedal force and the desired location of the brake start of time frames:
Estimating Brake start
I estimated the brake start by assuming that it is a changepoint. So, I used the changepoint package in R. But I get some of them right and others wrong (the vertical red line below represents the estimated changepoint):
You can see the changepoints for participants B and C are (almost) correct, but incorrect for participant A. In my full dataset, there are many incorrect values so manually estimating them is going to be very time consuming.
Do you have any suggestions to accurately estimate the brake start? Thank you for your time.
The data and code for the above figure are provided below.
Data and Code
Data
foo <- structure(list(participant = c("A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B",
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B",
"B", "B", "B", "B", "C", "C", "C", "C", "C", "C", "C", "C", "C",
"C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C",
"C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C",
"C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C",
"C", "C", "C", "C"), frames = c(39614, 39644, 39674, 39704, 39734,
39764, 39794, 39824, 39854, 39884, 39914, 39944, 39974, 40004,
40034, 40064, 40094, 40124, 40154, 40184, 40214, 40244, 40274,
40304, 40334, 40364, 40394, 40424, 40454, 40484, 40514, 40544,
40574, 40604, 40634, 40664, 40694, 40724, 40754, 40784, 40814,
40844, 40874, 40904, 40934, 40964, 40994, 41024, 41054, 41084,
41114, 41144, 41174, 45296, 45326, 45356, 45386, 45416, 45446,
45476, 45506, 45536, 45566, 45596, 45626, 45656, 45686, 45716,
45746, 45776, 45806, 45836, 45866, 45896, 45926, 63792, 63822,
63852, 63882, 63912, 63942, 63972, 64002, 64032, 64062, 64092,
64122, 64152, 64182, 64212, 64242, 64272, 64302, 64332, 64362,
64392, 64422, 64452, 64482, 64512, 64542, 64572, 64602, 64632,
64662, 64692, 64722, 64752, 64782, 64812, 64842, 64872, 64902,
64932, 64962, 64992, 65022, 65052, 65082, 65112, 65142, 65172,
65202, 65232, 65262, 65292, 65322), ED_brake_pedal_force_kg = c(0.34,
0.34, 0.34, 0.33, 0.33, 0.34, 0.32, 0.34, 0.34, 0.34, 0.34, 0.32,
0.34, 0.34, 0.37, 0.32, 0.32, 0.33, 0.34, 0.32, 0.33, 0.34, 0.34,
0.72, 2.01, 2.91, 4.57, 5.73, 5.84, 5.82, 5.21, 5.23, 5.23, 4.41,
4, 3.57, 3.09, 2.28, 1.37, 0.33, 0.33, 0.65, 1.21, 3.36, 4.91,
5.2, 5.96, 6.24, 7.6, 14.13, 25.8, 32.37, 37.71, 0.32, 0.34,
0.33, 0.32, 1.72, 8.93, 18.83, 22.78, 39.5, 66.63, 9.46, 2.24,
0.33, 0.34, 1.9, 5.5, 8.55, 10.66, 12.24, 12.24, 12.24, 12.27,
0.29, 0.29, 0.31, 0.31, 0.3, 0.29, 0.3, 0.3, 0.3, 0.29, 0.3,
0.31, 0.3, 0.29, 0.29, 0.91, 2.79, 3.67, 4.24, 5.61, 5.91, 6.08,
5.4, 4.46, 3.74, 3.85, 4, 4.43, 2.08, 0.7, 0.3, 0.29, 0.31, 0.32,
0.34, 0.69, 0.83, 0.83, 0.84, 1.36, 1.68, 2.04, 3.87, 5.21, 7.28,
9.84, 13.49, 14.83, 14.79, 14.79, 14.79, 14.71)), row.names = c(NA,
-127L), class = c("tbl_df", "tbl", "data.frame"))
Code
Estimation of changepoint and plotting:
library(changepoint)
library(tidyverse)
foo %>%
group_by(participant) %>%
mutate(brake_start_frame = frames[cpts(cpt.meanvar(ED_brake_pedal_force_kg,
Q = 8,
method = "BinSeg"))][1]) %>%
ungroup() %>%
ggplot() +
geom_line(aes(x = frames, y = ED_brake_pedal_force_kg)) +
geom_vline(aes(xintercept = brake_start_frame), color="red") +
facet_wrap(~ participant, scales = "free_x")

Since this is a time-series problem, you can explore TTR::momentum function to solve this problem. Whenever momentum will go above a particular threshold in upward direction, it will trigger the event.
library(TTR)
library(data.table)
setDT(foo)
foo[, momentum := TTR::momentum(ED_brake_pedal_force_kg, 5), by = participant]
ggplot(foo) +
geom_line(aes(x = momentum, y = ED_brake_pedal_force_kg)) +
facet_wrap(~ participant, scales = "free_x")

Error with using unlist, lapply and grepl in data.tables R

This question is an extension of this particular question. I have this particular data.table. I'm using data.table, mc2d, and e1071 libraries
library("data.table")
library("mc2d")
library("e1071")
col <- c("COST","TIME")
dt <- structure(
list(
ID = c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j"),COST_PR_L = c(NA, 0.4, 0.31, 0.4, 0.5, 0.17, 1, 0.5, 0.5, 0.5),COST_PR_U = c(7.5, 2, 2.67, 1.67, 2.4,2, 1.5, 2, 2, 1.67),COST_PO_L = c(NA, 0.33, 0.25, 0.44,0.5, 0.25, 1, 0.5, 0.5, 0.5),COST_PO_U = c(3, 1.43, 3.33,1.8, 2.4, 3.6, 1.45, 2, 1.5, 1.67), TIME_PR_L = c(NA, 0.5,0.4, 0.5, 0.5, NA, 0.67, 0.5, 0.5, 0.5), TIME_PR_U = c(2,2.5, 3, 1.5, 2, NA, 1.5, 2, 1.67, 2), TIME_PO_L = c(NA,0.4, 0.25, 0.56, 0.5, NA, 0.6, 0.5, 0.5, 0.5), TIME_PO_U = c(2,2, 5, 1.67, 2.5, NA, 1.5, 2, 1.67, 2)
),.Names = c("ID","COST_PR_L", "COST_PR_U","COST_PO_L","COST_PO_U","TIME_PR_L","TIME_PR_U","TIME_PO_L","TIME_PO_U"),class = c("data.table","data.frame"),row.names = c(NA,-10L))
When I run this particular operation on it,
dt[, unlist(lapply(col, function(xx) {
y = colnames(dt)[grepl(pattern = xx, x = colnames(dt))]
vars1 = y[grepl(pattern = "PR", x = y)]
vars2 = y[grepl(pattern = "PO", x = y)]
mn = get(vars1[1])
mx = get(vars1[2])
sk1 = ifelse(mn !=0 && mx !=0,skewness(rpert(1000, min = mn , mode = 1, max= mx )),-1)
mn = get(vars2[1])
mx = get(vars2[2])
sk2 = ifelse(mn !=0 && mx !=0,skewness(rpert(1000, min = mn , mode = 1, max= mx )),-1)
return(list(sk1, sk2))
}), recursive = FALSE)
, by = "ID"]
I get the following error
Error in [.data.table(dt, , unlist(lapply(col, function(xx) { :
Column 1 of result for group 2 is type 'double' but expecting type
'logical'. Column types must be consistent for each group.
However, If I remove the unlist in the code, It seems to calculate the answer. What is unlist doing that is messing it up?

Show only a certain part of the x-axis when using plot(density(mydf))

There's an extreme value in my data. How can I only show the density plot for the "important" part of my data. I'd like to show the x-axis only from let's say -5 to +5 percent.
COMP <- c("A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "B")
RET <- c(-80,1.1,3,1.4,-0.2, 0.6, 0.1, -0.21, -1.2, 0.9, 0.3, -0.1,0.3,-0.12)
mydf <- data.frame(COMP, RET, stringsAsFactors=F)
plot(density(mydf$RET))
and the same with boxplot on the y-axis
boxplot(mydf$RET)
I know
boxplot(mydf$RET, outline=FALSE)
but here I want the range of the y-axis even smaller. How is that possible?
Thank you!

Use the arguments xlim and ylim to adjust the axis' scales in R basic graphics.
COMP <- c("A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "B")
RET <- c(-80,1.1,3,1.4,-0.2, 0.6, 0.1, -0.21, -1.2, 0.9, 0.3, -0.1,0.3,-0.12)
mydf <- data.frame(COMP, RET, stringsAsFactors=F)
par(mfrow = c(1,2)) #stack plots in 1 row and 2 columns
plot(density(mydf$RET),xlim=c(-5,5), main="")
boxplot(mydf$RET, ylim = c(-2,2), ylab="RET")

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Interrelated columns in R data.table - r

Related

Is it possible to create bubble chart using Echarts4R

R: Finance - Compute Beta via CAPM for Panel Data

How to accurately estimate the start of an increasing value of a variable in time?

Error with using unlist, lapply and grepl in data.tables R

Show only a certain part of the x-axis when using plot(density(mydf))

Categories

Resources