pa package in R - performance attribution - r

When i try to upload my own data set into pa package I am not able to call a Brison method. I have attached a screenshot. The package itself bundled with the data set "jan", "quarter" and "year", there is no problem in running the analysis using these data (jan) or data (year) but my own data "Sample.xlsx" didn't work in the same way. When I try to run I get the following error: Error in is.data.frame(x) : object 'Sample' not found.
Can anyone help me to clean up my data.
Here is the code for the first 4 lines
data <-
structure(
list(
date = structure(
c(1420070400, 1420156800, 1420243200,
1420329600),
tzone = "UTC",
class = c("POSIXct", "POSIXt")
),
portfolio_return = c(0.0212, 0.019, 0.021, 0.03),
australian_equities_3 = c(0.01,
0.02, 0.01, 0.01),
domestic_fixed_interest_4 = c(0.02,-0.01,
0.02, 0.05),
cash_equivalents_5 = c(0.03, 0.04, 0.03, 0.03),
australian_equities_6 = c(0.28, 0.3, 0.3, 0.3),
domestic_fixed_interest_7 = c(0.32,
0.3, 0.3, 0.3),
cash_equivalents_8 = c(0.4, 0.4, 0.4, 0.4),
total_weight_9 = c(1, 1, 1, 1),
benchmark_return = c(0.0175,
0.0215, 0.032, 0.0381666666666666),
australian_equities_11 = c(0.02,
0.01, 0.01, 0.00333333333333333),
domestic_fixed_interest_12 = c(-0.01,
0.02, 0.05, 0.08),
cash_equivalents_13 = c(0.04, 0.03, 0.03,
0.0233333333333333),
australian_equities_14 = c(0.25, 0.25,
0.25, 0.25),
domestic_fixed_interest_15 = c(0.35, 0.35, 0.35,
0.35),
cash_equivalents_16 = c(0.4, 0.4, 0.4, 0.4),
total_weight_17 = c(1,
1, 1, 1),
allocation_australian_equities = c(7.5e-05,-0.000575,-0.0011,-0.00174166666666667),
allocation_domestic_fixed_interest = c(
0.000824999999999999,
7.49999999999999e-05,
-9e-04,
-0.00209166666666667
),
allocation_cash_equivalents = c(0,
0, 0, 0),
selection_australian_equities = c(-0.0025, 0.0025,
0, 0.00166666666666667),
selection_domestic_fixed_interest = c(0.0105,-0.0105,-0.0105,-0.0105),
selection_cash_equivalents = c(-0.004,
0.004, 0, 0.00266666666666668),
interaction_australian_equities = c(-3e-04,
5e-04, 0, 0.000333333333333333),
interaction_domestic_fixed_interest = c(-0.000899999999999999,
0.0015, 0.0015, 0.0015),
interaction_cash_equivalents = c(0,
0, 0, 0)
),
row.names = c(NA,-4L),
class = c("tbl_df", "tbl",
"data.frame")
)

Related

anyone successfully generate sankey chart with R plotly in powerBI desktop and powerBI service?

I tried by best to create sankey chart with R plotly package in powerBI, but failed. I can create sankey in RStudio with same code. I checked the official document that the package 'plotly' i used is supported by powerBI service. So there should be no reason the chart not displayed. https://learn.microsoft.com/en-us/power-bi/connect-data/service-r-packages-support
library("plotly")
a = read.csv('cleanSankey.csv', header=TRUE, sep=',')
node_names <- unique(c(as.character(a$source), as.character(a$target)))
node_names <- node_names[order(sub('.*_', '', node_names))]
nodes <- data.frame(name = node_names)
links <- data.frame(source = match(a$source, node_names) - 1,
target = match(a$target, node_names) - 1,
value = a$value)
node_x <- c(0, 0, 0, 0,
0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125,
0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25,
0.375, 0.375, 0.375, 0.375, 0.375, 0.375,
0.5, 0.5, 0.5, 0.5,
0.625, 0.6255, 0.625,
0.8, 0.8, 0.8,
0.999, 0.999)
node_y <- c(0.01, 0.02, 0.03, 0.04,
0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.11, 0.12,
0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08,
0.01, 0.02, 0.03, 0.04, 0.05, 0.06,
0.01, 0.02, 0.03, 0.04,
0.01, 0.02, 0.03,
0.01, 0.02, 0.03,
0.01, 0.02)
#Plot
plot_ly(type='sankey',
orientation = "h",
arrangement = "snap",
node = list (
label = node_names,
x = node_x,
y = node_y,
color = "grey",
pad = 15,
thinkness = 15,
line = list(color = "grey", width = 0.5)),
link = list(
source = links$source,
target = links$target,
value = links$value))
Then I I run above code in powerBI desktop, but powerBI says that 'Can't display the visual'.
Anyone has experience help to advice?

geom_tile() in R: Sort Order

I have a dataframe that I want to plot a heatmap of:
dput(df)
structure(list(`0` = c(6.08, 7.91, 5.14, 2.23, 0.72, 0.19, 0.04,
0.01, 0, 0, 0), `1` = c(9.12, 11.86, 7.71, 3.34, 1.09, 0.28,
0.06, 0.01, 0, 0, 0), `2` = c(6.84, 8.89, 5.78, 2.5, 0.81, 0.21,
0.05, 0.01, 0, 0, 0), `3` = c(3.42, 4.45, 2.89, 1.25, 0.41, 0.11,
0.02, 0, 0, 0, 0), `4` = c(1.28, 1.67, 1.08, 0.47, 0.15, 0.04,
0.01, 0, 0, 0, 0), `5` = c(0.38, 0.5, 0.33, 0.14, 0.05, 0.01,
0, 0, 0, 0, 0), `6` = c(0.1, 0.13, 0.08, 0.04, 0.01, 0, 0, 0,
0, 0, 0), `7` = c(0.02, 0.03, 0.02, 0.01, 0, 0, 0, 0, 0, 0, 0
), `8` = c(0, 0.01, 0, 0, 0, 0, 0, 0, 0, 0, 0), `9` = c(0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0), `10 or more` = c(0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0)), row.names = c("0", "1", "2", "3", "4", "5",
"6", "7", "8", "9", "10 or more"), class = "data.frame")
Now to plot the heatmap using ggplot2 this is how I approach the solution:
df %>%
as_tibble(rownames = "homeScore") %>%
pivot_longer(cols = -homeScore, names_to = "awayScore", values_to = "level") %>%
ggplot() +geom_tile(aes(x=homeScore, y=awayScore, fill = level))
The problem I face is that the columns and rows are being sorted on (0,1,10+,2,..) instead of (0,1,2,...10+). Here's the example:
How do I sort the values such that 10+ is the last for row and column, instead of the third?
As #Nate already mentioned you have to convert your vars to factors and put the levels in the right order. Instead of converting via factor(as.numeric(.)) (which converts "10 or more" to NA) I would recommend to make use of forcats::fct_relevel which allows you to change the order of the levels, e.g. forcats::fct_relevel(homeScore, "10 or more", after = 10) will change the order of the levels such that 10 or more becomes the last level. Try this:
library(ggplot2)
library(tidyr)
library(dplyr)
library(forcats)
df %>%
as_tibble(rownames = "homeScore") %>%
pivot_longer(cols = -homeScore, names_to = "awayScore", values_to = "level") %>%
mutate_at(vars(homeScore, awayScore), ~forcats::fct_relevel(.x, "10 or more", after = 10)) %>%
ggplot() +
geom_tile(aes(x=homeScore, y=awayScore, fill = level))

R: categorize all numeric variables (1:0) according to a cut-off

I have the following data frame:
structure(list(test1 = c(0.12, 0.2, 0.55, 0.22, 0.19, 0.17, 0.15,
0.12, 0.32, 0.23, 0.32, 0.23), test2 = c(0.15, 0.12, 0.32, 0.23,
0.12, 0.2, 0.55, 0.22, 0.12, 0.2, 0.55, 0.22), test3 = c(0.07,
0.01, 0, 0.13, 0.16, 0.78, 0.98, 0.1, 0.5, 0.3, 0.4, 0.5), test4 = c(0.23,
0.12, 0.2, 0.2, 0.55, 0.22, 0.12, 0.2, 0.55, 0.22, 0.55, 0.42
)), row.names = c(NA, -12L), class = c("tbl_df", "tbl", "data.frame"
))
And I am trying to write a script which, for each variables (test1, test2, test3...), creates (and add to the data frame) a dicotomic variable (named as out_testX) depending if the variable value is major or equal to .20.
The results, should be something like this:
structure(list(test1 = c(0.12, 0.2, 0.55, 0.22, 0.19, 0.17, 0.15,
0.12, 0.32, 0.23, 0.32, 0.23), test2 = c(0.15, 0.12, 0.32, 0.23,
0.12, 0.2, 0.55, 0.22, 0.12, 0.2, 0.55, 0.22), test3 = c(0.07,
0.01, 0, 0.13, 0.16, 0.78, 0.98, 0.1, 0.5, 0.3, 0.4, 0.5), test4 = c(0.23,
0.12, 0.2, 0.2, 0.55, 0.22, 0.12, 0.2, 0.55, 0.22, 0.55, 0.42
), out_test1 = c(0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1), out_test2 = c(0,
0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1), out_test3 = c(0, 0, 0, 0, 0,
1, 1, 0, 1, 1, 1, 1), out_test4 = c(1, 0, 1, 1, 1, 1, 0, 1, 1,
1, 1, 1)), row.names = c(NA, -12L), class = c("tbl_df", "tbl",
"data.frame"))
Can anyone help me?
Thank you
With mutate_all, we pass the function in the list specify a suffix to be added to column name, and if it needs to be prefix, do this in rename_at
library(dplyr)
library(stringr)
df1 %>%
mutate_all( list(out = ~+( . >= .2))) %>%
rename_at(vars(ends_with('out')), ~ str_replace(., '(.*)_(out)', '\\2_\\1'))
Or using base R
df1[paste0("out_", names(df1))] <- +(df1 >= .2)

obtain all possible correlation between two set of data

I am trying to find several info between each correlation
corr.test
I have two data set df1 and df2
df1<- structure(list(col1A = c(1.64, 0.03, 0, 4.202, 2.981, 0.055,
0, 0.002, 0.005, 0, 0.002, 0.649, 2.55, 2.762, 6.402), col2A = c(2.635,
0.019, 0, 5.638, 3.542, 0.793, 0.259, 0, 0.046, 0.004, 0.017,
0.971, 3.81, 3.104, 5.849), col3A = c(0.91, 0.037, 0, 5.757,
3.916, 0.022, 0, 0, 0.003, 0, 0.262, 0.136, 2.874, 3.466, 5.003
), col4A = c(1.027, 0.021, 0, 4.697, 2.832, 0.038, 0.032, 0.001,
0.003, 0, 0, 0.317, 2.743, 3.187, 6.455)), class = "data.frame", row.names = c(NA,
-15L))
the second data is like below
df2<-structure(list(col1 = c(2.172, 0, 0, 4.353, 4.581, 0.001, 0.027,
0, 0.002, 0, 0, 0.087, 2.129, 4.317, 5.849), col2 = c(2.093,
0, 0, 4.235, 3.166, 0, 0, 0.006, 0.01, 0, 0, 0.475, 0, 2.62,
5.364), col3 = c(3.322, 0, 0, 4.332, 4.018, 0.049, 0.169, 0.004,
0.02, 0, 0.032, 1.354, 2.944, 4.323, 5.44), col4 = c(0.928, 0.018,
0, 3.943, 3.723, 0.02, 0, 0, 0, 0, 0.075, 0.136, 3.982, 3.875,
5.83)), row.names = c("A", "AA", "AAA", "Aab", "buy", "yuyn",
"gff", "fgd", "kil", "lilk", "hhk", "lolo", "fdd", "vgfh", "nghg"
), class = "data.frame")
I want to obtain all possible correlation between the two and extract all p values and adjusted p values
I use
library(psych)
corr.test(df1,df2, use = "pairwise",method="pearson",adjust="holm",alpha=.05,ci=TRUE,minlength=5)
it does not give me any p value. also I cannot control any sort of permutation to calculate the adjusted p value.
I was thinking to use the following
x <-df1[,1]
y <-df2[,2]
corr_init <- cor(x,y) # original correlation
N <- 1000 # initialize number of permutations
count <- 0 # counts correlation greater than corr_init
for (i in 1:N) {
y_perm <- permute(y)
if (cor(y_perm,x) > corr_init) count <- count+1
}
p <- count/N #final p
but then I have do it one by one and still I need to extract each column and test ...
I am wondering if there is better way to calculate all correlation between two data, get R values, p values and P adjusted with specific number of randomization ?
It could be done using the Hmisc package:
library(Hmisc)
df1_cor_matrix <- rcorr(as.matrix(df1), type = "pearson")
df2_cor_matrix <- rcorr(as.matrix(df2), type = "pearson")
You can then extract out the coefficients using the following:
df1_coef <- df1_cor_matrix$r
df2_coef <- df2_cor_matrix$r
You can extract the p-values using the following:
df1_p_values <- df1_cor_matrix$P
df2_p_values <- df2_cor_matrix$P
You could get the adjusted p-values using the rcorr.adjust function:
rcorr.adjust(df1_cor_matrix, type = "pearson")
rcorr.adjust(df2_cor_matrix, type = "pearson")

What does MICE LoggedEvents mean?

I am trying to run a multiple imputation using the mice function (from the package of the same name) in R. I get a Warning that events have been logged. Here is the output from mice(.)$loggedEvents from my MWE (see below):
it im dep meth out
1 1 X pmm H
I'm not sure what is causing this warning and what the implications are. From what I understand, this can be caused by collinearity amongst variables, but this should be prevented by using remove_collinear=FALSE, but this isn't fixing the Warning.
MWE:
Pop <- data.frame(X = c( NA, 0.02, -1.15, 0.54, -0.61, -2.07),
Z = c( 0.83, 1.40, -3.07, -0.07, -0.20, -1.90),
D = c( 0, 0, 0, 1, 0, 0),
H = c( 0.01, 0.01, 0.01, 0.01, 0.02, 0.02))
Pop.Imp <- mice(Pop, m = 1, maxit = 1, print = T)
Obviously my original issue involved much more rows and columns of data and a higher number of imputations and iterations, but I've managed to trim this down to find this MWE.
Any help into figuring out what's causing this problem would be great. Is there some sort of cut-off that mice uses when deciding if/when a covariable is collinear? If it's very high, would this override the remove_collinear=FALSE parameter?
This isn't a full answer, but I couldn't fit the reply in a comment.
The logged events warning can arise from a variety of issues. The issue raised can be identified from the "meth" column in the mice()$loggedEvents output.
The two issues I know of are collinearity, and a constant predictor across all values (or maybe constant across all missing/not missing also satisfied this criteria). Added some variables to highlight these:
Pop <- data.frame(X = c( NA, 0.02, -1.15, 0.54, -0.61, -2.07, NA),
Z1 = c( 0.83, 1.40, -3.07, -0.07, -0.20, -1.90, 2.00),
Z2 = c( 0.83, 1.40, -3.07, -0.07, -0.20, -1.90, 2.00),
D = c( 0, 0, 0, 1, 0, 0, 1),
H = c( 0.01, 0.01, 0.01, 0.01, 0.02, 0.02, 0.02))
Pop.Imp <- mice(Pop, m = 1, maxit = 1, print = T)
Pop.Imp$loggedEvents
it im dep meth out
1 0 0 collinear Z2
2 1 1 X pmm H
Pop <- data.frame(X = c( NA, 0.02, -1.15, 0.54, -0.61, -2.07, NA),
Z1 = c( 0.83, 1.40, -3.07, -0.07, -0.20, -1.90, 2.00),
consvar = c( 0.83, 0.83, 0.83, 0.83, 0.83, 0.83, 0.83),
D = c( 0, 0, 0, 1, 0, 0, 1),
H = c( 0.01, 0.01, 0.01, 0.01, 0.02, 0.02, 0.02))
Pop.Imp <- mice(Pop, m = 1, maxit = 1, print = T)
Pop.Imp$loggedEvents
it im dep meth out
1 0 0 constant consvar
2 1 1 X pmm H
Unfortunately I don't know what issue "pmm" is. Maybe something to do with the predictive mean matching (chosen imputation method) not able to work in such a small dataset?

Resources