Ordering Table A based on Rank of Table B in R - r

pretty newb question here, but I have not been able to track down a solution for some time:
I have an XTS object of trading indicators (indicate) for stock data that looks like
A XOM MSFT
2000-11-30 -0.59 0.22 0.10
2000-12-29 0.55 -0.23 0.05
2001-01-30 -0.52 0.09 -0.10
And a table with an identical index for the corresponding period returns (return) that looks like
A XOM MSFT
2000-11-30 -0.15 0.10 0.03
2000-12-29 0.03 -0.05 0.02
2001-01-30 -0.04 0.02 -0.05
I have sorted the indicator table and had it return the column name with the following code:
indicate.label <- colnames(indicate)
indicate.rank <- t(apply(indicate, 1, function(x) indicate.label[order(-x)]))
indicate.rank <- xts(indicate.rank, order.by = index(returns))
Which gives the table (indicate.rank) of the symbol names ranked by their trading indicator:
1 2 3
2000-11-30 XOM MSFT A
2000-12-29 A MSFT XOM
2001-01-30 XOM A MSFT
I would like to also have a table that gives the period returns based on the indicator rank:
2000-11-30 0.10 0.03 -0.15
2000-12-29 0.03 0.02 -0.05
2001-01-30 0.02 -0.04 -0.05
I cannot figure out how to call the correct symbol for all rows or just sort the table return based on the order of indicate.
Thank you for any suggestions.
Trevor J

I'm not particularly satisfied with this solution, but it works.
row.rank <- t(apply(indicate, 1, order, decreasing=TRUE))
indicate.rank <- return.rank <- indicate # pre-allocate
for(i in 1:NROW(indicate.rank)) {
indicate.rank[i,] <- colnames(indicate)[row.rank[i,]]
return.rank[i,] <- return[i,row.rank[i,]]
}
It would probably be easier to handle this if the returns and the indicators for each symbol were in the same object, but I don't know how that would fit with the rest of your workflow.

Related

Exclude factor loadings from ID variable in order to create latent concept

I conducted a factor analysis and wanted to create the latent concept (postmaterialism and materialism) with the correlated variables (see output fa). Later on I want to merge this data set I used for the fa with another data set, hence I kept the ID variable in order to use it later as key variable. Now my problem is that I need to exclude the factor loadings from the ID variable because otherwise it'll contort the score of the latent concept of each individual. I tried different commands like:
!("ID"), with = FALSE, - ("ID"), with = FALSE, setdiff(names(expl_fa2),("ID")), with = FALSE
but nothing worked.
This is my code for the latent variables:
data_fa_1 <- data_fa_1 %>% mutate(postmat = expl_fa2$score[,1], mat = expl_fa2$scores[,2])
And this is the output from the factor analysis:
Standardized loadings (pattern matrix) based upon correlation matrix
MR1 MR2 h2 u2 com
import_of_new_ideas 0.48 0.06 0.233 0.77 1.0
import_of_safety 0.06 0.61 0.375 0.63 1.0
import_of_trying_things 0.66 0.03 0.435 0.57 1.0
import_of_obedience 0.01 0.49 0.240 0.76 1.0
import_of_modesty 0.01 0.44 0.197 0.80 1.0
import_of_good_time 0.62 0.01 0.382 0.62 1.0
import_of_freedom 0.43 0.16 0.208 0.79 1.3
import_of_strong_gov 0.15 0.57 0.350 0.65 1.1
import_of_adventures 0.64 -0.15 0.427 0.57 1.1
import_of_well_behav 0.03 0.64 0.412 0.59 1.0
import_of_traditions 0.03 0.50 0.253 0.75 1.0
import_of_fun 0.67 0.03 0.449 0.55 1.0
ID 0.07 0.04 0.007 0.99 1.7
Can anyone help me with the command I need to use in order to exclude the factor loadings from the ID variable (see output fa) from the creation of the latent variables "postmat" and "mat"?
Not sure if this is really your question, but assuming you just want to remove the first column from a data.table, here is an example data.table and 3 ways how you could exclude the ID column for that example:
DT <- data.table(
ID=LETTERS[1:10],
matrix(rnorm(50), nrow=10, dimnames = list(NULL, paste0("col", 1:5)))
)
DT[,- 1]
DT[, -"ID"]
DT[, setdiff(colnames(DT), "ID"), with=FALSE]

how to use the `map` family command in **purrr** pacakge to swap the columns across rows in data frame?

Imagine there are 4 cards on the desk and there are several rows of them (e.g., 5 rows in the demo). The value of each card is already listed in the demo data frame. However, the exact position of the card is indexed by the pos columns, see the demo data I generated below.
To achieve this, I swap the cards with the [] function across the rows to switch the cards' values back to their original position. The following code already fulfills such a purpose. To avoid explicit usage of the loop, I wonder whether I can achieve a similar effect if I use the vectorization function with packages from tidyverse family, e.g. pmap or related function within the package purrr?
# 1. data generation ------------------------------------------------------
rm(list=ls())
vect<-matrix(round(runif(20),2),nrow=5)
colnames(vect)<-paste0('card',1:4)
order<-rbind(c(2,3,4,1),c(3,4,1,2),c(1,2,3,4),c(4,3,2,1),c(3,4,2,1))
colnames(order)=paste0('pos',1:4)
dat<-data.frame(vect,order,stringsAsFactors = F)
# 2. data swap ------------------------------------------------------------
for (i in 1:dim(dat)[1]){
orders=dat[i,paste0('pos',1:4)]
card=dat[i,paste0('card',1:4)]
vec<-card[order(unlist(orders))]
names(vec)=paste0('deck',1:4)
dat[i,paste0('deck',1:4)]<-vec
}
dat
You could use pmap_dfr :
card_cols <- grep('card', names(dat))
pos_cols <- grep('pos', names(dat))
dat[paste0('deck', seq_along(card_cols))] <- purrr::pmap_dfr(dat, ~{
x <- c(...)
as.data.frame(t(unname(x[card_cols][order(x[pos_cols])])))
})
dat
# card1 card2 card3 card4 pos1 pos2 pos3 pos4 deck1 deck2 deck3 deck4
#1 0.05 0.07 0.16 0.86 2 3 4 1 0.86 0.05 0.07 0.16
#2 0.20 0.98 0.79 0.72 3 4 1 2 0.79 0.72 0.20 0.98
#3 0.50 0.79 0.72 0.10 1 2 3 4 0.50 0.79 0.72 0.10
#4 0.03 0.98 0.48 0.06 4 3 2 1 0.06 0.48 0.98 0.03
#5 0.41 0.72 0.91 0.84 3 4 2 1 0.84 0.91 0.41 0.72
One thing to note here is to make sure that the output from pmap function does not have original names of the columns. If they have the original names, it would reshuffle the columns according to the names and output would not be in correct order. I use unname here to remove the names.

Generate subsequences in R

I have a df which is 67200 obs long, with 5 vars. I would like to create a list of subsequences from one var, each of equal length (600 obs). I would like the sequence to be iterative so that I can identify rolling sequences i.e. seq1 = 0:600, seq2 = 1:601, seq3 = 2:602, and so on. I will then sum the data from each subsequence to identify the sequence with the highest total.
I understand how to make a basic sequence using seq, however after reading around SO and other sites, I can only find info on how to identify specific sequences. Any help with ideas on ways to create said subsequences would be great.
Sample Data:
Var1 Var2 Var3 Var4 Var5
0.00 0.31 0.32 0.00 0.01
0.10 0.46 0.46 0.13 0.01
0.20 0.46 0.47 0.14 0.02
0.30 0.40 0.21 0.14 0.02
0.40 0.38 0.11 0.20 0.03
0.50 0.38 0.07 0.25 0.04
Expected Output:
List of x each subsequnce
To answer your question I think you can achieve your expected output with lapply and seq :
x <- 600
n <- 0:(nrow(df) - 600)
lapply(n, function(i) seq(i, i+x))
However, reading the description it seems you are trying to perform rolling calculation and the above is not the best approach to do this. Look into zoo library it has functions like rollsum, rollmean or a general rollapply which will have better way to do this.

Mean of quartile for multiple columns and multiple dates

I'm trying to find the mean forward return (column fwd_rtn) of each quartile for each column (ie for quartiles for PB, PE, PS) for each date group (1/1/2016...1/4/2016)
head(df)
Date Stock Price PB PE PS fwd_rtn
1 1/1/2016 A 11.90 0.4 0.10 0.57 -0.015
2 1/1/2016 B 3.56 0.8 0.09 0.26 -0.036
3 1/1/2016 C 1.29 1.2 0.18 1.60 0.10
......
4 1/4/2016 A 12.80 0.39 0.13 0.53 -0.01
5 1/4/2016 B 4.03 0.76 0.08 0.23 0.02
6 1/4/2016 C 1.83 0.87 0.14 1.16 0.03
So far i have been able to find the mean return for 1 column for 1 date using this code
df$qPB <- cut(df$PB, breaks = quantile(df$PB, c(0,.25,.5,.75,1)),include.lowest = TRUE)
aggregate(df$fwd_rtn,list(qPB = df$qPB),FUN=mean)
which gave me the right answers. But I'm struggling to do it for the multiple columns. I think I'm supposed to use dplyr and the gather() function but i dont know how.
To get quartiles of a single variable by date you can use the ave function:
df$qPB <- ave(df$PB, df$Date, FUN= function(i) cut(i, breaks = quantile(df$PB,
c(0,.25,.5,.75,1)),include.lowest = TRUE)
# a minor addition to aggregate
aggregate(df$fwd_rtn, list("qPB"=df$qPB, "date"=df$Date), FUN=mean)
You should take a look at using lapply or sapply to move through multiple columns.

reshape unique strings in rows into columns in R

I would like to reshape my data based in unique string in a "Bull" column (all data frame):
EBV Bulls
0.13 NE001362
0.17 NE001361
0.05 NE001378
-0.12 NE001359
-0.14 NE001379
0.13 NE001380
-0.46 NE001379
-0.46 NE001359
-0.68 NE001394
0.28 NE001391
0.84 NE001394
-0.43 NE001393
-0.18 NE001707
My expected output:
NE001362 NE001361 NE001378 NE001359 NE001379 NE001380 NE001394 NE001391 NE001393 NE001707
0.13 0.17 0.05 -0.12 -0.14 0.13 -0.68 0.28 -0.43 -0.18
-0.46 -0.46 0.84
I tried dat2 <- dcast(all, EBV~variable, value.var = "Bulls") but do not works.
You have two options. Indexing the multiple occurrences for each level of Bulls or using a list to hold the different levels of EBV.
Option 1: Indexing multiple occurrences
You can use data.table to generate an index that numbers multiple occurrences of EBV:
require(data.table)
setDT(all) ## convert to data.table
all[, index:=1:.N, by=Bulls] ## generate index
dcast.data.table(all, formula=index ~ Bulls, value.var='EBV')
Option 2: Using a list to store multiple values
You could use a list as a value with data.table (I'm not sure if plain data.frame supports it).
require(data.table)
setDT(all) ## convert to data.table
all[, list(list(EBV)), by=Bulls] ## multiple values stored as list
Just to make sure that base R gets some acknowledgement:
## Add an ID, like ilir did, but with base R functions
mydf$ID <- with(mydf, ave(rep(1, nrow(mydf)), Bulls, FUN = seq_along))
Here's reshape:
reshape(mydf, direction = "wide", idvar="ID", timevar="Bulls")
# ID EBV.NE001362 EBV.NE001361 EBV.NE001378 EBV.NE001359 EBV.NE001379
# 1 1 0.13 0.17 0.05 -0.12 -0.14
# 7 2 NA NA NA -0.46 -0.46
# EBV.NE001380 EBV.NE001394 EBV.NE001391 EBV.NE001393 EBV.NE001707
# 1 0.13 -0.68 0.28 -0.43 -0.18
# 7 NA 0.84 NA NA NA
And xtabs. Note: This is a table-like matrix, so if you want a data.frame, you'll have to use as.data.frame.matrix on the output.
xtabs(EBV ~ ID + Bulls, mydf)
# Bulls
# ID NE001359 NE001361 NE001362 NE001378 NE001379 NE001380 NE001391
# 1 -0.12 0.17 0.13 0.05 -0.14 0.13 0.28
# 2 -0.46 0.00 0.00 0.00 -0.46 0.00 0.00
# Bulls
# ID NE001393 NE001394 NE001707
# 1 -0.43 -0.68 -0.18
# 2 0.00 0.84 0.00

Resources