Related
this is a follow up question . I wanna know how can I intercalate dynamically the columns in the bigger data set?
Rationale: I've conducted a for-loop to import 16 dataframes. After that, I did this to merge all dataframes:
### Merge all dataframes: (ps: I got this code here in SO :)
mergefun <- function(x, y) merge(x, y, by= "ID", all = T)
merged_DF <- Reduce(mergefun, dataList)
Each dataframes has an "ID" column (which is the same for every one), but they have different column names (the ones I've created based on the other posts' answer). Hence,
I have, in total (the head() of each dataframe):
ID NARR_G1_50_AAA NARR_G1_50_AAC NARR_G1_50_AC NARR_G1_50_AB
ID NARR_G1_100_AAA NARR_G1_100_AAC NARR_G1_100_AC NARR_G1_100_AB
ID NARR_G1_150_AAA NARR_G1_150_AAC NARR_G1_150_AC NARR_G1_150_AB
ID NARR_G1_200_AAA NARR_G1_200_AAC NARR_G1_200_AC NARR_G1_200_AB
ID NARR_G2_50_AAA NARR_G2_50_AAC NARR_G2_50_AC NARR_G2_50_AB
ID NARR_G2_100_AAA NARR_G2_100_AAC NARR_G2_100_AC NARR_G2_100_AB
ID NARR_G2_150_AAA NARR_G2_150_AAC NARR_G2_150_AC NARR_G2_150_AB
ID NARR_G2_200_AAA NARR_G2_200_AAC NARR_G2_200_AC NARR_G2_200_AB
ID ARG_G1_50_AAA ARG_G1_50_AAC ARG_G1_50_AC ARG_G1_50_AB
ID ARG_G1_100_AAA ARG_G1_100_AAC ARG_G1_100_AC ARG_G1_100_AB
ID ARG_G1_150_AAA ARG_G1_150_AAC ARG_G1_150_AC ARG_G1_150_AB
ID ARG_G1_200_AAA ARG_G1_200_AAC ARG_G1_200_AC ARG_G1_200_AB
ID ARG_G2_50_AAA ARG_G2_50_AAC ARG_G2_50_AC ARG_G2_50_AB
ID ARG_G2_100_AAA ARG_G2_100_AAC ARG_G2_100_AC ARG_G2_100_AB
ID ARG_G2_150_AAA ARG_G2_150_AAC ARG_G2_150_AC ARG_G2_150_AB
ID ARG_G2_200_AAA ARG_G2_200_AAC ARG_G2_200_AC ARG_G2_200_AB
I need two arrange the joined dataframe columns in these two orders:
SET 1 :
###Desired output 1:
NARR_G1_50_AAA, NARR_G2_50_AAA,
NARR_G1_50_AAC, NARR_G2_50_AAC,
NARR_G1_50_AC, NARR_G2_50_AC,
NARR_G1_50_AB, NARR_G2_50_AB,
ARG_G1_50_AAA, ARG_G2_50_AAA,
ARG_G1_50_AAC, ARG_G2_50_AAC,
ARG_G1_50_AC, ARG_G2_50_AC,
ARG_G1_50_AB, ARG_G2_50_AB........then with 100,150 and 200
SET 2 :
###Desired output 2:
NARR_G1_50_AAA, ARG_G1_50_AAA,
NARR_G2_50_AAA, ARG_G2_50_AAA,
NARR_G1_50_AAC, ARG_G1_50_AAC,
NARR_G2_50_AAC, ARG_G2_50_AAC,
NARR_G1_50_AC, ARG_G1_50_AC,
NARR_G2_50_AC, ARG_G2_50_AC,
NARR_G1_50_AB, ARG_G1_50_AB,
NARR_G2_50_AB, ARG_G2_50_AB,........then with 100,150 and 200
I've tried many things, but I can't get the desired orders...the closer I got was this:
dfPaired <- merged_DF %>% ###still doesn't produce the desired output
# dplyr::select(sort(names(.))) %>%
dplyr::select(order(gsub("G1", "G2", names(.)))) %>%
Question:
How can I get the desired orders (set 1 and set 2) without manually intercalating the columns in select() ?
Further notes:
SET 1:
I need to intercalate (in increasing order 50, then 100, then 150, then 200) "G1" and "G2" within each variable. Ex: NARR_G1_50_AAA, NARR_G2_50_AAA... There are 4 per number (AAA, AAB, AC and AB)
SET 2:
I need to intercalate (in increasing order 50, then 100, then 150, then 200) "NARR" and "ARG" comparing G1 and G2. Such as: NARR_G1_50_AAA, NARR_G2_50_AAA... thanks in advance :)
If it should be custom order, an option would be to split up the column names at _, then convert to factor with levels specified in the order we wanted
lvls1 <- c("NARR", "ARG")
lvls2 <- c("G1", "G2")
lvls3 <- c("AAA", "AAC", "AC", "AB")
#v1 <- names(merged_DF)[-1] # assuming 'ID' is the first column
d1 <- read.table(text = v1, header = FALSE, sep = "_")
i1 <- !sapply(d1, is.numeric)
d1[i1] <- Map(factor, d1[i1], levels = list(lvls1, lvls2, lvls3))
v2 <- v1[do.call(order, d1[c(3, 1,4, 2)])]
library(dplyr)
merged_DF %>%
select(ID, all_of(v2))
where v2 is
> v2
[1] "NARR_G1_50_AAA" "NARR_G2_50_AAA" "NARR_G1_50_AAC" "NARR_G2_50_AAC" "NARR_G1_50_AC" "NARR_G2_50_AC" "NARR_G1_50_AB" "NARR_G2_50_AB"
[9] "ARG_G1_50_AAA" "ARG_G2_50_AAA" "ARG_G1_50_AAC" "ARG_G2_50_AAC" "ARG_G1_50_AC" "ARG_G2_50_AC" "ARG_G1_50_AB" "ARG_G2_50_AB"
[17] "NARR_G1_100_AAA" "NARR_G2_100_AAA" "NARR_G1_100_AAC" "NARR_G2_100_AAC" "NARR_G1_100_AC" "NARR_G2_100_AC" "NARR_G1_100_AB" "NARR_G2_100_AB"
[25] "ARG_G1_100_AAA" "ARG_G2_100_AAA" "ARG_G1_100_AAC" "ARG_G2_100_AAC" "ARG_G1_100_AC" "ARG_G2_100_AC" "ARG_G1_100_AB" "ARG_G2_100_AB"
[33] "NARR_G1_150_AAA" "NARR_G2_150_AAA" "NARR_G1_150_AAC" "NARR_G2_150_AAC" "NARR_G1_150_AC" "NARR_G2_150_AC" "NARR_G1_150_AB" "NARR_G2_150_AB"
[41] "ARG_G1_150_AAA" "ARG_G2_150_AAA" "ARG_G1_150_AAC" "ARG_G2_150_AAC" "ARG_G1_150_AC" "ARG_G2_150_AC" "ARG_G1_150_AB" "ARG_G2_150_AB"
data
# it is a random order of the column names which is ordered in the code
v1 <- c("NARR_G1_100_AB", "NARR_G1_150_AAC", "NARR_G2_50_AB", "NARR_G1_150_AB",
"NARR_G2_100_AAA", "NARR_G1_100_AAC", "ARG_G1_150_AC", "ARG_G2_50_AAA",
"ARG_G2_150_AAA", "ARG_G1_50_AAA", "ARG_G2_100_AC", "NARR_G1_150_AAA",
"NARR_G2_100_AC", "ARG_G1_50_AC", "NARR_G1_100_AAA", "ARG_G2_50_AB",
"NARR_G1_150_AC", "ARG_G2_50_AAC", "ARG_G2_150_AB", "NARR_G2_100_AAC",
"NARR_G2_150_AAA", "NARR_G1_100_AC", "ARG_G1_150_AB", "ARG_G1_50_AAC",
"NARR_G1_50_AC", "ARG_G2_150_AAC", "NARR_G1_50_AAA", "NARR_G2_150_AB",
"NARR_G2_150_AAC", "ARG_G1_150_AAA", "ARG_G2_50_AC", "NARR_G2_50_AC",
"ARG_G1_150_AAC", "ARG_G1_100_AC", "ARG_G1_100_AAA", "NARR_G1_50_AAC",
"NARR_G2_150_AC", "ARG_G1_100_AAC", "ARG_G2_100_AAA", "ARG_G2_100_AAC",
"NARR_G1_50_AB", "NARR_G2_100_AB", "ARG_G2_100_AB", "ARG_G1_50_AB",
"NARR_G2_50_AAA", "ARG_G1_100_AB", "ARG_G2_150_AC", "NARR_G2_50_AAC"
)
I have a dataframe that I have to sort in decreasing order of absolute row value without changing the actual values (some of which are negative).
To give you an example, e.g. for the 1st row, I would like to go from
-0.01189179 0.03687456 -0.12202753 to
-0.12202753 0.03687456 -0.01189179.
For the 2nd row from
-0.04220260 0.04129326 -0.07178175 to
-0.07178175 -0.04220260 0.04129326 etc.
How can I do this in R?
Many thanks!
Try this
lst <- lapply(df , \(x) order(-abs(x)))
ans <- data.frame(Map(\(x,y) x[y] , df ,lst))
output
a b
1 -0.01189179 -0.07178175
2 0.03687456 -0.04220260
3 -0.12202753 0.04129326
data
df <- structure(list(a = c(-0.12202753, 0.03687456, -0.01189179), b = c(-0.0422026,
0.04129326, -0.07178175)), row.names = c(NA, -3L), class = "data.frame")
Here is a simple approach (using #Mohamed Desouky's Data)
df <- df[nrow(df):1,]
> df
a b
3 -0.01189179 -0.07178175
2 0.03687456 0.04129326
1 -0.12202753 -0.04220260
I have data with coordinates, without missing values. I would like to define them as coordinates using sp, but for a subset of the data. When I use
subset_of_data <- data[data$variable == x, ]
coordinates_from_data = subset(subset_of_data, select=c("S_X", "S_Y"))
coordinates(coordinates_from_data) <- c("S_X", "S_Y")
I get:
Error in `coordinates<-`(`*tmp*`, value = c("S_X", "S_Y")) :
coordinates are not allowed to contain missing values
But when I use subset, there is no problem:
subset_of_data <- subset(data, data$variable == x)
coordinates_from_data = subset(subset_of_data, select=c("S_X", "S_Y"))
coordinates(coordinates_from_data) <- c("S_X", "S_Y")
I don't get the error.
Any idea why it is so?
It has nothing to do with sp; it is just how subsetting works in R. Let's take an example:
df <- data.frame(city = c("Paris", "Berlin", NA),
x_coordinate = c(48.8589507, 52.5069312, 50.8550625),
y_coordinate = c(2.27702, 13.1445501, 4.3053501))
df
city x_coordinate y_coordinate
1 Paris 48.85895 2.27702
2 Berlin 52.50693 13.14455
3 <NA> 50.85506 4.30535
If we turn this dataframe into coordinates, it works, since there is no NA:
coordinates(df) <- c("x_coordinate", "y_coordinate")
Let's imagine now that we want to transform in coordinates only a subset of df, e.g., only Paris. If we do:
sub_df = df[df$city == "Paris", ]
We get:
city x_coordinate y_coordinate
1 Paris 48.85895 2.27702
NA <NA> NA NA
In this case, transforming into coordinates doesn't work anymore, since the subsetting variable contains NA values and subsetting creates NA values in the coordinates variables.
coordinates(sub_df) <- c("x_coordinate", "y_coordinate")
Error in `coordinates<-`(`*tmp*`, value = c("X_coordinate", "Y_coordinate" :
coordinates are not allowed to contain missing values
The way subset works is different:
sub_df_2 = subset(df, df$city == "Paris")
sub_df_2
coordinates city
1 (48.85895, 2.27702) Paris
Another option is to be more specific when using [:
sub_df_3 = df[df$city == "Paris" & !is.na(df$city), ]
sub_df_3
coordinates city
1 (48.85895, 2.27702) Paris
For Python users
It's quite different from Pandas' [ operator:
import pandas as pd
import numpy as np
df = pd.DataFrame({'city': ['Paris', 'Berlin', np.NaN],
'x_coordinate': [48.8589507, 52.5069312, 50.8550625],
'y_coordinate': [2.27702, 13.1445501, 4.3053501]})
print(df[df["city"] == 'Paris'])
city x_coordinate y_coordinate
0 Paris 48.858951 2.27702
I want to replace NA values in my xts object with formula Beta * Exposure * Index return.
My xts object is suppose Position_SimPnl created below:
library(xts)
df1 <- data.frame(Google = c(NA, NA, NA, NA, 500, 600, 700, 800),
Apple = c(10, 20,30,40,50,60,70,80),
Audi = c(1,2,3,4,5,6,7,8),
BMW = c(NA, NA, NA, NA, NA, 6000,7000,8000),
AENA = c(50,51,52,53,54,55,56,57))
Position_SimPnl <- xts(df1, order.by = Sys.Date() - 1:8)
For Beta there is a specific dataframe:
Beta_table <- data.frame (AENA = c(0.3,0.5,0.6), Apple = c(0.2,0.5,0.8), Google = c(0.1,0.3,0.5), Audi = c(0.4,0.6,0.7), AXP = c(0.5,0.7, 0.9), BMW = c(0.3,0.4, 0.5))
rownames(Beta_table) <- c(".SPX", ".FTSE", ".STOXX")
For exposure there is another dataframe:
Base <- data.frame (RIC = c("AENA","BMW","Apple","Audi","Google"), Exposure = c(100,200,300,400,500))
For Index return there is a xts object (Index_FX_Returns):
df2 <- data.frame(.SPX = c(0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08),
.FTSE = c(0.5, 0.4,0.3,0.2,0.3,0.4,0.3,0.4),
.STOXX = c(0.15,0.25,0.35,0.3,0.45,0.55,0.65,0.5))
Index_FX_Returns <- xts(df2,order.by = Sys.Date() - 1:8)
Also there is a dataframe which links RIC with Index:
RIC_Curr_Ind <- data.frame(RIC = c("AENA", "Apple", "Google", "Audi", "BMW"), Currency = c("EUR.","USD.","USD.","EUR.","EUR."), Index = c(".STOXX",".SPX",".SPX",".FTSE",".FTSE"))
What I want is for a particular position of NA value in Position_SimPnl it should look into the column name and get the corresponding index name from RIC_Curr_Ind dataframe and then look for the beta value from Beta_table by matching column name (column name of NA) and row name (index name derived from column name of NA).
Then again by matching the column name from Position_SimPnl with the RIC column from 'Base' dataframe it would extract the corresponding exposure value.
Then by matching column name from Position_SimPnl with RIC column from RIC_Curr_Ind dataframe, it would get the corresponding index name and from that index name it would look into the column name for xts object Index_FX_Returns and get the corresponding return value for the same date as of the NA value.
After getting the Beta, Exposure and Index return values I want the NA value to be replaced by formula: Beta * Exposure * Index return. Also I want only the NA values in Position_SimPnl to be replaced. the other values should remain as it was previously.I used the following formula for replacing the NA values:
do.call(merge, lapply(Position_SimPnl, function(y) {if(is.na(y)){y = (Beta_table[match(RIC_Curr_Ind$Index[match(colnames(y),RIC_Curr_Ind$RIC)],rownames(Beta_table)), match(colnames(y),colnames(Beta_table))]) * (Base$Exposure[match(colnames(y), Base$RIC)]) * (Index_FX_Returns[,RIC_Curr_Ind$Index[match(colnames(y),RIC_Curr_Ind$RIC)]])} else{y}}))
However in the output, if a particular column contains NA it is replacing all the values in the column (including which were not NA previously). Also I am getting multiple warning messages like
"In if (is.na(y)) { ... :
the condition has length > 1 and only the first element will be used".
I think because of this all values of column are getting transformed including non-NA ones. Can anyone suggest how to effectively replace these NA values by the formula mentioned above, keeping the other values same. Any help would be appreciated
Because you need to combine all data sets to achieve your formula Beta * Exposure * Index, consider building a master data frame comprised of all needed components. However, you face two challenges:
different data types (xts objects and data frame)
different data formats (wide and long formats)
For proper merging and calculating, consider converting all data components into data frames and reshaping to long format (i.e., all but Base and RIC_Curr_Ind). Then, merge and calculate with ifelse to fill NA values. Of course, at the end, you will have to reshape back to wide and convert back to XTS.
Reshape
# USER-DEFINED METHOD GIVEN THE MULTIPLE CALLS
proc_transpose <- function(df, col_pick, val_col, time_col) {
reshape(df,
varying = names(df)[col_pick],
times = names(df)[col_pick], ids = NULL,
v.names = val_col, timevar = time_col,
new.row.names = 1:1E4, direction = "long")
}
# POSITIONS
Position_SimPnl_wide_df <- data.frame(date = index(Position_SimPnl),
coredata(Position_SimPnl))
Position_SimPnl_long_df <- proc_transpose(Position_SimPnl_wide_df, col_pick = -1,
val_col = "Position", time_col = "RIC")
# BETA
Beta_table_long_df <- proc_transpose(transform(Beta_table, Index = row.names(Beta_table)),
col_pick = 1:ncol(Beta_table),
val_col = "Beta", time_col = "RIC")
# INDEX
Index_FX_Returns_wide_df <- data.frame(date = index(Index_FX_Returns),
coredata(Index_FX_Returns))
Index_FX_Returns_long_df <- proc_transpose(Index_FX_Returns_wide_df, col = -1,
val_col = "Index_value", time_col = "Index")
Merge
# CHAIN MERGE
master_df <- Reduce(function(...) merge(..., by="RIC"),
list(Position_SimPnl_long_df,
Beta_table_long_df,
Base)
)
# ADDITIONAL MERGES (NOT INCLUDED IN ABOVE CHAIN DUE TO DIFFERENT by)
master_df <- merge(master_df,
Index_FX_Returns_long_df, by=c("Index", "date"))
master_df <- merge(master_df,
RIC_Curr_Ind, by=c("Index", "RIC"))
Calculation
# FORMULA: Beta * Exposure * Index
master_df$Position <- with(master_df, ifelse(is.na(Position),
Beta * Exposure * Index_value,
Position))
Final Preparation
# RE-ORDER ROWS AND SUBSET COLS
master_df <- data.frame(with(master_df, master_df[order(RIC, date),
c("date", "RIC", "Position")]),
row.names = NULL)
# RESHAPE WIDE (REVERSE OF ABOVE)
Position_SimPnl_new <- setNames(reshape(master_df, idvar = "date",
v.names = "Position", timevar = "RIC",
direction = "wide"),
c("date", unique(master_df$RIC)))
# CONVERT TO XTS
Position_SimPnl_new <- xts(transform(Position_SimPnl_new, date = NULL),
order.by = Position_SimPnl_new$date)
Position_SimPnl_new
# AENA Apple Audi BMW Google
# 2019-11-27 58 80 8 8000 800.0
# 2019-11-28 57 70 7 7000 700.0
# 2019-11-29 56 60 6 6000 600.0
# 2019-11-30 55 50 5 24 500.0
# 2019-12-01 54 40 4 16 2.0
# 2019-12-02 53 30 3 24 1.5
# 2019-12-03 52 20 2 32 1.0
# 2019-12-04 51 10 1 40 0.5
The following data frame contain a "Campaign" column, the value of column contain information about season, name, and position, however, the order of these information are quiet different in each row. Lucky, these information is a fixed list, so we could create a vector to match the string inside the "Campaign_name" column.
Date Campaign
1 Jan-15 Summer|Peter|Up
2 Feb-15 David|Winter|Down
3 Mar-15 Up|Peter|Spring
Here is what I want to do, I want to create 3 columns as Name, Season, Position. So these column can search the string inside the campaign column and return the matched value from the list below.
Name <- c("Peter, David")
Season <- c("Summer","Spring","Autumn", "Winter")
Position <- c("Up","Down")
So my desired result would be following
Temp
Date Campaign Name Season Position
1 15-Jan Summer|Peter|Up Peter Summer Up
2 15-Feb David|Winter|Down David Winter Down
3 15-Mar Up|Peter|Spring Peter Spring Up
Another way:
L <- strsplit(df$Campaign,split = '\\|')
df$Name <- sapply(L,intersect,Name)
df$Season <- sapply(L,intersect,Season)
df$Position <- sapply(L,intersect,Position)
Do the following:
Date = c("Jan-15","Feb-15","Mar-15")
Campaign = c("Summer|Peter|Up","David|Winter|Down","Up|Peter|Spring")
df = data.frame(Date,Campaign)
Name <- c("Peter", "David")
Season <- c("Summer","Spring","Autumn", "Winter")
Position <- c("Up","Down")
for(k in Name){
df$Name[grepl(pattern = k, x = df$Campaign)] <- k
}
for(k in Season){
df$Season[grepl(pattern = k, x = df$Campaign)] <- k
}
for(k in Position){
df$Position[grepl(pattern = k, x = df$Campaign)] <- k
}
This gives:
> df
Date Campaign Name Season Position
1 Jan-15 Summer|Peter|Up Peter Summer Up
2 Feb-15 David|Winter|Down David Winter Down
3 Mar-15 Up|Peter|Spring Peter Spring Up
I had the same idea as Marat Talipov; here's a data.table option:
library(data.table)
Name <- c("Peter", "David")
Season <- c("Summer","Spring","Autumn", "Winter")
Position <- c("Up","Down")
dat <- data.table(Date=c("Jan-15", "Feb-15", "Mar-15"),
Campaign=c("Summer|Peter|Up", "David|Winter|Down", "Up|Peter|Spring"))
Gives
> dat
Date Campaign
1: Jan-15 Summer|Peter|Up
2: Feb-15 David|Winter|Down
3: Mar-15 Up|Peter|Spring
Processing is then
dat[ , `:=`(Name = sapply(strsplit(Campaign, "|", fixed=TRUE), intersect, Name),
Season = sapply(strsplit(Campaign, "|", fixed=TRUE), intersect, Season),
Position = sapply(strsplit(Campaign, "|", fixed=TRUE), intersect, Position))
]
Result:
> dat
Date Campaign Name Season Position
1: Jan-15 Summer|Peter|Up Peter Summer Up
2: Feb-15 David|Winter|Down David Winter Down
3: Mar-15 Up|Peter|Spring Peter Spring Up
Maybe there's some benefit if you're doing this to a lot of columns or need to modify in place (by reference).
I'm interested if anyone can show me how to update all three columns at once.
EDIT: Never mind, figured it out;
for (icol in c("Name", "Season", "Position"))
dat[, (icol):=sapply(strsplit(Campaign, "|", fixed=TRUE), intersect, get(icol))]