Replacing the entries in a data frame in R

Replacing the entries in a data frame in R - r

I have a data frame of dimension 100 by 54, where the rows are stock values at the end of the week, and each column represents a stock. I want to replace each entry in my data frame with the return value of the stock, so divide the current value of the cell by the previous one, and replace the current value by the new value. Example: Say I have this data frame with these values,
table 1
I want to manipulate my data frame to be:
table 2
So that it can eventually look like this:
table 3
I have written this as my code, but it does not do that job. I was wondering if someone can help me.
Returns99 <- NULL
for(i in 2:100){
Returns99 <- rbind(Returns99, rep(NA, 54))
Returns <- rbind(Returns99, (df100[i, ]/df100[i-1,]))
}
Where df100 is the data frame with price entries.

You don't need a loop. With Base R,
rbind(NA, df100[-1,] / df100[-nrow(df100),])
gives,
AGG DBC DFE
1 NA NA NA
2 1.0000000 1.0000000 1.0000000
3 1.0021019 0.9739496 0.9990862
4 0.9993008 1.0008628 0.9911585
Data:
structure(list(AGG = c(99.91, 99.91, 100.12, 100.05), DBC = c(23.8,
23.8, 23.18, 23.2), DFE = c(65.66, 65.66, 65.6, 65.02)), class = "data.frame", row.names = c(NA,
-4L))

Related

Sort dataframe by row index value without changing values

I have a dataframe that I have to sort in decreasing order of absolute row value without changing the actual values (some of which are negative).
To give you an example, e.g. for the 1st row, I would like to go from
-0.01189179 0.03687456 -0.12202753 to
-0.12202753 0.03687456 -0.01189179.
For the 2nd row from
-0.04220260 0.04129326 -0.07178175 to
-0.07178175 -0.04220260 0.04129326 etc.
How can I do this in R?
Many thanks!

Try this
lst <- lapply(df , \(x) order(-abs(x)))
ans <- data.frame(Map(\(x,y) x[y] , df ,lst))
output
a b
1 -0.01189179 -0.07178175
2 0.03687456 -0.04220260
3 -0.12202753 0.04129326
data
df <- structure(list(a = c(-0.12202753, 0.03687456, -0.01189179), b = c(-0.0422026,
0.04129326, -0.07178175)), row.names = c(NA, -3L), class = "data.frame")

Here is a simple approach (using #Mohamed Desouky's Data)
df <- df[nrow(df):1,]
> df
a b
3 -0.01189179 -0.07178175
2 0.03687456 0.04129326
1 -0.12202753 -0.04220260

Continuous multiplication same column previous value

I have a problem.
I have the following data frame.
1
2
NA
100
1.00499
NA
1.00813
NA
0.99203
NA
Two columns. In the second column, apart from the starting value, there are only NAs. I want to fill the first NA of the 2nd column by multiplying the 1st value from column 2 with the 2nd value from column 1 (100* 1.00499). The 3rd value of column 2 should be the product of the 2nd new created value in column 2 and the 3rd value in column 1 and so on. So that at the end the NAs are replaced by values.
These two sources have helped me understand how to refer to different rows. But in both cases a new column is created.I don't want that. I want to fill the already existing column 2.
Use a value from the previous row in an R data.table calculation
https://statisticsglobe.com/use-previous-row-of-data-table-in-r
Can anyone help me?
Thanks so much in advance.
Sample code
library(quantmod)
data.N225<-getSymbols("^N225",from="1965-01-01", to="2022-03-30", auto.assign=FALSE, src='yahoo')
data.N225[c(1:3, nrow(data.N225)),]
data.N225<- na.omit(data.N225)
N225 <- data.N225[,6]
N225$DiskreteRendite= Delt(N225$N225.Adjusted)
N225[c(1:3,nrow(N225)),]
options(digits=5)
N225.diskret <- N225[,3]
N225.diskret[c(1:3,nrow(N225.diskret)),]
N225$diskretplus1 <- N225$DiskreteRendite+1
N225[c(1:3,nrow(N225)),]
library(dplyr)
N225$normiert <-"Value"
N225$normiert[1,] <-100
N225[c(1:3,nrow(N225)),]
N225.new <- N225[,4:5]
N225.new[c(1:3,nrow(N225.new)),]
Here is the code to create the data frame in R studio.
a <- c(NA, 1.0050,1.0081, 1.0095, 1.0016,0.9947)
b <- c(100, NA, NA, NA, NA, NA)
c<- data.frame(ONE = a, TWO=b)

You could use cumprod for cummulative product
transform(
df,
TWO = cumprod(c(na.omit(TWO),na.omit(ONE)))
)
which yields
ONE TWO
1 NA 100.0000
2 1.0050 100.5000
3 1.0081 101.3140
4 1.0095 102.2765
5 1.0016 102.4402
6 0.9947 101.8972
data
> dput(df)
structure(list(ONE = c(NA, 1.005, 1.0081, 1.0095, 1.0016, 0.9947
), TWO = c(100, NA, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA,
-6L))

What about (gasp) a for loop?
(I'll use dat instead of c for your dataframe to avoid confusion with function c()).
for (row in 2:nrow(dat)) {
if (!is.na(dat$TWO[row-1])) {
dat$TWO[row] <- dat$ONE[row] * dat$TWO[row-1]
}
}
This means:
For each row from the second to the end, if the TWO in the previous row is not a missing value, calculate the TWO in this row by multiplying ONE in the current row and TWO from the previous row.
Output:
#> ONE TWO
#> 1 NA 100.0000
#> 2 1.0050 100.5000
#> 3 1.0081 101.3140
#> 4 1.0095 102.2765
#> 5 1.0016 102.4402
#> 6 0.9947 101.8972
Created on 2022-04-28 by the reprex package (v2.0.1)
I'd love to read a dplyr solution!

Replacing only NA values in xts object column wise using specific formula

I want to replace NA values in my xts object with formula Beta * Exposure * Index return.
My xts object is suppose Position_SimPnl created below:
library(xts)
df1 <- data.frame(Google = c(NA, NA, NA, NA, 500, 600, 700, 800),
Apple = c(10, 20,30,40,50,60,70,80),
Audi = c(1,2,3,4,5,6,7,8),
BMW = c(NA, NA, NA, NA, NA, 6000,7000,8000),
AENA = c(50,51,52,53,54,55,56,57))
Position_SimPnl <- xts(df1, order.by = Sys.Date() - 1:8)
For Beta there is a specific dataframe:
Beta_table <- data.frame (AENA = c(0.3,0.5,0.6), Apple = c(0.2,0.5,0.8), Google = c(0.1,0.3,0.5), Audi = c(0.4,0.6,0.7), AXP = c(0.5,0.7, 0.9), BMW = c(0.3,0.4, 0.5))
rownames(Beta_table) <- c(".SPX", ".FTSE", ".STOXX")
For exposure there is another dataframe:
Base <- data.frame (RIC = c("AENA","BMW","Apple","Audi","Google"), Exposure = c(100,200,300,400,500))
For Index return there is a xts object (Index_FX_Returns):
df2 <- data.frame(.SPX = c(0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08),
.FTSE = c(0.5, 0.4,0.3,0.2,0.3,0.4,0.3,0.4),
.STOXX = c(0.15,0.25,0.35,0.3,0.45,0.55,0.65,0.5))
Index_FX_Returns <- xts(df2,order.by = Sys.Date() - 1:8)
Also there is a dataframe which links RIC with Index:
RIC_Curr_Ind <- data.frame(RIC = c("AENA", "Apple", "Google", "Audi", "BMW"), Currency = c("EUR.","USD.","USD.","EUR.","EUR."), Index = c(".STOXX",".SPX",".SPX",".FTSE",".FTSE"))
What I want is for a particular position of NA value in Position_SimPnl it should look into the column name and get the corresponding index name from RIC_Curr_Ind dataframe and then look for the beta value from Beta_table by matching column name (column name of NA) and row name (index name derived from column name of NA).
Then again by matching the column name from Position_SimPnl with the RIC column from 'Base' dataframe it would extract the corresponding exposure value.
Then by matching column name from Position_SimPnl with RIC column from RIC_Curr_Ind dataframe, it would get the corresponding index name and from that index name it would look into the column name for xts object Index_FX_Returns and get the corresponding return value for the same date as of the NA value.
After getting the Beta, Exposure and Index return values I want the NA value to be replaced by formula: Beta * Exposure * Index return. Also I want only the NA values in Position_SimPnl to be replaced. the other values should remain as it was previously.I used the following formula for replacing the NA values:
do.call(merge, lapply(Position_SimPnl, function(y) {if(is.na(y)){y = (Beta_table[match(RIC_Curr_Ind$Index[match(colnames(y),RIC_Curr_Ind$RIC)],rownames(Beta_table)), match(colnames(y),colnames(Beta_table))]) * (Base$Exposure[match(colnames(y), Base$RIC)]) * (Index_FX_Returns[,RIC_Curr_Ind$Index[match(colnames(y),RIC_Curr_Ind$RIC)]])} else{y}}))
However in the output, if a particular column contains NA it is replacing all the values in the column (including which were not NA previously). Also I am getting multiple warning messages like
"In if (is.na(y)) { ... :
the condition has length > 1 and only the first element will be used".
I think because of this all values of column are getting transformed including non-NA ones. Can anyone suggest how to effectively replace these NA values by the formula mentioned above, keeping the other values same. Any help would be appreciated

Because you need to combine all data sets to achieve your formula Beta * Exposure * Index, consider building a master data frame comprised of all needed components. However, you face two challenges:
different data types (xts objects and data frame)
different data formats (wide and long formats)
For proper merging and calculating, consider converting all data components into data frames and reshaping to long format (i.e., all but Base and RIC_Curr_Ind). Then, merge and calculate with ifelse to fill NA values. Of course, at the end, you will have to reshape back to wide and convert back to XTS.
Reshape
# USER-DEFINED METHOD GIVEN THE MULTIPLE CALLS
proc_transpose <- function(df, col_pick, val_col, time_col) {
reshape(df,
varying = names(df)[col_pick],
times = names(df)[col_pick], ids = NULL,
v.names = val_col, timevar = time_col,
new.row.names = 1:1E4, direction = "long")
}
# POSITIONS
Position_SimPnl_wide_df <- data.frame(date = index(Position_SimPnl),
coredata(Position_SimPnl))
Position_SimPnl_long_df <- proc_transpose(Position_SimPnl_wide_df, col_pick = -1,
val_col = "Position", time_col = "RIC")
# BETA
Beta_table_long_df <- proc_transpose(transform(Beta_table, Index = row.names(Beta_table)),
col_pick = 1:ncol(Beta_table),
val_col = "Beta", time_col = "RIC")
# INDEX
Index_FX_Returns_wide_df <- data.frame(date = index(Index_FX_Returns),
coredata(Index_FX_Returns))
Index_FX_Returns_long_df <- proc_transpose(Index_FX_Returns_wide_df, col = -1,
val_col = "Index_value", time_col = "Index")
Merge
# CHAIN MERGE
master_df <- Reduce(function(...) merge(..., by="RIC"),
list(Position_SimPnl_long_df,
Beta_table_long_df,
Base)
)
# ADDITIONAL MERGES (NOT INCLUDED IN ABOVE CHAIN DUE TO DIFFERENT by)
master_df <- merge(master_df,
Index_FX_Returns_long_df, by=c("Index", "date"))
master_df <- merge(master_df,
RIC_Curr_Ind, by=c("Index", "RIC"))
Calculation
# FORMULA: Beta * Exposure * Index
master_df$Position <- with(master_df, ifelse(is.na(Position),
Beta * Exposure * Index_value,
Position))
Final Preparation
# RE-ORDER ROWS AND SUBSET COLS
master_df <- data.frame(with(master_df, master_df[order(RIC, date),
c("date", "RIC", "Position")]),
row.names = NULL)
# RESHAPE WIDE (REVERSE OF ABOVE)
Position_SimPnl_new <- setNames(reshape(master_df, idvar = "date",
v.names = "Position", timevar = "RIC",
direction = "wide"),
c("date", unique(master_df$RIC)))
# CONVERT TO XTS
Position_SimPnl_new <- xts(transform(Position_SimPnl_new, date = NULL),
order.by = Position_SimPnl_new$date)
Position_SimPnl_new
# AENA Apple Audi BMW Google
# 2019-11-27 58 80 8 8000 800.0
# 2019-11-28 57 70 7 7000 700.0
# 2019-11-29 56 60 6 6000 600.0
# 2019-11-30 55 50 5 24 500.0
# 2019-12-01 54 40 4 16 2.0
# 2019-12-02 53 30 3 24 1.5
# 2019-12-03 52 20 2 32 1.0
# 2019-12-04 51 10 1 40 0.5

Creating a new unique dataset from dates and categories in R

I have a dataframe that has OrderDate and MajorCategory as the two variables. OrderDates range from 2005-01-01 to 2007-12-31, and MajorCategory runs from 1 to 73 with around 35.5 million entries. Each OrderDate references a specific order, which has an ID number and also is attributed to a specific MajorCategory. I am trying to create a dataframe to show each unique OrderDate and the count of each MajorCategory that was ordered on that date.
The dataset currently looks something like:
OrderDate MajorCategory
2005-12-12 66
2005-12-12 66
2006-03-28 43
2006-05-16 66
I have separated the unique OrderDate (after changing the class to Date) into its own dataframe by using:
OD <- as.data.frame(unique(DMEFLines3Dataset2$OrderDate))
OD <- as.data.frame(sort(OD$`unique(DMEFLines3Dataset2$OrderDate)`))
I'm not sure how to get the MajorCategory to show me a count for each date. So the desired output would be something like:
OD MC_1 MC_2
2005-01-01 4 6
2005-01-02 7 45
2005-01-03 3 23
where OD is the Order Date and MC_X is the MajorCategory's order count per date (MC_1 to MC_73).
I tried using for loops, frequency, and count, but I can't seem to figure it out.

I am not an R expert, and if given the option I would try to aggregate the data as needed in a different language and then load the aggregated data into an R data frame for any further analysis.
I have done something close to what you are asking by calculating ROC graphs from the output of a third party naive bayes model which consisted of appointment detail grouped by departments. Tweaking my code a bit, I was able to get a dataframe with counts of an identifier grouped by date, which seems to be structured the way you are asking for.
library(RODBC)
dbConnection <- 'Driver={SQL Server};Server=SERVERNAME;Database=DBName;Trusted_Connection=yes'
channel <- odbcDriverConnect(dbConnection)
InputDataSet <- sqlQuery(channel, "
SELECT OrderID, OrderDate, MajorCategory from [dbo].[myDataSet];"
)
results <- data.frame("date", "ordCount")
names(results) <- c("date", "ordCount")
for (dt in unique(InputDataSet$OrderDate)) {
ordCount <- 0
filteredSet = InputDataSet[InputDataSet$OrderDate == dt,]
for (mc in unique(filteredSet$MajorCategory)) {
ordCount <- ordCount+1
}
df <- data.frame(dt, ordCount)
names(df) <- c("date", "ordCount")
results <- rbind(df, results)
}
results

library(tidyverse)
df1 <- df %>%
group_by(OrderDate, MajorCategory) %>%
tally() %>%
mutate(MajorCategory = paste("MC", MajorCategory, sep="_")) %>%
spread(MajorCategory, n)
df1
Output is:
OrderDate MC_43 MC_66 MC_67
1 2005-12-12 NA 2 1
2 2006-03-28 1 NA NA
3 2006-05-16 NA 1 NA
Sample data:
df <- structure(list(OrderDate = c("2005-12-12", "2005-12-12", "2005-12-12",
"2006-03-28", "2006-05-16"), MajorCategory = c(66L, 66L, 67L,
43L, 66L)), .Names = c("OrderDate", "MajorCategory"), class = "data.frame", row.names = c(NA,
-5L))

OrderDate<- as.Date(c('2005-12-12','2005-12-12','2006-03-28','2006-05-16','2005-03-04','2005-12-12'))
MajorCategory<- as.numeric(c(66, 66, 43, 66, 43, 1))
OD=data.frame(OrderDate,MajorCategory)
out <- split(OD, OD$MajorCategory)
count=lapply(out, function(x) aggregate(x$MajorCategory, FUN = length, by = list(x$OrderDate)))

R - Check if element from vector exist in data.frame, and if not, add dummy values

Having a vector of campaigns:
campaignsTypes <- c("Social Media","Distribution","Nurture","Newsletter","Push")
and a data.frame with information about them:
out <- structure(list(Type = c("Distribution", "Newsletter", "Nurture",
"Social Media"), Pageviews = c(42, 880, 17, 84)), .Names = c("Type",
"Pageviews"), row.names = c(NA, -4L), class = "data.frame")
I want to check if all elements from vector campaignsTypes are included in the data.frame out, and if not, create a new row with dummy values for this missing campaign. So far, I can check if a campaigngType is not present. However, I'm having problems into assigning the not existing element from vector as value for the first column of a manually inserted new row:
> ifelse(campaignsTypes %in% out$Type == FALSE,rbind(out, c(????,0)),"")
How to put the value of the missing campaign here?----------⤴

You can create a new data frame with the missing rows, and then stack the
two data frames.
rbind(out, data.frame(Type=setdiff(campaignsTypes, out$Type),
Pageviews=0L))
Result:
Type Pageviews
1 Distribution 42
2 Newsletter 880
3 Nurture 17
4 Social Media 84
5 Push 0

One way to do it,
output <- rbind(out, campaignsTypes[sapply(campaignsTypes, function(i) !(i %in% out$Type))])
output$Pageviews[output$Pageviews == output$Type] <- 0
output
# Type Pageviews
#1 Distribution 42
#2 Newsletter 880
#3 Nurture 17
#4 Social Media 84
#5 Push 0

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Replacing the entries in a data frame in R - r

Related

Sort dataframe by row index value without changing values

Continuous multiplication same column previous value

Replacing only NA values in xts object column wise using specific formula

Creating a new unique dataset from dates and categories in R

R - Check if element from vector exist in data.frame, and if not, add dummy values

Categories

Resources