R modify a column only if it exists - r

Suppose I have the following code snippet:
mainResult$Time <- formatTime(mainResult$Time, "DateAndTime")
mainResult$SettleDate <- formatTime(mainResult$SettleDate, "DateAndTime")
mainResult$IssueDate <- formatTime(mainResult$IssueDate, "DateAndTime")
mainResult$Maturity <- formatTime(mainResult$Maturity, "DateAndTime")
mainResult$Bid <- formatNumber(mainResult$Bid, "withDecimals")
mainResult$Ask <- formatNumber(mainResult$Ask, "withDecimals")
mainResult$AvgBid <- formatNumber(mainResult$AvgBid, "withDecimals")
mainResult$AvgAsk <- formatNumber(mainResult$AvgAsk, "withDecimals")
mainResult$BidYield <- formatNumber(mainResult$BidYield, "withDecimals")
mainResult$AskYield <- formatNumber(mainResult$AskYield, "withDecimals")
mainResult$BidSize <- formatNumber(mainResult$BidSize, "noDecimals")
mainResult$AskSize <- formatNumber(mainResult$AskSize, "noDecimals")
mainResult$Coupon <- formatNumber(mainResult$Coupon, "withDecimals")
Each formatTime and formatNumber works fine only if the column exists. Is there a clean way for this to execute without me wrapping every statement in an if block that checks if the column exists?

This is a terrible way to code. Try to use something more like this:
# start with a vector of column names and loop
dt_columns = c("Time", "SettleDate", "IssueDate", "Maturity")
for (col in dt_columns) {
if (col %in% names(mainResult)) mainResults[[col]] = formatNumber(mainResult[[col]], "DateAndTime")
}
# you can repeat for your other cases
Or this:
# intersect and lapply
dt_columns = c("Time", "SettleDate", "IssueDate", "Maturity")
dt_columns = intersect(names(mainResults), dt_columns)
mainResult[dt_columns] = lapply(mainResult[dt_columns], formatNumber, "DateAndTime")

I'd probably start by separating the information about the transformations you want to perform to the columns from the code that does the transformation. Something more like this
numberWithDecimals <- c("Bid","Ask","AvgBid","AvgAsk", "BidYield", "AskYield", "Coupon")
numberNoDecimals <- c("BidSize", "AskSize")
timeDateAndTime <- c("Time", "SettleDate", "IssueDate", "Maturity")
fmtColumns <- function(data, txlist, fun, fmt) {
cols <- intersect(txlist, names(data))
if(length(cols) > 0) {
data[, cols, drop=F] <- lapply(data[, cols, drop=F], fun, fmt)
}
}
mainResult <- fmtColumns(mainResult, numberWithDecimals, formatNumber, "withDecimals")
mainResult <- fmtColumns(mainResult, numberNoDecimals, formatNumber, "noDecimals")
mainResult <- fmtColumns(mainResult, timeDateAndTime , formatTime, "DateAndTime")

Related

How to use a for loop with multiple results

I have to automate this sequence of functions:
for (i in c(15,17,20,24,25,26,27,28,29,45,50,52,55,60,62)) {
WBES_sf_angola_i <- subset(WBES_sf_angola, isic == i)
WBES_angola_i <- as_Spatial(WBES_sf_angola_i)
FDI_angola_i <- FDI_angola[FDI_angola$isic==i,]
dist_ao_i <- distm(WBES_angola_i,FDI_angola_i, fun = distGeo)/1000
rm(WBES_sf_angola_i,WBES_angola_i,FDI_angola_i)
}
As a result, I want a "dist_ao" for each i. The indexed values are to be found in the isic columns of the WBES_sf_angola and the FDI_angola datasets.
How can I embed the index in the various items' names?
EDIT:
I tried with following modification:
for (i in c(15,17,20,24,25,26,27,28,29,45,50,52,55,60,62)) {
WBES_sf_angola_i <- subset(WBES_sf_angola, isic == i)
WBES_angola_i <- as_Spatial(WBES_sf_angola_i)
FDI_angola_i <- FDI_angola[FDI_angola$isic==i,]
result_list <- list()
result_list[[paste0("dist_ao_", i)]] <- distm(WBES_angola_i,FDI_angola_i, fun = distGeo)/1000
rm(WBES_sf_angola_i,WBES_angola_i,FDI_angola_i)
}
and the output is just a list of 1 that contains dist_ao_62. Where do I avoid overwriting?
Untested (due to missing MRE) but should work:
result_list <- list()
for (i in c(15,17,20,24,25,26,27,28,29,45,50,52,55,60,62)) {
result_list[[paste0("dist_ao_", i)]] <- distm(as_Spatial(subset(WBES_sf_angola, isic == i)) , FDI_angola[FDI_angola$isic==i,], fun = distGeo)/1000
}
You could approach it this way. All resulting dataframes will be included in the list, which you can convert to a dataframe from the last line of the the code here. NOTE: since not reproducible, I have mostly taken the code from your question inside the loop.
WBES_sf_angola_result <- list() # renamed this, as it seems you are using a dataset with the name WBES_sf_angola
WBES_angola <- list()
FDI_angola <- list()
dist_ao <- list()
for (i in c(15,17,20,24,25,26,27,28,29,45,50,52,55,60,62)) {
WBES_sf_angola[[paste0("i_", i)]] <- subset(WBES_sf_angola, isic == i)
WBES_angola[[paste0("i_", i)] <- as_Spatial(WBES_sf_angola_i)
FDI_angola[[paste0("i_", i)] <- FDI_angola[FDI_angola$isic==i,]
dist_ao[[paste0("i_", i)] <- distm(WBES_angola_i,FDI_angola_i, fun = distGeo)/1000
rm(WBES_sf_angola_i,WBES_angola_i,FDI_angola_i)
}
WBES_sf_angola_result <- do.call(rbind, WBES_sf_angola_result) # to get a dataframe
Your subset data can also be accessed through list index. eg.
WBES_sf_angola_result[[i_15]] # for the first item.

R: Rewriting the behaviour of "[" for matrices?

I am trying to append the "matrix" class and in turn overwrite the default behaviour of "[". Code examples below:
annMatrix <- function(mat=NULL, rowAnn=NULL, colAnn=NULL) {
if(is.null(mat)) mat <- matrix(nrow=0, ncol=0)
mat <- as.matrix(mat)
if(is.null(rowAnn)) rowAnn <- data.frame(row.names=seq_len(nrow(mat)))
if(is.null(colAnn)) colAnn <- data.frame(row.names=seq_len(ncol(mat)))
rowAnn <- data.frame(rowAnn, stringsAsFactors=FALSE)
colAnn <- data.frame(colAnn, stringsAsFactors=FALSE)
stopifnot(nrow(mat)==nrow(rowAnn) & ncol(mat)==nrow(colAnn))
attr(mat, "colAnn") <- colAnn
attr(mat, "rowAnn") <- rowAnn
class(mat) <- append(class(mat), "annMatrix")
mat
}
`[.annMatrix` <- function(annMat, rowExpr=NULL, colExpr=NULL) {
stopifnot(is.valid.annMatrix(annMat))
rowExpr <- eval(substitute(list(rowExpr)), attr(annMat, "rowAnn"), parent.frame())
colExpr <- eval(substitute(list(colExpr)), attr(annMat, "colAnn"), parent.frame())
indsR <- unlist(rowExpr)
indsC <- unlist(colExpr)
if(is.null(indsR)) indsR <- seq_len(nrow(annMat))
if(is.null(indsC)) indsC <- seq_len(ncol(annMat))
attr(annMat, "rowAnn") <- attr(annMat, "rowAnn")[indsR,,drop=FALSE]
attr(annMat, "colAnn") <- attr(annMat, "colAnn")[indsC,,drop=FALSE]
annMat <- unclass(annMat)
annMat <- annMat[indsR,indsC,drop=FALSE]
class(annMat) <- append(class(annMat), "annMatrix")
annMat
}
The basic idea is to make matrix preserve it's specific attributes after subsetting.
However I am running into a problem:
How to write "[" function in such a way that it behaves differently when called with and without a comma:
annMat[i]
annMat[i,]
as the default "[" for matrices seems to do.
I was thinking to set second argument to some value by default, but the value will not change because of an added comma.

NA/NAN Argument Error in R loop

I am new in R and I would like to get some help. I have an R code with user defined function called Plot_linear_fit below that I ran without any errors.
ReportDateRange <- c("2010-11-01", "2017-01-31")
trendDateRange1 <- c("2015-01-01", "2015-08-31")
trendDateRange2 <- c("2015-01-01", "2016-10-31")
trendDateRange3 <- c("2015-01-01", "2016-06-30")
plotDate1 <- c("2011-01-01")
plotDate2 <- c("2015-01-01")
plotDate3 <- c("2013-01-01")
numoftrends <- 3
TRx.Plot1 <- Plot_linear_fit(Sum.TRx,
trendDateRange1,
plotDate1,
ReportDateRange)
TRx.Plot2 <- Plot_linear_fit(Sum.TRx,
trendDateRange2,
plotDate2,
ReportDateRange)
TRx.Plot3 <- Plot_linear_fit(Sum.TRx,
trendDateRange3,
plotDate3,
ReportDateRange)
I would like to put these codes in a loop using lapply however when I try to run it, I am getting the NA/NAN argument
ReportDateRange <- c("2010-11-01", "2017-01-31")
trendDateRange1 <- c("2015-01-01", "2015-08-31")
trendDateRange2 <- c("2015-01-01", "2016-10-31")
trendDateRange3 <- c("2015-01-01", "2016-06-30")
plotDate1 <- c("2011-01-01")
plotDate2 <- c("2015-01-01")
plotDate3 <- c("2013-01-01")
numoftrends <- 3
lapply(1:numoftrends, function(j) {
paste0('TRx.Plot', j) <- Plot_linear_fit(Sum.TRx,
paste0("trendDateRange", j),
paste0("plotDate", j),
ReportDateRange)
})
I am not so sure what is wrong when you put this in lapply. The output of the function Plot_linear_fit is a dataframe. Thank you for your help.
The minimal fix would be (I guess, untested code because you didn't provide a MRE):
lapply(1:numoftrends, function(j) {
assign(paste0('TRx.Plot', j),
Plot_linear_fit(Sum.TRx,
get(paste0("trendDateRange", j)),
get(paste0("plotDate", j)),
ReportDateRange),
pos = .GlobalEnv)
})
But it would be more idiomatic to drop assignments in the global workspace altogether, and work directly with lists:
list_of_plots <- lapply(1:numoftrends, function(j) {
Plot_linear_fit(Sum.TRx,
get(paste0("trendDateRange", j)),
get(paste0("plotDate", j)),
ReportDateRange)
})

How to check if a two data frame have the same column names?

I have two data frames like this:
quest1 <- c(5,5,5)
quest2 <- c(5,5,5)
quest3<- c("a","b","c")
quest4 <- c(7,7,7)
quest5 <- c(8,8,8)
myquest1 <- data.frame(quest1,quest2,quest3)
myquest2 <- data.frame(quest4,quest5)
How can I check if they have the same column names with an ifelse or if loop statement with a warning or stop function?
Or is there an other..? I would prefer the the former.
I think what you need is something like the following using a function.
Using your example:
quest1 <- c(5,5,5)
quest2 <- c(5,5,5)
quest3<- c("a","b","c")
quest4 <- c(7,7,7)
quest5 <- c(8,8,8)
myquest1 <- data.frame(quest1,quest2,quest3)
myquest2 <- data.frame(quest4,quest5)
myquest3 <- data.frame(quest1,quest2,quest3)
my_func <- function(x,y) {
for (i in names(x)) {
if (!(i %in% names(y))) {
print('Warning: Names are not the same')
break
}
else if(i==tail(names(y),n=1)) {
print('Names are identical')
}
}
}
> my_func(myquest1,myquest2)
[1] "Warning: Names are not the same"
> my_func(myquest1,myquest3)
[1] "Names are identical"

How to avoid writing the same line several times in R?

I'm writing a program in R and I need to select variables based in a particular value of one of the variable. The program is the next:
a1961 <- base[base[,5]==1961,]
a1962 <- base[base[,5]==1962,]
a1963 <- base[base[,5]==1963,]
a1964 <- base[base[,5]==1964,]
a1965 <- base[base[,5]==1965,]
a1966 <- base[base[,5]==1966,]
a1967 <- base[base[,5]==1967,]
a1968 <- base[base[,5]==1968,]
a1969 <- base[base[,5]==1969,]
a1970 <- base[base[,5]==1970,]
a1971 <- base[base[,5]==1971,]
a1972 <- base[base[,5]==1972,]
a1973 <- base[base[,5]==1973,]
a1974 <- base[base[,5]==1974,]
a1975 <- base[base[,5]==1975,]
a1976 <- base[base[,5]==1976,]
a1977 <- base[base[,5]==1977,]
a1978 <- base[base[,5]==1978,]
a1979 <- base[base[,5]==1979,]
a1980 <- base[base[,5]==1980,]
a1981 <- base[base[,5]==1981,]
a1982 <- base[base[,5]==1982,]
a1983 <- base[base[,5]==1983,]
a1984 <- base[base[,5]==1984,]
a1985 <- base[base[,5]==1985,]
a1986 <- base[base[,5]==1986,]
a1987 <- base[base[,5]==1987,]
a1988 <- base[base[,5]==1988,]
a1989 <- base[base[,5]==1989,]
...
a2012 <- base[base[,5]==2012,]
Is there a way (like modules in SAS) in which I can avoid writing the same thing over and over again?
In general, coding/implementation questions really belong on StackOverflow. That said, my recommendation is instead of naming individual variables for each result, just throw them all into a list:
a = lapply(1961:1989, function(x) base[base[,5]==x,]
You can also use the assign command.
years <- 1961:2012
for(i in 1:length(years)) {
assign(x = paste0("a", years[i]), value = base[base[,5]==years[i],])
}

Resources