I am new in R and I would like to get some help. I have an R code with user defined function called Plot_linear_fit below that I ran without any errors.
ReportDateRange <- c("2010-11-01", "2017-01-31")
trendDateRange1 <- c("2015-01-01", "2015-08-31")
trendDateRange2 <- c("2015-01-01", "2016-10-31")
trendDateRange3 <- c("2015-01-01", "2016-06-30")
plotDate1 <- c("2011-01-01")
plotDate2 <- c("2015-01-01")
plotDate3 <- c("2013-01-01")
numoftrends <- 3
TRx.Plot1 <- Plot_linear_fit(Sum.TRx,
trendDateRange1,
plotDate1,
ReportDateRange)
TRx.Plot2 <- Plot_linear_fit(Sum.TRx,
trendDateRange2,
plotDate2,
ReportDateRange)
TRx.Plot3 <- Plot_linear_fit(Sum.TRx,
trendDateRange3,
plotDate3,
ReportDateRange)
I would like to put these codes in a loop using lapply however when I try to run it, I am getting the NA/NAN argument
ReportDateRange <- c("2010-11-01", "2017-01-31")
trendDateRange1 <- c("2015-01-01", "2015-08-31")
trendDateRange2 <- c("2015-01-01", "2016-10-31")
trendDateRange3 <- c("2015-01-01", "2016-06-30")
plotDate1 <- c("2011-01-01")
plotDate2 <- c("2015-01-01")
plotDate3 <- c("2013-01-01")
numoftrends <- 3
lapply(1:numoftrends, function(j) {
paste0('TRx.Plot', j) <- Plot_linear_fit(Sum.TRx,
paste0("trendDateRange", j),
paste0("plotDate", j),
ReportDateRange)
})
I am not so sure what is wrong when you put this in lapply. The output of the function Plot_linear_fit is a dataframe. Thank you for your help.
The minimal fix would be (I guess, untested code because you didn't provide a MRE):
lapply(1:numoftrends, function(j) {
assign(paste0('TRx.Plot', j),
Plot_linear_fit(Sum.TRx,
get(paste0("trendDateRange", j)),
get(paste0("plotDate", j)),
ReportDateRange),
pos = .GlobalEnv)
})
But it would be more idiomatic to drop assignments in the global workspace altogether, and work directly with lists:
list_of_plots <- lapply(1:numoftrends, function(j) {
Plot_linear_fit(Sum.TRx,
get(paste0("trendDateRange", j)),
get(paste0("plotDate", j)),
ReportDateRange)
})
Related
I have to automate this sequence of functions:
for (i in c(15,17,20,24,25,26,27,28,29,45,50,52,55,60,62)) {
WBES_sf_angola_i <- subset(WBES_sf_angola, isic == i)
WBES_angola_i <- as_Spatial(WBES_sf_angola_i)
FDI_angola_i <- FDI_angola[FDI_angola$isic==i,]
dist_ao_i <- distm(WBES_angola_i,FDI_angola_i, fun = distGeo)/1000
rm(WBES_sf_angola_i,WBES_angola_i,FDI_angola_i)
}
As a result, I want a "dist_ao" for each i. The indexed values are to be found in the isic columns of the WBES_sf_angola and the FDI_angola datasets.
How can I embed the index in the various items' names?
EDIT:
I tried with following modification:
for (i in c(15,17,20,24,25,26,27,28,29,45,50,52,55,60,62)) {
WBES_sf_angola_i <- subset(WBES_sf_angola, isic == i)
WBES_angola_i <- as_Spatial(WBES_sf_angola_i)
FDI_angola_i <- FDI_angola[FDI_angola$isic==i,]
result_list <- list()
result_list[[paste0("dist_ao_", i)]] <- distm(WBES_angola_i,FDI_angola_i, fun = distGeo)/1000
rm(WBES_sf_angola_i,WBES_angola_i,FDI_angola_i)
}
and the output is just a list of 1 that contains dist_ao_62. Where do I avoid overwriting?
Untested (due to missing MRE) but should work:
result_list <- list()
for (i in c(15,17,20,24,25,26,27,28,29,45,50,52,55,60,62)) {
result_list[[paste0("dist_ao_", i)]] <- distm(as_Spatial(subset(WBES_sf_angola, isic == i)) , FDI_angola[FDI_angola$isic==i,], fun = distGeo)/1000
}
You could approach it this way. All resulting dataframes will be included in the list, which you can convert to a dataframe from the last line of the the code here. NOTE: since not reproducible, I have mostly taken the code from your question inside the loop.
WBES_sf_angola_result <- list() # renamed this, as it seems you are using a dataset with the name WBES_sf_angola
WBES_angola <- list()
FDI_angola <- list()
dist_ao <- list()
for (i in c(15,17,20,24,25,26,27,28,29,45,50,52,55,60,62)) {
WBES_sf_angola[[paste0("i_", i)]] <- subset(WBES_sf_angola, isic == i)
WBES_angola[[paste0("i_", i)] <- as_Spatial(WBES_sf_angola_i)
FDI_angola[[paste0("i_", i)] <- FDI_angola[FDI_angola$isic==i,]
dist_ao[[paste0("i_", i)] <- distm(WBES_angola_i,FDI_angola_i, fun = distGeo)/1000
rm(WBES_sf_angola_i,WBES_angola_i,FDI_angola_i)
}
WBES_sf_angola_result <- do.call(rbind, WBES_sf_angola_result) # to get a dataframe
Your subset data can also be accessed through list index. eg.
WBES_sf_angola_result[[i_15]] # for the first item.
I have 1000 json files. And I would like to read them in parallel. I have 4 CPU cores.
I have a character vector which has the names of all the files as following:-
cik_files <- list.files("./data/", pattern = ".json")
And using this vector I load the file and extract the data and add it to the following list:-
data <- list()
Below is the code for extracting the data:-
for(i in 1:1000){
data1 <- fromJSON(paste0("./data/", cik_files[i]), flatten = TRUE)
if(("NetIncomeLoss" %in% names(data1$facts$`us-gaap`))){
data1 <- data1$facts$`us-gaap`$NetIncomeLoss$units$USD
data1 <- data1[grep("CY20[0-9]{2}$", data1$frame), c(3, 9)]
try({if(nrow(data1) > 0){
data1$cik <- strtrim(cik_files[i], 13)
data[[length(data) + 1]] <- data1
}}, silent = TRUE)
}
}
This however, takes quite a lot of time. So I was wondering how I can run the code within the for loop but in parallel.
Thanks in advance.
Here is an attempt to solve the problem in the question. Untested, since there is no data.
Step 1
First of all, rewrite the loop in the question as a function.
f <- function(i, path = "./data", cik_files){
filename <- file.path(path, cik_files[i])
data1 <- fromJSON(filename, flatten = TRUE)
if(("NetIncomeLoss" %in% names(data1$facts$`us-gaap`))){
data1 <- data1$facts$`us-gaap`$NetIncomeLoss$units$USD
found <- grep("CY20[0-9]{2}$", data1$frame)
if(length(found) > 0){
tryCatch({
out <- data1[found, c(3, 9)]
out$cik <- strtrim(cik_files[i], 13)
out
},
error = function(e) e,
warning = function(w) w)
} else NULL
} else NULL
}
Step 2
Now load the package parallel and run one of the following, depending on OS.
library(parallel)
# Not on Windows
library(jsonlite)
json_list <- mclapply(seq_along(cik_files), f, cik_files = cik_files)
# Windows
ncores <- detectCores()
cl <- makeCluster(ncores - 1L)
clusterExport(cl, "cik_files")
clusterEvalQ(cl, "cik_files")
clusterEvalQ(cl, library(jsonlite))
json_list <- parLapply(cl, seq_along(cik_files), f, cik_files = cik_files)
stopCluster(cl)
Step 3
Extract the data from the returned list json_list.
err <- sapply(json_list, inherits, "error")
warn <- sapply(json_list, inherits, "warning")
ok <- !(err | warn)
json_list[ok] # correctly read in
I tried to combine two lists into one dataframe:
all_stas <- list()
for(i in vid_id){
stas <- get_stats(video_id = i)
all_stas <- rbind(all_stas,stas)
}
View(all_stas)
all_detail <- list()
for(i in vid_id){
detail1 <- get_video_details(video_id = i)
all_detail <- rbind(all_detail,detail1)
}
View(all_detail)
df <- data.frame(all_stas,all_detail)
write.csv(df, file = "new_file.csv")
Afterwards I would like to store it into a csv file.
When I run it it gives me the following warning message
Warning message:
In rbind(all_stas, stas) :
number of columns of result is not a multiple of vector length (arg 2)
Does anyone of you know how I can make the code work?
This block below is triggering an error
all_stas <- list()
for(i in vid_id){
stas <- get_stats(video_id = i)
all_stas <- rbind(all_stas,stas)}
If I understand your code correctly you can get around that error by
all_stas <- list()
for(i in vid_id){
all_stas[[i]] <- get_stats(video_id = i)}
Suppose I have the following code snippet:
mainResult$Time <- formatTime(mainResult$Time, "DateAndTime")
mainResult$SettleDate <- formatTime(mainResult$SettleDate, "DateAndTime")
mainResult$IssueDate <- formatTime(mainResult$IssueDate, "DateAndTime")
mainResult$Maturity <- formatTime(mainResult$Maturity, "DateAndTime")
mainResult$Bid <- formatNumber(mainResult$Bid, "withDecimals")
mainResult$Ask <- formatNumber(mainResult$Ask, "withDecimals")
mainResult$AvgBid <- formatNumber(mainResult$AvgBid, "withDecimals")
mainResult$AvgAsk <- formatNumber(mainResult$AvgAsk, "withDecimals")
mainResult$BidYield <- formatNumber(mainResult$BidYield, "withDecimals")
mainResult$AskYield <- formatNumber(mainResult$AskYield, "withDecimals")
mainResult$BidSize <- formatNumber(mainResult$BidSize, "noDecimals")
mainResult$AskSize <- formatNumber(mainResult$AskSize, "noDecimals")
mainResult$Coupon <- formatNumber(mainResult$Coupon, "withDecimals")
Each formatTime and formatNumber works fine only if the column exists. Is there a clean way for this to execute without me wrapping every statement in an if block that checks if the column exists?
This is a terrible way to code. Try to use something more like this:
# start with a vector of column names and loop
dt_columns = c("Time", "SettleDate", "IssueDate", "Maturity")
for (col in dt_columns) {
if (col %in% names(mainResult)) mainResults[[col]] = formatNumber(mainResult[[col]], "DateAndTime")
}
# you can repeat for your other cases
Or this:
# intersect and lapply
dt_columns = c("Time", "SettleDate", "IssueDate", "Maturity")
dt_columns = intersect(names(mainResults), dt_columns)
mainResult[dt_columns] = lapply(mainResult[dt_columns], formatNumber, "DateAndTime")
I'd probably start by separating the information about the transformations you want to perform to the columns from the code that does the transformation. Something more like this
numberWithDecimals <- c("Bid","Ask","AvgBid","AvgAsk", "BidYield", "AskYield", "Coupon")
numberNoDecimals <- c("BidSize", "AskSize")
timeDateAndTime <- c("Time", "SettleDate", "IssueDate", "Maturity")
fmtColumns <- function(data, txlist, fun, fmt) {
cols <- intersect(txlist, names(data))
if(length(cols) > 0) {
data[, cols, drop=F] <- lapply(data[, cols, drop=F], fun, fmt)
}
}
mainResult <- fmtColumns(mainResult, numberWithDecimals, formatNumber, "withDecimals")
mainResult <- fmtColumns(mainResult, numberNoDecimals, formatNumber, "noDecimals")
mainResult <- fmtColumns(mainResult, timeDateAndTime , formatTime, "DateAndTime")
I am trying to append the "matrix" class and in turn overwrite the default behaviour of "[". Code examples below:
annMatrix <- function(mat=NULL, rowAnn=NULL, colAnn=NULL) {
if(is.null(mat)) mat <- matrix(nrow=0, ncol=0)
mat <- as.matrix(mat)
if(is.null(rowAnn)) rowAnn <- data.frame(row.names=seq_len(nrow(mat)))
if(is.null(colAnn)) colAnn <- data.frame(row.names=seq_len(ncol(mat)))
rowAnn <- data.frame(rowAnn, stringsAsFactors=FALSE)
colAnn <- data.frame(colAnn, stringsAsFactors=FALSE)
stopifnot(nrow(mat)==nrow(rowAnn) & ncol(mat)==nrow(colAnn))
attr(mat, "colAnn") <- colAnn
attr(mat, "rowAnn") <- rowAnn
class(mat) <- append(class(mat), "annMatrix")
mat
}
`[.annMatrix` <- function(annMat, rowExpr=NULL, colExpr=NULL) {
stopifnot(is.valid.annMatrix(annMat))
rowExpr <- eval(substitute(list(rowExpr)), attr(annMat, "rowAnn"), parent.frame())
colExpr <- eval(substitute(list(colExpr)), attr(annMat, "colAnn"), parent.frame())
indsR <- unlist(rowExpr)
indsC <- unlist(colExpr)
if(is.null(indsR)) indsR <- seq_len(nrow(annMat))
if(is.null(indsC)) indsC <- seq_len(ncol(annMat))
attr(annMat, "rowAnn") <- attr(annMat, "rowAnn")[indsR,,drop=FALSE]
attr(annMat, "colAnn") <- attr(annMat, "colAnn")[indsC,,drop=FALSE]
annMat <- unclass(annMat)
annMat <- annMat[indsR,indsC,drop=FALSE]
class(annMat) <- append(class(annMat), "annMatrix")
annMat
}
The basic idea is to make matrix preserve it's specific attributes after subsetting.
However I am running into a problem:
How to write "[" function in such a way that it behaves differently when called with and without a comma:
annMat[i]
annMat[i,]
as the default "[" for matrices seems to do.
I was thinking to set second argument to some value by default, but the value will not change because of an added comma.