How to treat i as a variable for a loop - r

I have a bunch of values I want to read in to an if statement in R. Namely:
year_1 , year_2
And so on.
I would like to use a for loop or a vectorisation method to test each one but I am not familiar with this in R as opposed to C++.
So I'd like to achieve something like:
for(i in 1:15) {
if(year_[i] !=NULL) {
count = count + 1
}
}
Not sure whether I am not searching for the right thing or whether R just doesn't do this sort of thing easily. I have used paste and a for loop successfully in the past to automatically name new variables but this I haven't got a hold of.
Update
Ok your answer seems to be on the right track. I should have been a bit more specific and say that I am reading the data from an excel file of parameters and using the data to produce plots. Different data sets will have different years active. The core of this problem is telling R how many years are active, starting from param$year_1 to param$year_15. So I am trying to actually read param$year_1 and so on for example, and basically check whether it is empty or not which will allow me to know how many years this particular data set is working with. When I tried mget(paste0("param$year_", 1:5))
it said the value was not found.
Update 2
I am sure the difficulty with this comes down to my description. But here is exactly what I want to produce but automated to a few lines as I know I will want to do similar operations like this is in the future. What the actual data is is irrelevant. This non automated version produces exactly what I want.
Non Automated Code
if(is.na(param$year_1[1]) == TRUE || param$year_1[1] == '') {
print("empty")
}
if(is.na(param$year_2[1]) == TRUE || param$year_2[1] == '') {
print("empty")
}
if(is.na(param$year_3[1]) == TRUE || param$year_3[1] == '') {
print("empty")
}
if(is.na(param$year_4[1]) == TRUE || param$year_4[1] == '') {
print("empty")
}
so on and so on until the final
if(is.na(param$year_15[1]) == TRUE || param$year_15[1] == '') {
print("empty")
}
It is such a simple thing to do in C++ but I have to learn it in R for the future.

It sounds like you have a list-like structure that contains names year_1, year_2, ..., year_15 and you want to check how many of these are null or have a missing first element. You could use standard indexing to limit to the elements for those years, sapply to check which are null, and sum to add up those values:
which(sapply(paste0("year_", 1:15), function(x) {
is.null(param[[x]]) || param[[x]][1] == ''
}))
# year_2 year_5 year_9 year_10 year_11 year_12 year_14 year_15
# 2 5 9 10 11 12 14 15
Data:
param <- list(ID = 1:10, year_1 = 1:5, year_2 = NULL, year_3 = 1:7, year_4 = 1:2, year_5 = NULL, year_6 = 14, year_7 = 1:3, year_8 = 1:9, year_9 = NULL, year_10 = NULL, year_11 = NULL, year_12 = NULL, year_13 = 1:7, year_14 = NULL, year_15 = NULL)

Related

Simple deep pagination example using SOLR and R

I'm needing to perform deep pagination using R and the solr package. SOLR 7.2.1 server, R 3.4.3
I can't figure out how to get the nextCursorMark from the resultant dataframe. I usually do this in Python but this is stumping me.
res <- solr_all(base = myBase, rows = 100, verbose=TRUE,
sort = "unique_id asc",
fq="*:*",
cursorMark="*"
)
I cannot get the nextCursorMark from the result. Any help would be appreciated.
I have noticed that if I add the nextCursorMark to pageDoc it will return the value if parsetype is set to json, but not dataframe. So I guess another part is - where is that value if you return a dataframe?
So I finally got a way to make this work. This is not optimal, the final solution is in the github issue referenced in the comment. But this works:
dat <-"http://yadda.com"
cM = "*"
done = FALSE
rowCount = 0
a <- data.frame()
while (!done)
{
Data <- solr_search(base = dat, rows = 100, verbose=FALSE,
sort = "unique_id asc",
fq="*:*",
parsetype="json",
cursorMark=cM,
pageDoc = "nextCursorMark"
)
if (cM == Data$nextCursorMark) {
done = TRUE
} else {
cM = Data$nextCursorMark
}
a <- append(x = a, Data$response$docs)
rowCount = rowCount + length(Data$response$docs)
print(rowCount)
}

Format number into K(thousand), M(million) in Shiny DataTables

I'm looking for a straight forward way to change the formatting of numbers into K,M in shiny dataTables. Preferably with something like formatCurrency. I don't want to write k, m functions to convert number into string in order to do the formatting as it makes it difficult to sort rows by value.
There's no built-in way to do this, but it's not too bad to write your own format function in JavaScript that doesn't break row sorting.
See Column Rendering in the DT docs for how to do this: https://rstudio.github.io/DT/options.html
And this will also help:
https://datatables.net/reference/option/columns.render
Here's an example of a custom thousands formatter that rounds to 1 decimal place:
library(DT)
formatThousands <- JS(
"function(data) {",
"return (data / 1000).toFixed(1) + 'K'",
"}")
datatable(datasets::rock, rownames = FALSE, options = list(
columnDefs = list(list(
targets = 0:1, render = formatThousands
))
))
Alternatively, if you want a non-JavaScript method, you could use the colFormat function used with the reactable package. Unfortunately, there is no automatic millions option but it's pretty easy to replicate if you divide the original data and add the labels on with colFormat.
Product <- c('Apples','Oranges','Pears')
Revenue <- c(212384903, 23438872, 26443879)
df <- data.frame(Product,Revenue)
df$Revenue_millions <- dfeg$Revenue/1000000
reactable(df,
showSortable = TRUE,
columns = list(
Revenue_millions = colDef(format = colFormat(prefix = "£", separators = TRUE,digits=1,suffix = "m"))))
The data should now sort correctly
If you are using DataTables, to get the data as Unit format i.e
10000 -> 10K
we can use render function
"render": function ( data ) {
if(data > 999 && data < 1000000) {
return data/1000+' K'
}
else if(data > 1000000){
return data/1000000+' M'
}
else{
return data
}
}
}

Combining a for loop with meta characters

Im writing a for loop that checks whether values in a particular column match a predefined list.
So it's kind of like this:
money <- read.csv2("money.csv", header = T)
#set counter
count_financial = 1
#set list
financial_items <- c("bank", "ABN Amro")
for (i in 1:nrow(money)) {
if(money$Description[i] in financial_items ) {
count_financial = count_financial + 1
}
}
It's working for now but I actually want to tweak it a little and use metacharacters. So I cant only find items which say "Bank" or "ABN Amro" but also lines which "bank cost" or "ABN Amro transaction".
Any thoughts on how I can do this?
Try:
length(financial_items %in% money$Description)
But if you are intent on using the loop:
for (i in 1:nrow(money)) {
if(money$Description[i] %in% financial_items ) {
count_financial = count_financial + 1
}
}
Update:
Upon rereading the question, I think you are looking for:
length(grep("bank|abn amro", money$Description, ignore.case=TRUE))
You could try
filtered <- money[grep("bank|ABN Amro",money$Description,ignore.case=TRUE),]
count_financial <- nrow(filtered)

R -- screening Excel rows according to characteristics of multiple cells

I am trying to eliminate all rows in excel that have he following features:
First column is an integer
Second column begins with an integer
Third column is empty
The code I have written appears to run indefinitely. CAS.MULT is the name of my dataframe.
for (i in 1:nrow(CAS.MULT)) {
testInteger <- function(x) {
test <- all.equal(x, as.integer(x), check.attributes = FALSE)
if (test == TRUE) {
return (TRUE)
}
else {
return (FALSE)
}
}
if (testInteger(as.integer(CAS.MULT[i,1])) == TRUE) {
if (testInteger(as.integer(substring(CAS.MULT[i,2],1,1))) == TRUE) {
if (CAS.MULT[i,3] == '') {
CAS.MULT <- data.frame(CAS.MULT[-i,])
}
}
}
}
You should be very wary of deleting rows within a for loop, if often leads to undesired behavior. There are a number of ways you could handle this. For instance, you can flag the rows for deletion and then delete them after.
Another thing I noticed is that you are converting your columns to integers before passing them to your function to test if they are integers, so you will be incorrectly returning true for all values passed to the function.
Maybe something like this would work (without a reproducible example it's hard to say if it will work or not):
toDelete <- numeric(0)
for (i in 1:nrow(CAS.MULT)) {
testInteger <- function(x) {
test <- all.equal(x, as.integer(x), check.attributes = FALSE)
if (test == TRUE) {
return (TRUE)
}
else {
return (FALSE)
}
}
if (testInteger(CAS.MULT[i,1]) == TRUE) {
if (testInteger(substring(CAS.MULT[i,2],1,1)) == TRUE) {
if (CAS.MULT[i,3] == '') {
toDelete <- c(toDelete, i)
}
}
}
}
CAS.MULT <- CAS.MULT[-1*toDelete,]
Hard to be sure without testing my code on your data, but this might work. Instead of a loop, the code below uses logical indexing based on the conditions you specified in your question. This is vectorized (meaning it operates on the entire data frame at once, rather than by row) and is much faster than looping row by row:
CAS.MULT.screened = CAS.MULT[!(CAS.MULT[,1] %% 1 == 0 |
as.numeric(substring(CAS.MULT[,2],1,1)) %% 1 == 0 |
CAS.MULT[,3] == ""), ]
For more on checking whether a value is an integer, see this SO question.
One other thing: Just for future reference, for efficiency you should define your function outside the loop, rather than recreating the function every time through the loop.

Storing the output of a function into a variable in R

I'm having trouble with storing the output of a function into a variable. I think it's best that I give some context to the problem I'm trying to work out.
Suppose that players "a" and "r" play a game of tennis, the runningScoreFn sums the pointHistory vector and puts everything together in a nice data.frame
runningScoreFn = function(pointHistory){
playerUni = c("a", "r")
cols = sapply(playerUni, function(thisPlayer){
cumsum(pointHistory == thisPlayer)
})
names(cols) = playerUni
cbind(pointHistory, as.data.frame(cols))
}
The oneEpxiermentGameFn that plays out a game of "a" v.s "r".The first player to win 4 points wins the game, but he must be ahead by at least 2 points. "r" has 60% chance of winning a point.
pRogerPoint = 0.6
oneExperimentGameFn = function(pRogerPoint){
game = c(rep("r",pRogerPoint * 100), rep("a", 100-pRogerPoint*100))
i = 4
keepGoing = TRUE
while(keepGoing){
whosePoint = sample(game, size=i, replace=TRUE)
if(sum(whosePoint=="r")-sum(whosePoint=="a")>=2){
success = TRUE
print(cbind(runningScoreFn(whosePoint),success=success))
keepGoing = FALSE
}else if(sum(whosePoint=="a")-sum(whosePoint=="r")>=2){
success = FALSE
print(cbind(runningScoreFn(whosePoint),success=success))
keepGoing = FALSE
}
i=i+1
}
}
pRogerGameFn shows the probability that Roger wins the game.
pRogerGameFn = function(pRogerPoint, NExperiments){
RogerGameFn = lapply(1:NExperiments,function(dummy){
ok=oneExperimentGameFn(pRogerPoint)
})}
Here I wish to store the output into the variable ok, but ok returns NULL. I think this has something to do with my oneExperimentGameFn.
I also tried ok = RogerGameFn, but ok also returns NULL.
there is nothing returning from the function oneExperimentGameFn.
If there is a specific value you want returned, insert a return(.) command at the end of the function (or wherever else appropriate).
If you simply want to catch the print statements, you can use capture.output(.):
ok <- capture.output(oneExperimentGameFn(pRogerPoint))

Resources