I'm making a function and before it does any of the hard stuff I need it to check that all the column names listed in the 'samples' dataset are also present in the 'grids' dataset (the function maps one onto the other).
all(names(samples[expvar]) %in% names(grids))
This does that: the code within all() asks if all the names in the list ('expvar') of columns in 'samples' are also names in 'grids'. The output for a correct length=3, expvar would be TRUE TRUE TRUE. 'all' asks if all are TRUE, so the output here is TRUE. I want to make an IF statement along the lines of:
if(all(names(samples[expvar]) %in% names(grids)) = FALSE) {stop("Not all expvar column names found as column names in grids")}
No else needed, it'll just carry on. The problem is that the '= FALSE' is redundant because all() is a logically evaluable statement... is there a "carry on" function, e.g.
if(all(etc)) CARRYON else {stop("warning")}
Or, can anyone think of a way I can restructure this to make it work?
You're looking for the function stopifnot.
However you don't need to implement it as
if (okay) {
# do stuff
} else {
stop()
}
which is what you have. Instead you can do
if (!okay) {
stop()
}
# do stuff
since the lines will execute in sequential order. But, again, it might be more readable to use stopifnot, as in:
stopifnot(okay)
# do stuff
I would code it:
if(!all(...))
stop(...)
... rest of program ...
Related
I am trying to change a variable in a function but even tho the function is producing the right values, when I go to use them in the next sections, R is still using the initial values.
I created a function to update my variables NetN and NetC:
Reproduction=function(NetN,NetC,cnrep=20){
if(NetC/NetN<=cnrep) {
DeltaC=NetC*p;
DeltaN=DeltaC/cnrep;
Crep=Crep+DeltaC;
Nrep=Nrep+DeltaN;
Brep=(Nrep*14+Crep*12)*2/1e6;
NetN=NetN-DeltaN; #/* Update N, C values */
NetC=NetC*(1-p)
print ("'Using C to allocate'")
}
else {
print("Using N to allocate");
DeltaN=NetN*p;
DeltaC=DeltaN*cnrep;
Nrep=Nrep+DeltaN;
Crep=Crep+DeltaC;
Brep=(Nrep*14+Crep*12)*2/1e6;
NetN=NetN*(1-p);
NetC=NetC-DeltaC;
} } return(c(NetC=NetC,NetN=NetN,NewB=NewB,Crep=Crep,Nrep=Nrep,Brep=Brep))}
When I use my function by say doing:
Reproduction(NetN=1.07149,NetC=0.0922349,cnrep=20)
I get the desired result printed out which includes:
NetC=7.378792e-02
However, when I go to use NetC in the next section of my code, R is still using NetC=0.0922349.
Can I make R update NetC without having to define a new variable?
In R, in general, functions shouldn't change things outside of the function. It's possible to do so using <<- or assign(), but this generally makes your function inflexible and very surprising.
Instead, functions should return values (which yours does nicely), and if you want to keep those values, you explicitly use <- or = to assign them to objects outside of the function. They way your function is built now, you can do that like this:
updates = Reproduction(NetN = 1.07149, NetC = 0.0922349, cnrep = 20)
NetC = updates["NetC"]
This way, you (a) still have all the other results of the function stored in updates, (b) if you wanted to run Reproduction() with a different set of inputs and compare the results, you can do that. (If NetC updated automatically, you could never see two different values), (c) You can potentially change variable names and still use the same function, (d) You can run the function to experiment/see what happens without saving/updating the values.
If you generally want to keep NetN, NetC, and cnrep in sync, I would recommend keeping them together in a named vector or list, and rewriting your function to take that list as input and return that list as output. Something like this:
params = list(NetN = 1.07149, NetC = 0.0922349, cnrep = 20)
Reproduction=function(param_list){
NetN = param_list$NetN
NetC = param_list$NetC
cnrep = param_list$cnrep
if(NetC/NetN <= cnrep) {
DeltaC=NetC*p;
DeltaN=DeltaC/cnrep;
Crep=Crep+DeltaC;
Nrep=Nrep+DeltaN;
Brep=(Nrep*14+Crep*12)*2/1e6;
NetN=NetN-DeltaN; #/* Update N, C values */
NetC=NetC*(1-p)
print ("'Using C to allocate'")
}
else {
print("Using N to allocate");
DeltaN=NetN*p;
DeltaC=DeltaN*cnrep;
Nrep=Nrep+DeltaN;
Crep=Crep+DeltaC;
Brep=(Nrep*14+Crep*12)*2/1e6;
NetN=NetN*(1-p);
NetC=NetC-DeltaC;
}
## Removed extra } and ) ??
return(list(NetC=NetC, NetN=NetN, NewB=NewB, Crep=Crep, Nrep=Nrep, Brep=Brep))
}
This way, you can use the single line params <- Reproduction(params) to update everything in your list. You can access individual items in the list with either params$Netc or params[["NetC"]].
i tried updating data in dataframe but its unable to get updating
//Initialize data and dataframe here
user_data=read.csv("train_5.csv")
baskets.df=data.frame(Sequence=character(),
Challenge=character(),
countno=integer(),
stringsAsFactors=FALSE)
/Updating data in dataframe here
for(i in 1:length((user_data)))
{
for(j in i:length(user_data))
{
if(user_data$challenge_sequence[i]==user_data$challenge_sequence[j]&&user_data$challenge[i]==user_data$challenge[j])
{
writedata(user_data$challenge_sequence[i],user_data$challenge[i])
}
}
}
writedata=function( seqnn,challng)
{
#print(seqnn)
#print(challng)
newRow <- data.frame(Sequence=seqnn,Challenge=challng,countno=1)
baskets.df=rbind(baskets.df,newRow)
}
//view data here
View(baskets.df)
I've modified your code to what I believe will work. You haven't provided sample data, so I can't verify that it works the way you want. I'm basing my attempt here on a couple of common novice mistakes that I'll do my best to explain.
Your writedata function was written to be a little loose with it's scope. When you create a new function, what happens in the function technically happens in its own environment. That is, it tries to look for things defined within the function, and then any new objects it creates are created only within that environment. R also has this neat (and sometimes tricky) feature where, if it can't find an object in an environment, it will try to look up to the parent environment.
The impact this has on your writedata function is that when R looks for baskets.df in the function and can't find it, R then turns to the Global Environment, finds baskets.df there, and then uses it in rbind. However, the result of rbind gets saved to a baskets.df in the function environment, and does not update the object of the same name in the global environment.
To address this, I added an argument to writedata that is simply named data. We can then use this argument to pass a data frame to the function's environment and do everything locally. By not making any assignment at the end, we implicitly tell the function to return it's result.
Then, in your loop, instead of simply calling writedata, we assign it's result back to baskets.df to replace the previous result.
for(i in 1:length((user_data)))
{
for(j in i:length(user_data))
{
if(user_data$challenge_sequence[i] == user_data$challenge_sequence[j] &&
user_data$challenge[i] == user_data$challenge[j])
{
baskets.df <- writedata(baskets.df,
user_data$challenge_sequence[i],
user_data$challenge[i])
}
}
}
writedata=function(data, seqnn,challng)
{
#print(seqnn)
#print(challng)
newRow <- data.frame(Sequence = seqnn,
Challenge = challng,
countno = 1)
rbind(data, newRow)
}
I'm not sure what you're programming background is, but your loops will be very slow in R because it's an interpreted language. To get around this, many functions are vectorized (which simply means that you give them more than one data point, and they do the looping inside compiled code where the loops are fast).
With that in mind, here's what I believe will be a much faster implementation of your code
user_data=read.csv("train_5.csv")
# challenge_indices will be a matrix with TRUE at every place "challenge" and "challenge_sequence" is the same
challenge_indices <- outer(user_data$challenge_sequence, user_data$challenge_sequence, "==") &
outer(user_data$challenge, user_data$challenge, "==")
# since you don't want duplicates, get rid of them
challenge_indices[upper.tri(challenge_indices, diag = TRUE)] <- FALSE
# now let's get the indices of interest
index_list <- which(challenge_indices,arr.ind = TRUE)
# now we make the resulting data set all at once
# this is much faster, because it does not require copying the data frame many times - which would be required if you created a new row every time.
baskets.df <- with(user_data, data.frame(
Sequence = challenge_sequence[index_list[,"row"]],
challenge = challenge[index_list[,"row"]]
)
Just to give some background first:
I currently have 2 data frames (giraffe, leaf) and both of them share the column 'key', where the elements in the leaf data frame are a subset of giraffe. What I needed to do is compare the two data frames and when there are matching elements in both data frames in the 'key' column, the string 'leaf' will be input into another column (project) in the giraffe data frame inside the same row as the matching 'key' element. I've taken the following approach however it seems I have made a small error somewhere and after searching online, I still don't know what it is:
Truth_vector <- is.element((giraffe[,1]),(leaf[,1])) #returns a vector with 3000 elements, most are FALSE except for where the element inside 'key' is present in both data frames
i=1
for (i in 1:length(giraffe[,1])) {
if Truth_vector[i] == TRUE {
giraffe[i,5] <- 'leaf'
}
i = i+1
}
Error: unexpected '}' in "}"
Edit:
I tried implementing the solution as a function however nothing ends up happening, no error messages get returned either. What I've done is:
Project_assign <- function(prjct) {
Truth_vector <- is.element((giraffe[,1]),(prjct[,1]))
giraffe[which(Truth_vector),5] <- 'prjct'
}
Project_assign(leaf)
Edit: This was because everything was getting assigned in the function sub environment, not the global environment. Using assign('giraffe',giraffe,envir=.GlobalEnv) solves this however you should try and avoid the assign function and Instead I used a for loop going over a list of all the dataframes
You have a couple issues. First, the if criteria needs to be in parentheses, and secondly you don't need to increment i yourself. This should suffice:
for (i in 1:length(giraffe[,1])) {
if (Truth_vector[i] == TRUE) {
giraffe[i,5] <- 'leaf'
}
}
Of course, this would do it too:
giraffe[which(Truth_vector),5] <- 'leaf'
(assuming Truth_vector is not longer than the number of rows in giraffe)
I am trying to create a function which will look at two vectors of character labels, and print the appropriate label based on an If statement. I am running into an issue when one of the vectors is populated by NA.
I'll truncate my function:
eventTypepriority=function(a,b) {
if(is.na(a)) {print(b)}
if(is.na(b)) {print(a)}
if(a=="BW"& b=="BW",) {print("BW")}
if(a=="?BW"& b=="BW") {print("?BW")}
...#and so on
}
Some data:
a=c("Pm", "BW", "?BW")
b=c("PmDP","?BW",NA)
c=mapply(eventTypepriority, a,b, USE.NAMES = TRUE)
The function works fine for the first two, selecting the label I've designated in my if statements. However, when it gets to the third pair I receive this error:
Error in if (a == "?BW" & b == "BW") { :
missing value where TRUE/FALSE needed
I'm guessing this is because at that place, b=NA, and this is the first if statement, outside of the 'is.na' statements, that need it to ignore missing values.
Is there a way to handle this? I'd really rather not add conditional statements for every label and NA. I've also tried:
-is.null (same error message)
-Regular Expressions:
if(a==grepl([:print:]) & b==NA) {print(a)}
In various formats, including if(a==grepl(:print:)... No avail. I receive an 'Error: unexpected '[' or whatever character R didn't like first to tell me this is wrong.
All comments and thoughts would be appreciated. ^_^
if all your if conditions are exclusives, just call return() to avoid checking other conditions when one is met:
eventTypepriority=function(a,b) {
if(is.na(a)) {print(b);return()}
if(is.na(b)) {print(a);return()}
if(a=="BW"& b=="BW",) {print("BW");return()}
if(a=="?BW"& b=="BW") {print("?BW");return()}
...#and so on
}
You need to use if .. else statements instead of simply if; otherwise, your function will evaluate the 3rd and 4th lines even when one of the values is n/a.
Given you mapply statement, I also assume you want the function to output the corresponding label, not just print it?
In that case
eventTypepriority<-function(a,b) {
if(is.na(a)) b
else if(is.na(b)) a
else if(a=="BW"& b=="BW") "BW"
else if(a=="?BW"& b=="BW") "?BW"
else "..."
}
a=c("Pm", "BW", "?BW")
b=c("PmDP","?BW",NA)
c=mapply(eventTypepriority, a,b, USE.NAMES = T)
c
returns
Pm BW ?BW
"..." "..." "?BW"
If you actually want to just print the label and have your function return something else, you should be able to figure it out from here.
I'd like to get the unique elements from a column. That seems straight forward. Both of these work, but I'm not getting the object type I'd like:
userlist <- as.list(somebigdf$username)
userlist <- unique(userlist)
or
userlist <- unique(somebigdf$username)
When I iterate through, I'm not getting the names:
for(i in 1:length(userlist)){
cat(names(userlist[i]), '\n')
}
Returns blank spaces.
for(i in userlist){
cat(i, '\n')
}
Returns integers.
The above function is just an example. I'll be using that but also matching the returned name in an if-else function.
The object types seem to be integers or an extended data.frame with lots of values for each name - which isn't what I want. I would really just like a list of strings something along the lines of userlist = c( the results from unique).
Edit -
This code will iterate correctly through the names:
for(name in unique(somebigdf$username)){
cat(name, '\n')
}
I'm accepting my own answer. Namely, a working solution - this code will iterate correctly through the names:
for(name in unique(somebigdf$username)){
cat(name, '\n')
}
If someone at a later date has a better answer that seems more in keeping with the question, I will be happy to accept that as the answer.