Using 'if' and 'for' to distinguish between numbers - r

There is a list called "G" and i am trying to replace any numbers above 5 with smile ":)" symbol and any number below 5 with ":(". i wrote this code, and expecting to only have five smiles however, the result is totally different and smile appears for all numbers. Thanks if anyone tell me what's wrong with this very simple and basic code.
`G <- list(1,2,3,4,5,6,7,8,9,10)
B <- for (i in 1:length(G))
if (any(G> 5)) {
print(":)")
} else {
print(":(")
} `

The following points will help with the question and understanding of R for the future.
Lists vs. Vectors
For loops vs. Vectorization
print with assignment
Lists
In R, the list object is a special object that can hold other objects. They can hold almost anything, even other lists! Because they are so accepting of a mix of things, they are not equipped with all of the cool functions like vectors.
The vector is a simple R object with a common type. "Common type" means that the entire vector will either be numbers, or character values, or whatever other type we are using. If we created c(1,"a") we will have mixed a letter with a number. These two types will be forced to be only one type, character. The quantitative functions will not work with it anymore. But the list list(1,"a") will hold each as its own type.
In your case, you have a series of whole numbers. When the numbers are in a vector, we can apply many built-in functions to work with them that will not work with the generic list. Functions like sum, mean, and sd, are built to accept vectors. But they do not work with lists. We should change G to a vector:
G <- c(1,2,3,4,5,6,7,8,9,10)
#Or a bit shorter
G <- 1:10
For loops and Vectorization
In R, because vectors can only be of one type, many cool things are possible. I can do G + 1 and R will add one to each element of G. It completed the whole thing without a loop. For conditional statements like yours we can vectorize also:
ifelse(G > 5, ":)", ":(")
Print with assignment
The print function can be saved but it is better to simply capture the output itself as is:
#Don't do
x <- print("hello")
#Do
x <- "hello"
Both work, but the second is more in line with R programming. The reason B did not save the output in your case is because you attempted to save the for loop itself. If you would like to save the output of a loop, do it within the loop since the output will be dumped upon completion.
To summarize, we can simplify to:
G <- 1:10
B <- ifelse(G > 5, ":)", ":(")
#[1] ":(" ":(" ":(" ":(" ":(" ":)" ":)" ":)" ":)" ":)"

To follow up on what Pierre mentioned: lists in R are different from Python. What you're looking for is a vector, you can read more about them here. You can initialize a vector with c() like:
test <- c(1,2,3,4,5,6,7,8,9,10)
You can access a vector using it’s index by putting it in brackets []. For example, in the vector nouns <- c(“cat”, “dog”, “tree”) using nouns[2] will return ”dog”.
A functioning version of your code is:
G <- c(1,2,3,4,5,6,7,8,9,10)
for (i in 1:length(G))
{ #missed loop curly bracket on this line
if (G[i] > 5) {
print(":)") #if you want to assign the output of a loop do it here
} else {
print(":(")
}
}
Edit: Pierre beat me to it! Leaving up but Pierre has a more thorough answer.

Related

Dplyr::filter works on data frame but not on vector, why is this?

I have used dplyr:filter on a data frame to get all the numbers not divisible by 3 and 2, but this function did not work on a vector. I am curious as to why this is so?
Here is my code:
vec<-vector()
for (i in 1:1260){
if (i %% 2 !=0){
vec<-c(vec,i)
}
}
vec<-data.frame(vec)
vec%>%filter(vec%%3!=0)
This should work:
vec<-vector()
for (i in 1:1260){
if (i %% 2 !=0){
vec<-c(vec,i)
}
}
vec<-data.frame(vec)
answer <- vec%>%filter(vec%%3!=0)
real_answer <- answer$vec
The problem is that filter is meant to work with dataframes
Looking at the documentation here it seems like only the use with a tibble/tbl is correct and you are lucky that your data.frame works: https://www.rdocumentation.org/packages/dplyr/versions/0.7.8/topics/filter
This seems to be part of a general understanding that tidy data is data in data.frames.
In R data are often used in vectors and if a and b are vectors of data you can just plot them using
plot(a, b)
where there is an implicit connection of the value of a[n] and the value of b[n] being connected via a common n. However, there is always a risk of that implicit connection being disturbed when changing a or b alone. There is less risk if the connection is made explicit within a data.frame or a tibble where values in the same row belong together.
If, e. g. you do na.omit(a) there is no telling which a value belongs to which b value, whereas a na.omit(data.frame(a, b)) is save in this regard.
Without using data.frame, you can try
vec[vec%%3!=0]
or
subset(vec,vec%%3!=0)

R add to a list in a loop, using conditions

I have a data.frame dim = (200,500)
I want to do a shaprio.test on each column of my dataframe and append to a list. This is what I'm trying:
colstoremove <- list();
for (i in range(dim(I.df.nocov)[2])) {
x <- shapiro.test(I.df.nocov[1:200,i])
colstoremove[[i]] <- x[2]
}
However this is failing. Some pointers? (background is mainly python, not much of an R user)
Consider lapply() as any data frame passed into it runs operations on columns and the returned list will be equal to number of columns:
colstoremove <- lapply(I.df.noconv, function(col) shapiro.test(col)[2])
Here is what happens in
for (i in range(dim(I.df.nocov)[2]))
For the sake of example, I assume that I.df.nocov contains 100 rows and 5 columns.
dim(I.df.nocov) is the vector of I.df.nocov dimensions, i.e. c(100, 5)
dim(I.df.nocov)[2] is the 2nd dimension of I.df.nocov, i.e. 5
range(x)is a 2-element vector which contains minimal and maximal values of x. For example, range(c(4,10,1)) is c(1,10). So range(dim(I.df.nocov)[2]) is c(5,5).
Therefore, the loop iterate twice: first time with i=5, and second time also with i=5. Not surprising that it fails!
The problem is that R's function range and Python's function with the same name do completely different things. The equivalent of Python's range is called seq. For example, seq(5)=c(1,2,3,4,5), while seq(3,5)=c(3,4,5), and seq(1,10,2)=c(1,3,5,7,9). You may also write 1:n, it is the same as seq(n), and m:n is same as seq(m,n) (but the priority of ':' is very high, so 1:2*x is interpreted as (1:2)*x.
Generally, if something does not work in R, you should print the subexpressions from the innerwise to the outerwise. If some subexpression is too big to be printed, use str(x) (str means "structure"). And never assume that functions in Python and R are same! If there is a function with same name, it usually does a different thing.
On a side note, instead of dim(I.df.nocov)[2] you could just write ncol(I.df.nocov) (there is also a function nrow).

R rstats How to replace a single character in one place in a string

I'm trying to write a function that will take a string and replace one character with another, but I want it to return every permutation of replacing that character. I'd like to replace every i with an l but I don't want to do it globally like in gsub and I don't want to do just the first one like in sub. I think an example illustrates it best. If I pass in the name keviin (with two i's):
thisFunction("keviin")
[1] kevlin keviln kevlln
So I get back replacing the first i, the second i and then both i's. This sounds like a job for recursion, but first I need to figure out how to replace just the first i. Then I could pass the resulting string to the function to get the next permutation.
Anybody got an idea to give me a push? I've tried doing this but it didn't work for me:
> substr("keviin",4,4) <- "l"
Error in substr("keviin", 4, 4) <- "l" :
target of assignment expands to non-language object
From #CarlWitthoft idea, how about this:
thisFunction<-function(x) {
xsplit<-strsplit(x,"")[[1]]
ipos<-as.vector(gregexpr("i",x)[[1]])
if (length(ipos)==1) {
if (ipos<0) return(x) else {
substring(x,ipos,ipos)<-"l"
return(x)
}
}
combos<-unlist(lapply(seq_along(ipos),combn,x=ipos,simplify=FALSE),recursive=FALSE)
ret<-t(vapply(combos,function(x) {xsplit[x]<-"l";xsplit},character(length(xsplit))))
do.call(function(...) paste(...,sep=""),as.data.frame(ret))
}
thisFunction("keviin")
#[1] "kevlin" "keviln" "kevlln"
How about a combination of regex and sampling from a vector?
kevsplit<-unlist(strsplit('keviin',''))
the_eyes <-which( grepl('i',kevsplit))
kevsplit[sample(the_eyes,1)] <-"L"
newkev<-paste(kevsplit,collapse='')
That will randomly swap out one of the "i"s. To swap out all possible permutations,
do something like
for(j in 1:length(the_eyes) ) {
calculate all permutations of the_eyes taken j at a time
swap those selected values to kevsplit and save in some list
}
I'm too lazy to write out that last bit :-)
EDIT: to clarify, aside from pasting things back together again, your problem is basically:
For a vector of type c(0,0,0,0,0,....) (replacing your "i" with 0 or logical FALSE), how many ways can you replace 1 or more values with a "TRUE" (or 1) ? That's a standard problem in introductory combinatorics -- and happily enough for us computer weenies, turns out to be counting in binary!
This works with objects but not pure strings in quotes for some reason
thisFunction <- function(x){
+
+ substr(x,4,4) <- 'l'
+ return(x)
+
+ }
> thisFunction('keviin')
[1] "kevlin"
works.

Loop over a string variable in R

I would like to loop over a string variable. For example:
clist <- c("BMI", "trig", "hdl")
for (i in clist) {
data_FK_i<-subset(data_FK, subset= !is.na(FK) & (!is.na(i)))
}
The "i" should receive a different name from the list.
What am I doing wrong? It's not working? Adding "" doesn't seem to help.
Thank,
Einat
Thanks, the "assign" answer did the work!!!!!!!!!!
I agree with #Thomas. You should use a list. However, let me demonstrate how to modify your code to create multiple objects. You can use the function assign to create objects based on strings.
clist <- c("BMI", "trig", "hdl")
for (i in clist) {
assign(paste0("data_FK_", i), complete.cases(data[c("FK", i)]))
}
Try something like this instead, which will give you a list containing the three subsetted dataframes:
lapply(clist, function(x) data_FK[ !is.na(data_FK$FK) & !is.na(data_FK[,x]) ,])
The problem in your code is that i is a character string, specifically one of the values from clist in each iteration of the for-loop. So, when R reads !is.na(i) you're saying !is.na("BMI"), etc.
Various places on Stack Overflow advise against using subset at all in favor of extraction indices (i.e., [) like in the example code above because subset relies on non-standard evaluation that is confusing and sometimes leads you down bad rabbit holes.
Is this what you want?
You need to give the loop something to store the data into.
Also you need to tell the loop how long you want it to run.
clist <- c("BMI", "trig", "hdl")
#empty vector
data_FK<-c()
#I want a loop and it will 'loop' 3 times (1 to 3), which is the length of my list
for (i in 1:length(clist)) {
#each loop stores the corresponding item from the list into the vector
data_FK<-c(data_FK,clist[i])
}
## or if you want to store the values in a data frame
## there are other ways to create this, but here is a simple solution
data_FK<-data.frame(placer=1:length(clist))
for(i in 1:length(clist)){
data_FK$items[i]<-clist[i]
}
## or maybe you just want to print the names
for (i in 1:length(clist)){
print(clist[i])
}

Assigning output of a function to two variables in R [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
function with multiple outputs
This seems like an easy question, but I can't figure it out and I haven't had luck in the R manuals I've looked at. I want to find dim(x), but I want to assign dim(x)[1] to a and dim(x)[2] to b in a single line.
I've tried [a b] <- dim(x) and c(a, b) <- dim(x), but neither has worked. Is there a one-line way to do this? It seems like a very basic thing that should be easy to handle.
This may not be as simple of a solution as you had wanted, but this gets the job done. It's also a very handy tool in the future, should you need to assign multiple variables at once (and you don't know how many values you have).
Output <- SomeFunction(x)
VariablesList <- letters[1:length(Output)]
for (i in seq(1, length(Output), by = 1)) {
assign(VariablesList[i], Output[i])
}
Loops aren't the most efficient things in R, but I've used this multiple times. I personally find it especially useful when gathering information from a folder with an unknown number of entries.
EDIT: And in this case, Output could be any length (as long as VariablesList is longer).
EDIT #2: Changed up the VariablesList vector to allow for more values, as Liz suggested.
You can also write your own function that will always make a global a and b. But this isn't advisable:
mydim <- function(x) {
out <- dim(x)
a <<- out[1]
b <<- out[2]
}
The "R" way to do this is to output the results as a list or vector just like the built in function does and access them as needed:
out <- dim(x)
out[1]
out[2]
R has excellent list and vector comprehension that many other languages lack and thus doesn't have this multiple assignment feature. Instead it has a rich set of functions to reach into complex data structures without looping constructs.
Doesn't look like there is a way to do this. Really the only way to deal with it is to add a couple of extra lines:
temp <- dim(x)
a <- temp[1]
b <- temp[2]
It depends what is in a and b. If they are just numbers try to return a vector like this:
dim <- function(x,y)
return(c(x,y))
dim(1,2)[1]
# [1] 1
dim(1,2)[2]
# [1] 2
If a and b are something else, you might want to return a list
dim <- function(x,y)
return(list(item1=x:y,item2=(2*x):(2*y)))
dim(1,2)[[1]]
[1] 1 2
dim(1,2)[[2]]
[1] 2 3 4
EDIT:
try this: x <- c(1,2); names(x) <- c("a","b")

Resources