How can I assign a value using if-else conditions in R - r

I have this dataframe with a column a. I would like to add a different column 'b' based on column 'a'.
For: if a>=10, b='double'. Otherwise b='single'.
How can I do it?
Sample output:
a b
2 single
2 single
4 single
11 double
12 double
12 double
45 double
4 single

You can use ifelse to act on vectors with if statements.
ifelse(a>=10, "double", "single")
So your code could look like this
mydata <- cbind(a, ifelse(a>10, "double", "single"))
(Specified in comments below that if a=10, then "double")

Strictly speaking, if-else is assignable in r, that is
x1 <- if (TRUE) 1 else 2
is legit. For details see https://adv-r.hadley.nz/control-flow.html#choices
However, as this vectorizes over neither the test condition nor the value branches, it's not applicable to the particular case described in the question details, which is about adding a column in a conditional manner. In such a situation ifelse or the more typesafe if_else (from dplyr) can be used.

Related

number some patterns in the string using R

I have a strings and it has some patterns like this
my_string = "`d#k`0.55`0.55`0.55`0.55`0.55`0.55`0.55`0.55`0.55`n$l`0.4`0.1`0.25`0.28`0.18`0.3`0.17`0.2`0.03`!lk`0.04`0.04`0.04`0.04`0.04`0.04`0.04`0.04`0.04`vnabgjd`0.02`0.02`0.02`0.02`0.02`0.02`0.02`0.02`0.02`pogk(`1.01`0.71`0.86`0.89`0.79`0.91`0.78`0.81`0.64`r!#^##niw`0.0014`0.0020`9.9999`9.9999`0.0020`0.0022`0.0032`9.9999`0.0000`
As you can see there is patterns [`nonnumber] then [`number.num~] repeated.
So I want to identify how many [`number.num~] are between [`nonnumber].
I tried to use regex
index <- gregexpr("`(\\w{2,20})`\\d\\.\\d(.*?)`\\D",cle)
regmatches(cle,index)
but using this code, the [`\D] is overlapped. so just It can't number how many the pattern are.
So if you know any method about it, please leave some reply
Using strsplit. We split at the backtick and count the position difference which of the values coerced to "numeric" yield NA. Note, that we need to exclude the first element after strsplit and add an NA at the end in the numerics. Resulting in a vector named with the non-numerical element using setNames (not very good names actually, but it's demonstrating what's going on).
s <- el(strsplit(my_string, "\\`"))[-1]
s.num <- suppressWarnings(as.numeric(s))
setNames(diff(which(is.na(c(s.num, NA)))) - 1,
s[is.na(s.num)])
# d#k n$l !lk vnabgjd pogk( r!#^##niw
# 9 9 9 9 9 9

R commands for finding mode in R seem to be wrong

I watched video on YouTube re finding mode in R from list of numerics. When I enter commands they do not work. R does not even give an error message. The vector is
X <- c(1,2,2,2,3,4,5,6,7,8,9)
Then instructor says use
temp <- table(as.vector(x))
to basically sort all unique values in list. R should give me from this command 1,2,3,4,5,6,7,8,9 but nothing happens except when the instructor does it this list is given. Then he says to use command,
names(temp)[temp--max(temp)]
which basically should give me this: 1,3,1,1,1,1,1,1,1 where 3 shows that the mode is 2 because it is repeated 3 times in list. I would like to stay with these commands as far as is possible as the instructor explains them in detail. Am I doing a typo or something?
You're kind of confused.
X <- c(1,2,2,2,3,4,5,6,7,8,9) ## define vector
temp <- table(as.vector(X))
to basically sort all unique values in list.
That's not exactly what this command does (sort(unique(X)) would give a sorted vector of the unique values; note that in R, lists and vectors are different kinds of objects, it's best not to use the words interchangeably). What table() does is to count the number of instances of each unique value (in sorted order); also, as.vector() is redundant.
R should give me from this command 1,2,3,4,5,6,7,8,9 but nothing happens except when the instructor does it this list is given.
If you assign results to a variable, R doesn't print anything. If you want to see the value of a variable, type the variable's name by itself:
temp
you should see
1 2 3 4 5 6 7 8 9
1 3 1 1 1 1 1 1 1
the first row is the labels (unique values), the second is the counts.
Then he says to use command, names(temp)[temp--max(temp)] which basically should give me this: 1,3,1,1,1,1,1,1,1 where 3 shows that the mode is 2 because it is repeated 3 times in list.
No. You already have the sequence of counts stored in temp. You should have typed
names(temp)[temp==max(temp)]
(note =, not -) which should print
[1] "2"
i.e., this is the mode. The logic here is that temp==max(temp) gives you a logical vector (a vector of TRUE and FALSE values) that's only TRUE for the elements of temp that are equal to the maximum value; names(temp)[temp==max(temp)] selects the elements of the names vector (the first row shown in the printout of temp above) that correspond to TRUE values ...

Matlab or R: replace elements in matrix by values from another matrix in order

I have a problem to solve in either Matlab or R (preferably in R).
Imagine I have a vector A with 10 elements.
I have also a vector B with 30 elements, of which 10 have value 'x'.
Now, I want to replace all the 'x' in B by the corresponding values taken from A, in the order that is established in A. Once a value in A is taken, the next one is ready to be used when the next 'x' in B is found.
Note that the sizes of A and B are different, it's the number of 'x' cells that coincides with the size of A.
I have tried different ways to do it. Any suggestion on how to program this?
As long as the number of x entries in B matches the length of A, this will do what you want:
B[B=='x'] <- A
(It should be clear that this is the R solution.)
MATLAB Solution
In MATLAB it's quite simple, use logical indexing:
B(B == 'x') = A;

How to delete rows in multiple columns by unique number?

Given data like this
C1<-c(3,-999.000,4,4,5)
C2<-c(3,7,3,4,5)
C3<-c(5,4,3,6,-999.000)
DF<-data.frame(ID=c("A","B","C","D","E"),C1=C1,C2=C2,C3=C3)
How do I go about removing the -999.000 data in all of the columns
I know this works per column
DF2<-DF[!(DF$C1==-999.000 | DF$C2==-999.000 | DF$C3==-999.000),]
But I'd like to avoid referencing each column. I am thinking there is an easy way to reference all of the columns in a particular data frame aka:
DF3<-DF[!(DF[,]==-999.000),]
or
DF3<-DF[!(DF[,(2:4)]==-999.000),]
but obviously these do not work
And out of curiosity, bonus points if you can me why I need that last comma before the ending square bracket as in:
==-999.000),]
The following may work
DF[!apply(DF==-999,1,sum),]
or if you can have multiple -999 on a row
DF[!(apply(DF==-999,1,sum)>0),]
or
DF[!apply(DF==-999,1,any),]
Based on your code, I'll assume that you want to remove all rows that contain -999.
DF2 <- DF[rowSums(DF == -999) == 0, ]
As for your bonus question: A data frame is a list of vectors, all of which have the same length. If we think of the vectors as columns, then a data frame can be thought of as a matrix where the columns might have different types (numeric, character, etc). R allows you to refer to elements of a data frame much the same way you refer to elements of a matrix; by using row and column indices. So DF[i, j] refers to the ith element in the jth vector of DF, which you can think of as the ith row and jth column. So if you want to retain only some of the rows of the data frame and all columns, you can use a matrix-like notation: DF[row.indices, ].
To address your "bonus" question, if we go to the documentation for ?Extract.data.frame we will find:
Data frames can be indexed in several modes. When [ and [[ are used
with a single index (x[i] or x[[i]]), they index the data frame as if
it were a list. In this usage a drop argument is ignored, with a
warning.
and also:
When [ and [[ are used with two indices (x[i, j] and x[[i, j]]) they
act like indexing a matrix: [[ can only be used to select one element.
Note that for each selected column, xj say, typically (if it is not
matrix-like), the resulting column will be xj[i], and hence rely on
the corresponding [ method, see the examples section.
So you need the comma to ensure that R knows you are referring to a row, not a column.
I don't understand if your target is to remove all the rows that contain at least one NA, if this is what you are looking for, then this could be a possible answer:
DF[DF==-999] <- NA
na.omit(DF)
ID C1 C2 C3
1 A 3 3 5
3 C 4 3 3
4 D 4 4 6

Why does R need the name of the dataframe?

If you have a dataframe like this
mydf <- data.frame(firstcol = c(1,2,1), secondcol = c(3,4,5))
Why would
mydf[mydf$firstcol,]
work but
mydf[firstcol,]
wouldn't?
You can do this:
mydf[,"firstcol"]
Remember that the column goes second, not first.
In your example, to see what mydf[mydf$firstcol,] gives you, let's break it down:
> mydf$firstcol
[1] 1 2 1
So really mydf[mydf$firstcol,] is the same as
> mydf[c(1,2,1),]
firstcol secondcol
1 1 3
2 2 4
1.1 1 3
So you are asking for rows 1, 2, and 1. That is, you are asking for your row one to be the same as row 1 of mydf, your row 2 to be the same as row 2 of mydf and your row 3 to be the same as row 1 of mydf; and you are asking for both columns.
Another question is why the following doesn't work:
> mydf[,firstcol]
Error in `[.data.frame`(mydf, , firstcol) : object 'firstcol' not found
That is, why do you have to put quotes around the column name when you ask for it like that but not when you do mydf$firstcol. The answer is just that the operators you are using require different types of arguments. You can look at '$' to see the form x$name and thus the second argument can be a name, which is not quoted. You can then look up ?'[', which will actually lead you to the same help page. And there you will find the following, which explains it. Note that a "character" vector needs to have quoted entries (that is how you enter a character vector in R (and many other languages).
i, j, ...: indices specifying elements to extract or replace. Indices
are ‘numeric’ or ‘character’ vectors or empty (missing) or
‘NULL’. Numeric values are coerced to integer as by
‘as.integer’ (and hence truncated towards zero). Character
vectors will be matched to the ‘names’ of the object (or for
matrices/arrays, the ‘dimnames’): see ‘Character indices’
below for further details.
Nothing to add to the very clear explanation of Xu Wang. You might want to note in addition that the package data.table allows you to use notation such as mydf[firstcol==1,] or mydf[,firstcol], that many find more natural.

Resources