Referencing Row Number in R - r

How do I reference the row number of an observation? For example, if you have a data.frame called "data" and want to create a variable data$rownumber equal to each observation's row number, how would you do it without using a loop?

These are present by default as rownames when you create a data.frame.
R> df = data.frame('a' = rnorm(10), 'b' = runif(10), 'c' = letters[1:10])
R> df
a b c
1 0.3336944 0.39746731 a
2 -0.2334404 0.12242856 b
3 1.4886706 0.07984085 c
4 -1.4853724 0.83163342 d
5 0.7291344 0.10981827 e
6 0.1786753 0.47401690 f
7 -0.9173701 0.73992239 g
8 0.7805941 0.91925413 h
9 0.2469860 0.87979229 i
10 1.2810961 0.53289335 j
and you can access them via the rownames command.
R> rownames(df)
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10"
if you need them as numbers, simply coerce to numeric by adding as.numeric, as in as.numeric(rownames(df)).
You don't need to add them, as if you know what you are looking for (say item df$c == 'i', you can use the which command:
R> which(df$c =='i')
[1] 9
or if you don't know the column
R> which(df == 'i', arr.ind=T)
row col
[1,] 9 3
you may access the element using df[9, 'c'], or df$c[9].
If you wanted to add them you could use df$rownumber <- as.numeric(rownames(df)), though this may be less robust than df$rownumber <- 1:nrow(df) as there are cases when you might have assigned to rownames so they will no longer be the default index numbers (the which command will continue to return index numbers even if you do assign to rownames).

Simply:
data$rownumber = 1:nrow(Data)

Perhaps with dataframes, one of the easiest and most practical solutions is:
data = dplyr::mutate(data, rownum=row_number())

This is probably the simplest way:
data$rownumber = 1:dim(data)[1]
It's probably worth noting that if you want to select a row by its row index, you can do this with simple bracket notation
data[3,]
vs.
data[data$rownumber==3,]
So I'm not really sure what this new column accomplishes.

Related

Is there a way to retrieve the vectors selected by fcoalesce?

When using fcoalesce, is there any way I can retrieve the indices or names of the selected vectors?
Here is a simplified two-vector example, for the following coalesce of vectors a and b:
library(data.table)
a = c(NA,2,3,4,NA)
b = c(1,3,3,4,5)
fcoalesce(a,b)
[1] 1 2 3 4 5
I'd like to see something like this:
b,a,a,a,b
A real life example could have any number of vectors.
We can use ifelse - coalesce is simply taking the first non-NA for each row/element between two vectors/columns. Thus, create a logical condition for NA elements, and specify the 'yes', 'no' as the object names
ifelse(is.na(a), 'b', 'a')
[1] "b" "a" "a" "a" "b"
I managed to solve it by merging all vectors into a data.table (dt_combined) and coalescing them iteratively:
apply(dt_combined, 1, function(i){
(1:length(dt_combined))[ which(!is.na(i))[1] ]
})
One could also get the column names instead of the column index:
apply(dt_combined, 1, function(i){
colnames(dt_combined)[ which(!is.na(i))[1] ]
})

Split dataframe columns into vectors in R

I have a dataframe as such:
Number <- c(1,2,3)
Number2 <- c(10,12,14)
Letter <- c("A","B","C")
df <- data.frame(Number,Number2,Letter)
I would like to split the df into its respective three columns, each one becoming a vector with the respective column name. In essence, the output should look exactly like the original three input vectors in the above example.
I have tried the split function and also using for loop, but without success.
Any ideas? Thank you.
We may use unclass as data.frame is a list with additional attributes. By unclassing, it removes the data.frame attribute
unclass(df)
Or another option is asplit with MARGIN specified as 2
asplit(df, 2)
NOTE: Both of them return a named list. If we intend to create new objects in the global env, use list2env (not recommended though)
We can use c oras.list
> c(df)
$Number
[1] 1 2 3
$Number2
[1] 10 12 14
$Letter
[1] "A" "B" "C"
> as.list(df)
$Number
[1] 1 2 3
$Number2
[1] 10 12 14
$Letter
[1] "A" "B" "C"
Assuming you are trying to create these as vectors if the global environment, use list2env:
df <- data.frame(Number = c(1, 2, 3),
Number2 = c(10, 12, 14),
Letter = c("A", "B", "C"))
list2env(df, .GlobalEnv)
## <environment: R_GlobalEnv>
ls()
## [1] "df" "Letter" "Number" "Number2"
list2env is clearly the easiest way, but if you want to do it with a for loop it can also be achieved.
The "tricky" part is to make a new vector based on the column names inside the for loop. If you just write
names(df[i]) <- input
a vector will not be created.
A workaround is to use paste to create a string with the new vector name and what should be in it, then use "eval(parse(text=)" to evaluate this expression.
Maybe not the most elegant solution, but seems to work.
for (i in colnames(df)){
vector_name <- names(df[i])
expression_to_be_evaluated <- paste(vector_name, "<- df[[i]]")
eval(parse(text=expression_to_be_evaluated))
}
> Letter
[1] A B C
Levels: A B C
> Number
[1] 1 2 3
> Number2
[1] 10 12 14

Creating names for a list of doubles to a "List of named Vectors"

What I tried to do:
In aphid package there is a function deriveHMM() which needs to be fed with a list like:
x <- list(c("c"="10.0", "b"="5.0","c"="10.0", "a"="1.0", "a"="2.0",...))
which needs to be created of a very large input vector like
iv <- c(10, 5, 10, 1, 2,...)
It is important, that the order of my original input vector remains unchanged.
I need to automatically create this list by a large input of doubles from a .csv file (import of doubles to R worked fine). Each double has to get a name depending on its closest distance to a predefined value, for example:
all doubles ranging from 0 to 2.5 should be named "a"
all doubles ranging from 2.5 to 7.5 should be named "b"
all doubles greater than 7.5 should be named "c"
and after that all doubles be converted to a character (or string (?)) so the method deriveHMM() accepts the input.
I would be very happy to have suggestions. I am new to R and this is my first post on Stackoverflow.com. I am not an experienced programmer, but I try my best to understand your help.
EDIT:
Updated the question, because what I need is a "List of named vectors of characters", exactly like in my example above without changing the order.
This solution uses findInterval to get an index into a tags vector, the vector of names.
set.seed(1234) # Make the results reproducible
x <- runif(10, 0, 20)
tags <- letters[1:3]
breaks <- c(0, 2.5, 7.5, Inf)
names(x) <- tags[findInterval(x, breaks)]
x
# a c c c c
# 2.2740682 12.4459881 12.1854947 12.4675888 17.2183077
# c a b c c
#12.8062121 0.1899151 4.6510101 13.3216752 10.2850228
Edit.
If you need x to be of class "character", get the index into tags first, then coerce x to character and only then assign the names attribute.
i <- findInterval(x, breaks)
x <- as.character(x)
names(x) <- tags[i]
x
# a c c
# "2.27406822610646" "12.4459880962968" "12.1854946576059"
# c c c
# "12.4675888335332" "17.2183076711372" "12.8062121057883"
# a b c
#"0.189915127120912" "4.65101012028754" "13.321675164625"
# c
# "10.2850228268653"
Here is an example, where x represents your input vector.
x <- seq(1, 10, 0.5)
The first step is to give your elements names depending on their values.
names(x) <- ifelse(x <= 2.5, "a", ifelse(x > 2.5 & x <= 7.5, "b", "c"))
Next, split your vector and a apply as.character. We can use by here.
lst <- by(x, names(x), as.character, simplify = TRUE)
is.list(lst)
# [1] TRUE
Result
lst
#names(x): a
#[1] "1" "1.5" "2" "2.5"
#-----------------------------------------------------------------------------------------------------------------------
#names(x): b
# [1] "3" "3.5" "4" "4.5" "5" "5.5" "6" "6.5" "7" "7.5"
#-----------------------------------------------------------------------------------------------------------------------
#names(x): c
#[1] "8" "8.5" "9" "9.5" "10"
You could also use split and lapply as shown below, by is shorthand of such an approach.
lapply(split(x, names(x)), as.character)

Convert letters to numbers

I have a bunch of letters, and cannot for the life of me figure out how to convert them to their number equivalent.
letters[1:4]
Is there a function
numbers['e']
which returns
5
or something user defined (ie 1994)?
I want to convert all 26 letters to a specific value.
I don't know of a "pre-built" function, but such a mapping is pretty easy to set up using match. For the specific example you give, matching a letter to its position in the alphabet, we can use the following code:
myLetters <- letters[1:26]
match("a", myLetters)
[1] 1
It is almost as easy to associate other values to the letters. The following is an example using a random selection of integers.
# assign values for each letter, here a sample from 1 to 2000
set.seed(1234)
myValues <- sample(1:2000, size=26)
names(myValues) <- myLetters
myValues[match("a", names(myValues))]
a
228
Note also that this method can be extended to ordered collections of letters (strings) as well.
You could try this function:
letter2number <- function(x) {utf8ToInt(x) - utf8ToInt("a") + 1L}
Here's a short test:
letter2number("e")
#[1] 5
set.seed(123)
myletters <- letters[sample(26,8)]
#[1] "h" "t" "j" "u" "w" "a" "k" "q"
unname(sapply(myletters, letter2number))
#[1] 8 20 10 21 23 1 11 17
The function calculates the utf8 code of the letter that it is passed to, subtracts from this value the utf8 code of the letter "a" and adds to this value the number one to ensure that R's indexing convention is observed, according to which the numbering of the letters starts at 1, and not at 0.
The code works because the numeric sequence of the utf8 codes representing letters respects the alphabetic order.
For capital letters you could use, accordingly,
LETTER2num <- function(x) {utf8ToInt(x) - utf8ToInt("A") + 1L}
The which function seems appropriate here.
which(letters == 'e')
#[1] 5
Create a lookup vector and use simple subsetting:
x <- letters[1:4]
lookup <- setNames(seq_along(letters), letters)
lookup[x]
#a b c d
#1 2 3 4
Use unname if you want to remove the names.
thanks for all the ideas, but I am a dumdum.
Here's what I did. Made a mapping from each letter to a specific number, then called each letter
df=data.frame(L=letters[1:26],N=rnorm(26))
df[df$L=='e',2]

Create header of a dataframe from the first row in the data frame

Suppose I have a dataframe:
a <- data.frame(a=c("f", 2, 3), b=c("g", 3, 7), c=c("h", 2, 4))
and I would like to create column names from the first row. MY guess was:
names(a) <- a[1,]
which gives:
names(a)
[1] "3" "3" "3"
I did not fully get what is happening. Can anyone explain and help me on how to do it the right way?
The columns of a are factors with each column having different levels. R casts them to ints. Since, for each column the letter appears last alphanumerically it gets assigned the value 3.
Try
names(a) = as.character(unlist(a[1,]))
names(a)
Try this:
> colnames(a) <- unlist(a[1,])
> a
f g h
1 f g h
2 2 3 2
3 3 7 4
janitor::row_to_names(a,1)
janitor package gives the cleanest way to shift any data row up as column names.

Resources