How can I order a column of a matrix? - r

I have created a matrix out of two vectors
x<-c(1,118,3,220)
y<-c("A","B","C","D")
z<-c(x,y)
m<-matrix(z,ncol=2)
Now I want order the second row, but it doesn't work properly.
I tried:
m[order(m[,2]),]
The order should be 1,3,118,220, but it shows 1,118,220,3

The matrix can only hold one class which in this case would be character since you have "A","B","C","D".
So if still want to order the rows in matrix you need to subset the first column convert it into numeric, use order and then use them to reorder rows.
m[order(as.numeric(m[, 1])), ]
# [,1] [,2]
#[1,] "1" "A"
#[2,] "3" "C"
#[3,] "118" "B"
#[4,] "220" "D"
Since you have data with mixed data types why not store them in dataframe instead ?
x<-c(1,118,3,220)
y<-c("A","B","C","D")
df <- data.frame(x,y)
df[order(df[,1]),]
# x y
#1 1 A
#3 3 C
#2 118 B
#4 220 D

Related

Display identical columns in R dataframe

Suppose I have the following dataframe :
df <- data.frame(A=c(1,2,3),B=c("a","b","c"),C=c(2,1,3),D=c(1,2,3),E=c("a","b","c"),F=c(1,2,3))
> df
A B C D E F
1 1 a 2 1 a 1
2 2 b 1 2 b 2
3 3 c 3 3 c 3
I want to filter out the columns that are identical. I know that I can do it with
DuplCols <- df[duplicated(as.list(df))]
UniqueCols <- df[ ! duplicated(as.list(df))]
In the real world my dataframe has more than 500 columns and I do not know how many identical columns of the same kind I have and I do not know the names of the columns. However, each columnname is unique (as in df). My desired result is (optimally) a dataframe where in each row the column names of the identical columns of one kind are stored. The number of columns in the DesiredResult dataframe is the maximal number of identical columns of one kind in the original dataframe and if there are less identical columns of another kind, NA should be stored:
> DesiredResult
X1 X2 X3
1 A D F
2 B E NA
3 C NA NA
(With "identical column of the same kind" I mean the following: in df the columns A, D, F are identical columns of the same kind and B, E are identical columns of the same kind.)
You can use unique and then test with %in% where it matches to extract the colname.
tt_lapply(unique(as.list(df)), function(x) {colnames(df)[as.list(df) %in% list(x)]})
tt
#[[1]]
#[1] "A" "D" "F"
#
#[[2]]
#[1] "B" "E"
#
#[[3]]
#[1] "C"
t(sapply(tt, "length<-", max(lengths(tt)))) #As data.frame
# [,1] [,2] [,3]
#[1,] "A" "D" "F"
#[2,] "B" "E" NA
#[3,] "C" NA NA

Sort dataframe with multiple columns for multiple years

I have a data.frame with multiple columns and first column being Year. I want to sort my data frame in descending order for each year. I have fifteen years of data and then over 3000 columns.
I illustrate as follows:
Year A B C D
2000 2 3 4 NA
2001 3 4 NA 1
Desired output, my data frame has NAs as well but I can not remove those.
Year C B A
2000 4 3 2
Year B A D
2001 4 3 1
And this verion as well
Year
2000 C B A
2001 B A D
I have scripted this code
Asc <-order(df[-1], decreasing=True)
But I'm unable to obtain my desired output. I have referred in R sort row data in ascending order but still its different for what I'm looking for.
Would appreciate your help in this regard.
We can use apply with MARGIN=1. We loop through the rows of the dataset (excluding the first column) with apply, get the index of non-NA elements ('i1'), order the non-NA values descendingly ('i2'), and use that to rearrange the column names of the dataset.
m1 <- t(apply(df1[-1], 1, function(x) {
i1 <- !is.na(x)
i2 <- order(-x[i1])
names(df1)[-1][i1][i2]}))
m1
# [,1] [,2] [,3]
#[1,] "C" "B" "A"
#[2,] "B" "A" "D"
If we need the values and also the names, a list approach would be more suitable as it won't create any problems in the class
lst <- apply(df1[-1], 1, function(x){
i1 <- !is.na(x)
list(sort(x[i1],decreasing=TRUE))})
lst
#[[1]]
#[[1]][[1]]
#C B A
#4 3 2
#[[2]]
#[[2]][[1]]
#B A D
#4 3 1
We can extract the names or the elements from the 'lst'
do.call(rbind, do.call(`c`,rapply(lst, names,
how='list')))
# [,1] [,2] [,3]
#[1,] "C" "B" "A"
#[2,] "B" "A" "D"
Or
t(sapply(do.call(c, lst), names))
and the values as
t(simplify2array(do.call(c, lst)))

From list of characters to matrix/data-frame of numeric (R)

I have a long list, whose elements are lists of length one containing a character vector. These vectors can have different lengths.
The element of the vectors are 'characters' but I would like to convert them in numeric, as they actually represent numbers.
I would like to create a matrix, or a data frame, whose rows are the vectors above, converted into numeric. Since they have different lengths, the "right ends" of each row could be filled with NA.
I am trying to use the function rbind.fill.matrix from the library {plyr}, but the only thing I could get is a long numeric 1-d array with all the numbers inside, instead of a matrix.
This is the best I could do to get a list of numeric (dat here is my original list):
dat<-sapply(sapply(dat,unlist),as.numeric)
How can I create the matrix now?
Thank you!
I would do something like:
library(stringi)
temp <- stri_list2matrix(dat, byrow = TRUE)
final <- `dim<-`(as.numeric(temp), dim(temp))
The basic idea is that stri_list2matrix will convert the list to a matrix, but it would still be a character matrix. as.numeric would remove the dimensional attributes of the matrix, so we add those back in with:
`dim<-` ## Yes, the backticks are required -- or at least quotes
POC:
dat <- list(1:2, 1:3, 1:2, 1:5, 1:6)
dat <- lapply(dat, as.character)
dat
# [[1]]
# [1] "1" "2"
#
# [[2]]
# [1] "1" "2" "3"
#
# [[3]]
# [1] "1" "2"
#
# [[4]]
# [1] "1" "2" "3" "4" "5"
#
# [[5]]
# [1] "1" "2" "3" "4" "5" "6"
library(stringi)
temp <- stri_list2matrix(dat, byrow = TRUE)
final <- `dim<-`(as.numeric(temp), dim(temp))
final
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 1 2 NA NA NA NA
# [2,] 1 2 3 NA NA NA
# [3,] 1 2 NA NA NA NA
# [4,] 1 2 3 4 5 NA
# [5,] 1 2 3 4 5 6

Using r to read elements of a 2 column vector and return matches with corresponding 1st col name

I need to take a two column vector in r. In the first column I have dates, 12/21/2011 format and I have a number in the second column, 255 format.
I need to take a number of my choosing say 255 and see if it matches any numbers in my second column. If it does match I need to return the date it matched on.
I know about match, count, in etc. I just cannot seem to put it together. I am a newbie perhaps this is a bit beyond my ability but I figure that if I learn something I'll be that much better for it.
There are some partial matches in the questions but nothing as detailed as what I want.
If anyone has any examples that will teach me I'd be more than happy. If you have a reference in a book I will do that myself if you tell me the reference.
Thank you very much. I'm using R 2.13.1 in a Windows XP SP3 environment.
Getting acquainted with indexing in R will help you with this task (and many others), without the need for additional functions. To select only certain rows and columns in a matrix or dataframe, the format is x[rows,columns], where leaving either rows or columns blank displays all.
In your case, this is what we could do. First, let's create an example matrix (note that a '2 column vector' is actually a matrix):
x <- cbind(c("12/11/11", "12/10/11", "10/16/11",
"11/07/11"), c(1, 255, 3, 255))
# [,1] [,2]
#[1,] "12/11/11" "1"
#[2,] "12/10/11" "255"
#[3,] "10/16/11" "3"
#[4,] "11/07/11" "255"
Using a logical vector in your row index, you can return only the rows that contain a certain value. For instance, here's a logical vector for any row where column 2 = 255:
x[,2] == 255
#[1] FALSE TRUE FALSE TRUE
Inserting this logical vector into your row index will return only rows labeled TRUE.
x[x[,2]==255,]
# [,1] [,2]
#[1,] "12/10/11" "255"
#[2,] "11/07/11" "255"
To show only the dates, specify column 1 in your index:
x[x[,2]==255,1]
#[1] "12/10/11" "11/07/11"
You might want to look at subset().
> x1 <- rnorm(20)*10
> y1 <- rnorm(20)*5
> z1 <- cbind(round(abs(x1),0), round(abs(y1),0)) ## just creates 2 columns of data.
> z1
[,1] [,2]
[1,] 9 1
[2,] 6 6
[3,] 3 7
[4,] 10 0
[5,] 9 2
[6,] 7 7
[7,] 7 10
[8,] 3 1
[9,] 6 10
[10,] 6 5
[11,] 0 11
[12,] 5 0
[13,] 0 8
[14,] 2 4
[15,] 1 2
[16,] 3 3
[17,] 9 7
[18,] 12 4
[19,] 1 1
[20,] 6 3
> ss1 <- subset(z1, z1[,2]==2) ## creates subset of 'z1' where column 2 equals 2.
> ss1 ## shows contents of ss1
[,1] [,2]
[1,] 9 2
[2,] 1 2
Also consider using merge, have your column of lookup values and your column of dates in a data frame, then merge this with another data frame that has the value that you want to look up (or a bunch of values that you want to look up). By default it will return a data frame with only the values from both groups that match, you can set the arguments to keep those that don't match and they will have missing values to show that they did not match.

Getting a row from a data frame as a vector in R

I know that to get a row from a data frame in R, we can do this:
data[row,]
where row is an integer. But that spits out an ugly looking data structure where every column is labeled with the names of the column names. How can I just get it a row as a list of value?
Data.frames created by importing data from a external source will have their data transformed to factors by default. If you do not want this set stringsAsFactors=FALSE
In this case to extract a row or a column as a vector you need to do something like this:
as.numeric(as.vector(DF[1,]))
or like this
as.character(as.vector(DF[1,]))
You can't necessarily get it as a vector because each column might have a different mode. You might have numerics in one column and characters in the next.
If you know the mode of the whole row, or can convert to the same type, you can use the mode's conversion function (for example, as.numeric()) to convert to a vector. For example:
> state.x77[1,]
Population Income Illiteracy Life Exp Murder HS Grad Frost
3615.00 3624.00 2.10 69.05 15.10 41.30 20.00
Area
50708.00
> as.numeric(state.x77[1,])
[1] 3615.00 3624.00 2.10 69.05 15.10 41.30 20.00 50708.00
This would work even if some of the columns were integers, although they would be converted to numeric floating-point numbers.
There is a problem with what you propose; namely that the components of data frames (what you call columns) can be of different data types. If you want a single row as a vector, that must contain only a single data type - they are atomic vectors!
Here is an example:
> set.seed(2)
> dat <- data.frame(A = 1:10, B = sample(LETTERS[1:4], 10, replace = TRUE))
> dat
A B
1 1 A
2 2 C
3 3 C
4 4 A
5 5 D
6 6 D
7 7 A
8 8 D
9 9 B
10 10 C
> dat[1, ]
A B
1 1 A
If we force it to drop the empty (column), the only recourse for R is to convert the row to a list to maintain the disparate data types.
> dat[1, , drop = TRUE]
$A
[1] 1
$B
[1] A
Levels: A B C D
The only logical solution to this it to get the data frame into a common type by coercing it to a matrix. This is done via data.matrix() for example:
> mat <- data.matrix(dat)
> mat[1,]
A B
1 1
data.matrix() converts factors to their internal numeric codes. The above allows the first row to be extracted as a vector.
However, if you have character data in the data frame, the only recourse will be to create a character matrix, which may or may not be useful, and data.matrix() now can't be used, we need as.matrix() instead:
> dat$String <- LETTERS[1:10]
> str(dat)
'data.frame': 10 obs. of 3 variables:
$ A : int 1 2 3 4 5 6 7 8 9 10
$ B : Factor w/ 4 levels "A","B","C","D": 1 3 3 1 4 4 1 4 2 3
$ String: chr "A" "B" "C" "D" ...
> mat <- data.matrix(dat)
Warning message:
NAs introduced by coercion
> mat
A B String
[1,] 1 1 NA
[2,] 2 3 NA
[3,] 3 3 NA
[4,] 4 1 NA
[5,] 5 4 NA
[6,] 6 4 NA
[7,] 7 1 NA
[8,] 8 4 NA
[9,] 9 2 NA
[10,] 10 3 NA
> mat <- as.matrix(dat)
> mat
A B String
[1,] " 1" "A" "A"
[2,] " 2" "C" "B"
[3,] " 3" "C" "C"
[4,] " 4" "A" "D"
[5,] " 5" "D" "E"
[6,] " 6" "D" "F"
[7,] " 7" "A" "G"
[8,] " 8" "D" "H"
[9,] " 9" "B" "I"
[10,] "10" "C" "J"
> mat[1, ]
A B String
" 1" "A" "A"
> class(mat[1, ])
[1] "character"
How about this?
library(tidyverse)
dat <- as_tibble(iris)
pulled_row <- dat %>% slice(3) %>% flatten_chr()
If you know all the values are same type, then use flatten_xxx.
Otherwise, I think flatten_chr() is safer.
As user "Reinstate Monica" notes, this problem has two parts:
A data frame will often have different data types in each column that need to be coerced to character strings.
Even after coercing the columns to character format, the data.frame "shell" needs to stripped-off to create a vector via a command like unlist.
With a combination of dplyr and base R this can be done in two lines. First, mutate_all converts all columns to character format. Second, the unlist commands extracts the vector out of the data.frame structure.
My particular issue was that the second line of a csv included the actual column names. So, I wanted to extract the second row to a vector and use that to assign column names. The following worked to extract the row as a character vector:
library(dplyr)
data_col_names <- data[2, ] %>%
mutate_all(as.character) %>%
unlist(., use.names=FALSE)
# example of using extracted row to rename cols
names(data) <- data_col_names
# only for this example, you'd want to remove row 2
# data <- data[-2, ]
(Note: Using as.character() in place of unlist will work too but it's less intuitive to apply as.character twice.)
I see that the most short variant is
c(t(data[row,]))
However if at least one column in data is a column of strings, so it will return string vector.

Resources