Sorting changes list to integer in R - r

I have a list and when I apply sort() it changes the type to 'integer' which is not understandable to me. Help is really appreciated.
myfile.csv is a single column with values {"a","a","c","b","c","a"}
The code is as follows:
temp <- read.csv("myfile.csv",header=TRUE)
typeof(temp) ## prints: "list"
temp2 <- sort(temp[,1])
typeof(temp2) ## prints: "integer"
and now i can't refer elements in temp2 using temp2[1,] or temp2[2,] and get error
Error in `[.default`(temp3, 1, ) : incorrect number of dimensions

Use this command and temp2 will be a data frame with sorted values:
temp2 <- temp[order(temp[ , 1]), , drop = FALSE]

temp2 <- sort(temp[,1]) takes the first column of the data.frame temp, sorts it, and assigns it to temp2. The result is an atomic vector (possibly with additional attributes) because data.frame columns are atomic vectors (possibly with additional attributes). If you want the first element temp2, you can use temp2[1]. You should study help("[").

Related

How do I replace a value in a matrix/dataframe if that value is less than the corresponding value in a list?

I have a (20x12)matrix with numerical values and a list of 12 numbers. If a value in the matrix is less than the value in the list in the corresponding column index, I would like to replace it. How can I do it?
mat <- matrix(rpois(240,10),ncol=12)
list_to_replace <- rpois(12,10)
I think this is what is desired. Use the same logical index to pick out the positions of the possible replacements and the re-assignments:
t( apply(mat, 1, function(r) {
r[ r < list_to_replace] <- list_to_replace[ r < list_to_replace]; r}) )
The t is needed to transpose back because the apply function always delivers column-oriented result, even when the input is rowwise.
BTW; you would be well advised to only use the term "list" when referring to an R object with class "list". What you have is a "vector".
You could use the code below:
index <- t(t(mat) < list_to_replace)
mat[index] <- list_to_replace[which(index, TRUE)[, 2]]

Subset data.table based on value in column of type list

So I have this case currently of a data.table with one column of type list.
This list can contain different values, NULL among other possible values.
I tried to subset the data.table to keep only rows for which this column has the value NULL.
Behold... my attempts below (for the example I named the column "ColofTypeList"):
DT[is.null(ColofTypeList)]
It returns me an Empty data.table.
Then I tried:
DT[ColofTypeList == NULL]
It returns the following error (I expected an error):
Error in .prepareFastSubset(isub = isub, x = x, enclos = parent.frame(), :
RHS of == is length 0 which is not 1 or nrow (96). For robustness, no recycling is allowed (other than of length 1 RHS). Consider %in% instead.
(Just a precision my original data.table contains 96 rows, which is why the error message say such thing:
which is not 1 or nrow (96).
The number of rows is not the point).
Then I tried this:
DT[ColofTypeList == list(NULL)]
It returns the following error:
Error: comparison of these types is not implemented
I also tried to give a list of the same length than the length of the column, and got this same last error.
So my question is simple: What is the correct data.table way to subset the rows for which elements of this "ColofTypeList" are NULL ?
EDIT: here is a reproducible example
DT<-data.table(Random_stuff=c(1:9),ColofTypeList=rep(list(NULL,"hello",NULL),3))
Have fun!
If it is a list, we can loop through the list and apply the is.null to return a logical vector
DT[unlist(lapply(ColofTypeList, is.null))]
# ColofTypeList anotherCol
#1: 3
Or another option is lengths
DT[lengths(ColofTypeList)==0]
data
DT <- data.table(ColofTypeList = list(0, 1:5, NULL, NA), anotherCol = 1:4)
I have found another way that is also quite nice:
DT[lapply(ColofTypeList, is.null)==TRUE]
It is also important to mention that using isTRUE() doesn't work.

How to extract second value from a list of lists and give NA to lists with no second value?

I'd like to extract the first and second values from a list of lists. I was able to extract the first value with no issue. However, it gives me an error when I was trying to extract the second value because not all lists from the suggestion column has more than one value. How can I extract the second value from the suggestion column in mydf_1 and generate NA to those with no second value?
Below are the codes I wrote to get to the first suggestion, but when I do
mydf_1$second_suggestion <- lapply(mydf_1$suggestion, `[[`, 2)
it gives this error:
Error in FUN(X[[i]], ...) : subscript out of bounds
Thanks.
# create a data frame contains words
mydf <- data.frame("words"=c("banna", "pocorn and drnk", "trael", "rabbitt",
"emptey", "ebay", "templete", "interne", "bing",
"methog", "tullius"), stringsAsFactors=FALSE)
# add a custom word to the dictionary$
library(hunspell)
mydict_hunspell <- dictionary(lang="en_US", affix=NULL, add_words="bing",
cache=TRUE)
# use hunspell to identify misspelled words and create a row number column
# for later uses
mydf$words_checking <- hunspell(mydf$word, dict=mydict_hunspell)
mydf$row_num <- rownames(mydf)
# unlist the words_checking column and get suggestions for those misspelled
# words in another data frame
library(tidyr)
mydf_1 <- unnest(mydf, words_checking)
mydf_1$suggestion <- hunspell_suggest(mydf_1$words_checking)
# extract first suggestion from suggestion column
mydf_1$first_suggestion <- lapply(mydf_1$suggestion, `[[`, 1)
You can check the length of each list first before trying to extract the element of interest. Also, I recommend using sapply so that you have a character vector returned, as opposed to another list.
For the first suggestion:
index <- 1
sapply(mydf_1$suggestion, function(x) {if(length(x) < index) {NA} else {x[[index]]}})
And for the second suggestion and so on:
index <- 2
sapply(mydf_1$suggestion, function(x) {if(length(x) < index) {NA} else {x[[index]]}})
This could be wrapped into a larger function with a bit more code if you need to automate...
In theory, you could test with is.null(see How to test if list element exists? ), but I still got the same error trying that approach.

Comparison of of character type and factor type in R

Ok, so I am having this issue right now. I have a matrix A whose rownames are the values of a field in another matrix B. I want to find indices of my rownames in the second matrix B. Now I am trying to do this operation which(A$field == rowname_A) . Unfortunately couple of things are appearing one - the rowname_A variable is of character class. It is of this format , "X12345". The values of A$field is of type factor. Is there a way to remove the appended X from the character, convert it to factor and do the comparison. Or convert the factor variables of A$field in to character type and then do the comparison.
Help will be appreciated.
Thanks.
This is fairly straightfoward. The example below should help you out.
A <- matrix(1:3)
rownames(A) <- paste0("X", 1:3)
B <- data.frame(field = factor(1:3))
# Remove "X" from rownames(A) and check equality
B$field %in% substr(rownames(A), 2, nchar(rownames(A)))
# Add "X" to B$field and check equality
paste0("X", B$field) %in% rownames(A)

Losing Class information when I use apply in R

When I pass a row of a data frame to a function using apply, I lose the class information of the elements of that row. They all turn into 'character'. The following is a simple example. I want to add a couple of years to the 3 stooges ages. When I try to add 2 a value that had been numeric R says "non-numeric argument to binary operator." How do I avoid this?
age = c(20, 30, 50)
who = c("Larry", "Curly", "Mo")
df = data.frame(who, age)
colnames(df) <- c( '_who_', '_age_')
dfunc <- function (er) {
print(er['_age_'])
print(er[2])
print(is.numeric(er[2]))
print(class(er[2]))
return (er[2] + 2)
}
a <- apply(df,1, dfunc)
Output follows:
_age_
"20"
_age_
"20"
[1] FALSE
[1] "character"
Error in er[2] + 2 : non-numeric argument to binary operator
apply only really works on matrices (which have the same type for all elements). When you run it on a data.frame, it simply calls as.matrix first.
The easiest way around this is to work on the numeric columns only:
# skips the first column
a <- apply(df[, -1, drop=FALSE],1, dfunc)
# Or in two steps:
m <- as.matrix(df[, -1, drop=FALSE])
a <- apply(m,1, dfunc)
The drop=FALSE is needed to avoid getting a single column vector.
-1 means all-but-the first column, you could instead explicitly specify the columns you want, for example df[, c('foo', 'bar')]
UPDATE
If you want your function to access one full data.frame row at a time, there are (at least) two options:
# "loop" over the index and extract a row at a time
sapply(seq_len(nrow(df)), function(i) dfunc(df[i,]))
# Use split to produce a list where each element is a row
sapply(split(df, seq_len(nrow(df))), dfunc)
The first option is probably better for large data frames since it doesn't have to create a huge list structure upfront.

Resources