How to get a matrix element without the column name in R? - r

This seems to be simple but I can't find the answer.
I combine two vectors using cbind().
> first = c(1:5)
> second = c(6:10)
> values = cbind(first,second)
When I want to retrieve a single element using values[1,2] I always get the column name in addition to the actual element.
> values[1,2]
second
6
How can I get the value without the column name?
I know I can remove the column names in the matrix like in this post: How to remove column names from a matrix in R? But how can I leave the matrix as is and only get the value I want?

We can use unname
unname(values[1,2])
#[1] 6
Or as.vector
as.vector(values[1,2])
You can use the [[ operator to extact a single element,
values[[1,2]]
# [1] 6

Related

R: Get column size using index

I'd like to get the size of a column using the index. I tried using the length() function with the column index inside, but it doesn't work:
length(bd[7])
I'm sorry if this is too basic, I'm new to R. Thank you!
The bd[7] is still a data.frame with single column and length for a data.frame is by default the number of columns. We need to extract the column as a vector and then use length. Extraction of column depends on the class i.e. if it is a data.frame/matrix, then bd[,7] would drop the dimensions and return a vector, but it is not the case with data.table/tibble. However, all of them works with either $ or [[
length(bd[[7]])
Or if it is a data.frame or vector, NROW would still work though
NROW(bd[7])
i.e.
> NROW(1:7)
[1] 7
> NROW(data.frame(col1 = 1:7))
[1] 7

R shifting values of dataframe to the left while preserving headers

I have a csv file with headers in the form :
a,b,c,d
1,6,5,6,8
df <- read_csv("test.csv")
For some reason there's the value 1 in the example is incorrect and to correct the file, Id like to shift all the other values to the left and thus drop 1 but preserving the columns ending with :
a,b,c,d
6,5,6,8
How can I achieve that ?
What about this:
headers <- names(df)
new_df <- df[, 2:length(df)]
names(new_df) <- headers
In one line of code, the structure command creates an object and assigns attributes:
structure(df[,2:length(df)], names = names(df)[1:(length(df)-1)])
Recognizing that a data.frame is a list of equal-length vectors, where each vector represents a column, the following will also work:
structure(df[2:length(df)], names = names(df)[1:(length(df)-1)])
Note no comma in df[1:length(df)].
Also, I like the trick of removing items from a vector or list using a negative index. So I think an even cleaner bit of code is:
structure(df[-1], names = names(df)[-length(df)])

Vector from tibble has length 0

I have a tibble ('df') with
> dim(df)
[1] 55 144
of which I extract a vector test <- c(df[,39]). I would expect the following result:
> length(test)
[1] 55
as I basically took column 39 from my tibble. Instead, I get
> length(test)
[1] 1
Now, class(test) yielded list, so I thought the class might be the reason; however, with class set to char, I get the same result.
I'm especially confused since length(df[39,]) yields [1] 155.
Background is I am searching in the vector using grep, which doesn't work with a vector taken from a column. Of course, as I am trying to recode all lines in my tibble, I can recode them by row instead of by column, so I think there is a workaround. However, what causes R to assume that test has length 1? What is the difference in the treatment of rows and columns?
Whenever you apply [] operation on a tibble, it always returns another tibble. This is one of differences between tibble structure and the data.frame in base R.
For example:
a <- 1:5
df = tibble(a,b=a*2,c=a^2)
df2 = as.data.frame(df) # convert to base data.frame
df[,2] # give a tibble, its dim is 5 1
df2[,2] # give a vector, its dim is NULL, its length is 5.
You see the return type from the data.frame has been changed from the original type. Meanwhile the tibble is designed in such way to keep the structure consistency between input and output type.
There are two ways, if you want to process certain column of a tibble as vectors.
pull()
[[ ]]
Personally, I am using pull(), which is also very intuitive.
Why length(df[39,]) yields 155?
My understanding is that df[39,] give you a tibble, its dim is 1 155. And its length is equal to the number of columns. Why? Because length also can give the length of lists. Behind of the design of tibble and data.frame, they are constructed by linked list. Each column is actually a list. That's why you can have different types in one tibble or data.frame.

Clarification in colnames function in R

I am new to R and I wanted to ask experts about the colnames function in R. Using the function I realized that it provides a NULL if used for single column of a matrix object, however it works perfectly fine for more than 1 columns of a matrix object. To illustrate, say I have matrix test
>test<-matrix(0,ncol=4,nrow=5)
>colnames(test)<-c("A","B","C","D")
>colnames(test[,1]) or colnames(test[,c(1)]) gives output as NULL
NULL
whereas the following works fine,
colnames(test[,c(1:2)])
[1] "A" "B"
I understand that alternative way is to use colnames(test)[c(1:2)]. Am I missing something here in the case where I am getting NULL.
If you look in the description of ?colnames. You'll see that it takes an argument x which is a a matrix-like R object, with at least two dimensions for colnames.
When you are calling colnames(test[,1]) you are giving colnames a vector with 1 dimension. Compare class(test[,1]) vs. class(test[,c(1:2)]). Vectors don't have columns or rows and therefore no column or row names. You can have named elements within a vector, but that is definitely not equivalent to the column names from a matrix
The best way to extract a single (or multiple) column name is to select the column after from the full vector of column names
colnames(test) # gives you all column names
colnames(test)[1] # gives you the column name 1
colnames(test)[c(1,2)] # gives you column names 1 and 2
Does this clarify this issue for you?

Count of Comma separated values in r

I have a column named subcat_id in which the values are stored as comma separated lists. I need to count the number of values and store the counts in a new column. The lists also have Null values that I want to get rid of.
I would like to store the counts in the n column.
We can try
nchar(gsub('[^,]+', '', gsub(',(?=,)|(^,|,$)', '',
gsub('(Null){1,}', '', df1$subcat_id), perl=TRUE)))+1L
#[1] 6 4
Or
library(stringr)
str_count(df1$subcat_id, '[0-9.]+')
#[1] 6 4
data
df1 <- data.frame(subcat_id = c('1,2,3,15,16,78',
'1,2,3,15,Null,Null'), stringsAsFactors=FALSE)
You can do
sapply(strsplit(subcat_id,","),FUN=function(x){length(x[x!="Null"])})
strsplit(subcat_id,",") will return a list of each item in subcat_id split on commas. sapply will apply the specified function to each item in this list and return us a vector of the results.
Finally, the function that we apply will take just the non-null entries in each list item and count the resulting sublist.
For example, if we have
subcat_id <- c("1,2,3","23,Null,4")
Then running the above code returns c(3,4) which you can assign to your column.
If running this from a dataframe, it is possible that the character column has been interpreted as a factor, in which case the error non-character argument will be thrown. To fix this, we need to force interpretation as a character vector with the as.character function, changing the command to
sapply(strsplit(as.character(frame$subcat_id),","),FUN=function(x){length(x[x!="Null"])})

Resources