I have a tibble ('df') with
> dim(df)
[1] 55 144
of which I extract a vector test <- c(df[,39]). I would expect the following result:
> length(test)
[1] 55
as I basically took column 39 from my tibble. Instead, I get
> length(test)
[1] 1
Now, class(test) yielded list, so I thought the class might be the reason; however, with class set to char, I get the same result.
I'm especially confused since length(df[39,]) yields [1] 155.
Background is I am searching in the vector using grep, which doesn't work with a vector taken from a column. Of course, as I am trying to recode all lines in my tibble, I can recode them by row instead of by column, so I think there is a workaround. However, what causes R to assume that test has length 1? What is the difference in the treatment of rows and columns?
Whenever you apply [] operation on a tibble, it always returns another tibble. This is one of differences between tibble structure and the data.frame in base R.
For example:
a <- 1:5
df = tibble(a,b=a*2,c=a^2)
df2 = as.data.frame(df) # convert to base data.frame
df[,2] # give a tibble, its dim is 5 1
df2[,2] # give a vector, its dim is NULL, its length is 5.
You see the return type from the data.frame has been changed from the original type. Meanwhile the tibble is designed in such way to keep the structure consistency between input and output type.
There are two ways, if you want to process certain column of a tibble as vectors.
pull()
[[ ]]
Personally, I am using pull(), which is also very intuitive.
Why length(df[39,]) yields 155?
My understanding is that df[39,] give you a tibble, its dim is 1 155. And its length is equal to the number of columns. Why? Because length also can give the length of lists. Behind of the design of tibble and data.frame, they are constructed by linked list. Each column is actually a list. That's why you can have different types in one tibble or data.frame.
Related
I'd like to get the size of a column using the index. I tried using the length() function with the column index inside, but it doesn't work:
length(bd[7])
I'm sorry if this is too basic, I'm new to R. Thank you!
The bd[7] is still a data.frame with single column and length for a data.frame is by default the number of columns. We need to extract the column as a vector and then use length. Extraction of column depends on the class i.e. if it is a data.frame/matrix, then bd[,7] would drop the dimensions and return a vector, but it is not the case with data.table/tibble. However, all of them works with either $ or [[
length(bd[[7]])
Or if it is a data.frame or vector, NROW would still work though
NROW(bd[7])
i.e.
> NROW(1:7)
[1] 7
> NROW(data.frame(col1 = 1:7))
[1] 7
Given a dataframe df and a function f which is applied to df:
df[] <- lapply(df, f)
What is the magic R is performing to replace columns in df with collection of vectors in the list from lapply? I see that the result from lapply is a list of vectors having the same names as the dataframe df. I assume some magic mapping is being done to map the vectors to df[], which is the collection of columns in df (methinks). Just works? Trying to better understand so that I remember what to use the next time.
A data.frame is merely a list of vectors having the same length. You can see it using is.list(a_data_frame). It will return TRUE.
[] can have different meaning or action depending of the object it is applied on. It even can be redefined as it is in fact a function.
[] allows to subset or insert vector columns from data.frame.
df[1] get the first column
df[1] <- 2 replace the first column with 2 (repeated in order to have the same length as other columns)
df[] return the whole data.frame
df[] <- list(c1,c2,c3) sets the content of the data.frame replacing it's current content
Plus a wide number of other way to access or set data in a data.frame (by column name, by subset of rows, of columns, ...)
I am new to R and I wanted to ask experts about the colnames function in R. Using the function I realized that it provides a NULL if used for single column of a matrix object, however it works perfectly fine for more than 1 columns of a matrix object. To illustrate, say I have matrix test
>test<-matrix(0,ncol=4,nrow=5)
>colnames(test)<-c("A","B","C","D")
>colnames(test[,1]) or colnames(test[,c(1)]) gives output as NULL
NULL
whereas the following works fine,
colnames(test[,c(1:2)])
[1] "A" "B"
I understand that alternative way is to use colnames(test)[c(1:2)]. Am I missing something here in the case where I am getting NULL.
If you look in the description of ?colnames. You'll see that it takes an argument x which is a a matrix-like R object, with at least two dimensions for colnames.
When you are calling colnames(test[,1]) you are giving colnames a vector with 1 dimension. Compare class(test[,1]) vs. class(test[,c(1:2)]). Vectors don't have columns or rows and therefore no column or row names. You can have named elements within a vector, but that is definitely not equivalent to the column names from a matrix
The best way to extract a single (or multiple) column name is to select the column after from the full vector of column names
colnames(test) # gives you all column names
colnames(test)[1] # gives you the column name 1
colnames(test)[c(1,2)] # gives you column names 1 and 2
Does this clarify this issue for you?
This seems to be simple but I can't find the answer.
I combine two vectors using cbind().
> first = c(1:5)
> second = c(6:10)
> values = cbind(first,second)
When I want to retrieve a single element using values[1,2] I always get the column name in addition to the actual element.
> values[1,2]
second
6
How can I get the value without the column name?
I know I can remove the column names in the matrix like in this post: How to remove column names from a matrix in R? But how can I leave the matrix as is and only get the value I want?
We can use unname
unname(values[1,2])
#[1] 6
Or as.vector
as.vector(values[1,2])
You can use the [[ operator to extact a single element,
values[[1,2]]
# [1] 6
Suppose I have a data.frame that's completely numeric. If I make one entry of the first column a character (for example), then the entire first column will become character.
Question: How do I reverse this. That is, how do I make it such that any character objects inside the data.frame that are "obviously" numeric objects are forced to be numeric?
MWE:
test <- data.frame(matrix(rnorm(50),10))
is(test[3,1])
test[1,1] <- "TEST"
is(test[3,1])
print(test)
So my goal here would be to go FROM the way that test is now, TO a state of affairs where test[2:10] is numeric. So I guess I'm asking for a function that does this over an entire data.frame.
Short answer is you cannot.
As was mentioned in the comments, in a data frame, all elements of a column must have the same mode.
If you would like to specifically find the values that are "number like" you can use the following (where vec here would be, say, a data frame column)
vec[!is.na(as.numeric((vec)))]
You can then convert these, but unfortunately you cannot put the converted values back into the same column. As as you do, they will be coerced back to character
As for a function that can convert the whole dataframe to numeric (realizing that isolating specific entries as exceptions is not possible), you can use sapply
sapply(dataFrameName, as.numeric)
You are allowed to have vectors of type list in a data.frame, and the list can contain any type of object except functions as long as it is of the same length as the other columns in the data.frame, e.g.:
mydataframe <- data.frame(numbers=1:3)
mydataframe$mylist <- list(1, 'plum', 5)
mydataframe
# numbers mylist
#1 1 1
#2 2 plum
#3 3 5
sapply(mydataframe, typeof)
# numbers mylist
#"integer" "list"
sapply(mydataframe$mylist, typeof)
#[1] "double" "character" "double"