Why can I not rename columns of a tbl? - r

I came across a weird function in dplyr's tbl:
df <- as.tibble(iris)
i <- colnames(df)[5]
df$new <- df[,i]
For some reason the newly created column new is named new.Species (at least when I View(df)), however it should be named new only....
I do not understand why this happens. An obious fix is to simply save df as a data.frame - but I still would like to understand what happens here.

Because the df[,i] is still a tibble with one column. We need df[[i]]:
df$new <- df[[i]]
With data.frame, when we use [, by default drop = TRUE (?Extract), but in tibble, it won't drop the dimensions to create a vector. We need [[ to extract the column.

Related

How can I get the column/variable names of a dataframe that fit certain parameters?

I came across a problem in my DataCamp exercise that basically asked "Remove the column names in this vector that are not factors." I know what they -wanted- me to do, and that was to simply do glimpse(df) and manually delete elements of the vector containing the column names, but that wasn't satisfying for me. I figured there was a simple way to store the column names of the dataframe that are factors into a vector. So, I tried two things that ended up working, but I worry they might be inefficient.
Example data Frame:
factorVar <- as.factor(LETTERS[1:10])
df1 <- data.frame(x = 1, y = 1:10, factorVar = sample(factorVar, 10))
My first solution was this:
vector1 <- names(select_if(df1, is.factor))
This worked, but select_if returns an entire tibble of a filtered dataframe and then gets the column names. Surely there's an easier way...
Next, I tried this:
vector2 <- colnames(df1)[sapply(df1,is.factor)]
This also worked, but I wanted to know if there's a quicker, more efficient way of filtering column names based on their type and then storing the results as a vector.

Trying to get rid of a data frame row

I am trying to get rid of a data frame row. I read the data with
temp_data <- read.table(blablabla)
and then when I try to get rid of the first row with
temp_data <- temp_data[-1,]
it turns temp_data into a vector. Why is this happening?
As commented by others, by default for [, it is drop=TRUE. From the ?"["
drop: For matrices and arrays. If TRUE the result is coerced to the
lowest possible dimension (see the examples). This only works for
extracting elements, not for the replacement. See drop for further
details.
So, we need
temp_data[-1, , drop=FALSE]
If we convert to data.table, for subsetting the rows, it is not needed,
library(data.table)
temp_data[-1]
data
temp_data <- data.frame(Col1 = 1:5)

How to indicate row.names=1 using fread() in data.table?

I want to consider the first column in my .csv file as a sequence of rownames. Usually I used to do the following:
read.csv("example_file.csv", row.names=1)
But I want to do this with the fread() function in the data.table R package, as it runs very quickly.
X <- as.matrix(fread("bigmatrix.csv"),rownames=1)
Why not saving the rownames in a column:
df <- data.frame(x=rnorm(1000))
df$row_name = row.names(df)
fwrite(df,file="example_file.csv")
Then you can load the saved CSV.
df <- fread(file="example_file.csv")
From a small search I've done, data.tables never uses row names. Since data.tables inherit from data.frames, it still has the row names attribute. But it never uses them.
However, you can probably use this answer (similar post) and later make the rowname column into your actual rownames. Though, it might not be efficient.
Just one function, convert to a dataframe
a <- fread(file="example_file.csv") %>% as.data.frame()
row.names(a) <- a$V1

renaming subset of columns in r with paste0

I have a data frame (my_df) with columns named after individual county numbers. I melted/cast the data from a much larger set to get to this point. The first column name is year and it is a list of years from 1970-2011. The next 3010 columns are counties. However, I'd like to rename the county columns to be "column_"+county number.
This code executes in R but for whatever reason doesn't update the column names. they remain solely the numbers... any help?
new_col_names = paste0("county_",colnames(my_df[,2:ncol(my_df)]))
colnames(my_df[,2:ncol(my_df)]) = new_col_names
The problem is the subsetting within the colnames call.
Try names(my_df) <- c(names(my_df)[1], new_col_names) instead.
Note: names and colnames are interchangeable for data.frame objects.
EDIT: alternate approach suggested by flodel, subsetting outside the function call:
names(my_df)[-1] <- new_col_names
colnames() is for a matrix (or matrix-like object), try simply names() for a data.frame
Example:
new_col_names=paste0("county_",colnames(my_df[,2:ncol(my_df)]))
my_df <- data.frame(a=c(1,2,3,4,5), b=rnorm(5), c=rnorm(5), d=rnorm(5))
names(my_df) <- c(names(my_df)[1], new_col_names)

Rename multiple dataframe columns, referenced by current names

I want to rename some random columns of a large data frame and I want to use the current column names, not the indexes. Column indexes might change if I add or remove columns to the data, so I figure using the existing column names is a more stable solution.
This is what I have now:
mydf = merge(df.1, df.2)
colnames(mydf)[which(colnames(mydf) == "MyName.1")] = "MyNewName"
Can I simplify this code, either the original merge() call or just the second line? "MyName.1" is actually the result of an xts merge of two different xts objects.
The trouble with changing column names of a data.frame is that, almost unbelievably, the entire data.frame is copied. Even when it's in .GlobalEnv and no other variable points to it.
The data.table package has a setnames() function which changes column names by reference without copying the whole dataset. data.table is different in that it doesn't copy-on-write, which can be very important for large datasets. (You did say your data set was large.). Simply provide the old and the new names:
require(data.table)
setnames(DT,"MyName.1", "MyNewName")
# or more explicit:
setnames(DT, old = "MyName.1", new = "MyNewName")
?setnames
names(mydf)[names(mydf) == "MyName.1"] = "MyNewName" # 13 characters shorter.
Although, you may want to replace a vector eventually. In that case, use %in% instead of == and set MyName.1 as a vector of equal length to MyNewName
plyr has a rename function for just this purpose:
library(plyr)
mydf <- rename(mydf, c("MyName.1" = "MyNewName"))
names(mydf) <- sub("MyName\\.1", "MyNewName", names(mydf))
This would generalize better to a multiple-name-change strategy if you put a stem as a pattern to be replaced using gsub instead of sub.
You can use the str_replace function of the stringr package:
names(mydf) <- str_replace(names(mydf), "MyName.1", "MyNewName")

Resources