Why when I use mutate command the index column tend to disappear? - r

Using the data.frame mtcars, when I apply the mutate command, the index column from mtcars with cars names tend to disappear.
mtcars
mutate(mtcars, displ_l = disp / 61.0237)
I want to visualize the whole data.frame with new modifications. Could it be possible?
Thanks

Then, the way to turn row names from data.frame into a column is by using this:
setDT(df, keep.rownames = TRUE)[]

Related

Specific column selection from data.table in R

I have a data.table in R. The table has column names.
I have a vector named colnames with a few of the column names in the table.
colnames<-c("cost1", "cost2", "cost3")
I want to select the columns whose names are in the vector colnames from the table.
The name of the table is dt.
I have tried doing the following:
selected_columns <- dt[,colnames]
But this, does not work, and I get an error.
However, when I try the following, it works:
selected_columns <- dt[,c("cost1", "cost2", "cost3")]
I want to use the vector variable (colnames) to access the columns and not the c("..") method.
How can I do so?
You can try like this:
dt[, .SD, .SDcols = colnames]
Meanwhile, data.table gives an alternative choice in recent version:
dt[, ..colnames]
Another nice alternatives is leveraging select from tidyverse/dplyr universe. select gives you a lot of flexibility when selecting columns from a data frame / tibble or data table.
library("data.table")
library("tidyverse")
df <- data.table(mtcars)
columns_to_select <- c("cyl", "mpg")
select(df, columns_to_select)
You can also skip quoting column names if you wish
select(df, c(cyl, mpg))
or leverage ellipsis and pass multiple quoted or unquoted names
Here, comparing multiple objects.
objs <- list(select(df, c(cyl, mpg)),
select(df, cyl, mpg),
select(df, "cyl", "mpg"))
outer(objs, objs, Vectorize(all.equal))
You may want to have a further look at dtplyr, which provides a bridge between data table and dplyr, if you want to go this route.

How to split a dataframe a list of dataframes (while dropping the grouped column)

Similar questions have been asked, but I cannot figure out one last step.
How can I split a large data frame into a list of the data.frames and drop the column that grouped the rows into a specific dataframe?
Example:
#Load large dataframe
data <- mtcars
# split into a list based on "cyl" column
data_list <- split(data, f=data$cyl, drop = TRUE)
Then from here I want to remove "cyl" column from all of the dataframes in the list. Without going through each dataframe in the list, is there a way to remove this column?
Thanks!
While we do the split, we can subset
data_list <- split(data[setdiff(names(data), 'cyl')], f=data$cyl, drop = TRUE)
Or if it is already created, then use
data_list <- lapply(data_list, subset, select = -cyl)
Or another option is group_split from dplyr and make use of .keep which is TRUE by default
library(dplyr)
data_list <- data %>%
group_split(cyl, .keep = FALSE)

How to export column names and labels as 1st and 2nd rows in spreadsheet [using R]?

Is there a way to export both the column name and label from R so that they appear as the 1st and 2nd row of a spreadsheet. I'm able to do the reverse (import) where I read in each row and then use names() and label() to assign the name/label. But I'm stuck on how to do the export without manually adding the labels as a row of data in R first.
Column name/label in Viewer
Here is a simple solution:
library(dplyr) #for pipes and mutate_if
library(purrr) #for map_chr
library(expss) #for apply_labels and labels management
iris2=iris %>%
mutate_if(is.factor, as.character) %>%
apply_labels(Sepal.Length="length", Sepal.Width="witdh", Petal.Length="length2", Petal.Width="width2", Species="spec")
library(Hmisc)
rtn=rbind(names(iris2), label(iris2), iris2)
rtn %>% head
You have to use mutate_if to change all factors to character vectors, like I did in my dummy dataset, else you would have NA instead of names and labels.
Still, please note that this leads to untidy data as the first non-heading row is not an observation. It may be OK for outputting though.

Add a column based on the dynamically named columns

A new column must be added to the existed dataframe so, that it is the mean of some other columns which are selected dynamiclly.
I prefer using dplyr, and thus the solution might look like something as follows:
selected_columns <- c("am", "mpg")
dplyr::mutate_at(mt_cars, vars(selected_columns), funs(new_col = rowMeans(.)))
Is there a way to modify this chunk or is another approach required?
Here, we just need to subset the columns of data (. ) with the string vector and get the rowMeans
library(dplyr)
mtcars %>%
mutate(new_col = rowMeans(.[selected_columns]))
mutate doesn't have the funs parameter (funs is already deprecated with list) and it is in mutate_if/mutate_at/mutate_all.

How to Rename Column Headers in R

I have two separate datasets: one has the column headers and another has the data.
The first one looks like this:
where I want to make the 2nd column as the column headers of the next dataset:
How can I do this? Thank you.
In general you can use colnames, which is a list of your column names of your dataframe or matrix. You can rename your dataframe then with:
colnames(df) <- *listofnames*
Also it is possible just to rename one name by using the [] brackets.
This would rename the first column:
colnames(df2)[1] <- "name"
For your example we gonna take the values of your column. Try this:
colnames(df2) <- as.character(df1[,2])
Take care that the length of the columns and the header is identical.
Equivalent for rows is rownames()
dplyr way w/ reproducible code:
library(dplyr)
df <- tibble(x = 1:5, y = 11:15)
df_n <- tibble(x = 1:2, y = c("col1", "col2"))
names(df) <- df_n %>% select(y) %>% pull()
I think the select() %>% pull() syntax is easier to remember than list indexing. Also I used names over colnames function. When working with a dataframe, colnames simply calls the names function, so better to cut out the middleman and be more explicit that we are working with a dataframe and not a matrix. Also shorter to type.
You can simply do this :
names(data)[3]<- 'Newlabel'
Where names(data)[3] is the column you want to rename.

Resources