Selecting columns from dataframe programmatically when column names have spaces - r

I have a dataframe which I would like to query. Note that the columns of that dataframe could change and the column names have spcaes. I have a function that I want to apply on the dataframe columns. I figured I could programmatically find out what columns exists and then use that list of columns to apply function to the columns that exist.
I was able to figure out how to do that when the column names don't have spaces: See the code below
library(tidyverse)
library(rlang)
col_names <- c("cyl","mpg","New_Var")
cc <- rlang::quos(col_names)
mtcars%>%mutate(New_Var=1)%>%select(!!!cc)
But when the column names have spaces, this method does not works, below is the code I used:
col_names <- c("cyl","mpg","`New Var`")
cc <- rlang::quos(col_names)
mtcars%>%mutate(`New Var`=1)%>%select(!!!cc)
Is there a way to select columns that have spaces in their name without changing their names ?

You have to do nothing differently for values with spaces. For example,
library(dplyr)
library(rlang)
col_names <- c("cyl","mpg","New Var")
cc <- quos(col_names)
mtcars %>% mutate(`New Var`=1) %>% select(!!!cc)
Also note, that select also accepts string names so this works too :
mtcars%>% mutate(`New Var`=1) %>% select(col_names)

Related

In R list ,how to set sub list names

How to set list names ,here is the code as below.
Currently,split_data include two sub list [[1]] and [[2]], how set names separately for them?
I want set name 'A' for [[1]],'B' for [[2]], so can retrieve data use split_data['A']...
Anyone can help on this, thanks ?
for instance ma <- list(a=c('a1','a2'),b=c('b1','b2')) can use ma["a"] for sub list
library(tidyverse)
test_data <- data.frame(category=c('A','B','A','B','A','B','A','B'),
sales=c(1,2,4,5,8,1,4,6))
split_data <- test_data %>% group_split(category)
Others have shown you in the comments how to get what you want using split() instead of group_split(). That seems like the easiest solution.
However, if you're stuck with the existing code, here's an alternative that keeps your current code, and adds the names.
library(tidyverse)
test_data <- data.frame(category=c('A','B','A','B','A','B','A','B'),
sales=c(1,2,4,5,8,1,4,6))
split_data <- test_data %>% group_split(category)
names(split_data) <- test_data %>% group_by(category) %>% group_keys() %>% apply(1, paste, collapse = ".")
The idea is to use group_by to split in the same way group_split does, then extract the keys as a tibble. This will have one row per group, but will have the different variables in separate columns, so I put them together by pasting the columns with a dot as separator. The last expression in the pipe is equivalent to apply(keys, 1, f)
where f is function(row) paste(row, collapse = "."). It applies f to each row of the tibble, producing a single name.
This should work even if the split happens on multiple variables, and produces names similar to those produced by split().

dplyr unnest() not working for large comma separated data

Trying to use dplyr's unnest function to split apart a large character data set separated by commas. The data set has the form:
id keywords
835a24fe-c276-9824-0f4d-35fc81319cca Analytics,Artificial Intelligence,Big Data,Health Care
I want to create a table that has the "id" in column one and each of the "keywords" in a separate column with the same "id"
I'm using the code:
CB_keyword <- tibble(id=organizations$uuid[organizations$uuid %in% org_uuid ] ,
keyword=organizations$category_list[organizations$uuid %in% org_uuid]) %>% unnest(keyword, names_sep = ",")
The %in% code is selecting "id" and "keyword" info from another table ... and it is doing this correctly. The piping to unnest seems to do nothing. The tibble remains unchanged except that the column name is now "keyword,keyword" instead of "keyword", but the data is the same as if the unnest command is not used.
If the keywords is a string column, use separate_rows instead of unnest
library(dplyr)
library(tidyr)
df1 %>%
separate_rows(keywords, sep=",\\s*")

How to fill a section of a column with already existing values corresponding to another column in R?

I'm working on some cleaning data for some flight trajectories and 'callsign' is a required field that I need to have filled in.
Section of the csv I am working with
The data I'm working with has almost 300000 rows and this issue of blank callsigns is quite repetitive. Is there any way I can fill these callsigns in based on their corresponding icao24 identification numbers?
I've tried using a tapply() function for sectioning off the data on the basis of their icao24 number and applying a function to each chunk ie.
tapply(myDF$callsign, myDF$icao24, ...)
But I can't seem to understand what 'function' I would be applying to each section because they are named differently. Would I need to use some sort of loop iterating over each section with a tapply() applied to each section?
If the values are blank (""), then do a group_by 'icao24' and replace the elements that are "" with the first element of non-blank 'callsign'
library(dplyr)
df2 <- df1%>%
group_by(icao24) %>%
mutate(callsign = replace(callsign, callsign == "",
first(callsign[callsign != ""])))
Another option is fill after converting the blank to NA
library(tidyr)
df2 <- df1 %>%
mutate(callsign = na_if(callsign, "")) %>%
group_by(icao24) %>%
fill(callsign)

How to export column names and labels as 1st and 2nd rows in spreadsheet [using R]?

Is there a way to export both the column name and label from R so that they appear as the 1st and 2nd row of a spreadsheet. I'm able to do the reverse (import) where I read in each row and then use names() and label() to assign the name/label. But I'm stuck on how to do the export without manually adding the labels as a row of data in R first.
Column name/label in Viewer
Here is a simple solution:
library(dplyr) #for pipes and mutate_if
library(purrr) #for map_chr
library(expss) #for apply_labels and labels management
iris2=iris %>%
mutate_if(is.factor, as.character) %>%
apply_labels(Sepal.Length="length", Sepal.Width="witdh", Petal.Length="length2", Petal.Width="width2", Species="spec")
library(Hmisc)
rtn=rbind(names(iris2), label(iris2), iris2)
rtn %>% head
You have to use mutate_if to change all factors to character vectors, like I did in my dummy dataset, else you would have NA instead of names and labels.
Still, please note that this leads to untidy data as the first non-heading row is not an observation. It may be OK for outputting though.

How to Rename Column Headers in R

I have two separate datasets: one has the column headers and another has the data.
The first one looks like this:
where I want to make the 2nd column as the column headers of the next dataset:
How can I do this? Thank you.
In general you can use colnames, which is a list of your column names of your dataframe or matrix. You can rename your dataframe then with:
colnames(df) <- *listofnames*
Also it is possible just to rename one name by using the [] brackets.
This would rename the first column:
colnames(df2)[1] <- "name"
For your example we gonna take the values of your column. Try this:
colnames(df2) <- as.character(df1[,2])
Take care that the length of the columns and the header is identical.
Equivalent for rows is rownames()
dplyr way w/ reproducible code:
library(dplyr)
df <- tibble(x = 1:5, y = 11:15)
df_n <- tibble(x = 1:2, y = c("col1", "col2"))
names(df) <- df_n %>% select(y) %>% pull()
I think the select() %>% pull() syntax is easier to remember than list indexing. Also I used names over colnames function. When working with a dataframe, colnames simply calls the names function, so better to cut out the middleman and be more explicit that we are working with a dataframe and not a matrix. Also shorter to type.
You can simply do this :
names(data)[3]<- 'Newlabel'
Where names(data)[3] is the column you want to rename.

Resources