How to sort dataframe by using variablename - r

I want to sort an DataFrame by a column which is specified by an object.
What I want to do is
data <- dplyr::arrange(data, desc(`column_name`))
by replacing column_name to object like str_c("column_", "name") because I want to sort by condition.
Those codes does not work.
data <- dplyr::arrange(data, desc(str_c("column_", "name")))
data <- dplyr::arrange(data, desc(colnames(data[str_c("column_", "name")])))
My code returns
"Error: incorrect size (1) at position 1, expecting : columnlength"

An option would be to convert to symbol and then evaluate (!!)
library(stringr)
dplyr::arrange(data, desc(!! rlang::sym(str_c("column_", "name"))))

Related

Using read_csv specifying data types for groups of columns in R

I would like to use read_csv because I am working with a large data. The types of variables are reading incorrectly because I have many missing values. It would be possible to identify the type of variable (column) from the name of the variable, because it includes "DATE" if it is a date-type, "Names" if it is a character type and a rest of the variables can have a default 'col_guess' type. I do not want to type all the 55 variables so I tried this code first:
df <- read_csv('df.csv', col_types = cols((grepl("DATE$", colnames(df))==T)=col_date()), cols((grepl("Name$", colnames(df))==T)=col_character()))
I received tghis message:
Error: unexpected '=' in "df <- read_csv('df.csv', col_types = cols((grepl("DATE$", colnames(df))==T)="
So I tried to write a loop and because the df data is already in R (but the wrongly identified data variables' values have been deleted).
for (colname in colnames(df)){
if (grepl("DATE$", colname)==T){
ct1 <- cols(colname=col_date("%d/%m/%Y"))
}else if (grepl("Name$", colname)==T){
ct2 <- cols(colname=col_character())
}else{
ct3 <- cols(colname=col_guess())
tx <- c(ct1, ct2, ct3)
print(tx)
}
}
It does not do what I would like to get as an output and I do not know how I would need to continue if I would get the loop right.
The data is a public data, you can download it here (BasicCompanyDataAsOneFile): http://download.companieshouse.gov.uk/en_output.html
Any suggestion would be appreciated, thank you.
Since the data is already read in R, you can identify the columns by their names and apply the function to their respective columns.
df <- readr::read_csv('df.csv')
date_cols <- grep('DATE$', names(df))
char_cols <- grep('Name$', names(df))
df[date_cols] <- lapply(df[date_cols], as.Date)
df[char_cols] <- lapply(df[char_cols], as.character)
You can also try type.convert which automatically changes data to their respective types but it might not work for date columns.
df <- type.convert(df)
I read the data in using read_csv
df <- read_csv('DF.csv', col_types = cols(.default="c"))
then I used the following codes for changing the columns' data types
date_cols <- grep('DATE$', names(df))
df[date_cols] <- lapply(df[date_cols], as.Date)

iterate through DataFrame in R to change columns types

I come from Python and I am not sure how to accomplish this in R. I want to write a function that takes two arguments. A dataframe and a list of column names. I want iterate through the dataframe to convert the column names that match the ones in the list.
the list of column names I want to convert, the type is character
col.names<-c('Ri','Na','Mg')
I wrote this function but it is not returning the desired output
function.convert<- function(df,col.names){
for (i in colnames(df)) {
if (i %in% col.names){
as.factor(i)}
}}
my desired output is the same dataframe but with specified columns converted to factor type.
You can do
df[col.names] <- do.call(cbind.data.frame, lapply(df[col.names], as.factor))
or using dplyr
df %>% mutate_at(col.names, as.factor)
Or as a function
f <- function(df, col.names) {
df[col.names] <- do.call(cbind.data.frame, lapply(df[col.names], as.factor))
df
}

For Loop to convert string to list

I have a column in a data frame, which contains string values. I want to convert these values to lists of characters. When i try to execute the following code:
library(tidyverse)
col <- c("a,b,c,d","e,f,h")
df <- data_frame(col)
for (i in 1:length(df$col)) {
df$col[[i]] <- as.vector(unlist(strsplit(df$col[[i]],",")),mode ="list")
}
i get this error message:
Error in df$col[[i]] <- as.vector(unlist(strsplit(df$col[[i]], ",")), : more elements supplied than there are to replace
Traceback:
Is there a way to convert all the values in the column to lists ?
Thanks
If I understand your question correctly, then this will do the trick:
rapply(df, list)

In R, how do I treat a parameter as a variable, where that variable is the name of its contents? [duplicate]

I have this sample code to create a new data frame 'new_data' from the existing data frame 'my_data'.
new_data = NULL
n = 10 #this number correspond to the number of rows in my_data
conditions = c("Bas_A", "Bas_T", "Oper_A", "Oper_T") # the vector characters correspond to the target column names in my_data
for (cond in conditions){
for (i in 1:n){
new_data <- rbind(new_data, c(cond, my_data$cond[i]))
}
}
The problem is that my_data$cond (where cond is a variable, and not the column name) is not accepted.
How can I call a column of a data frame by using, after the dollar sign, a variable value?
To access a column, use:
my_data[ , cond]
or
my_data[[cond]]
The ith row can be accessed with:
my_data[i, ]
Combine both to obtain the desired value:
my_data[i, cond]
or
my_data[[cond]][i]
I guess you need get().
For example,
get(x,list), where list is the list and x is the variable(can be a string), which equals list$x.
But in get(x,list), x can be a variable while using $, x cannot be a variable.
$ works on columns, not individual column objects. It's a form of vectorization. The code
corrections$BookDate = as.Date(corrections$BookDate, format = "%m/%d/%Y")
converts the contents of the BookDate column of the corrections table from strings to Date objects. It performs it in one operation, assignment.
Do the following and it will fix your problem:
new_data <- rbind(new_data, c(cond, my_data$cond))

Eliminate dataframe rows that match a character string

I have a dataframe rawdata with columns that contain ecological information. I am trying to eliminate all of the rows for which the column LatinName matches a vector of species for which I already have some data, and create a new dataframe with only the species that are missing data. So, what I'd like to do is something like:
matches <- c("Thunnus thynnus", "Balaenoptera musculus", "Homarus americanus")
# obviously these are a random subset; the real vector has ~16,000 values
rawdata_missing <- rawdata %>% filter(LatinName != "matches")
This doesn't work because the boolean operator can't be applied to a character string. Alternatively I could do something like this:
rawdata_missing <- filter(rawdata, !grepl(matches, LatinName)
This doesn't work either because !grepl also can't use the character string.
I know there are a lot of ways I could subset rawdata using the rows where LatinName IS in matches, but I can't figure out a neat way to subset rawdata such that LatinName is NOT in matches.
Thanks in advance for the help!
filteredData <- rawdata[!(rawdata$LatinName %in% Matches), ]
Another way by using subset, paste, mapply and grepl is...
fileteredData <- subset(rawdata,mapply(grepl,rawdata$LatinName,paste(Matches,collapse = "|")) == FALSE)

Resources