Write a function in R to change a group of datasets layout

Write a function in R to change a group of datasets layout - r

I have many datasets in tibble format, with variables as rows. I want to change the layout and wrangle individual dataset. To save myself from repetitive work and risk of making mistakes. I wrote this function in R to do this.
library(tidyverse)
change_data_layout<- function(data_df){
data_df_2 <- data_df %>% mutate(samples = colnames()) %>% t()
colnames(data_df_2) <-data_df_2[1,]
rownames <- rownames(data_df_2) [2:nrow(data_df_2)]
data_df_3 <- data_df_2[1:nrow(data_df_2),] %>% as_tibble() %>% mutate(samples = rownames)
colnames(data_df_3) <- data_df_3 [1,]
data_df_4 <- data_df_3[2:nrow(data_df_3),]
data_final <- data_df_4 %>%
mutate_each(funs(type.convert)) %>% mutate_if(is.factor, as.character)
return(data_final)
}
However, when I run this function as :
dataset1_final <- change_data_layout(dataset1)
I got this error message:
Error: argument "x" is missing, with no default
Called from: mutate_impl(.data, dots)
Any help and suggestions?

Related

Problem with mutate when trying to create a line_id column

I need to create a line ID column within a dataframe for further pre-processing steps. The code worked fine up until yesterday. Today, however I am facing the error message:
"Error in mutate():
ℹ In argument: line_id = (function (x, y) ....
Caused by error:
! Can't convert y to match type of x ."
Here is my code - the dataframe consists of two character columns:
split_text <- raw_text %>%
mutate(text = enframe(strsplit(text, split = "\n", ))) %>%
unnest(cols = c(text)) %>%
unnest(cols = c(value)) %>%
rename(text_raw = value) %>%
select(-name) %>%
mutate(doc_id = str_remove(doc_id, ".txt")) %>%
# removing empty rows + add line_id
mutate(line_id = row_number())
Besides row_number(), I also tried rowid_to_column, and even c(1:1000) - the length of the dataframe. The error message stays the same.

Try explicitly specifying the data type of the "line_id" column as an integer using the as.integer() function, like this:
mutate(line_id = as.integer(row_number()))

This code works but is not fully satisfying, since I have to break the pipe:
split_text$line_id <- as.integer(c(1:nrow(split_text)))

How to change variable to factor based on its name in some list by using across?

(I am new in R)
Trying to change variables data type of df members to factors based on condition if their names available in a list to_factors_list.
I have tried some code using mutate(across()) but it's giving errors.
Data prep.:
library(tidyverse)
# tidytuesday himalayan data
members <- read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-09-22/members.csv")
# creating list of names
to_factors_list <- members %>%
map_df(~(data.frame(n_distinct = n_distinct(.x))),
.id = "var_name") %>%
filter(n_distinct < 15) %>%
select(var_name) %>% pull()
to_factors_list
############### output ###############
'season''sex''hired''success''solo''oxygen_used''died''death_cause''injured''injury_type'
Getting error in below code attempts:
members %>%
mutate(across(~.x %in% to_factors_list, factor))
members %>%
mutate_if( ~.x %in% to_factors_list, factor)
I am not sure what's wrong and how can I make this work ?

In base R, this can be done with lapply
members[to_factors_list] <- lapply(members[to_factors_list], factor)

The correct syntax is:
members %>% mutate(across(to_factors_list, factor))
Or if you prefer an older-version dplyr syntax:
members %>% mutate_at(vars(to_factors_list), factor)

How to append a Date filtering condition to my existing filtering codes in R?

I need to create a subset of my main data frame (mydata1) in R.
The Date column in mydata1 has already been formatted as a Date using the following codes:
mydata1$Date = as.Date(mydata1$Date)
I have the current codes running to create the subset of my data:
mydata3 <- mydata1 %>%
filter(Total.Extras.Per.GN >= 100) %>%
filter(Original.Meal.Plan.Code %in% target) %>%
filter(Date, between ("2017-01-01"), ("2017-06-01")) %>%
select(PropertyCode, Date, Market, Original.Meal.Plan.Code, GADR, Total.Extras.Per.GN)
However, the line filter(Date, between ("2017-01-01"), ("2017-06-01")) %>% is giving me an error. How do I write it properly so that it filters my Date column with the dates specified therein?
Error message:
Error in filter_impl(.data, dots) :
argument "left" is missing, with no default

Simply place Date inside the between arg and wrap date strings in as.Date() for comparison:
mydata3 <- mydata1 %>%
filter(Total.Extras.Per.GN >= 100) %>%
filter(Original.Meal.Plan.Code %in% target) %>%
filter(between(Date, as.Date("2017-01-01"), as.Date("2017-06-01"))) %>%
select(PropertyCode, Date, Market, Original.Meal.Plan.Code, GADR, Total.Extras.Per.GN)

purrring with NULL listcolumns in R

library(tidyverse)
data(mtcars)
mtcars <- rownames_to_column(mtcars,var = "car")
mtcars$make <- map_chr(mtcars$car,~strsplit(.x," ")[[1]][1])
mt2 <- mtcars %>% select(1:8,make) %>% nest(-make,.key = "l")
mt4<-mt2[1:5,]
mt4[c(1,5),"l"] <- list(list(NULL))
Now, I´d like to run the following function for each make of car:
fun_mt <- function(df){
a <- df %>%
filter(cyl<8) %>%
arrange(mpg) %>%
slice(1) %>%
select(mpg,disp)
return(a)
}
mt4 %>% mutate(newdf=map(l,~possibly(fun_mt(.x),otherwise = "NA"))) %>% unnest(newdf)
However, the NULL columns refuse to evaluate due to
Error: no applicable method for 'filter_' applied to an object of class "NULL"
I also tried using the safely and possibly approach, but still I get an error msg:
Error: Don't know how to convert NULL into a function
Any good solutions to this?

The problem is that NULL gets passed into the function fun_mt(). You wanted to catch this with possibly(). But possibly() is a function operator, i.e. you pass it a function and it returns a function. So, your call should have been
~ possibly(fun_mt, otherwise = "NA"))(.x)
But this doesn't yet work with unnest(). Instead of a character "NA" (a bad idea anyway, rather use a proper NA) you would have to default to a data frame:
~ possibly(fun_mt, otherwise = data.frame(mpg = NA, disp = NA))(.x)

How to use rowwise with do function with if else of ifelse?

I have data frame which contain column, which is list.
data frame contain json reponse as column, and second column is list which is converted from JSON using following code.
vectorize_fromJSON <- Vectorize(fromJSON, USE.NAMES=FALSE)
z <- vectorize_fromJSON(data_df$json_response)
I am using rowwise with do function to extract information for list.
However, I am not able to use if with it.
Working code
t <- data_df %>% rowwise %>% do(
test = class(.$json_list$cbas$dslscc)
)
I want something like as follows:
t <- data_df %>% rowwise %>% do(
test = ifelse(class(.$json_list$cbas$dslscc)=="list", TRUE,
.$json_list$cbas$dslscc)
)
following is error:
Error in
.$json_list$clear_bank_attributes$days_since_last_successful_check_cashed$nil
: $ operator is invalid for atomic vectors

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Write a function in R to change a group of datasets layout - r

Related

Problem with mutate when trying to create a line_id column

How to change variable to factor based on its name in some list by using across?

How to append a Date filtering condition to my existing filtering codes in R?

purrring with NULL listcolumns in R

How to use rowwise with do function with if else of ifelse?

Categories

Resources