I've seen a few articles on how to use if statements or conditionals using piping, but I'm not sure how to apply it to my situation. Along with a specific answer to my problem, I was also hoping for also a more general explanation about adding a if statement with piping so I am able to handle most situations.
I tried to learn to use this answer below (use if() to use select() within a dplyr pipe chain), but I don't understand why we are supplying "." as an argument on the third line below and when I should do so
mtcars %>%
group_by(cyl) %>%
{ if (cond) filter(., am == 1) else . } %>%
summarise(m = mean(wt))
Here's a sample of my data:
df_parse<-
structure(list(value = c("HURESPLI\t2\tLINE NUMBER OF THE RESPONDENT\tCURRENT\t22 - 23",
"FILLER\t2\t\t27 - 28", "HUBUSL1\t2\tENTER LINE NUMBER\t81 - 82",
"GEDIV\t1\tDIVISION\t91 - 91", "GESTFIPS\t2\tFEDERAL INFORMATION\t93 - 94"
), starts_with_position = c(TRUE, TRUE, TRUE, TRUE, TRUE), missing_vars = c("HUFINAL\t FINAL OUTCOME CODE\t 24 - 26",
"HETENURE\t ARE YOUR LIVING QUARTERS... (READ ANSWER CATEGORIES)\t 29 - 30",
"FOR HUBUS = 1 VALID ENTRIES 83 - 84", " 92 - 92", " 95 - 95"
)), row.names = c(NA, 5L), class = "data.frame")
I'm trying to separate out the missing_vars column using extract (tidyr) and gsub as shown below:
df_parse<-
df_parse %>%
mutate(dup_value2 = missing_vars) %>%
extract(col = dup_value2, into = "position2", regex = "(\\d+\\s*-\\s*\\d+)$") %>%
mutate(id2 = gsub(pattern = "\\t.*", replacement = "", x = missing_vars)) %>%
mutate(desc2 = gsub(".*\\\t\\d+\\\t", replacement = "", x = missing_vars)) %>%
mutate(desc2 = gsub("(\\d+\\s*-\\s*\\d+)$", replacement = "", x = missing_vars))
This works fine, but I wanted to add a conditional on the start of this pipe, where df_parse$starts_with_position == TRUE
Something like this? (I know it doesn't work)
df_parse %>% if(starts_with_position==TRUE){
mutate(dup_value2 = missing_vars) %>%
extract(col = dup_value2, into = "position2", regex = "(\\d+\\s*-\\s*\\d+)$") %>%
mutate(id2 = gsub(pattern = "\\t.*", replacement = "", x = missing_vars)) %>%
mutate(desc2 = gsub(".*\\\t\\d+\\\t", replacement = "", x = missing_vars)) %>%
mutate(desc2 = gsub("(\\d+\\s*-\\s*\\d+)$", replacement = "", x = missing_vars))
}else ""
Related
I have a list something like this:
my_data<- list(c(dummy= 300), structure(123.7, .Names = ""),
structure(143, .Names = ""), structure(113.675, .Names = ""),
structure(163.75, .Names = ""), structure(656, .Names = ""),
structure(5642, .Names = ""), structure(1232, .Names = ""))
I want the minimun and maximum values from this list
I have tried using
min(my_data)
max(my_data)
But I get an error: Error in min(weighted_mae) : invalid 'type' (list) of argument
typeof(my_data) #[1] "list"
class(my_data) #[1] "list"
What is the right way for getting the minimum and maximum from my_data?
You could do:
my_data |>
unlist(use.names = FALSE) |>
range()
The following is the same, without piping:
range(unlist(my_data, use.names = FALSE))
If you want to get minimum and maximum values separately, then you could do:
min(unlist(my_data, use.names = FALSE))
max(unlist(my_data, use.names = FALSE))
The whole function which i need to convert the for loop in to apply for optimization
plans_achievements <- function(pa_m,pa_q){
if(nrow(pa_m)==0 & nrow(pa_q==0)){
df = data.frame(a = c(""), b = c("No Data Available"))
colnames(df)=""
}else{
pa_m= pa_m%>% select(inc,month_year,Plans,Achievements,quarter_year)
colnames(pa_mon)[2] = "Period"
pa_q= pa_q%>% select(inc,quarter_year,Plans,Achievements)
colnames(pa_qtr)[2] = "Period"
df = data.frame(inc=c(""),Period=c(""),Plans=c(""),Achievements=c(""))
for (q in unique(pa_q$Period)){
df1 = pa_q[pa_q$Period==q,]
df1$Period = paste0("<span style=\"color:#288D55\">",df1$Period,"</span>")
df1$Plans = paste0("<span style=\"color:#288D55\">",df1$Plans,"</span>")
df1$Achievements = paste0("<span style=\"color:#288D55\">",df1$Achievements,"</span>")
df = rbind(df,df1)
for (m in unique(pa_m$quarter_year)){
if(m==q){
df2 = pa_m[pa_m$quarter_year==q,][-5]
df = rbind(df,df2)
}
}
}
df = df[-1,]
}
return(df)
}
The apply which i tried
my_fun <- function(q){
df1 = pa_qtr[pa_qtr$Period==q,]
df1$Period = paste0("<span style=\"color:#288D55\">",df1$Period,"</span>")
df1$Plans = paste0("<span style=\"color:#288D55\">",df1$Plans,"</span>")
df1$Achievements = paste0("<span style=\"color:#288D55\">",df1$Achievements,"</span>")
df = rbind(df,df1)
}
df = do.call(rbind,lapply(unique(pa_qtr$Period), my_fun))
my_fun2 <- function(m,my_fun){
if (m == q) {
df2 = pa_mon[pa_mon$qtr_yr == q, ][-5]
df = rbind(df,df2)
}
}
df = do.call(cbind,lapply(unique(pa_mon$qtr_yr), my_fun2))
DT::datatable(plans_achievements(pa_m[pa_m$inc=="vate",],pa_q[pa_q$inc=="vate",]), rownames = F,escape = FALSE,selection=list(mode="single",target="row"),options = list(pageLength = 50,scrollX = TRUE,dom = 'tp',ordering=F,columnDefs = list(list(visible=FALSE, targets=c(0)),list(className = 'dt-left', targets = '_all'))))
Why you get the error comparison is possible only for atomic and list types
I will answer your original question first:
You get the error because you haven't defined q as a variable inside the function my_fun2. Since you haven't defined this variable, R will look for it in the global environment. There R will find the function q() (used to quit R). So you get the error message comparison (1) is possible only for atomic and list types because R thinks you are trying to compare a number m with the function q.
Here is a small example to make it easy to see:
# Run this in a clean environment
m <- 1
m == b # Understandable error message - "b" is not found
m == q # Your error - because R thinks you are comparing m to a function
You fix this error by making sure that q is defined inside your function. Either by creating it inside the function, or by supplying it as an input argument.
A possible solution for your problem
As I understand your code, you want to format, merge and sort the values in pa_q and pa_m, to display them in a html table.
Under is a possible solution, using tidyverse and vectorized operations, rather than a loop or apply functions. Vectorized functions are typically your fastest option in R, as I know you want to optimize your code.
library(dplyr)
plans_achievements <- function(pa_m, pa_q) {
# I've modified the logic a bit: there is no need to wrap the full function in
# an else statement, since we can return early if the data has no rows
if (nrow(pa_m) == 0 && nrow(pa_q == 0)) {
df = data.frame(a = c(""), b = c("No Data Available"))
colnames(df) = ""
return(df)
}
pa_q <-
pa_q %>%
# Select and rename the columns vi need
select(inc, Period = quarter_year, Plans, Achievements, date) %>%
# Format the values
mutate(
Period = paste0("<span style=\"color:#288D55\">", Period,"</span>"),
Plans = paste0("<span style=\"color:#288D55\">", Plans,"</span>"),
Achievements = paste0("<span style=\"color:#288D55\">", Achievements,"</span>")
)
pa_m <-
pa_m %>%
# Select and rename the columns we need
select(inc, Period = month_year, Plans, Achievements, date) #%>%
# Combine the datasets
bind_rows(
pa_q,
pa_m
) %>%
# Make sure that R understand date as a date value
mutate(
date = lubridate::dmy(date)
) %>%
# Sort by date
arrange(desc(date)) %>%
# Remove columns we do not need
select(-date, -inc)
}
DT::datatable(
plans_achievements(
pa_m[pa_m$inc=="vate",],
pa_q[pa_q$inc=="vate",]
),
rownames = FALSE,
escape = FALSE,
selection = list(mode = "single", target = "row"),
options = list(
pageLength = 50,
scrollX = TRUE,
dom = 'tp',
ordering = FALSE,
columnDefs = list(
list(className = 'dt-left', targets = '_all')
)
)
)
Hopefully this solves your problem.
I have data frame as follows:
Actual data runs into hundreds of rows and columns
The objective here is to spread "Attribute Value" against each of the column V1, V2,...VN.
That is dates that are appearing in column V1, should get spread into column names
And corresponding "Attribute Value" should appear against each below
df1 <- data.frame(ROW_ID = c("23416","23416","23416"),
Process_ID = c("SLT","SLT","SLT"),
Operation_Code = c("SLT","SLT","SLT"),
Resource_Group_Code = c("BD","BD","BT"),
Location_Code = c("JS","JS","JS"),
Resource_Code = c("B-T234","B-T234","B-T234"),
Resource_Desc = c("699","699","699"),
iDeleteFlag = c("N","N","N"),
Attribute_Code = c("RA002","RA002","RA002"),
Attribute_Value = c("266","269","298"),
Capacity_Type = c("s","s","s"),
Planning_Version = c("PDMT","PDMT","PDMT"),
"V1"= c("2021-10-10", "2021-10-31", "2021-11-07"),
"V2"= c("2021-10-17", "", "2021-11-14"),
"V3" = c("2021-10-24", "", "2021-11-21"),
"V4" = c("", "2021-11-07", ""),
"V5" = c("", "2021-11-21", ""))
The required output is as follows:
df2 <- data.frame(ROW_ID = c("23416","23416","23416"),
Process_ID = c("SLT","SLT","SLT"),
Operation_Code = c("SLT","SLT","SLT"),
Resource_Group_Code = c("BD","BD","BT"),
Location_Code = c("JS","JS","JS"),
Resource_Code = c("B-T234","B-T234","B-T234"),
Resource_Desc = c("699","699","699"),
iDeleteFlag = c("N","N","N"),
Attribute_Code = c("RA002","RA002","RA002"),
Capacity_Type = c("s","s","s"),
Planning_Version = c("PDMT","PDMT","PDMT"),
"2021-10-10"= c("266", "", ""),
"2021-10-17"= c("266", "", ""),
"2021-10-24" = c("266", "", ""),
"2021-10-31" = c("", "269", ""),
"2021-11-07" = c("", "269", "298"),
"2021-11-14" = c("", "", "298"),
"2021-11-21" = c("", "269", "298"))
My code is as follows:
my code not giving required output
RA002variable_2021ANeg <- gather(RA002variable_2021ANeg, key, value, -
ROW_ID, - Process_ID, - Operation_Code, - Resource_Group_Code, -
Location_Code, - Resource_Code, - Resource_Desc, -
iDeleteFlag, - Attribute_Code1, - Capacity_Type, -
Planning_Version, -Attribute_Value) %>%
mutate(key =( Attribute_Value)) %>%
select(- Attribute_Value) %>%
spread(key, value)
Gather and spread have been substituted with pivot_longerand pivot_wider. While gather and spread are still working, it's best we all got used to the new functions.
Since your ROW_ID is not unique for each row, I create additional index column (simply the row number), gather columns V1:V5 and spread dates into column names:
df1 %>%
mutate(index = row_number()) %>%
pivot_longer(V1:V5, names_to = "name", values_to = "value") %>%
select(-name) %>%
filter(value != "") %>%
pivot_wider(names_from = "value", values_from = "Attribute_Value")
I see this is a common issue but I can't understand what to do from reading other posts or trying to understand functional programming which is new to me. Functions are closures in R, encapsulating the environment they were created in? The code I have is:
# Remove numbers from text
minus_TextNum <- function(df, new.df){
new.df <- mutate(df, text = gsub(x = text, pattern = "[0-9]+|\\(.*\\)", replacement = "")) %>% # and/or whatever's in brackets
unnest_tokens(input = text, output = word) %>%
filter(!word %in% c(stop_words$word, "patient")) %>%
group_by(id) %>%
summarise(text = paste(word, collapse = " "))
return(new.df)
}
minus_TextNum(TidySymptoms)
Error is as follows:
Error: Problem with mutate() column text. ℹ text = gsub(x = text, pattern = "[0-9]+|\\(.*\\)", replacement = ""). x cannot coerce
type 'closure' to vector of type 'character'
I don't understand what type closure is, and this is a simple function that works on a simple dataset I created to test. Problem arises when I use the real-world dataset.
Any feedback appreciated. Reproducible sample below:
# Remove numbers and/or anything in brackets
# Test Data
mydata <- data.frame(id = 1:8,
text = c("112773 Nissan Micra, Car, (10 pcs)",
"112774 Nissan Micra, Car, (10 pcs)",
"112775 Nissan Micra, Car, (10 pcs)",
"112776 Volkswagon Beetle, Car, (3 pcs)",
"112777 Toyota Corolla, Car, (12 pcs)",
"112778 Nissan Micra, Car, (10 pcs)",
"112779 Toyota Prius, Car, (9 pcs)",
"112780 Toyota Corolla, Car, (12 pcs)"),
stringsAsFactors = F)
library(dplyr)
library(tidytext)
# remove numbers from text data
data(stop_words)
minus_TextNum <- function(df, new.df){
new.df <- mutate(df, text = gsub(x = text, pattern = "[0-9]+|\\(.*\\)", replacement = "")) %>% # and/or whatevers in brackets
unnest_tokens(input = text, output = word) %>%
filter(!word %in% c(stop_words$word, "car")) %>%
group_by(id) %>%
summarise(text = paste(word, collapse = " "))
return(new.df)
}
minus_TextNum(mydata)
dput(head(TidySymptoms, n = 10))
structure(list(word = c("epiglottis", "swelled", "hinder", "swallowing",
"pictures", "benadryl", "tylenol", "approximately", "30", "min"
)), row.names = c(NA, 10L), class = "data.frame")
TidySymptoms data has no id column in it. Assuming it's a mistake and you have that already in your data you can do the following changes in the function.
There is no need to pass df.new to the function.
The column in TidySymptoms is called as word but you are using text in the function.
Try this code.
minus_TextNum <- function(df){
df.new <- mutate(df, text = gsub(x = word, pattern = "[0-9]+|\\(.*\\)", replacement = "")) %>%
unnest_tokens(input = text, output = word) %>%
filter(!word %in% c(stop_words$word, "patient")) %>%
group_by(id) %>%
summarise(text = paste(word, collapse = " "))
return(new.df)
}
minus_TextNum(TidySymptoms)
get_bike_data <- function(url) {
html_bike_category <- read_html(url)
# Get the names
bike_name_tbl <- html_bike_category %>%
html_nodes(css = ".catalog-category-bikes__title-text") %>%
html_text() %>%
str_remove_all(pattern = "\n") %>%
enframe(name = "position", value = "name")
# Get the prices
bike_price_tbl <- html_bike_category %>%
html_nodes(css = ".catalog-category-bikes__price-title") %>%
html_text() %>%
str_remove_all(pattern = "\\.")%>%
extract_numeric()%>%
enframe(name = "position", value = "price_euro") %>%
left_join(bike_name_tbl)
}
# 2.3.1b Alternative with a for loop
# Create an empty tibble, that we can populate
# Loop through all urls
bike_data_tbl <- bike_data_tbl %>%
rename("model" = "name")%>%
subset(nchar(price_euro)!=0)
bike_data_tbl
this is a data of price and model from a website. I wanted to change the 1.699 to 1,699. Although I tried many other methods(format(decimal.mark=","), parse.number(), sub(), etc.) that I googled, it still does not work.
What is the problem?
Below a possible solution
library(stringr)
text<-c('1231.1','4343.5','312312.0')
str_replace(string = text,pattern = "[.]",replacement = ",")
[1] "1231,1" "4343,5" "312312,0"
another possible solution is:
num_text<-c(1231.1,4343.5,312312.0)
gsub("\\.", ",", num_text)
[1] "1231,1" "4343,5" "312312"
Both gsub and formatshould work
#format
format(bike_data_tbl$price_euro, decimal.mark = ",")
#gsub
gsub(pattern = ".", x = bike_data_tbl$price_euro, replacement = ",", fixed = TRUE)
However, it seems that the prices are in thousands (e.g. 1.699 for ground control model = 1699 euros. You could try this:
as.numeric(gsub(pattern = ".", x = bike_data_tbl$price_euro, replacement = "", fixed = TRUE))
The last function replaces all dots with nothing.