I'm trying to subset columns in my dataframe to use them solely in a mutate function as part of conditional formating for an HTML Table with knitr::kable and kableExtra.
#Conditional Formating function
highlights <- function(x) { cell_spec(x, background = ifelse( x != NA, "#C9FFE5","white")) }
#build table
ds.tab <- ds%>%
mutate_if("column contains ANY NA values", funs(highlights(.)))%>% ...
I need to write the bit between brackets ("column contains ANY NA values") in R.
Thanks!
It should work if you use any(is.na(.)) such as the following:
ds.tab <- ds %>%
mutate_if(function(x) any(is.na(x)), funs(highlights(.))) %>% ...
Or if you prefer, the following syntax works the same way
ds.tab <- ds %>%
mutate_if(~any(is.na(.)), funs(highlights(.))) %>% ...
Related
I have a dataset with columns that contain information of a code + name, which I would like to separate into 2 columns. So, just an example:
Column E5000_A contain values like `0080002. ALB - Democratic Party' in one cell, I would like two columns one containing the code 0080002, and the other containing the other info.
I have 8 more columns with values very similar (E5000_A until E5000_H). This is the code that I am writing.
cols2 <- c("E5000_A" , "E5000_B" , "E5000_C" , "E5000_D" ,
"E5000_E" , "E5000_F" , "E5000_G" , "E5000_H" )
for(i in cols2){
cses_imd_m <- cses_imd_m %>% mutate(substr(i, 1L, 7L))
}
But for some reason it is only generating a new column for the E5000_A and the loop does not go to the other variables. What am I doing wrong? Let me know if you need more details about the code or data frame.
data.frame approach
# to extract codes
df %>%
mutate_at(.vars = vars(c("E5000_A", "E5000_B", "E5000_C", "E5000_D", "E5000_E",
"E5000_F", "E5000_G", "E5000_H")),
.funs = function(x) str_extract("^\\d+", x))
You can also use across() inside of mutate().
If you want to use for loop
col_names <- c("E5000_A", "E5000_B", "E5000_C", "E5000_D", "E5000_E", "E5000_F", "E5000_G", "E5000_H")
for (i in col_names) {
df[,sprintf("code_%s", i)] <- str_extract("^\\d+", df[,i])
df[,sprintf("party_%s", i)] <- gsub(".*\\.", "", df[,i]) %>% str_trim() # remove all before dot (.)
}
I am trying to figure out how to write a function in R than can select specific columns from a dataframe(df) for subsetting:
Essentially I have df with columns or colnames : count_A.x, count_B.x, count_C.x, count_A.y, count_B.y, count_C.y.
I would ideally like a function where I can select both "count_A.x" and "count_A.y" columns by simply specifying count_A in function argument.
I tried the following:
e.g. pull_columns2 <- function(df,count_char){
df_subset<- df%>% select(,c(count_char.x, count_char.y))
}
Unfortunately when I run the above code [i.e., pull_columns2(df, count_A)] the following code it rightfully says that column count_char.x does not exist and does not "convert" count_char.x to count_A
pull_columns2(df, count_A)
We can use
pull_columns2 <- function(df,count_char){
df_subset<- df %>% select(contains(count_char))
df_subset
}
#> then use it as follows
df %>% pull_columns2("count_A")
Try
select_func = function(df, pattern){
return(df[colnames(df)[which(grepl(pattern, colnames(df)))]])
}
df = data.frame("aaa" = 1:10, "aab" = 1:10, "bb" = 1:10, "ca" = 1:10)
select_func(df,"b")
I am performing Data Analysis and cleaning in R using tidyverse.
I have a Data Frame with 23 columns containing values 'NO','STEADY','UP' and 'down'.
I want to change all the values in these 23 columns to 0 in case of 'NO','STEADY' and 1 in other case.
What i did is, i created a list by name keys in which i have kept all my columns, After that i am using for loop, ifelse statements and mutate.
Please have a look at the code below
# Column names are kept in the list by name keys
keys = c('metformin', 'repaglinide', 'nateglinide', 'chlorpropamide', 'glimepiride',
'glipizide', 'glyburide', 'pioglitazone', 'rosiglitazone', 'acarbose', 'miglitol',
'insulin', 'glyburide-metformin', 'tolazamide', 'metformin-pioglitazone',
'metformin-rosiglitazone', 'glimepiride-pioglitazone', 'glipizide-metformin',
'troglitazone', 'tolbutamide', 'acetohexamide')
After that, i used following code to get the desired result :
for (col in keys){
Dataset = Dataset %>%
mutate(col = ifelse(col %in% c('No','Steady'),0,1)) }
I was expecting that, it will do the changes that i require, but nothing happens after this. (NO ERROR MESSAGE AND NO DESIRED RESULT)
After that, i researched further and executed following code
for (col in keys){
print(col)}
It gives me elements of list as characters like - "metformin"
So, i thought - may be this is the issue. Hence, i used the below code to caste the keys as symbols :
keys_new = sym(keys)
After that i again ran the same code:
for (col in keys_new){
Dataset = Dataset %>%
mutate(col = ifelse(col %in% c('No','Steady'),0,1))}
It gives me following Error -
Error in match(x, table, nomatch = 0L) :
'match' requires vector arguments
After all this. I also tried to create a function to get the desired results, but that too didn't worked:
change = function(name){
Dataset = Dataset %>%
mutate(name = ifelse(name %in% c('No','Steady'),0,1),
name = as.factor(name))
return(Dataset)}
for (col in keys){
change(col)}
This didn't perform any action. (NO ERROR MESSAGE AND NO DESIRED RESULT)
When keys_new is placed in this code:
for (col in keys_new){
change(col)}
I got the same Error :
Error in match(x, table, nomatch = 0L) :
'match' requires vector arguments
PLEASE GUIDE
There's no need to loop or keep track of column names. You can use mutate_all -
Dataset %>%
mutate_all(~ifelse(. %in% c('No','Steady'), 0, 1))
Another way, thanks to Rui Barradas -
Dataset %>%
mutate_all(~as.integer(!. %in% c('No','Steady')))
There's a simpler way using mutate_at and case_when.
Dataset %>% mutate_at(keys, ~case_when(. %in% c("NO", "STEADY") ~ 0, TRUE ~ 1))
mutate_at will only mutate the columns specified in the keys variable. case_when then lets you replace one value by another by some condition.
This answer for using mutate through forloop.
I don't have your data, so i tried to make my own data, i changed the keys into a tibble using enframe then spread it into columns and used the row number as a value for each column, then check if the value is higher than 10 or not.
To use the column name in mutate you have to use !! and := in the mutate function
df <- enframe(c('metformin', 'repaglinide', 'nateglinide', 'chlorpropamide', 'glimepiride',
'glipizide', 'glyburide', 'pioglitazone', 'rosiglitazone', 'acarbose', 'miglitol',
'insulin', 'glyburide-metformin', 'tolazamide', 'metformin-pioglitazone',
'metformin-rosiglitazone', 'glimepiride-pioglitazone', 'glipizide-metformin',
'troglitazone', 'tolbutamide', 'acetohexamide')
) %>% spread(key = value,value = name)
keys = c('metformin', 'repaglinide', 'nateglinide', 'chlorpropamide', 'glimepiride',
'glipizide', 'glyburide', 'pioglitazone', 'rosiglitazone', 'acarbose', 'miglitol',
'insulin', 'glyburide-metformin', 'tolazamide', 'metformin-pioglitazone',
'metformin-rosiglitazone', 'glimepiride-pioglitazone', 'glipizide-metformin',
'troglitazone', 'tolbutamide', 'acetohexamide')
for (col in keys){
df = df %>%
mutate(!!as.character(col) := ifelse( df[col] > 10,0,100) )
}
I am using R version 3.5.2.
I would like to evaluate a string in a kable function, but I am having some issues. Normally, I can pass a string through a for loop using the get function but in the kableExtra::add_header_above function I get the following error:
Error: unexpected '=' in:"print(kable(df4,"html", col.names = c("zero","one")) %>% add_header_above(c(get("string") ="
I have tried a handful of techniques like creating a string outside of the kable function and calling it, using page breaks and print statements in the knit loop and trying the eval function as well. I have also added result ="asis" as suggested here
Here is a reproducible example:
```{r results="asis"}
library("knitr")
library("kableExtra")
df1 <- mtcars %>% dplyr::select(am,vs)
df1a <- df1 %>% mutate(type = "A")
df1b <- df1 %>% mutate(type = "B")
df1c <- df1 %>% mutate(type = "C")
df2 <- rbind(df1a,df1b,df1c)
vector <- as.vector(unique(df2$type))
for (variable in vector) {
df3 <- df2 %>% filter(type == (variable))
df4 <- table(df3$am,df3$vs)
print(kable(df4,"html", col.names = c("zero","one")) %>%
add_header_above(c(get("string") = 3)))
}
```
Ideally, I would like the header of the table to have the string name from the column type. Below is an example of what I want it to look it:
print(kable(df4,"html", col.names = c("zero","one")) %>%
add_header_above(c("A" = 3)))
I understand that the knitr function needs to be treated differently than regular R when using loops as found in this solution but I am still struggling to get the string to be evaluated correctly. Perhaps because the function requires a vecotr input, it is not evalauting it as a string?
You have to define your header as a vector. The name of the header should be the names of the vector and the value of the vector would be the number of columns the header will use.
The loop in the code should look like this:
for (variable in vector) {
df3 <- df2 %>% filter(type == (variable))
df4 <- table(df3$am,df3$vs)
header_temp = 3
names(header_temp) = get("variable")
print(kable(df4,"html", col.names = c("zero","one")) %>%
add_header_above(header_temp))
}
So first I define the number of columns the of the header in the variable header_temp and then i assign a name to it.
I'm new to R and have a problem
I am trying to reformat some data, and in the process I would like to rename the columns of the new data set.
here is how I have tried to do this:
first the .csv file is read in, lets say case1_case2.csv
then the name of the .csv file is broken up into two parts
each part is assigned to a vector
so it ends up being like this:
xName=case1
yName=case2
After I have put my data into new columns I would like to rename each column to be case1 and case2
to do this I tried using the rename function in R but instead of renaming to case1 and case2 the columns get renamed to xName and yName.
here is my code:
for ( n in 1:length(dirNames) ){
inFile <- read.csv(dirNames[n], header=TRUE, fileEncoding="UTF-8-BOM")
xName <- sub("_.*","",dirNames[n])
yName <- sub(".*[_]([^.]+)[.].*", "\\1", dirNames[n])
xValues <- inFile %>% select(which(str_detect(names(inFile), xName))) %>% stack() %>% rename( xName = values ) %>% subset( select = xName)
yValues <- inFile %>% select(which(!str_detect(names(inFile), xName))) %>% stack() %>% rename(yName = values, Organisms=ind)
finalForm <- cbind(xValues, yValues) %>% filter(complete.cases(.))
}
how can I make sure that the variables xName and yName are expanded inside of the rename() function
thanks.
You didn't provide a reproducible example, so I'll just demonstrate the idea in general. The rename function is part of the dplyr package.
You need to "unquote" the variable that contains the string you want to use as the new column name. The unquote operator is !! and you'll need to use the special := assignment operator to make unquoting on the left hand side allowed.
library(tidyverse)
df <- data_frame(x = 1:3)
y <- "Foo"
df %>% rename(y=x) # Not what you want - need to unquote y
df %>% rename(!!y = x) # Gives error - need to use :=
df %>% rename(!!y := x) # Correct