Using dplyr functions on variables named "." - r

Sometimes when generating a data frame from a list, the variable is named "." by default. How can I refer to this variable within dplyr functions, if only to change the variable name to something more appropriate.
# Code that produces my data frame with "." as column name
library(tidyverse)
d <- data.frame(`.` = 1, row.names = "a")
# Now my code fails because `.` is a poor column name for dplyr functions:
d %>% select(model = rownames(.), outlier = `.`)

This isn't actually a problem with the column named . its a problem with referencing the rownames in select() see
d <- data.frame(test = 1, row.names = "a")
d %>% select(model = rownames(.), outlier = test)
still returns Error: Strings must match column names. Unknown columns: a
just use
d <- data.frame(`.` = 1, row.names = "a")
d %>% select(outlier = '.')
will rename the column to outlier

Given
d <- data.frame(`.` = 1, row.names = "a")
Base R Solution
colnames(d) <- 'newname'
Dplyr Solution
d %>% rename(newname = '.')

Related

Recode monetary string values into new variable as numeric

First off - newbie with R so bear with me. I'm trying to recode string values as numeric. My problem is I have two different string patterns present in my values: "M" and "B" for 'million' and 'billion', respectively.
df <- (funds = c($1.76M, $2B, $57M, $9.87B)
I've successfully knocked off the dollar sign and now have:
df <- (funds = c($1.76M, $2B, $57M, $9.87B),
fundsR = c(1.76M, 2B, 57M, 9.87B)
)
How can I recode these as numeric while retaining their respective monetary values? I've tried using various if statements, for loops, with or without str_detect, pipe operators, case_when, mutate, etc. to isolate values with "M" and values with "B", convert to numeric and multiply to come up the complimentary numeric value--all in a new column. This seemingly simple task turned out not as simple as I imagined it would be and I'd attribute it to being a novice. At this point I'd like to start from scratch and see if anyone has any fresh ideas. My Rstudio is a MESS.
Something like this would be nice:
df <- (funds = c($1.76M, $2B, $57M, $9.87B),
fundsR = c(1.76M, 2B, 57M, 9.87B),
fundsFinal = c(1760000, 2000000000, 57000000, 9870000000)
)
I'd really appreciate your input.
You could create a helper function f, and then apply it to the funds column:
library(dplyr)
library(stringr)
f <- function(x) {
curr = c("M"=1e6, "B" = 1e9)
val = str_remove(x,"\\$")
as.numeric(str_remove_all(val,"B|M"))*curr[str_extract(val, "B|M")]
}
df %>% mutate(fundsFinal = f(funds))
Output:
funds fundsFinal
1 $1.76M 1.76e+06
2 $2B 2.00e+09
3 $57M 5.70e+07
4 $9.87B 9.87e+09
Input:
df = structure(list(funds = c("$1.76M", "$2B", "$57M", "$9.87B")), class = "data.frame", row.names = c(NA,
-4L))
This works but I'm sure better solutions exist. Assuming funds is a character vector:
library(tidyverse)
options(scipen = 999)
df <- data.frame(funds = c('$1.76M', '$2B', '$57M', '$9.87B'))
df = df %>%
mutate( fundsFinal = ifelse(str_sub(funds,nchar(funds),-1) =='M',
as.numeric(substr(funds, 2, nchar(funds) - 1))*10^6,
as.numeric(substr(funds, 2, nchar(funds) - 1))*10^9))

How to use the R pipe operator (%>%) in the following cases

1) I have a data frame named df, how can I include an if statement within the mutate function used within the pipe operator? The following does not work:
df %>%
mutate_if(myvar == "A", newColumn = oldColumn*3, newColumn = oldColumn)
The variable myvar is not included in the data frame and is a "flag" variable with values either "A" or "B". When "A", would like to create a new column named "newColumn" in the data frame that is three times the old column (named "oldColumn"), otherwise it is identical to the old column.
2) Would like to divide the column named "numbers" with the entry of numbers which has the minimum value in another column named "seconds", as follows:
df$newCol <- df$numbers / df[df$seconds== min(df$seconds),]$numbers
How can I do that with mutate command and "%>%", so that it looks more handy? Nothing that I tried works unfortunately.
Thanks for any answers,
J.
If myvar is just a variable floating around in the environmnet, you can use an if else statement within mutate (similar question here)
library(dplyr)
# Generate dataset
df <- tibble(oldColumn = rnorm(100))
# Mutate with if-else conditions
df <- df %>% mutate(newColumn = if(myvar == "A") oldColumn else if(myvar=="B") oldColumn * 3)
If myvar is included as a column in the dataframe then you could can use case_when.
# Generate dataset
df <- tibble(myvar = sample(c("A", "B"), 100, replace = TRUE),
oldColumn = rnorm(100))
# Create a new column which depends on the value of myvar
df <- df %>%
mutate(newColumn = case_when(myvar == "A" ~ oldColumn*3,
myvar == "B" ~ oldColumn))
As for question 2, you can use mutate with "." operater which calls the left hand side (i.e. "df") in the right hand side of the function. Then you can filter down to the row with the minimum value of seconds (top_n statement using -1 as argument), and pull out the value for the numbers variable
# Generate data
df <- tibble(numbers = sample(1:60),
seconds = sample(1:60))
# Do computation
df <- df %>% mutate(newCol = numbers / top_n(.,-1,seconds) %>% pull(numbers))

How to RBind First 4 Column one above Other with Tag

Below i have to tried to reproduce in representable Form
`v<- data.frame(C1TEMP = c(3,6,1,8,9,2,2,9,1,23),
C1VIB = c(5,6,1,8,9,2,2,9,1,23),
C1DE = c(9,6,1,8,9,2,2,9,1,23),
C1NDE = c(8,6,1,8,9,2,2,9,1,23),
C2TEMP = c(5,6,1,8,9,2,2,9,1,23),
C2VIB = c(378,6,1,8,9,2,2,9,1,23),
C2DE = c(3,78,1,8,9,2,2,9,1,23),
C2NDE = c(3,6,1,8,9,2,2,9,1,23),
C3TEMP= c(3,6,89,8,9,2,2,9,1,23),
C3VIB = c(3,6,1,98,9,2,2,9,1,23),
C3DE = c(33,56,91,82,99,12,22,19,81,23),
C3NDE = c(13,76,91,88,59,42,22,39,21,23))`
Here i want to rbind Every 4 column one above each Other with the tag No Along. And No of Columns will always be divisible of 4. I here with also Attaching an image for a clear picture what result should be expected.
EXPECTED OUTPUT:
I agree with YCR's comment. Still, this is a way to tackle your problem. Use the following code:
# data frames need column headers, so convert to matrix
v01 <- as.matrix(v[, 1:4])
v02 <- as.matrix(v[, 5:8])
v03 <- as.matrix(v[, 9:12])
# remove columnnames
colnames(v01) <- NULL
colnames(v02) <- NULL
colnames(v03) <- NULL
# now you can use rbind and give the columnnames back
v2 <- rbind( v01, v02, v03)
colnames(v2) <- c("C1TEMP", "C1VIB", "C1DE", "C1NDE")
v2
try this
It is a bit more convoluted than previous answers but it should be more adaptable to other data frames
# how many blocks have you got?
howMany <-table(gsub(names(v),pattern = "[0-9]",replacement = ""))[1]
# make a common name string
NAMES <- unique(gsub(names(v),pattern = "[0-9]",replacement = ""))
# create a list
list() -> V
for(i in 1:howMany){
# get the column with matching index number
v[,grep(names(v),pattern = i)] -> vi
names(vi) <- NAMES# change name
data.frame(Tag=i,vi) -> V[[i]]# put it in the list
}
# combine tables in the list into one list
do.call(rbind,V)
Nils
The melt and reshape way:
It implies to get an identifier per row:
v<- data.frame(C1TEMP = c(3,6,1,8,9,2,2,9,1,23),
C1VIB = c(5,6,1,8,9,2,2,9,1,23),
C1DE = c(9,6,1,8,9,2,2,9,1,23),
C1NDE = c(8,6,1,8,9,2,2,9,1,23),
C2TEMP = c(5,6,1,8,9,2,2,9,1,23),
C2VIB = c(378,6,1,8,9,2,2,9,1,23),
C2DE = c(3,78,1,8,9,2,2,9,1,23),
C2NDE = c(3,6,1,8,9,2,2,9,1,23),
C3TEMP= c(3,6,89,8,9,2,2,9,1,23),
C3VIB = c(3,6,1,98,9,2,2,9,1,23),
C3DE = c(33,56,91,82,99,12,22,19,81,23),
C3NDE = c(13,76,91,88,59,42,22,39,21,23),
id = 1:10
, stringsAsFactors = F)
library(tidyverse)
# melt the dataframe(reshape from wide to long format):
v_melt <- reshape2::melt(v, id.vars = "id")
# modify the aggregation variables
v_melt <- v_melt %>%
mutate(var = substr(as.character(variable), 3, 8),
group_id = paste0(substr(as.character(variable), 1, 2), "_", id))
# reshape the data frame in a wide format:
v_cast <- reshape2::dcast(v_melt, group_id ~ var, value.var = "value")

R renaming passed columns in functions

I have been searching this and have found this link to be helpful with renaming passed columns from a function (the [,column_name] code actually made my_function1 work after I had been searching for a while. Is there a way to use the pipe operator to rename columns in a dataframe within a function?
My attempt is shown in my_function2 but it gives me an Error: All arguments to rename must be named or Error: Unknown variables: col2. I am guessing because I have not specified what col2 belongs to.
Also, is there a way to pass associated arguments into the function, like col1 and new_col1 so that you can associated the column name to be replaced and the column name that is replacing it. Thanks in advance!
library(dplyr)
my_df = data.frame(a = c(1,2,3), b = c(4,5,6), c = c(7,8,9))
my_function1 = function(input_df, col1, new_col1) {
df_new = input_df
df_new[,new_col1] = df_new[,col1]
return(df_new)
}
temp1 = my_function1(my_df, "a", "new_a")
my_function2 = function(input_df, col2, new_col2) {
df_new = input_df %>%
rename(new_col2 = col2)
return(df_new)
}
temp2 = my_function2(my_df, "b", "new_b")
rename_ (alongside other dyplyr verbs suffixed with an underscore) has been depreciated.
Instead, try:
my_function3 = function(input_df, cols, new_cols) {
input_df %>%
rename({{ new_cols }} := {{ cols }})
}
See this vignette for more information about embracing arguments with double braces and programming with dplyr.
Following #MatthewPlourde's answer to a similar question, we can do:
my_function3 = function(input_df, cols, new_cols) {
rename_(input_df, .dots = setNames(cols, new_cols))
}
# example
my_function3(my_df, "b", "new_b")
# a new_b c
# 1 1 4 7
# 2 2 5 8
# 3 3 6 9
Many dplyr functions have less-known variants with names ending in _. that allow you to work with the package more programmatically. One pattern is...
DF %>% dplyr_fun(arg1 = val1, arg2 = val2, ...)
# becomes
DF %>% dplyr_fun_(.dots = list(arg1 = "val1", arg2 = "val2", ...))
This has worked for me in a few cases, where the val* are just column names. There are more complicated patterns and techniques, covered in the document that pops up when you type vignette("nse"), but I do not know them well.

How to add colums to a blank data frame columns by columns in R?

I try to create a data.fame, and then add some columns to this data.frame.
I try following code, but it does not work:
test.dim <- as.data.frame(matrix(nrow=0, ncol=4))
names <- c("A", "B", "C", "D")
colnames(test.dim) <- names
for (i in 1:4) {
name = names[i]
# do some calculations, at last get another data.fame named x.data
mean.data <- apply(x.data, 1, mean)
test.dim[, name] <- mean.data
}
Usually one would already have a data.frame (call it df) and simply add frames by calling df$newColName = values or df[,newColNames] = frame_of_values.
Your question indicates that you are separating the creation of your values from putting them in the data frame (which I do not recommend). But if you really want to start from a zero row zero col frame here are some options:
colnamesToAdd = LETTERS[1:4]
test.dim = data.frame( matrix(rep(NA),length(colnamesToAdd),nrow=1) )
colnames(test.dim) = colnamesToAdd
test.dim = test.dim[-1,]
Another option:
colnamesToAdd = LETTERS[1:4]
test.dim = data.frame("USELESS" = NA)
test.dim[,colnamesToAdd] = NA
test.dim = test.dim[-1,-1]
If you are looking to add a mean to your table and repeat it for every factor:
library(data.table);
test.dim = data.table("FACTOR" = sample(letters[1:4],100,replace=TRUE), "VALUE" = runif(100), "MEAN" = NA)
means = test.dim[,list(AVG=mean(VALUE)),by="FACTOR"]
# without data.table: by(test.dim$VALUE, test.dim$FACTOR, mean)
for(x in 1:nrow(means)) { test.dim$MEAN[test.dim$FACTOR==means$FACTOR[x]] = means$AVG[x] } # normally I would use the foreach package instead of this last for loop

Resources