R dplyr: mutate a column for specific row range - r

I am trying to modify the values of a column for rows in a specific range. This is my data:
df = data.frame(names = c("george","michael","lena","tony"))
and I want to do the following using dplyr:
df[2:3,] = "elsa"
My attempt at it is the following, but it doesn't seem to work:
df = cbind(df, rows = as.integer(rownames(df)))
dplyr::mutate(df, ifelse(rows %in% c(2,3), names = "elsa" , names = names))
which gives the result:
Error: unused arguments (names = "elsa", names = c(1, 3, 2, 4))
Thanks for any advice.

This question is a little vague, but I think OP is trying to just replace certain values in a data frame using indexing. As the comment above noted the example dataframe's column is comprised of a factor variable, which makes replacing the value behave differently than you might expect. There are two ways to get around this.
The first (more verbose) way is to force df$names to be a character variable instead of a factor. Then using indexing to select the value you'd like to change and replace it:
df$names = as.character(df$names)
df$names[c(2,3)] = "elsa"
Alternatively, you can set stringsAsFactors = TRUE and proceed as above.
df = data.frame(names = c("george","michael","lena","tony"), stringsAsFactors = FALSE)
df$names[c(2:3)] = "elsa"
names
1 george
2 elsa
3 elsa
4 tony
Definitely check out ?data.frame to get a fuller explanation.

The factor answers are faster, but you can do it with dplyr like this (notice that the column must be of type character and not factor):
df <- data.frame(names = c("george","michael","lena","tony"), stringsAsFactors=F)
oldnames <- c("michael", "lena")
df <- mutate(df, names=ifelse(names %in% oldnames, "elsa", names))
Another way is to do something like
oldnames <- c("michael", "lena")
df$names[df$names %in% oldnames] <- "elsa"

Convert names to a character vector explicitly and use replace:
df %>% mutate(names = replace(as.character(names), 2:3, "elsa"))
Note: If names were already a character vector we could have done just:
df %>% mutate(names = replace(names, 2:3, "elsa"))

We can do this using data.table. Convert the 'data.frame' to 'data.table' (setDT(df)), specify the row index as i and assign (:=) 'elisa' to the 'names'. As the OP mentioned about large dataset, using the := from data.table will be extremely fast.
library(data.table)
setDT(df)[2:3, names := 'elisa']
df
# names
#1: george
#2: elisa
#3: elisa
#4: tony

Related

Rename all other levels to "Other"

I have a dataframe containing all the calls that I have done in the last year. Under the column "Name" there are the names of the people in my contact list. In R this column contains 30 factors, I want to have only 3 factors: Mom, Dad, BestFriend and Others.
I'm using this snippet:
library(plyr)
call$Name <- mapvalues(call$Name, from = 'Mikey Mouse', to = 'BFF')
call$Name <- mapvalues(call$Name, from = c('Rocky Balboa','Uma Thurman'), to = c('Dad','Mom'))
How can I rename all other levels aside those 3 to Other?
We can first create a level 'Others' (assuming it is a factor), assign the levels that are not %in% the vector of levels ('nm1') to 'Other'
levels(call$Name) <- c(levels(call$Name), 'Other'))
levels(call$Name)[!levels(call$Name %in% nm1] <- 'Other'
Or another option is recode from dplyr which also have the .default option to specify other levels that are not in the vector to a given value
library(dplyr)
recode(call$Name, `Mikey Mouse` = 'BFF', `Rocky Balboa` = 'Dad',
`Uma Thurman` = 'Mom', .default = 'Other')
data
set.seed(24)
call <- data.frame(Name = sample(c('Mikey Mouse', 'Rocky Balboa',
'Uma Thurman', 'Richard Gere', 'Rick Perry'), 25, replace = TRUE))
nm1 <- c('Mickey Mouse', 'Rocky Balboa', 'Uma Thurman')
There is also the fct_other() function in the forcats package for doing exactly this. Using the data akrun provided we could simply do:
library(forcats)
call$Name <- fct_other(call$Name, keep = nm1)

removing columns from a data frame which feature in a list, but don't feature in another list

Say, my variable are as follows.
df = read.csv('somedataset.csv') #contains 'col1','col2','col3','col4','col5' say
colsSomeRemoveSomeDontRemove = c('col1','col2','col3')
colsDontRemove = 'col2'
I would like to remove all those columns from df which feature in colsSomeRemoveSomeDontRemove, but are not part of colsDontRemove.
So basically, at the end my df should contain only columns 'col2','col4','col5'
How can I do that?
I have tried doing the following, but could not get it to work
df1 = cbind(df[,which(!(names(df) %in% colsSomeRemoveSomeDontRemove))],as.data.frame(df[,colsDontRemove]))
df[, !(colnames(df) %in% setdiff(colsSomeRemoveSomeDontRemove, colsDontRemove))]

Using dplyr, Remove all strings from a data frame

I have a data frame with 300 columns which has a string variable somewhere which I am trying to remove. I have found this solution in stack overflow using lapply (see below), which is what I want to do, but using the dplyr package. I have tried using the mutate_each function but cant seem to make it work
"If your data frame (df) is really all integers except for NAs and garbage then then the following converts it.
df2 <- data.frame(lapply(df, function(x) as.numeric(as.character(x))))
You'll have a warning about NAs introduced by coercion but that's just all those non numeric character strings turning into NAs.
dplyr 0.5 now includes a select_if() function.
For example:
person <- c("jim", "john", "harry")
df <- data.frame(matrix(c(1:9,NA,11,12), nrow=3), person)
library(dplyr)
df %>% select_if(is.numeric)
# X1 X2 X3 X4
#1 1 4 7 NA
#2 2 5 8 11
#3 3 6 9 12
Of course you could add further conditions if necessary.
If you want to use this line of code:
df2 <- data.frame(lapply(df, function(x) as.numeric(as.character(x))))
with dplyr (by which I assume you mean "using pipes") the easiest would be
df2 = df %>% lapply(function(x) as.numeric(as.character(x))) %>%
as.data.frame
To "translate" this into the mutate_each idiom:
mutate_each(df, funs(as.numeric(as.character(.)))
This function will, of course, convert all columns to character, then to numeric. To improve efficiency, don't bother doing two conversions on columns that are already numeric:
mutate_each(df, funs({
if (is.numeric(.)) return(.)
as.numeric(as.character(.))
}))
Data for testing:
df = data.frame(v1 = 1:10, v2 = factor(11:20))
mutate_all works here, and simply wrap the gsub in a function. (I also assume you aren't necessarily string hunting, so much as trawling for non-integers.
StrScrub <- function(x) {
as.integer(gsub("^\\D+$",NA, x))
}
ScrubbedDF <- mutate_all(data, funs(StrScrub))
Example dataframe:
library(dplyr)
options(stringsAsFactors = F)
data = data.frame("A" = c(2:5),"B" = c(5,"gr",3:2), "C" = c("h", 9, "j", "1"))
with reference/help from Tony Ladson

adding values in certain columns of a data frame in R

I'm new to R. In a data frame, I wanted to create a new column #21 that is equal to the sum of column #1 to #20,row by row.
I knew I could do
df$Col21<-df$Col1+df$Col2+.....+df$Col20
But is there a more concise expression?
Also, can I achieve this if using column names not numbers? Thanks!
There is rowSums:
df$Col21 = rowSums(df[,1:20])
should do the trick, and with names:
df$Col21 = rowSums(df[,paste("Col", 1:20, sep="")])
With leading zeros and 3 digits, try:
df$Col21 = rowSums(df[,sprintf("Col%03d", 1:20, sep="")])
I find the dplyr functions for column selection very intuitive, like starts_with(), ends_with(), contains(), matches() and num_range():
df <- as.data.frame(replicate(20, runif(10)))
names(df) <- paste0("Col", 1:20)
library(dplyr)
# e.g.
summarise_each(df, funs(sum), starts_with("Col"))
# or
rowSums(select(df, contains("8")))

R wildcards, sapply and as.factor

I want to change the type to factor of all variables in a data frame whose names match a certain pattern.
So here I am trying to change the type to factor of all variables whose name begins with namestub in the dataframe df.
attach(df)
sapply(grep(glob2rx("namestub*"), names(df)), as.factor)
But this doesn't work since
> levels(df$namestub1)
NULL
## Make a reproducible example
df <- data.frame(namestubA = letters[1:5], B = letters[5:1],
namestubC = LETTERS[1:5], stringsAsFactors=FALSE)
## Get indices of columns to convert
ii <- grep(glob2rx("namestub*"), names(df))
## Convert and replace the indicated columns
df[ii] <- lapply(df[ii], as.factor)

Resources