R wildcards, sapply and as.factor

R wildcards, sapply and as.factor - r

I want to change the type to factor of all variables in a data frame whose names match a certain pattern.
So here I am trying to change the type to factor of all variables whose name begins with namestub in the dataframe df.
attach(df)
sapply(grep(glob2rx("namestub*"), names(df)), as.factor)
But this doesn't work since
> levels(df$namestub1)
NULL

## Make a reproducible example
df <- data.frame(namestubA = letters[1:5], B = letters[5:1],
namestubC = LETTERS[1:5], stringsAsFactors=FALSE)
## Get indices of columns to convert
ii <- grep(glob2rx("namestub*"), names(df))
## Convert and replace the indicated columns
df[ii] <- lapply(df[ii], as.factor)

Related

How can lapply work with addressing columns as unknown variables?

So, I have a list of strings named control_for. I have a data frame sampleTable with some of the columns named as strings from control_for list. And I have a third object dge_obj (DGElist object) where I want to append those columns. What I wanted to do - use lapply to loop through control_for list, and for each string, find a column in sampleTable with the same name, and then add that column (as a factor) to a DGElist object. For example, for doing it manually with just one string, it looks like this, and it works:
group <- as.factor(sampleTable[,3])
dge_obj$samples$group <- group
And I tried something like this:
lapply(control_for, function(x) {
x <- as.factor(sampleTable[, x])
dge_obj$samples$x <- x
}
Which doesn't work. I guess the problem is that R can't recognize addressing columns like this. Can someone help?

Here are two base R ways of doing it. The data set is the example of help("DGEList") and a mock up data.frame sampleTable.
Define a vector common_vars of the table's names in control_for. Then create the new columns.
library(edgeR)
sampleTable <- data.frame(a = 1:4, b = 5:8, no = letters[21:24])
control_for <- c("a", "b")
common_vars <- intersect(control_for, names(sampleTable))
1. for loop
for(x in common_vars){
y <- sampleTable[[x]]
dge_obj$samples[[x]] <- factor(y)
}
2. *apply loop.
tmp <- sapply(sampleTable[common_vars], factor)
dge_obj$samples <- cbind(dge_obj$samples, tmp)
This code can be rewritten as a one-liner.
Data
set.seed(2021)
y <- matrix(rnbinom(10000,mu=5,size=2),ncol=4)
dge_obj <- DGEList(counts=y, group=rep(1:2,each=2))

How do you replace an entire column in one dataframe with another column in another dataframe?

I have two dataframes. I want to replace the ids in dataframe1 with generic ids. In dataframe2 I have mapped the ids from dataframe1 with the generic ids.
Do I have to merge the two dataframes and after it is merged do I delete the column I don't want?
Thanks.

With dplyr
library(dplyr)
left_join(df1, df2, by = 'ids')

We can use merge and then delete the ids.
dataframe1 <- data.frame(ids = 1001:1010, variable = runif(min=100,max = 500,n=10))
dataframe2 <- data.frame(ids = 1001:1010, generics = 1:10)
result <- merge(dataframe1,dataframe2,by="ids")[,-1]
Alternatively we can use match and replace by assignment.
dataframe1$ids <- dataframe2$generics[match(dataframe1$ids,dataframe2$ids)]

Subsetting data frames isn't very difficult in R: hope this helps, you didn't provide much code so I hope this will be of help to you:
#create 4 random columns (vectors) of data, and merge them into data frames:
a <- rnorm(n=100,mean = 0,sd=1)
b <- rnorm(n=100,mean = 0,sd=1)
c <- rnorm(n=100,mean = 0,sd=1)
d<- rnorm(n=100,mean = 0,sd=1)
df_ab <- as.data.frame(cbind(a,b))
df_cd <- as.data.frame(cbind(c,d))
#if you want column d in df_cd to equal column a in df_ab simply use the assignment operator
df_cd$d <- df_ab$a
#you can also use the subsetting with square brackets:
df_cd[,"d"] <- df_ab[,"a"]

Add different suffix to column names on multiple data frames in R

I'm trying to add different suffixes to my data frames so that I can distinguish them after I've merge them. I have my data frames in a list and created a vector for the suffixes but so far I have not been successful.
data2016 is the list containing my 7 data frames
new_names <- c("june2016", "july2016", "aug2016", "sep2016", "oct2016", "nov2016", "dec2016")
data2016v2 <- lapply(data2016, paste(colnames(data2016)), new_names)

Your query is not quite clear. Therefore two solutions.
The beginning is the same for either solution. Suppose you have these four dataframes:
df1x <- data.frame(v1 = rnorm(50),
v2 = runif(50))
df2x <- data.frame(v3 = rnorm(60),
v4 = runif(60))
df3x <- data.frame(v1 = rnorm(50),
v2 = runif(50))
df4x <- data.frame(v3 = rnorm(60),
v4 = runif(60))
Suppose further you assemble them in a list, something akin to your data2016using mgetand ls and describing a pattern to match them:
my_list <- mget(ls(pattern = "^df\\d+x$"))
The names of the dataframes in this list are the following:
names(my_list)
[1] "df1x" "df2x" "df3x" "df4x"
Solution 1:
Suppose you want to change the names of the dataframes thus:
new_names <- c("june2016", "july2016","aug2016", "sep2016")
Then you can simply assign new_namesto names(my_list):
names(my_list) <- new_names
And the result is:
names(my_list)
[1] "june2016" "july2016" "aug2016" "sep2016"
Solution 2:
You want to add the new_names literally as suffixes to the 'old' names, in which case you would use pasteor paste0 thus:
names(my_list) <- paste0(names(my_list), "_", new_names)
And the result is:
names(my_list)
[1] "df1x_june2016" "df2x_july2016" "df3x_aug2016" "df4x_sep2016"

You could use an index number within lapply to reference both the list and your vector of suffixes. Because there are a couple steps, I'll wrap the process in a function(). (Called an anonymous function because we aren't assigning a name to it.)
data2016v2 <- lapply(1:7, function(i) {
this_data <- data2016[[i]] # Double brackets for a list
names(this_data) <- paste0(names(this_data), new_names[i]) # Single bracket for vector
this_data # The renamed data frame to be placed into data2016v2
})
Notice in the paste0() line we are recycling the term in new_names[i], so for example if new_names[i] is "june2016" and your first data.frame has columns "A", "B", and "C" then it would give you this:
> paste0(c("A", "B", "C"), "june2016")
[1] "Ajune2016" "Bjune2016" "Cjune2016"
(You may want to add an underscore in there?)
As an aside, it sounds like you might be better served by adding the "june2016" as a column in your data (like say a variable named month with "june2016" as the value in each row) and combining your data using something like bind_rows() from the dplyr package, running it "long" instead of "wide".

Rename all other levels to "Other"

I have a dataframe containing all the calls that I have done in the last year. Under the column "Name" there are the names of the people in my contact list. In R this column contains 30 factors, I want to have only 3 factors: Mom, Dad, BestFriend and Others.
I'm using this snippet:
library(plyr)
call$Name <- mapvalues(call$Name, from = 'Mikey Mouse', to = 'BFF')
call$Name <- mapvalues(call$Name, from = c('Rocky Balboa','Uma Thurman'), to = c('Dad','Mom'))
How can I rename all other levels aside those 3 to Other?

We can first create a level 'Others' (assuming it is a factor), assign the levels that are not %in% the vector of levels ('nm1') to 'Other'
levels(call$Name) <- c(levels(call$Name), 'Other'))
levels(call$Name)[!levels(call$Name %in% nm1] <- 'Other'
Or another option is recode from dplyr which also have the .default option to specify other levels that are not in the vector to a given value
library(dplyr)
recode(call$Name, `Mikey Mouse` = 'BFF', `Rocky Balboa` = 'Dad',
`Uma Thurman` = 'Mom', .default = 'Other')
data
set.seed(24)
call <- data.frame(Name = sample(c('Mikey Mouse', 'Rocky Balboa',
'Uma Thurman', 'Richard Gere', 'Rick Perry'), 25, replace = TRUE))
nm1 <- c('Mickey Mouse', 'Rocky Balboa', 'Uma Thurman')

There is also the fct_other() function in the forcats package for doing exactly this. Using the data akrun provided we could simply do:
library(forcats)
call$Name <- fct_other(call$Name, keep = nm1)

Store output of sapply into a data frame?

how can I store the output of sapply() to a dataframe where the index value is stored in first column and its value in corresponding 2nd column. For illustration, I have shown only 2 elements here, but there are 110 columns in my data. "loan" is the data frame.
cols <- sapply(loan,function(x) sum(is.na(x)))
cols
id
0
member_id
7
I want output as:
var value
id 0
member_id 7
I know that sapply() returns a vector, but when I print the vector, values are printed along with its some "index" e.g., column name if applied on a data frame. So, now when I want to store it as a data frame with two columns where 1st column contains the index part and the second column contains the value, how can I do it?

I found an answer to my question. For those who actually did understand my problem, this answer might make sense:
cols <- data.frame(sapply(loan ,function(x) sum(is.na(x))))
cols <- cbind(variable = row.names(cols), cols)
I wanted the row.names to be in a column of the same data frame corresponding to the values obtained from sapply.

We can use stack
stack(mylist)[2:1]
data
mylist <- list(df = 1, rf = 2)

Is this what you want?
Your original list:
L <- c("df",1,"rf",2)
L
[1] "df" "1" "rf" "2"
As a data frame:
N <- length(L)
df <- data.frame( var = L[seq(1,N,2)], value = L[seq(2,N,2)] )
df
var value
1 df 1
2 rf 2

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R wildcards, sapply and as.factor - r

Related

How can lapply work with addressing columns as unknown variables?

How do you replace an entire column in one dataframe with another column in another dataframe?

Add different suffix to column names on multiple data frames in R

Rename all other levels to "Other"

Store output of sapply into a data frame?

Categories

Resources