I need to recode variable (column) values in a dataframe. The following snippet replaces my values with what looks like array indexes instead of the categorial values:
CMlist <- c("CMdysphagiascreen","CMStrokeUnit","CMVTE","CMantithromd2")
for (i in CMlist) {
RHSSP[[i]] <- ifelse(RHSSP[[i]] == "NDOC", "Y", RHSSP[[i]])
RHSSP[[i]] <- ifelse(RHSSP[[i]] == "U", "N", RHSSP[[i]])
RHSSP[[i]] <- ifelse(is.NULL(RHSSP[[i]]), "N", RHSSP[[i]])
}
No doubt there's a better method for doing this. Can someone explain what's wrong with my attempt and maybe a better way of going about it?
Related
I'm wanting to rescale multiple columns of a dataframe between a specific range (1 and 20) in R/R Studio. While I can get it to work for a single column, I cant seem tog et it wo work for multiple. The real data contains many columns, so some sort of indexing would be ideal if possible. I'm sure it's probably something simple, but cant seem to figure out what I'm missing. Any help would be appreciated. Thanks
# This works on a single column
library(scales)
single = c(100,90,80,70,60,50,40,30,20,10)
rescale(single , to=c(1,20))
# This does not work
library(scales)
multiple = data.frame(V0 = c("A","B","C","D","E","F", "G", "H", "I", "J"),
V1= c(1,2,3,4,5,6,7,8,9,10),
V2= c(100,90,80,70,60,50,40,30,20,10)
)
rescale(multiple[,c(2,3)], to=c(1,20))
You are looking for:
multiple[,c(2,3)] <-lapply(multiple[,c(2,3)], rescale, to=c(1,20))
I am trying to subset my dataframe, but when I do some of the factors are not being brought in and left behind.
When I try this code it gives me a dataframe that has 2048 obs, but then when I try the next set of code I still have COW, Negative Control, and Positive Control in the subset.
Controls_data <- subset(data_all, SampleID == c('COW', 'Negative Control', 'Positive Control'))
Sample_data <- subset(data_all, SampleID != c("COW", "Negative Control", "Positive Control"))
I should have 6,144 in the Controls_data. I double checked this in excel because I thought that maybe they were spelled differently or had spaces.
As #arg0naut and #Gregor both writes and suggests. Your problem is that == uses R's standard reuse rules and then does pairwise comparison. So that is not what you want to do.
Compare the outputs from the following lines of codes.:
letters == c("c", "e")
letters %in% c("c", "e")
letters == c("c", "e", "d")
Notice the warning the last case. In your case, the left hand side happens to be a multiple of the right and you are not warned.
You could also use the match function in your case:
match(c("c", "e", "d"), letters)
I'm a beginner to "R", so this might embarrass me, but nonetheless:
How do I add a column to a dataframe? Here is my attempt at adding a normally distributed dataset as a column to an empty dataframe:
e = rnorm(1000)
mydf <- data.frame()
mydf[["e"]] <- e
and it gives the error:
Error in [[<-.data.frame(*tmp*, "e", value = c(-1.09398526454771,
: replacement has 1000 rows, data has 0
This is the resource I used for this: here. I even tried converting to a vector using mydf[["e"]] <- as.vector(e). But this still fails. Help? thank you.
I have a rather large data frame with a factor that has a lot of levels (more than 4,000). I have another column in the same data frame that I'm using as a reference, and what I'd like to find is a subset of the levels whenever this reference column is NA.
The first step I'm using is subsetrows <- which(is.na(mydata$reference)) but after that I'm stuck. I want something like levels(mydata[subsetrows,mydata$factor]) but unfortunately, this command shows me all the levels and not just the ones existing in subsetrows. I suppose I could create a new vector outside of my data frame of only my subset rows and then drop any unused levels, but is there any easier/cleaner way to do this, possibly without copying my data outside the data frame?
As an example of what I want returned, if my data frame has factor levels from A to Z, but in my subset only P, R and Y appear, I want something that returns the levels P, R and Y.
You can certainly accomplish this with base functions. But my personal preference is to use dplyr with chained operations such as this:
library(dplyr)
d %>%
filter(is.na(ref)) %>%
select(field) %>%
distinct()
data
d <- data.frame(
field = c("A", "B", "C", "A", "B", "C"),
ref = c(NA, "a", "b", NA, "c", NA)
)
I modified a suggestion in the comments by Marat to use the function unique that seems to return the correct levels.
Solution:
subsetrows <- which(is.na(mydata$reference))
unique(as.character(mydata$factor[subsetrows]))
While I like learning new packages and functions, this solution seems better at this point since it's more compact and easier for me to understand if I need to revisit this code at some distant point in the future.
I am trying to redefine the levels that are assigned when I am using cbind to create a dataframe from select columns of other dataframes. The dataframes contain integers, and the rownames are strings:
outTable<-data.frame(cbind(contRes$wt, bRes$log2FoldChange, cRes$log2FoldChange, dRes$log2FoldChange, aRes$log2FoldChange), row.names=row.names(aRes))
Using the following, I get the levels of the columns:
levels(as.factor(colnames(outTable)))
[1] "F" "N" "RH" "RK" "W"
I would like to change that order by passing something like:
levels(as.factor(colnames(outTable)))<-c("W", "RK", "RH", "F", "N")
but I get the error:
could not find function "as.factor<-"
The end purpose is to set the X axis order of a boxplot in ggplot2. Am I approaching this the right way? if so, what am I missing, and if not how would be the best way to?
Use
factor(colnames(outTable), levels=c("W", "RK", "RH", "F", "N"))
If you use levels()<- you will simply rename/replace level names; you don't re-order them. This is certainly not he behavior you want. The best way to re-order them all is to just use factor()
You can specify levels as an argument in the as.factor function
factor(colnames(outTable), levels = c("W", "RK", "RH", "F", "N"), ordered=T)