Renaming values in R after binning with cut() - r

I had a list of numerical values that I wanted to bin using cut(). Now each row has been replaced with the range that it fell into, in the form of ranges using brackets e.g. [0,140] meaning between 0 and 140 inclusive
The problem is these names are lengthy, and eventually require exponent notation, making them even longer, and it makes the graph illegible. Using typeof() it appears it's still in integer form, but I can't figure out how to rename them the way I would with factors. When I tried with factor() and the labels parameter, I was told that sort only worked on atomic lists.
As an example, here's essentially what I tried on my dataset, except with the built-in iris dataset:
data(iris)
iris[1] <- cut(iris[[1]], 10, include.lowest=TRUE)
iris[1] <- factor(iris[1], labels = c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j"))
It returns the error:
Error in sort.list(y) : 'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?

Related

VegaLite.jl ignoring order of rows

I am trying to plot a sorted bar chart in Julia using VegaLite.jl on the DataFrame sorted by the chosen key, but the bar chart completely ignores the order of the DataFrame. I've also tried to specify order on y in different ways, but that resulted in nothing (neither an error nor any effect on the sorting order).
Toy data and plotting:
using DataFrames
using VegaLite
df2 = DataFrame(weight=rand(10), col_name=["A", "B", "C", "D", "E", "F", "G", "H", "I", "J"])
sort!(df2, [:weight], rev=true)
df2 |> #vlplot() + #vlplot(:bar, x=:col_name, y=:weight) + #vlplot(mark={:rule, strokeDash=[2,2], size=2}, y={datum=0.5})
How to specify the order of the bar chart so that it actually works?

How to fix a subsetting issue in R

I am trying to subset my dataframe, but when I do some of the factors are not being brought in and left behind.
When I try this code it gives me a dataframe that has 2048 obs, but then when I try the next set of code I still have COW, Negative Control, and Positive Control in the subset.
Controls_data <- subset(data_all, SampleID == c('COW', 'Negative Control', 'Positive Control'))
Sample_data <- subset(data_all, SampleID != c("COW", "Negative Control", "Positive Control"))
I should have 6,144 in the Controls_data. I double checked this in excel because I thought that maybe they were spelled differently or had spaces.
As #arg0naut and #Gregor both writes and suggests. Your problem is that == uses R's standard reuse rules and then does pairwise comparison. So that is not what you want to do.
Compare the outputs from the following lines of codes.:
letters == c("c", "e")
letters %in% c("c", "e")
letters == c("c", "e", "d")
Notice the warning the last case. In your case, the left hand side happens to be a multiple of the right and you are not warned.
You could also use the match function in your case:
match(c("c", "e", "d"), letters)

How to select columns from R dataframe?

I know we can extract specific columns from R dataframe with df[,c("A","B","E")]. However, There are so many columns I want to pick out, and I can not type them one by one. I have a dataframe B that contains all column headers that I want to extract from Dataframe A. How can I extract columns from Dataframe A based on the headers I put in DataframeB ?
I tried A[,B[, 1]] but I got incorrect number of dimensions I got same error when I tried to print B[, 1].
With dput(B) I got > dput(B)
c("A", "B", "C",
"D", "E", "F",
"G")

R - show only levels used in a subset of data frame

I have a rather large data frame with a factor that has a lot of levels (more than 4,000). I have another column in the same data frame that I'm using as a reference, and what I'd like to find is a subset of the levels whenever this reference column is NA.
The first step I'm using is subsetrows <- which(is.na(mydata$reference)) but after that I'm stuck. I want something like levels(mydata[subsetrows,mydata$factor]) but unfortunately, this command shows me all the levels and not just the ones existing in subsetrows. I suppose I could create a new vector outside of my data frame of only my subset rows and then drop any unused levels, but is there any easier/cleaner way to do this, possibly without copying my data outside the data frame?
As an example of what I want returned, if my data frame has factor levels from A to Z, but in my subset only P, R and Y appear, I want something that returns the levels P, R and Y.
You can certainly accomplish this with base functions. But my personal preference is to use dplyr with chained operations such as this:
library(dplyr)
d %>%
filter(is.na(ref)) %>%
select(field) %>%
distinct()
data
d <- data.frame(
field = c("A", "B", "C", "A", "B", "C"),
ref = c(NA, "a", "b", NA, "c", NA)
)
I modified a suggestion in the comments by Marat to use the function unique that seems to return the correct levels.
Solution:
subsetrows <- which(is.na(mydata$reference))
unique(as.character(mydata$factor[subsetrows]))
While I like learning new packages and functions, this solution seems better at this point since it's more compact and easier for me to understand if I need to revisit this code at some distant point in the future.

Defining the levels of dataframe columns in R

I am trying to redefine the levels that are assigned when I am using cbind to create a dataframe from select columns of other dataframes. The dataframes contain integers, and the rownames are strings:
outTable<-data.frame(cbind(contRes$wt, bRes$log2FoldChange, cRes$log2FoldChange, dRes$log2FoldChange, aRes$log2FoldChange), row.names=row.names(aRes))
Using the following, I get the levels of the columns:
levels(as.factor(colnames(outTable)))
[1] "F" "N" "RH" "RK" "W"
I would like to change that order by passing something like:
levels(as.factor(colnames(outTable)))<-c("W", "RK", "RH", "F", "N")
but I get the error:
could not find function "as.factor<-"
The end purpose is to set the X axis order of a boxplot in ggplot2. Am I approaching this the right way? if so, what am I missing, and if not how would be the best way to?
Use
factor(colnames(outTable), levels=c("W", "RK", "RH", "F", "N"))
If you use levels()<- you will simply rename/replace level names; you don't re-order them. This is certainly not he behavior you want. The best way to re-order them all is to just use factor()
You can specify levels as an argument in the as.factor function
factor(colnames(outTable), levels = c("W", "RK", "RH", "F", "N"), ordered=T)

Resources