How to select columns from R dataframe? - r

I know we can extract specific columns from R dataframe with df[,c("A","B","E")]. However, There are so many columns I want to pick out, and I can not type them one by one. I have a dataframe B that contains all column headers that I want to extract from Dataframe A. How can I extract columns from Dataframe A based on the headers I put in DataframeB ?
I tried A[,B[, 1]] but I got incorrect number of dimensions I got same error when I tried to print B[, 1].
With dput(B) I got > dput(B)
c("A", "B", "C",
"D", "E", "F",
"G")

Related

Combining multiple lists into data frame w/ two columns: one for elements of all the lists, and one that has the name of the origin list

Sorry for the very basic question, but I imagine there's a very easy way to do what I want to do and I'm drawing a blank.
I have three basic lists in R, for example:
list_1 = c("A", "B", "C")
list_2 = c("D", "E", "F")
list_3 = c("G", "H", "I")
I want to combine these into a data frame with two columns, the first that has the elements of all of the lists -- "A", "B", "C", "D", "E", "F", "G", "H", "I" -- and the second that has the name of the list where the element was originally located -- "list_1", "list_1", "list_1", "list_2", "list_2", "list_2", "list_3", "list_3", "list_3."
I've tried various of the classic merging functions (rbind, bind_rows, append, etc.) but none seem to do specifically what I'm looking for. Hoping someone has the magic solution!

VegaLite.jl ignoring order of rows

I am trying to plot a sorted bar chart in Julia using VegaLite.jl on the DataFrame sorted by the chosen key, but the bar chart completely ignores the order of the DataFrame. I've also tried to specify order on y in different ways, but that resulted in nothing (neither an error nor any effect on the sorting order).
Toy data and plotting:
using DataFrames
using VegaLite
df2 = DataFrame(weight=rand(10), col_name=["A", "B", "C", "D", "E", "F", "G", "H", "I", "J"])
sort!(df2, [:weight], rev=true)
df2 |> #vlplot() + #vlplot(:bar, x=:col_name, y=:weight) + #vlplot(mark={:rule, strokeDash=[2,2], size=2}, y={datum=0.5})
How to specify the order of the bar chart so that it actually works?

pheatmap column cluster annotation

I have following code for the heat map generation using "pheatmap" package. I am successful creating heat map but having very hard time to annotate the column. I have 9 cell types and each cell type has 3 replicates.
Here is the data. I have labeled samples as 01_01,01_02,01_03 to 09_01, 09_02,09_03.
I would like to have column annotation as "A", "B", "C", "D","E", "F", "G", "H", "I". pheatmap with clustering row and column
I tried my best to annotate cluster column but wasn't successful. I really appreciate if someone can explain how to do customize column annotation.
Data:
my data file
library(readxl)
library(pheatmap)
library(tibble)
library(dplyr)
read file
test<-read_excel("~test.xlsx",1)
test%>%
column_to_rownames("Metabolite") %>%
as.data.frame()
test<-as.data.frame(test)
rownames(test)<-test$Metabolite
test$Metabolite<-NULL
pheatmap(test,cluster_rows=TRUE,cluster_cols=TRUE,breaks=NA,scale="none",legend=TRUE,color=colorRampPalette(c("navy","white","firebrick3"))(50),margins=c(5,10),fontsize_row=4,fontsize_col=4,cellwidth=5,cellheight=5)

Renaming values in R after binning with cut()

I had a list of numerical values that I wanted to bin using cut(). Now each row has been replaced with the range that it fell into, in the form of ranges using brackets e.g. [0,140] meaning between 0 and 140 inclusive
The problem is these names are lengthy, and eventually require exponent notation, making them even longer, and it makes the graph illegible. Using typeof() it appears it's still in integer form, but I can't figure out how to rename them the way I would with factors. When I tried with factor() and the labels parameter, I was told that sort only worked on atomic lists.
As an example, here's essentially what I tried on my dataset, except with the built-in iris dataset:
data(iris)
iris[1] <- cut(iris[[1]], 10, include.lowest=TRUE)
iris[1] <- factor(iris[1], labels = c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j"))
It returns the error:
Error in sort.list(y) : 'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?

SparkR: Assign values of a column with condition

I want to replace values of a column with a certain condition.
Example of R data frame:
df <- data.frame(id=c(1:7),value=c("a", "b", "c", "d", "e", "c", "c"))
I want to replace values "c" and "d", in column value by "e".
In R, it can be done this way
df[df$value %in% c("c","d"),]$value <- "e"
I tried to do the same thing in sparkR. Tried ifelse, when functions but couldn't give me the desired result.
Does anyway run into the same issue?
The first comment of mtoto works well (with spark 3.0.1) and should be transformed in answer and accepted.
df$value <- ifelse(df$value %in% c("c","d"), "e", df$value)
Another valid slightly different method to replace strings in a column could be the following:
df$value <- regexp_replace(df$value, "c", "e")

Resources