This question is more advanced than the similar here. Expected amount of chars about 20.
When I plot things in data.frame, I do it like:
# t1 is a df
> plot((q1*s1+q2*s2)/(s1+s2),data=t1)
but can I reuse this form for matrix?
[Finally working MVO, thanks!]
> M<-matrix(data=rnorm(30),ncol=2,dimnames=list(NULL,c('q1','q2')))
> plot(M)
> x=1:dim(M)[1]
> plot(x~q1/q2,data=data.frame(M),type='l')
You can use with for this
with(data.frame(mymatrix), plot((q1*s1+q2*q2)/(s1+s2)))
Hope this help
That sort of plotting (where you type formulas involving the dataframe's columns) is only available for data frames.
If colnames(mymatrix) are q1, s1, etc, then you can achieve the affect by doing:
plot( myformula, data=data.frame(mymatrix))
i.e., coerce the matrix to a dataframe and then use the formula.
Update
An example demonstrating this works:
# construct a matrix
> mymatrix <- array(runif(10*2),dim=c(10,2))
# give it column names X and Y
> colnames(mymatrix)<-c('X','Y')
> mymatrix
X Y
[1,] 0.07346608 0.81321578
[2,] 0.09525474 0.17852467
[3,] 0.81246522 0.45747972
[4,] 0.01286714 0.82517127
[5,] 0.77554012 0.87725725
[6,] 0.71908435 0.71628493
[7,] 0.13212848 0.67827601
[8,] 0.65993809 0.01650703
[9,] 0.11385161 0.99433644
[10,] 0.22750439 0.45611635
# plot Y vs X -- note you need to convert the matrix to a data frame first.
> plot(Y~X,data.frame(mymatrix))
Related
I am trying to subset a large data matrix, an example of which is below:
row 1/col 1 row 1/col 2 row 1/col 3
[1,] 855.815 749.574 754.950
[2,] 855.718 749.496 755.004
[3,] 855.846 749.359 754.910
[4,] 855.746 749.299 754.795
[5,] 855.805 749.421 754.883
I am trying to remove columns where the value of the first row is above or below one standard deviation away from the mean of the whole first row, using this code:
library(matrixStats)
x = data[,-1] > (rowMeans(data[,-1]) + rowSds(data[,-1]))
y = data[,-1] < (rowMeans(data[,-1]) - rowSds(data[,-1]))
subset(df2, !(x | y))
But this returns the following error when applied to my dataset:
Error in x[subset & !is.na(subset), vars, drop = drop] :
(subscript) logical subscript too long
As I understand it, R has expanded this to read:
subset(df2, !(data[,-1] > (rowMeans(data[,-1]) + rowSds(data[,-1]))|data[,-1] < (rowMeans(data[,-1]) - rowSds(data[,-1]))))
and that the logical argument is simply too long. Is there something I am missing? I am inexperienced with R and sure there are neater ways to do this, but from what I have read I thought subset would be most useful.
Thank you in advance.
You can try this:
df <- as.matrix(read.table(text='C1 C2 C3
[1,] 855.815 749.574 754.950
[2,] 855.718 749.496 755.004
[3,] 855.846 749.359 754.910
[4,] 855.746 749.299 754.795
[5,] 855.805 749.421 754.883', header=TRUE))
library(matrixStats)
df[,which(abs(df[1,] - rowMeans(df)[1]) < rowSds(df)[1])]
# C2 C3
#[1,] 749.574 754.950
#[2,] 749.496 755.004
#[3,] 749.359 754.910
#[4,] 749.299 754.795
#[5,] 749.421 754.883
In R, when I select only one column from a data frame/matrix, the result will become a vector and lost the column names, how can I keep the column names?
For example, if I run the following code,
x <- matrix(1,3,3)
colnames(x) <- c("test1","test2","test3")
x[,1]
I will get
[1] 1 1 1
Actually, I want to get
test1
[1,] 1
[2,] 1
[3,] 1
The following code give me exactly what I want, however, is there any easier way to do this?
x <- matrix(1,3,3)
colnames(x) <- c("test1","test2","test3")
y <- as.matrix(x[,1])
colnames(y) <- colnames(x)[1]
y
Use the drop argument:
> x <- matrix(1,3,3)
> colnames(x) <- c("test1","test2","test3")
> x[,1, drop = FALSE]
test1
[1,] 1
[2,] 1
[3,] 1
Another possibility is to use subset:
> subset(x, select = 1)
test1
[1,] 1
[2,] 1
[3,] 1
The question mentions 'matrix or dataframe' as an input. If x is a dataframe, use LIST SUBSETTING notation, which will keep the column name and will NOT simplify by default!
`x <- matrix(1,3,3)
colnames(x) <- c("test1","test2","test3")
x=as.data.frame(x)
x[,1]
x[1]`
Data frames possess the characteristics of both lists and matrices: if you subset with a single vector, they behave like lists; if you subset with two vectors, they behave like matrices.
There's an important difference if you select a single
column: matrix subsetting simplifies by default, list
subsetting does not.
source: See http://adv-r.had.co.nz/Subsetting.html#subsetting-operators for details
I have an algorithm that takes data, sorts it, analyzes it, and then returns scores for the sorted data. However, the scores correspond to the sorted data, and I'd really like to return scores that correspond to the unsorted data. I figured there had to be some default R function that does this, but I've had no luck finding anything. Here's a MWE, and code I wrote that works but is really slow:
orig = rnorm(10)
ord = order(orig)
new = orig[ord]
reprod = sapply(1:length(orig), function(x)new[which(ord==x)] )
all(reprod==orig)
Are there any ways to "un-sort" data more efficiently?
what about just:
orig = rnorm(100000)
ord = order(orig)
new = orig[ord]
reprod = rep(0,length(new))
reprod[ord] = new
One nice thing about the ordering vector (the result of the order function) is that if you run order on the result it gives you the inverse sorting, in other words it tells you how to unsort. Here is a quick example of what I think you are trying to do in a simple way
> orig <- rnorm(10)
> ord <- order(orig)
> score <- seq_along(orig)
> cbind(orig, score[ order(ord) ])
orig
[1,] -0.2429266384 4
[2,] 0.6346488818 8
[3,] 1.2956779160 9
[4,] -0.5563531517 3
[5,] 1.3299626650 10
[6,] -1.6062497717 1
[7,] -1.1444093167 2
[8,] -0.0004719915 5
[9,] 0.2734227278 7
[10,] 0.0357991850 6
You could also do something like:
new[ order(ord) ]
to see that it returns the sorted data to the original order.
I have very big data set. I have to do some preprocessing in my data set. I do the following steps in my data set, but I get number for the second column insteas of names. but when I run the code on simple data set, it work well. does anybody knows what is the problem ? and how can I remove "" from output?
some parts of my data set :
> tars.hsa.miRBase[1:4,]
miRBaseid
1 hsa-let-7a/hsa-let-7b/hsa-let-7c/hsa-let-7d/hsa-let-7e/hsa-let-7f/hsa-miR-98/hsa-let-7g/hsa-let-7i/hsa-miR-4458/hsa-miR-4500
2 hsa-let-7a/hsa-let-7b/hsa-let-7c/hsa-let-7d/hsa-let-7e/hsa-let-7f/hsa-miR-98/hsa-let-7g/hsa-let-7i/hsa-miR-4458/hsa-miR-4500
3 hsa-let-7a/hsa-let-7b/hsa-let-7c/hsa-let-7d/hsa-let-7e/hsa-let-7f/hsa-miR-98/hsa-let-7g/hsa-let-7i/hsa-miR-4458/hsa-miR-4500
4 hsa-let-7a/hsa-let-7b/hsa-let-7c/hsa-let-7d/hsa-let-7e/hsa-let-7f/hsa-miR-98/hsa-let-7g/hsa-let-7i/hsa-miR-4458/hsa-miR-4500
Gene.Symbol Transcript.ID
1 SCARA3 NM_016240
2 IGLON5 NM_001101372
3 IRF5 NM_001098630
4 ELL2 NM_012081
My code :
ind.mirs <- strsplit(tars.hsa.miRBase[, "miRBaseid"], split="/")
lclus <- (sapply(ind.mirs, length))
new.tars <- matrix(NA,sum(lclus),2)
new.tars[,1] <- do.call(c,ind.mirs)
new.tars[,2] <- rep(tars.hsa.miRBase$Gene.Symbol, time=lclus )
Some part of output :
[,1] [,2]
[1,] "hsa-let-7a" "13883"
[2,] "hsa-let-7b" "13883"
[3,] "hsa-let-7c" "13883"
[4,] "hsa-let-7d" "13883"
What I expected :
miRBaseid Gene.Symbol
[1,] hsa-let-7a SCARA3
[2,] hsa-let-7b SCARA3
[3,] hsa-let-7c SCARA3
[4,] hsa-let-7d SCARA3
.
.
.
.
How is it work on simple data :
tars.hsa <- data.frame(miR.Family=c("a","b/b","c/c","d/d/d"), Gene.Symbol=paste0("A",1:4,"BG"),stringsAsFactors=FALSE)
ind.mirs <- strsplit(tars.hsa[, "miR.Family"], split="/")
lclus <- sapply(ind.mirs, length)
new.tars <- matrix(NA,sum(lclus),2)
new.tars[,1] <- do.call(c,ind.mirs)
new.tars[,2] <- rep(tars.hsa$Gene.Symbol, time=lclus )
OutPut:
[,1] [,2]
[1,] "a" "A1BG"
[2,] "b" "A2BG"
[3,] "b" "A2BG"
[4,] "c" "A3BG"
[5,] "c" "A3BG"
[6,] "d" "A4BG"
[7,] "d" "A4BG"
[8,] "d" "A4BG"
>
What is happening is that you are getting the numeric index of the factor level that corresponds to "SCARA3" in your dataset (in this case, 13883). This is being caused by two main issues: first, the matrix has to be all one data type in R, and second, the code is treating the text as factor levels.
If you use a data frame instead of a matrix, each column can have its own data type, so you can have a column that is text and another that is numeric. Alternatively, you might try the options(stringsAsFactors=FALSE) option to change how R processes strings.
Getting rid of the "" signs that you are worried about would also be accomplished by handling the data as a data frame and not a matrix; they are appearing because you are creating a character matrix. They aren't being stored in the data itself, but are there for display (IIRC).
EDITED TO ADD:
Okay, a longer explanation. In R, when you have a vector of character data, by default R assumes that those represent categorical variables. For instance, if you have a variable race in your dataset with different character strings ("White", "Black", "Asian", and so on), it automatically creates a factor. A factor in R is a special kind of character variable that has different rules in modeling and such.
If I create example data from your question, like this:
tars.hsa.miRBase <- data.frame(miRBaseid=c("hsa-let-7a/hsa-let-7b/hsa-let-7c/hsa-let-7d/hsa-let-7e/hsa-let-7f/hsa-miR-98/hsa-let-7g/hsa-let-7i/hsa-miR-4458/hsa-miR-4500",
"hsa-let-7a/hsa-let-7b/hsa-let-7c/hsa-let-7d/hsa-let-7e/hsa-let-7f/hsa-miR-98/hsa-let-7g/hsa-let-7i/hsa-miR-4458/hsa-miR-4500",
"hsa-let-7a/hsa-let-7b/hsa-let-7c/hsa-let-7d/hsa-let-7e/hsa-let-7f/hsa-miR-98/hsa-let-7g/hsa-let-7i/hsa-miR-4458/hsa-miR-4500",
"hsa-let-7a/hsa-let-7b/hsa-let-7c/hsa-let-7d/hsa-let-7e/hsa-let-7f/hsa-miR-98/hsa-let-7g/hsa-let-7i/hsa-miR-4458/hsa-miR-4500"),
Gene.Symbol=c("SCARA3","IGLON5","IRF5","ELL2"),
Transcript.ID=c("NM_016240","NM_001101372","NM_001098630","NM_012081"))
The resulting data is made into factors:
[1] SCARA3 IGLON5 IRF5 ELL2
Levels: ELL2 IGLON5 IRF5 SCARA3
You can tell that the data is a factor because of the "Levels:" statement below the results. To get around this, you can tell R not to treat strings as factors options(stringsAsFactors=FALSE) and you can pass data through as.character to ignore the factor levels.
> as.character(tars.hsa.miRBase$Gene.Symbol)
[1] "SCARA3" "IGLON5" "IRF5" "ELL2"
See how it changes the output?
ind.mirs <- strsplit(as.character(tars.hsa.miRBase[,"miRBaseid"]), split="/")
lclus <- sapply(ind.mirs, length)
new.tars <- matrix(NA,sum(lclus),2)
new.tars[,1] <- do.call(c,ind.mirs)
new.tars[,2] <- rep(as.character(tars.hsa.miRBase$Gene.Symbol), time=lclus)
> new.tars
[,1] [,2]
[1,] "hsa-let-7a" "SCARA3"
[2,] "hsa-let-7b" "SCARA3"
[3,] "hsa-let-7c" "SCARA3"
[4,] "hsa-let-7d" "SCARA3"
[5,] "hsa-let-7e" "SCARA3"
I cannot seem to convert a list to a matrix. I load a .csv file using:
dat = read.csv("games.csv", header = TRUE)
> typeof(dat)
[1] "list"
But when I try to convert it into a numeric matrix using:
games = data.matrix(dat)
The entries' values are all changed for some reason. What is the problem?
While Nathaniel's solution worked for you, I think it's important to point out that you might need to adjust your perception of what is going on.
The typeof(dat) might be a list but the class is a data.frame.
This might help illustrate the difference:
# typeof() & class() of `pts` is `list`
# whereas typeof() `dat` in your example is `list` but
# class() of `dat` in your example is a `data.frame`
pts <- list(x = cars[,1], y = cars[,2])
as.matrix(pts)
## [,1]
## x Numeric,50
#3 y Numeric,50
head(as.matrix(data.frame(pts)))
## x y
## [1,] 4 2
## [2,] 4 10
## [3,] 7 4
## [4,] 7 22
## [5,] 8 16
## [6,] 9 10
Those are two substantially different outcomes from the 'as.matrix()` function.
Just making sure you don't get disappointed of the outcome if you try this in a different context outside of read.csv.
Without any other information being provided, perhaps you might try:
games <- as.matrix(dat)