How can I start manipulating an scRNA seq .txt matrix in R? - r

I'm really new to this, so please forgive my general lack of understanding.
I've been trying to take the dataset that is provided here (I download the .txt file at the bottom). It's scRNA-seq data and I'm trying to use seurat to process it and make some graphs. However, every time I try loading it, it doesn't work and it says I need a arcode file. How can I convert this file into something that'll run through seurat and make a barcode file? I really appreciate anyone's help in this. Thank you!

Reading it is trivial, use standard functions such as:
d <- data.table::fread("GSM4203181_data.raw.matrix.txt")
The "barcodes" are the colnames, the "features" are the first column, so the genes.
> d[1:5,1:5]
V1 AAACCTGAGAGATGAG-1 AAACCTGCACCAGGCT-1 AAACCTGGTTAAGACA-1
1: RP11-34P13.7 0 0 0
2: RP11-34P13.8 0 0 0
3: FO538757.2 1 0 0
4: AP006222.2 2 1 0
5: RP4-669L17.10 0 0 0
AAACCTGGTTGAGTTC-1
1: 0
2: 0
3: 0
4: 0
5: 0
From here you can construct your Seurat object manually, e.g. via https://www.rdocumentation.org/packages/Seurat/versions/3.0.1/topics/CreateSeuratObject

Related

How to create undirected (i)graph in R for protein-interaction data with tissue information?

I am new to bioinformatics in general and would really appreciate some help and tips with the project Im working on.
My data of protein-protein interactions is stored in a table (in MySQL) with binary information about tissue-specificity. Now I am trying to create an undirected graph with igraph in R, but could not understand what type of data structure I should use without losing the tissue-information (Adjacency matrix, edge list..?).
Thank you in advance!
The data itself about 200k rows, but here is an example of the structure:
symbol1
symbol2
adipose_tissue
adrenal_gland
amygdala
bone
POT1
PRMT7
0
0
0
0
CNBP
HNRNPAB
1
1
1
1
TRIAP1
BAG3
1
1
1
1
NR5A1
RALY
0
1
0
0
TPI1
CCDC8
1
1
1
1
MRPS22
BARD1
0
0
0
1
TOP2A
CCDC8
0
0
0
1
MYH9
TRIM72
0
0
0
0
ATXN7
TAF12
1
0
0
1
PSEN1
STT3B
1
1
1
1
ATP5F1
TSG101
1
1
1
1
BRCA1
UTP4
0
0
1
1
Bioinformatics apart, this is a question of data-wrangling in igraph. Igraph is capable of building graph-objects from both matrices and lists in many formats, so one should avoid too much pre-conversion. I suggest you build your graph using graph_from_data_frame()
I assume that the data structure described above is relational and therefore basically already an edge-list of relations between proteins uniprot2 and uniprot2. This mockup sample-data would then mimics your data-structure.
data <- data.frame(uniprot1 = c('Q94X','Q95X','Q435','QUUU','0982'),
uniprot2 = c('QUUU','Q94X','Q95X','Q95X','Q94X'),
symbol = c('Symbol A', 'Symbol B', 'Symbol C','Symbol D',' Symbol E'),
adipose_tissue = c(1,0,0,1,1),
bone=c(0,0,0,1,1))
To keep variables other than just the relational edges between vertices, you can either create them alongside your graph-objects, or add and manipulate them later manually.
Attributes naturally belong either to vertices or to edges. A veracity-attribute in your data would be a protein name, size or other characteristic. An edge-attribute would be the relational strength, type, or any other characteristic of the link between two proteins. If your graph would have a veracity called understandable_name_of_protein you'd access it like so:
V(g)$understandable_name_of_protein
Edge-attributes follow the same principle through E(g)$attribute. When you load the example data above, all your edge-attributes should jump right into your graph like this:
# Build an undirected graph using the edges described in `data`
g <- graph_from_data_frame(data, directed=FALSE)
# Check sure that data was correctly imported as edge-attributes
E(g)$bone
# Add the edge-attribut `color` which will be displayed when plotting the graph
E(g)$color <- ifelse(E(g)$bone == 0, 'green','black')
# plot to see the graph with the bone-attribute visible as edge-color
plot(g)

Matches in binary columns-R

I am performing some prediction models. I have 2 binary columns , one with predicted values and the other one with the actual values.
Since the columns have few ones because it counts the number of people with cancer, i want to observe how many cases the model detected(how many real ones it predicted) and the percentage of sick persons correctly predicted.
Brief description of the data: the first column shows the real values and the seconde one shows the predicted values:
> predictedvsreal
real prediction
39240 0 0
39241 0 0
39242 0 0
39243 1 0
39244 0 1
39245 0 0
39246 0 0
39247 0 0
39248 1 1
39249 0 0
39250 0 0
39251 0 0
39252 0 0
Thanks!
Next time please include a reproducible example as it makes the question much better - both for letting people who answer have a concrete example to work with and to catch edge-cases, and for future readers to see a real example.
There are lots of good recommendations for how to create nice, minimal, reproducible examples at this link.
From what you describe, you want the table function, probably like this:
with(your_data, table(your_first_column_name, your_second_column_name))

what is Boolean x'.y+x.y' equal to

i am stuck with a boolean expression help me solve what x.y'+x'.y =?
i have exam today and i don't know how do solve this type. And in addition can someone recreate the boolean laws that involve two element instead of one for me? Thank you
There are only two inputs to the expression, so write out a truth table with the values of the inputs and for each term until you get the result.
x y x' y' x'.y x.y' x'.y+x.y'
0 0 1 1 0 ...
0 1 1 0 1 ...
1 1 0 0 0 ...
1 0 0 1 0 ...
When you have done that, look for patterns in the last column. You should then recognise the pattern as being the same as a single operator.
The pattern for the inputs is usually a Gray code so that the output column reflects changes due to only one input changing, which usually can help show up the pattern.
Alternatively, when you have your result, plot it in a grid and spot the pattern that way, e.g. for x+y you'd get
x\y 0 1
0 0 1
0 1 1

transpose row to column in R using qdap

I have been using the wfm function in "qdap" package for transposing the text row values into columns and ran into problem when the data contains numbers along with text. For example if the row value is "abcdef" the transpose works fine but if the value is "ab1000" then the truncation of numbers happen. Can anyone help with suggestions on how to work around this?
Approach tried so far:
input <- read.table(header=F, text="101 ab0003
101 pp6500
102 sm2456")
colnames(input) <- c("id","channel")
require(qdap)
library(qdap)
output <- t(with(input, wfm(channel, id)))
output <- as.data.frame(output)
expected_output<- read.table(header=F,text="1 1 0
0 0 1")
colnames(expected_output) <- c("ab0003","pp6500", "sm2456")
I think maybe wfm isn't the right tool for this job. It seems you don't really have sentences that you want to split into words. So you're using a function with a lot of overhead unnecessarily. What you really want it to tabulate the values you have by another grouping variable.
Here are two approaches. One using qdapTools's mtabulate, another using base R's table:
library(qdapTools)
mtabulate(with(input, split(channel, id)))
## ab0003 pp6500 sm2456
## 101 1 1 0
## 102 0 0 1
t(with(input, table(channel, id)))
## channel
## id ab0003 pp6500 sm2456
## 101 1 1 0
## 102 0 0 1
It may be possible your MWE is not reflecting the complexity of the data, if this is the case it brings us back to the original problem. wfm uses tmpackage as a backend to make some of the manipulations. So we'd need to supply something to the ldots (...). I re-read the documentation and this is a bit confusing (I have added this info in the dev version) but we want to pass removeNumbers=FALSE to TermDocumentMatrix as seen here:
output <- t(with(input, wfm(channel, id, removeNumbers=FALSE)))
as.data.frame(output)
## ab0003 pp6500 sm2456
## 101 1 1 0
## 102 0 0 1

do until loop in R

at the end of my wits, so sorry this is the wrong place, or done incorrectly. first time asking here. i am new to R, with very little programming experience (a pascal class in college, and was very good at macromedia lingo way back - so, not that afraid of code).
to keep things short and simple, i think best to just show you what i have, and what i would like. i have spent hours upon hours searching and trying for a solution.
an example of what i have (it is an xts object called "signals", and indexed by days (left out here to make the example simple):
open close position
0 0 0
1 0 0
0 0 0
0 0 0
0 1 0
0 0 0
and what i would like to happen:
open close position
0 0 0
1 0 1
0 0 1
0 0 1
0 1 1
0 0 0
basically, when "open" is true, repeat 1s in "position" until "close" is true. amazingly simple, i think, but somehow i can't make it work. here one example of where i got that i thought was maybe close, but it gets stuck in an endless loop:
for (i in 1:nrow(signals)) {
if (signals[i,"open"]==1) next
while (signals[i,"close"] == 0) {
signals[i,"position"] <- 1 }
}
thank you!
EDIT - i left out an important qualifier. there are times where the first true statement in "close" will come before the first true statement in "open." however, now that i wrote that out here, i suppose it is easier to just 'clean' the close column somehow, so there are no 1s in it prior to the point of the first 1 in the open column.
however, if someone has an idea how to do it all, feel free to add additional information. thanks!
You don't have to use loops for this:
open <- c(0,1,0,0,0,0)
close <- c(0,0,0,0,1,0)
position <- cumsum(open-close)
position
[1] 0 1 1 1 0 0
Note this closes immediately, if you want to on the line after you get a close signal, use:
cumsum(open-c(0,close[-length(close)]))
[1] 0 1 1 1 1 0
The reason your while statment never ends is that you have nothing to modify what is being tested, that is i doesn't get incremented.

Resources