Data frame header in R - r

I am trying to make some calculations with data from oracle db using R. I connected to the DB and extracted the data correctly.
> y=dbGetQuery(con, "select distinct(fk_parametro) from t_datos")
> y
FK_PARAMETRO
1 30
2 42
3 43
4 83
5 87
6 1
7 6
8 44
9 20
10 14
11 86
12 88
13 85
14 81
15 35
16 8
17 80
18 89
19 7
20 12
21 82
22 9
23 10
The following command.. works:
> sum(y)
[1] 1042
But this one.. fails:
> mean(y)
[1] NA
Warning message:
In mean.default(y) : argument is not numeric or logical: returning NA
I think it happens because R is considering the header "FK_PARAMETRO" as an element. can someone help me to figure out?

As commented by #akrun, this works
mean(y[,1])
Or as suggested by #PierreLafortune, could also do
colMeans(y)

Related

Script out of bounds in R

I am using a code based on Deseq2. One of my goals is to plot a heatmap of data.
heatmap.data <- counts(dds)[topGenes,]
The error I am getting is
Error in counts(dds)[topGenes, ]: subscript out of bounds
the first few line sof my counts(dds) function looks like this.
99h1 99h2 99h3 99h4 wth1 wth2
ENSDARG00000000002 243 196 187 117 91 96
ENSDARG00000000018 42 55 53 32 48 48
ENSDARG00000000019 91 91 108 64 95 94
ENSDARG00000000068 3 10 10 10 30 21
ENSDARG00000000069 55 47 43 53 51 30
ENSDARG00000000086 46 26 36 18 37 29
ENSDARG00000000103 301 289 289 199 347 386
ENSDARG00000000151 18 19 17 14 22 19
ENSDARG00000000161 16 17 9 19 10 20
ENSDARG00000000175 10 9 10 6 16 12
ENSDARG00000000183 12 8 15 11 8 9
ENSDARG00000000189 16 17 13 10 13 21
ENSDARG00000000212 227 208 259 234 78 69
ENSDARG00000000229 68 72 95 44 71 64
ENSDARG00000000241 71 92 67 76 88 74
ENSDARG00000000324 11 9 6 2 8 9
ENSDARG00000000370 12 5 7 8 0 5
ENSDARG00000000394 390 356 339 283 313 286
ENSDARG00000000423 0 0 2 2 7 1
ENSDARG00000000442 1 1 0 0 1 1
ENSDARG00000000472 16 8 3 5 7 8
ENSDARG00000000476 2 1 2 4 6 3
ENSDARG00000000489 221 203 169 144 84 114
ENSDARG00000000503 133 118 139 89 91 112
ENSDARG00000000529 31 25 17 26 15 24
ENSDARG00000000540 25 17 17 10 28 19
ENSDARG00000000542 15 9 9 6 15 12
How do I ensure all the elements of the top genes are present in it?
When I try to see 20 top genes in the dataset. it looks like a list of genes
6339" "12416" "1241" "3025" "12791" "846" "15090"
[8] "6529" "14564" "4863" "12777" "1122" "7454" "13716"
[15] "5790" "3328" "1231" "13734" "2797" "9072" with the column head V1.
I have used both
topGenes <- read.table("E://mir99h50 Cheng data//topGenesresordered.txt",header = TRUE)
and
topGenes <- read.table("E://mir99h50 Cheng data//topGenesresordered.txt",header = FALSE)
to see if the out of bounds error is removed. However it was of no use. I guess the V1 head is causing the issue.
The top genes function has been generated using the above code snippet.
resordered <- res[order(res$padj),]
#Reorder gene list by increasing pAdj
resordered <- as.data.frame(res[order(res$padj),])
#Filter for genes that are differentially expressed with an FDR < 0.01
ii <- which(res$padj < 0.01)
length(ii)
# Use the rownames() function to get the top 20 differentially expressed genes from our results table
topGenes <- rownames(resordered[1:20,])
topGenes
# Get the counts from the DESeqDataSet using the counts() function
heatmap.data <- counts(dds)[topGenes,]
Perhaps this will do what you want?
counts_dds <- counts(dds)
topgenes <- c("ENSDARG00000000002", "ENSDARG00000000489", "ENSDARG00000000503",
"ENSDARG00000000540", "ENSDARG00000000529", "ENSDARG00000000542")
heatmap.data <- counts_dds[rownames(counts_dds) %in% topgenes,]
If you provide more information it will be easier to advise you on how to fix your problem.

merging two data frames failed in R

1st data frame:
> head(rawCountTable)
X1916.MJO.0001_1 X1916.MJO.0002_1 X1916.MJO.0003_1 X1916.MJO.0004_1 X1916.MJO.0005_1 X1916.MJO.0006_1
sp0055008 9 5 10 2 13 18
sp0052637 32 16 27 18 25 28
sp0025247 50 73 67 63 94 73
sp0025250 9 5 5 17 15 22
sp0025268 141 117 107 104 129 135
sp0025270 0 0 0 0 0 0
2nd data frame
> head(gene.description)
V1 V6
1: sp0082943 Exonuclease
2: sp0086614 NB-ARC domain
3: sp0013196 Ulp1 protease family, C-terminal catalytic domain
4: sp0088665 Ankyrin repeats (many copies)
5: sp0071721 Peptidase inhibitor I9
6: sp0012855 Ubiquitin fusion degradation protein UFD1
During merging I got the following error:
> merge(rownames(rawCountTable),gene.description, by.y="V1")
Error in merge.data.frame(as.data.frame(x), as.data.frame(y), ...)
I needed to merger the two data frames together because
> dim(rawCountTable)
[1] 94003 24
> dim(gene.description)
[1] 61125 2
I need all 94003 ids. What did I miss?
Thank you in advance.

how to fix "undefined columns selected" for network meta-analysis in R?

I am conducting a network meta-analysis on R with two packages, gemtc and rjags. However, when I type
Model <- mtc.model (network, linearmodel=’fixed’).
R always returns “
Error in [.data.frame(data, sel1 | sel2, columns, drop = FALSE) :
undefined columns selected In addition: Warning messages: 1: In
mtc.model(network, linearModel = "fixed") : Likelihood can not be
inferred. Defaulting to normal. 2: In mtc.model(network, linearModel =
"fixed") : Link can not be inferred. Defaulting to identity “
How to fix this problem? Thanks!
I am attaching my codes and data here:
SAE <- read.csv(file.choose(),head=T, sep=",")
head(SAE)
network <- mtc.network(data.ab=SAE)
summary(network)
plot(network)
model.fe <- mtc.model (network, linearModel="fixed")
plot(model.fe)
summary(model.fe)
cat(model.fe$code)
model.fe$data
# run this model
result.fe <- mtc.run(model.fe, n.adapt=0, n.iter=50)
plot(result.fe)
gelman.diag(result.fe)
result.fe <- mtc.run(model.fe, n.adapt=1000, n.iter=5000)
plot(result.fe)
gelman.diag(result.fe)
following is my data: SAE
study treatment responder sample.size
1 1 3 0 76
2 1 30 2 72
3 2 3 99 1389
4 2 23 132 1383
5 3 1 6 352
6 3 30 2 178
7 4 2 6 106
8 4 30 3 95
9 5 3 49 393
10 5 25 18 198
11 6 1 20 65
12 6 22 10 26
13 7 1 1 76
14 7 30 3 76
15 8 3 7 441
16 8 26 1 220
17 9 2 1 47
18 9 30 0 41
19 10 3 10 156
20 10 30 9 150
21 11 1 4 85
22 11 25 5 85
23 11 30 4 84
24 12 3 6 152
25 12 30 5 160
26 13 18 4 158
27 13 21 8 158
28 14 1 3 110
29 14 30 2 111
30 15 3 3 83
31 15 30 1 92
32 16 1 3 124
33 16 22 6 123
34 16 30 4 125
35 17 3 236 1553
36 17 23 254 1546
37 18 6 5 398
38 18 7 6 403
39 19 1 64 588
40 19 22 73 584
How about reading the manual ?mtc.model. It clearly states the following:
Required columns [responders, sampleSize]
So your responder variable should be responders and your sample.size variable should be sampleSize.
Next, your plot(network) should help you determine that some comparisons can not be made. In your data, there are 2 subgroups of trials that were compared. Treatment 18 and 21 were not compared with any of the others. Therefore you can only do a meta-analysis of 21 and 18 or a network meta-analysis of the rest.
network <- mtc.network(data.ab=SAE[!SAE$treatment %in% c(21, 18), ])
model.fe <- mtc.model(network, linearModel="fixed")

Mean and SD in R

maybe it is a very easy question. This is my data.frame:
> read.table("text.txt")
V1 V2
1 26 22516
2 28 17129
3 30 38470
4 32 12920
5 34 30835
6 36 36244
7 38 24482
8 40 67482
9 42 23121
10 44 51643
11 46 61064
12 48 37678
13 50 98817
14 52 31741
15 54 74672
16 56 85648
17 58 53813
18 60 135534
19 62 46621
20 64 89266
21 66 99818
22 68 60071
23 70 168558
24 72 67059
25 74 194730
26 76 278473
27 78 217860
It means that I have 22516 sequences with length 26, 17129 sequences with length 28, etc. I would like to know the sequence length mean and its standard deviation. I know how to do it, but I know to do it creating a list full of 26 repeated 22516 times and so on... and then compute the mean and SD. However, I thing there is a easier method. Any idea?
Thanks.
For mean: (V1 %*% V2)/sum(V2)
For SD: sqrt(((V1-(V1 %*% V2)/sum(V2))**2 %*% V2)/sum(V2))
I do not find mean(rep(V1,V2)) # 61.902 and sd(rep(V1,V2)) # 14.23891 that complex, but alternatively you might try:
weighted.mean(V1,V2) # 61.902
# recipe from http://www.ltcconline.net/greenl/courses/201/descstat/meansdgrouped.htm
sqrt((sum((V1^2)*V2)-(sum(V1*V2)^2)/sum(V2))/(sum(V2)-1)) # 14.23891
Step1: Set up data:
dat.df <- read.table(text="id V1 V2
1 26 22516
2 28 17129
3 30 38470
4 32 12920
5 34 30835
6 36 36244
7 38 24482
8 40 67482
9 42 23121
10 44 51643
11 46 61064
12 48 37678
13 50 98817
14 52 31741
15 54 74672
16 56 85648
17 58 53813
18 60 135534
19 62 46621
20 64 89266
21 66 99818
22 68 60071
23 70 168558
24 72 67059
25 74 194730
26 76 278473
27 78 217860",header=T)
Step2: Convert to data.table (only for simplicity and laziness in typing)
library(data.table)
dat <- data.table(dat.df)
Step3: Set up new columns with products, and use them to find mean
dat[,pr:=V1*V2]
dat[,v1sq:=as.numeric(V1*V1*V2)]
dat.Mean <- sum(dat$pr)/sum(dat$V2)
dat.SD <- sqrt( (sum(dat$v1sq)/sum(dat$V2)) - dat.Mean^2)
Hope this helps!!
MEAN = (V1*V2)/sum(V2)
SD = sqrt((V1*V1*V2)/sum(V2) - MEAN^2)

Changing factors to Integers without changing the order of the data

I have following data and trying change CCG and Pract to numbers so I can use stan or Winbugs...when I try to change it seems its changing the order of the data..
I want to change CCG and Pract to numbers without changing the order of the data...I tried hard but I couldn't do it.
I am struggling with this basic issue than writing Bugs codes....please help..
I have the following data
CCG pract Deno Numer Points Excep
1 01C N81049 49 46 4 4
2 01C N81022 28 26 4 23
3 01C N81632 66 64 4 4
4 01C N81069 15 14 4 3
5 01C N81062 98 89 4 9
6 01C N81033 31 28 4 9
I tried to change to integer using as.integer() and I am getting I am getting..
CCG pract Deno Numer Points Excep
1 20 6621 160 144 41 36
2 20 6594 130 117 41 18
3 20 6698 179 164 41 36
4 20 6640 57 46 41 25
5 20 6633 214 191 41 62
6 20 6605 137 119 41 62
By checking Deno and Numer it is clear the order of the data has been changed...Why CCG is not starting from 1?
I want
CCG pract Deno Numer Points Excep
1 01C N81049 49 46 4 4
2 01C N81022 28 26 4 23
3 01C N81632 66 64 4 4
4 01C N81069 15 14 4 3
5 01C N81062 98 89 4 9
6 01C N81033 31 28 4 9
change to something like this
CCG pract Deno Numer Points Excep
1 1 1 49 46 4 4
2 1 1 28 26 4 23
3 1 1 66 64 4 4
4 1 1 15 14 4 3
5 1 1 98 89 4 9
6 1 1 31 28 4 9
Please help me..
In R, factors are internally represented as integers, linking to a table of the factor levels. AFAIK, these internal integers are assigned based on a lexicographic order of the factor levels, so 57 gets a higher code than 238.
as.integer() will extract this internal integer coding. As you found out, this is not very useful. (I honestly don't understand why R does this when applying as.integer() to factors that have integers as factor levels.)
Solution: first convert to character, then to integer. as.integer(as.character(Deno))

Resources