I have a dataset which is a function of time and frequency. When I melt it I want to retain actual time (date) and frequency values as I want a 2d plot with y axis as frequency and x axis as time.
I tried to retain a column of desired axis values but melting makes it factor and stat_contour throws error.
My data is some thing like the following
a = data.frame(date=time,power=power)
names(a) = c('date',period)
where period is
[1] 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
[23] 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
[45] 8 8 8 8 8 8 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16
[67] 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16
[89] 16 16 16 16 16 16 16 16 16 16 16 16 32 32 32 32 32 32 32 32 32 32
[111] 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32
[133] 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 64 64 64 64
[155] 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64
[177] 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64
[199] 64 64 128 128 128 128 128 128 128 128 128 128 128 128 128 128 128 128 128 128 128 128
[221] 128 128 128 128 128 128 128 128 128 128 128 128 128 128 128 128 128 128 128 128 128 128
[243] 128 128 128 128 128 128 128 128 256
power = melt(a,id.vars = 'date')
date period power
1 850-01-01 8 0.05106766
2 851-01-01 8 0.05926821
3 852-01-01 8 0.06783015
4 853-01-01 8 0.07681627
5 854-01-01 8 0.08636516
6 855-01-01 8 0.09667054
ggplot(power, aes(x = date, y = period, z = power)) +
stat_contour()
this gives an error as period column is a factor; if I make it numeric I loose the exact Y axis labels. Is there any workaround?
thanks
You did not provide a reproducible example, but in principle you have to change the axis labels manually:
library(ggplot2)
d <- expand.grid(x = 1:100, y = paste("P", 1:100))
d$z <- rnorm(10000)
ggplot(d, aes(x, as.numeric(y), z = z)) +
geom_contour() +
scale_y_continuous(breaks = seq_along(levels(d$y)), breaks = levels(d$y))
However, I find it a bit difficult to really understand what a contour plot with one axis being a qualitative should represent. My guess is that your y-axis is a factor because of the melt operation and that there is some column wrongly specified as factor and hence the overall column is a factor when in fact it should be a a numeric (that is it shows 8, should represent 8 but is actually a factor with level 8).
If this is the case, a normal as.numeric does not work as expected b/c it maps simply the first level to 1 the second level to 2 and so on. But you want it to be mapped to the numeric of the string representation. In this case something like this should work:
library(ggplot2)
d <- expand.grid(x = 1:100, y = sample(1000, 100)) # y should be a numeric but is a factor
d$z <- rnorm(10000)
ggplot(d, aes(x, as.numeric(as.character(y)), z = z)) +
geom_contour()
This all is, however, guesswork, because you did not provide us with a full reproducible example. Please read and follow these gudieline in the future when posting questions here how to ask a good question and how to give a reproducible example
Related
I am using a code based on Deseq2. One of my goals is to plot a heatmap of data.
heatmap.data <- counts(dds)[topGenes,]
The error I am getting is
Error in counts(dds)[topGenes, ]: subscript out of bounds
the first few line sof my counts(dds) function looks like this.
99h1 99h2 99h3 99h4 wth1 wth2
ENSDARG00000000002 243 196 187 117 91 96
ENSDARG00000000018 42 55 53 32 48 48
ENSDARG00000000019 91 91 108 64 95 94
ENSDARG00000000068 3 10 10 10 30 21
ENSDARG00000000069 55 47 43 53 51 30
ENSDARG00000000086 46 26 36 18 37 29
ENSDARG00000000103 301 289 289 199 347 386
ENSDARG00000000151 18 19 17 14 22 19
ENSDARG00000000161 16 17 9 19 10 20
ENSDARG00000000175 10 9 10 6 16 12
ENSDARG00000000183 12 8 15 11 8 9
ENSDARG00000000189 16 17 13 10 13 21
ENSDARG00000000212 227 208 259 234 78 69
ENSDARG00000000229 68 72 95 44 71 64
ENSDARG00000000241 71 92 67 76 88 74
ENSDARG00000000324 11 9 6 2 8 9
ENSDARG00000000370 12 5 7 8 0 5
ENSDARG00000000394 390 356 339 283 313 286
ENSDARG00000000423 0 0 2 2 7 1
ENSDARG00000000442 1 1 0 0 1 1
ENSDARG00000000472 16 8 3 5 7 8
ENSDARG00000000476 2 1 2 4 6 3
ENSDARG00000000489 221 203 169 144 84 114
ENSDARG00000000503 133 118 139 89 91 112
ENSDARG00000000529 31 25 17 26 15 24
ENSDARG00000000540 25 17 17 10 28 19
ENSDARG00000000542 15 9 9 6 15 12
How do I ensure all the elements of the top genes are present in it?
When I try to see 20 top genes in the dataset. it looks like a list of genes
6339" "12416" "1241" "3025" "12791" "846" "15090"
[8] "6529" "14564" "4863" "12777" "1122" "7454" "13716"
[15] "5790" "3328" "1231" "13734" "2797" "9072" with the column head V1.
I have used both
topGenes <- read.table("E://mir99h50 Cheng data//topGenesresordered.txt",header = TRUE)
and
topGenes <- read.table("E://mir99h50 Cheng data//topGenesresordered.txt",header = FALSE)
to see if the out of bounds error is removed. However it was of no use. I guess the V1 head is causing the issue.
The top genes function has been generated using the above code snippet.
resordered <- res[order(res$padj),]
#Reorder gene list by increasing pAdj
resordered <- as.data.frame(res[order(res$padj),])
#Filter for genes that are differentially expressed with an FDR < 0.01
ii <- which(res$padj < 0.01)
length(ii)
# Use the rownames() function to get the top 20 differentially expressed genes from our results table
topGenes <- rownames(resordered[1:20,])
topGenes
# Get the counts from the DESeqDataSet using the counts() function
heatmap.data <- counts(dds)[topGenes,]
Perhaps this will do what you want?
counts_dds <- counts(dds)
topgenes <- c("ENSDARG00000000002", "ENSDARG00000000489", "ENSDARG00000000503",
"ENSDARG00000000540", "ENSDARG00000000529", "ENSDARG00000000542")
heatmap.data <- counts_dds[rownames(counts_dds) %in% topgenes,]
If you provide more information it will be easier to advise you on how to fix your problem.
This shouldn't be too hard, but I always have issues when tying to run calculations on a column in a dataframe that relies on the value of a another column in the data frame. Here is my data.frame
stream reach length.km length.m total.sa pools.sa
1 Stream Reach_Code 109 109 1 1
2 Brooks BRK_001 17 14 108 13
3 Brooks BRK_002 15 12 99 9
4 Brooks BRK_003 24 21 94 95
5 Brooks BRK_004 32 29 97 33
6 Brooks BRK_005 27 24 92 79
7 Brooks BRK_006 26 23 95 6
8 Brooks BRK_007 16 13 77 15
9 Brooks BRK_008 29 26 84 26
10 Brooks BRK_009 18 15 87 46
11 Brooks BRK_010 23 20 88 47
12 Brooks BRK_011 22 19 91 40
13 Brooks BRK_012 30 27 98 37
14 Brooks BRK_013 25 22 93 29
19 Buncombe_Hollow BNH_0001 7 4 75 65
20 Buncombe_Hollow BNH_0002 8 5 66 21
21 Buncombe_Hollow BNH_0003 9 6 68 53
22 Buncombe_Hollow BNH_0004 19 16 81 11
23 Buncombe_Hollow BNH_0005 6 3 65 27
24 Buncombe_Hollow BNH_0006 13 10 63 23
25 Buncombe_Hollow BNH_0007 12 9 71 57
I would like to calculate the mean of a column (lets say length.m) where stream = Brooks and then do the same thing for stream = Buncombe_Hollow. I actually have 17 different stream names, and plan on calculating the mean of some column for each stream. I will then store these means as a vector, and bind them to another vector of the stream names, so the end result is something like this
stream truevalue
1 Brooks 0.9440620
2 Siouxon 0.5858527
3 Speelyai 0.5839844
Thanks!
try using aggregate:
# Generate some data to use
someDf <- data.frame(stream = rep(c("Brooks", "Buncombe_Hollow"), each = 10),
length.m = rpois(20, 4))
# Calculate the means with aggregate
with(someDf, aggregate(list(truevalue = length.m), list(stream = stream), mean))
The reason for the "list" bits is to specifically name the columns in the (data frame) output
Start using the dplyr package. It makes such calculations quick as well as very easy to write
library(dplyr)
result <- data %>% group_by(stream) %>% summarize(truevalue = mean(length.m))
I have a dataset in a given format:
USER.ID avgfrequency
1 3 3.7821782
2 7 14.7500000
3 9 13.4761905
4 13 5.1967213
5 16 6.7812500
6 26 41.7500000
7 49 13.6666667
8 50 7.0000000
9 51 1.0000000
10 52 17.7500000
11 69 4.5000000
12 75 9.9500000
13 91 84.2000000
14 98 8.0185185
15 138 14.2000000
16 139 34.7500000
17 149 7.6666667
18 155 35.3333333
19 167 24.0000000
20 170 7.3529412
21 171 4.4210526
22 175 6.5781250
23 176 19.2857143
24 177 10.4864865
25 178 28.0000000
26 180 4.8461538
27 183 25.5000000
28 184 13.0000000
29 210 32.0000000
30 215 13.4615385
31 220 11.3611111
32 223 26.2500000
I want to first sort the dataset by avgfrequency and then I want to plot count of USER.ID's that fall under different bin categories.
I want to divide avgfrequency into different bin categories of width 10.
I am trying to sort data using:
user_avgfrequency <- user_avgfrequency[order(user_avgfrequency[,1]), ]
but getting an error.
df <- data.frame(USER.ID=c(3,7,9,13,16,26,49,50,51,52,69,75,91,98,138,139,149,155,167,170,171,175,176,177,178,180,183,184,210,215,220,223), avgfrequency=c(3.7821782,14.7500000,13.4761905,5.1967213,6.7812500,41.7500000,13.6666667,7.0000000,1.0000000,17.7500000,4.5000000,9.9500000,84.2000000,8.0185185,14.2000000,34.7500000,7.6666667,35.3333333,24.0000000,7.3529412,4.4210526,6.5781250,19.2857143,10.4864865,28.0000000,4.8461538,25.5000000,13.0000000,32.0000000,13.4615385,11.3611111,26.2500000) );
breaks <- seq(0,ceiling(max(df$avgfrequency)/10)*10,10);
cols <- colorRampPalette(c('blue','green','red'))(length(breaks)-1);
hist(df$avgfrequency,breaks,col=cols,axes=F,xlab='Average Frequency',ylab='Count');
axis(1,breaks);
axis(2,0:max(tabulate(cut(df$avgfrequency,breaks))));
maybe it is a very easy question. This is my data.frame:
> read.table("text.txt")
V1 V2
1 26 22516
2 28 17129
3 30 38470
4 32 12920
5 34 30835
6 36 36244
7 38 24482
8 40 67482
9 42 23121
10 44 51643
11 46 61064
12 48 37678
13 50 98817
14 52 31741
15 54 74672
16 56 85648
17 58 53813
18 60 135534
19 62 46621
20 64 89266
21 66 99818
22 68 60071
23 70 168558
24 72 67059
25 74 194730
26 76 278473
27 78 217860
It means that I have 22516 sequences with length 26, 17129 sequences with length 28, etc. I would like to know the sequence length mean and its standard deviation. I know how to do it, but I know to do it creating a list full of 26 repeated 22516 times and so on... and then compute the mean and SD. However, I thing there is a easier method. Any idea?
Thanks.
For mean: (V1 %*% V2)/sum(V2)
For SD: sqrt(((V1-(V1 %*% V2)/sum(V2))**2 %*% V2)/sum(V2))
I do not find mean(rep(V1,V2)) # 61.902 and sd(rep(V1,V2)) # 14.23891 that complex, but alternatively you might try:
weighted.mean(V1,V2) # 61.902
# recipe from http://www.ltcconline.net/greenl/courses/201/descstat/meansdgrouped.htm
sqrt((sum((V1^2)*V2)-(sum(V1*V2)^2)/sum(V2))/(sum(V2)-1)) # 14.23891
Step1: Set up data:
dat.df <- read.table(text="id V1 V2
1 26 22516
2 28 17129
3 30 38470
4 32 12920
5 34 30835
6 36 36244
7 38 24482
8 40 67482
9 42 23121
10 44 51643
11 46 61064
12 48 37678
13 50 98817
14 52 31741
15 54 74672
16 56 85648
17 58 53813
18 60 135534
19 62 46621
20 64 89266
21 66 99818
22 68 60071
23 70 168558
24 72 67059
25 74 194730
26 76 278473
27 78 217860",header=T)
Step2: Convert to data.table (only for simplicity and laziness in typing)
library(data.table)
dat <- data.table(dat.df)
Step3: Set up new columns with products, and use them to find mean
dat[,pr:=V1*V2]
dat[,v1sq:=as.numeric(V1*V1*V2)]
dat.Mean <- sum(dat$pr)/sum(dat$V2)
dat.SD <- sqrt( (sum(dat$v1sq)/sum(dat$V2)) - dat.Mean^2)
Hope this helps!!
MEAN = (V1*V2)/sum(V2)
SD = sqrt((V1*V1*V2)/sum(V2) - MEAN^2)
Does anyone knows formula for calculating GSM network coverage into percents (0 .. 100) from rssi? It should be safe for 8bit AVR microcontroller CPU, without hardcore math operations like log or division by something that's not 2^n (bitshift is preferred). Creating array with 32 possible percentage values is poor solution.
possible rssi values (0..31 is valid values):
0 -113 dBm or less
1 -111 dBm
2...30 -109... -53 dBm
31 -51 dBm or greater
99 not known or not detectable
Approximate values i want:
RSSI %
0 0
1 3
2 6
3 10
4 13
5 16
6 19
7 23
8 26
9 29
10 32
11 36
12 39
13 42
14 45
15 48
16 52
17 55
18 58
19 61
20 65
21 68
22 71
23 74
24 78
25 81
26 84
27 87
28 90
29 94
30 97
31 100
99 ?
I'm out of ideas, so please advise me! Thanks for Your time!
(rssi * 827 + 127) >> 8
Multiply by 827, add 127 to simulate rounding to nearest, then drop the 8 low order bits, all in integer arithmetic.
This is unfortunately not a simple task to do if you want it to be accurate.
This article best explains the complexities of the task:
https://www.adriangranados.com/blog/dbm-to-percent-conversion
Without involving floats: RSSI*3+3 will miss high and low values, but will be OK mid-range. If accuracy at high values is more important add more than 3 and vice versa.