Calculating group median from frequency table in R - r

I am trying to implement the data.table method described here: Calculating grouped variance from a frequency table in R.
I can successfully replicate their example. But when I apply it to my own data, nothing seems to happen. In particular, output is this:
table <- data.frame(districts,proportions,populations)
districts proportions populations
1: 24 0.8270270 1269
2: 26 0.8867925 1679
3: 12 0.9136691 510
4: 27 0.4220532 3274
5: 20 0.5457650 3644
8937: 1 0.7798072 3444
8938: 1 0.6080247 6128
8939: 1 0.4655172 4335
8940: 1 0.4813200 4297
8941: 1 0.7690167 3906
setDT(table)[, list(GroupMedian=as.double(median(rep(proportions, populations))),
TotalCount=sum(populations)) , by = districts]
##Same output as above###
I have no idea whats going on, after much time.


Grouping and building intervals of data in R and useful visualization

I have some data extracted via HIVE. In the end we are talking of csv with around 500 000 rows. I want to plot them after grouping them in intervals.
Beside the grouping it's not clear how to visualize the data. Since we are talking about low spends and sometimes a high frequency I'm not sure how to handle this problem.
Here is just an overview via head(data)
userid64 spend freq
575033023245123 0.00924205 489
12588968125440467 0.00037 2
13830962861053825 0.00168 1
18983461971805285 0.001500366 333
25159368164208149 0.00215 1
32284253673482883 0.001721303 222
33221593608613197 0.00298 709
39590145306822865 0.001785281 11
45831636009567401 0.00397 654
71526649454205197 0.000949978 1
78782620614743930 0.00552 5
I want to group the data in intervals. So I want an extra columns indicating the groups. The first group should contain all data with an frequency (called freq) between 1 and 100. The second group should contain all rows where there entries have a frequency between 101 and 200... and so on.
The result should look like
userid64 spend freq group
575033023245123 0.00924205 489 5
12588968125440467 0.00037 2 1
13830962861053825 0.00168 1 1
18983461971805285 0.001500366 333 3
25159368164208149 0.00215 1 1
32284253673482883 0.001721303 222 2
33221593608613197 0.00298 709 8
39590145306822865 0.001785281 11 1
45831636009567401 0.00397 654 7
71526649454205197 0.000949978 1 1
78782620614743930 0.00552 5 1
Is there a nice and gentle art to get this? I need this grouping for upcoming plots. I want to do visualization for all intervals to get an overview regarding the spend. If you have any ideas for the visualization please let me know. I thought I should work with boxplots.
If you want to group freq for every 100 units, you can try ceiling function in base R
ceiling(df$freq / 100)
#[1] 5 1 1 4 1 3 8 1 7 1 1
where df is your dataframe.

Why can't I see my full rbindlist result?

I used the rbindlist() function to try and merge two melted data frames (means_melt and means_melt_50). I'm wondering why it comes up with the break in the data? And whether I can use the whole list as I ultimately intend to create two graphs, each with 5 sets of data (grouped by variable), and using facet_grid(). I want the two graphs separated based on "Accuracy".
> compiled_means <- list(means_melt, means_melt_50)
> rbindlist(compiled_means, use.names = TRUE, fill=FALSE, idcol = NULL)
Divisions Accuracy variable value
1: 1 0 mean20 16
2: 2 0 mean20 20
3: 3 0 mean20 21
4: 4 0 mean20 17
5: 5 0 mean20 20
196: 16 50 mean_2 2
197: 17 50 mean_2 2
198: 18 50 mean_2 2
199: 19 50 mean_2 4
200: 20 50 mean_2 3
If anyone has a more efficient way for me to format the data so that it can be put in the graphs I want, I'm happy to hear suggestions. I'm not sure if the route I'm taking if effective or long-winded...
Simply a matter of preferences and options - by default the function shows a summary of data.tables that have >100 rows. The following direct print gives the full data table.
print(, nrows = Inf)

common dispersion in R

I try to use edge R package to analysis my data as following
This is my data frame called subdata
1 13707 13866 12193 12671 10178
2 0 0 0 0 1
3 7165 5002 1256 1341 2087
6 8537 16679 9042 9620 19168
10 19438 25234 15563 16419 16582
16 3 3 11 3 5
y=estimateGLMCommonDisp(y,design, verbose=TRUE)
I try to calculate the common dispersion to estimate up and down regulation genes but I get this error message as following
Warning message:
In estimateGLMCommonDisp.default(y = y$counts, design = design, :
No residual df: setting dispersion to NA
please could any one help me to dissolve this problem , I really appreciate for that

Replace values in one data frame from values in another data frame

I need to change individual identifiers that are currently alphabetical to numerical. I have created a data frame where each alphabetical identifier is associated with a number
individuals num.individuals (g4)
1 ZYO 64
2 KAO 24
3 MKU 32
4 SAG 42
What I need to replace ZYO with the number 64 in my main data frame (g3) and like wise for all the other codes.
My main data frame (g3) looks like this
1 2
2 2 EVA
4 2
5 SAG 2
6 2
Now on a small scale I can write a code to change it like I did with ATR
g3$ATR <- as.character(g3$ATR)
g3[g3$target == "ATR" | g3$ATR == "ATR","ATR"] <- 2
But this is time consuming and increased chance of human error.
I know there are ways to do this on a broad scale with NAs
I think maybe we could do a for loop for this, but I am not good enough to write one myself.
I have also been trying to use this function which I feel like may work but I am not sure how to logically build this argument, it was posted on the questions board here
Fast replacing values in dataframe in R
df <-, function(x){replace(x, x <0,0)})
I have tried to work my data into this by
df <-, function(g3){replace(x, x <0,0)})
Here is one approach using the data.table package:
First, create a reproducible example similar to your data:
ref <- data.table(individuals=1:4,num.individuals=c("ZYO","KAO","MKU","SAG"),g4=c(64,24,32,42))
g3 <- data.table(SAG=c("","SAG","","SAG"),KAO=c("KAO","KAO","",""))
Here is the ref table:
individuals num.individuals g4
1: 1 ZYO 64
2: 2 KAO 24
3: 3 MKU 32
4: 4 SAG 42
And here is your g3 table:
1: KAO
4: SAG
And now we do our find and replacing:
g3[ , lapply(.SD,function(x) ref$g4[chmatch(x,ref$num.individuals)])]
And the final result:
1: NA 24
2: 42 24
3: NA NA
4: 42 NA
And if you need more speed, the fastmatch package might help with their fmatch function:
g3[ , lapply(.SD,function(x) ref$g4[fmatch(x,ref$num.individuals)])]
1: NA 24
2: 42 24
3: NA NA
4: 42 NA

R script using consecutive rows with allowance frame

I am trying to create an R script that says, "make a new variable and, based on a previous variable 'scores,' put a 1 for ten consecutive 'scores' in which at least 8 of those 10 'scores' are at or above 1952"
How about this with zoo::rollapply()
#make dataframe with scores
require(zoo) # for rollapply() function
score newvar
25 2695 1
26 2750 1
30 2468 1
140 2525 1
141 2515 1
275 1989 1
