How to convert overlapping ranges to contiguous ranges? - r

I have overlapping ranges like this
data
G S E
o 10 15
o 13 20
r 20 28
r 25 33
I am trying to convert this into table shown below.
G S E
o 10 12
o 13 19
r 20 24
r 25 33
I have been trying use ifelse condition and data.table shift to access values from next row for comparison, but did not succeed yet. Any suggestion will be greatly appreciated.

Related

Show only even numbers from a data set

I am trying to extract only the even numbers from the "cars" data set.
I know I need to create a new function.
I have come this far:
Is.even = function(x) x %% 2 == 0
When I enter in:
Is.even(cars[1])
It gives me back a logical response. I want to only display the actual even numbers in integer form and hide the odd numbers.
What am I doing wrong?
Apart from #neilfws' suggestion, if you pass your values as a vector you can also use Filter
Filter(Is.even, cars[, 1])
#[1] 4 4 8 10 10 10 12 12 12 12 14 14 14 14 16 16 18 18 18 18 20 20 20 20 20 22 24 24 24 24

Multiple boxplots in R while grouping a matrix both by columns and rows

I am having trouble trying to figure out how to make a single figure containing multiple boxplots in R, while grouping my data frame/ matrix both by columns and rows.
I have a data frame in R with 10 rows and 500 columns. The columns are separated into 2 groups(factors - 1's and 2's) and now I want to have a single figure containing two boxplots for each row of my data frame subject to the column groups.
Ex.
M1 N2 O1 P2 Q1 R2 # [The 1's and 2's refer to my two column groups]
A 10 11 12 13 14 15
B 15 14 13 12 11 10
C 20 21 22 23 24 25
D 25 24 23 22 21 20
So for the above example I would like to have a single figure with "4 boxplots pairs" for each row such that each boxplot pair will represent values corresponding to 1's and 2's factors of my column.
Thanks in Advance !!!
Here on idea using reshape2. Since You have more columns than rows, it natural to work on the transpose.
library(ggplot2)
library(reshape2)
dt <- read.table(text='
M1 N2 O1 P2 Q1 R2
A 10 11 12 13 14 15
B 15 14 13 12 11 10
C 20 21 22 23 24 25
D 25 24 23 22 21 20',header=TRUE)
dt.m <- melt(t(dt))
dt.m$Var1 <- gsub('[A-Z]','',dt.m$Var1)
Here 2 options to plot :
library(ggplot2)
library(gridExtra)
p1 <- ggplot(dt.m) +
geom_boxplot(aes(x=Var2,y=value,fill=Var1))
p2 <- ggplot(dt.m) +
geom_boxplot(aes(x=Var2,y=value,fill=Var2))+
facet_grid(~Var1)
grid.arrange(p1,p2)

R: Doing calculations on multiple factors/levels (Dummy variables)

I have two equally long matching vectors of time series data: Price (x) and hour (h). Hour goes from 0-23. My hour variable is my dummy variable (or factor/level variable I guess it is called in R).
Right now i've defined 24 different dummy variables, and for each hour I type my dummy variable. So for example generating 24 plots to look at or calculate 24 means etc I would type:
plot.ts(hour1) # and so on for all 24.
I would like to do this for all 24 variables as easily as possible? So I can run a lot of different calculations. For example, how could I just compute the mean for all 24 dummy variables without making 24 lines of code, changing each dummy variable?
EDIT: Sorry, thought it was clear with the two vectors. Example:
1. Price Hour
2. 8 0
3. 12 1
4. 14 2
5. 16 3
6. 18 4
7. 20 5
8. 22 6
9. 24 7
10. 26 8
11. 28 9
12. 24 10
13. 26 11
14. 23 12
15. 23 13
16. 23 14
17. 14 15
18. 19 16
19. 25 17
20. 26 18
21. 28 19
22. 30 20
23. 33 21
24. 24 22
25. 10 23
26. 14 0
27. 12 1
28. 13 2
29. x ect.
It is not clear how your data are stored since you don't give a reproducible example. I assume you have separate variables for each hour1.
Generally, It is better to put your hourxx variable in a list to perform calculations.
For example, this will compute mean for all hours:
lapply(lapply(ls(pattern='hour.*'),get),mean)
EDIT after OP clarification:
You shuld create a new variable to distinguish between Hours intervals. Something like :
dat <- data.frame(Price=rnorm(24*5),Hour=rep(0:23,5))
dat$id <- cumsum(c(0,diff(dat$Hour)==-23))
Then using ply package for example , you can compute mean by id:
library(plyr)
ddply(dat,.(id),summarise,mPrice=mean(Price))
id mPrice
1 0 0.2999602
2 1 -0.2201148
3 2 0.2400192
4 3 -0.2087594
5 4 0.1666915

Is there a way to "auto-name" expression in J

I have a few questions/suggestions concerning data.table.
R) X = data.table(x=c("q","q","q","w","w","e"),y=1:6,z=10:15)
R) X[,list(sum(y)),by=list(x)]
x V1
1: q 6
2: w 9
3: e 6
I think it is too bad that one has to write
R) X[,list(y=sum(y)),by=list(x)]
x y
1: q 6
2: w 9
3: e 6
It should default to keeping the same column name (ie: y) where the function calls only one column, this would be a massive gain in most of the cases, typically in finance as we usually look as weighted sums or last time or...
=> Is there any variable I can set to default to this behaviour ?
When doing a selectI might want to do a calculus on few columns and apply another operation for all other columns.
I mean too bad that when I want this:
R) X = data.table(x=c("q","q","q","w","w","e"),y=1:6,z=10:15,t=20:25,u=30:35)
R) X
x y z t u
1: q 1 10 20 30
2: q 2 11 21 31
3: q 3 12 22 32
4: w 4 13 23 33
5: w 5 14 24 34
6: e 6 15 25 35
R) X[,list(y=sum(y),z=last(z),t=last(t),u=last(u)),by=list(x)] #LOOOOOOOOOOONGGGG
#EXPR
x y z t u
1: q 6 12 22 32
2: w 9 14 24 34
3: e 6 15 25 35
I cannot write it like...
R) X[,list(sum(y)),by=list(x),defaultFn=last] #defaultFn would be
applied to all remaniing columns
=> Can I do this somehow (may be setting an option)?
Thanks
On part 1, that's not a bad idea. We already do that for expressions in by, and something close is already on the list for j :
FR#2286 Inferred naming could apply to j=colname[...]
Find max per group and return another column
But if we did do that it would probably need to be turned on via an option, to maintain backwards compatibility. I've added a link in that FR back to this question.
On the 2nd part how about :
X[,c(y=sum(y),lapply(.SD,last)[-1]),by=x]
x y z t u
1: q 6 12 22 32
2: w 9 14 24 34
3: e 6 15 25 35
Please ask multiple questions separately, though. Each question on S.O. is supposed to be a single question.

How to reorder a column in a data frame to be the last column

I have a data frame where columns are constantly being added to it. I also have a total column that I would like to stay at the end. I think I must have skipped over some really basic command somewhere but cannot seem to find the answer anywhere. Anyway, here is some sample data:
x=1:10
y=21:30
z=data.frame(x,y)
z$total=z$x+z$y
z$w=11:20
z$total=z$x+z$y+z$w
When I type z I get this:
x y total w
1 1 21 33 11
2 2 22 36 12
3 3 23 39 13
4 4 24 42 14
5 5 25 45 15
6 6 26 48 16
7 7 27 51 17
8 8 28 54 18
9 9 29 57 19
10 10 30 60 20
Note how the total column comes before the w, and obviously any subsequent columns. Is there a way I can force it to be the last column? I am guessing that I would have to use ncol(z) somehow. Or maybe not.
You can reorder your columns as follows:
z <- z[,c('x','y','w','total')]
To do this programmatically, after you're done adding your columns, you can retrieve their names like so:
nms <- colnames(z)
Then you can grab the ones that aren't 'total' like so:
nms[nms!='total']
Combined with the above:
z <- z[, c(nms[nms!='total'],'total')]
You have a logic issue here. Whenever you add to a data.frame, it grows to the right.
Easiest fix: keep total a vector until you are done, and only then append it. It will then be the rightmost column.
(For critical applications, you would of course determine your width k beforehand, allocate k+1 columns and just index the last one for totals.)

Resources