Consider the following data, where the left column represents a bit (1 or 0), and the right column represents the number of microseconds that we observe the bit.
0 664
1 63
0 404
1 544
0 651
1 686
0 507
1 1155
0 664
1 271
0 456
1 2763
0 664
1 115
0 456
1 4010
0 664
1 63
0 351
1 3855
I would like to plot this data such that there is a horizontal line at 0 with a width of 664, followed by a rise to a horizontal line at 1 with a width of 63, followed by a fall to a horizontal line at 0 with a width of 404, and so on.
Is there an efficient and direct way to plot this in R that does not involve manual comparison against bounds?
Here is my current code for doing this which is extremely inefficient and naive, so I hope there is a better way.
args <- commandArgs(trailingOnly = TRUE)
data = read.table(args[1])
current = 1
sumA = 0
pf = function(x) {
if (x < sumA) {
return(data[current,1])
}
for (i in current: length(data[,1])) {
sumA <<- sumA + data[i,2]
if (x < sumA) {
current <<- i + 1
return(data[i,1])
}
}
return("OUT OF BOUNDS")
}
cumSum = colSums(data)[[2]]
print(cumSum - 1);
h = Vectorize(pf)
plot(h, 1, cumSum-1, n=cumSum-1, lwd=0.001, xlim=c(0,cumSum-1))
As mentioned in my comment, plot command with type flag set to s should do the trick.
E.g., for you first 10 samples:
x <- c(0,664,63,404,544,651,686,507,1155,664,271)
xC <- cumsum(x)
y <- c(0,1,0,1,0,1,0,1,0,1,0)
plot(xC,y,type='s')
Related
I would like to produce nested tables for a multilevel factorial experiment. I have 10 paints examined for time to reach an end point under 4 levels of humidity, 3 temperatures and 2 wind speeds. Of course I have searched on line but without success.
Some sample code can be generated using:
## Made Up Data # NB the data is continuous whereas observations were made 40/168 so data is censored.
time3 <- 4*seq(1:24) # Dependent: times in hrs, runif is not really representative but will do
wind <- c(1,2) # Independent: factor draught on or off
RH <- c(0,35,75,95) # Independent: value for RH but can be processes as a factor
temp <- c(5,11,20) # Independent: value for temperature but can be processed as a factor
paint <- c("paintA", "paintB", "paintC") # Independent: Experimental material
# Combine into dataframe
dfa <- data.frame(rep(temp,8))
dfa$RH <- rep(RH,6)
dfa$wind <- rep(wind,12)
dfa$time3 <- time3
dfa$paint <- rep(paint[1],24)
# Replicate for different paints
dfb <- dfa
dfb$paint <- paint[2]
dfc <- dfa
dfc$paint <- paint[3]
dfx <- do.call("rbind", list(dfa,dfb,dfc))
# Rename first col
colnames(dfx)[1] <- "temp"
# Prepare xtab tables
tx <- xtabs(dfx$time3 ~ dfx$wind + dfx$RH + dfx$temp + dfx$paint)
tx
And the target I hope to obtain would be like this xtab example
This
tx <- xtabs(dfx$time3 ~ dfx$wind + dfx$RH + dfx$temp)
does not work well enough. I would also like to write to C:\file.csv for printing and reporting etc. Please advise on how to achieve the desired output.
You can paste the two variables you want to nest together. Since the items will be ordered lexicographically, you will need to zero-pad the temp variable, to get numerical ordering.
xtabs(time3~wind+paste(sprintf("%02d",temp),RH,sep=":")+paint,dfx)
, , paint = paintA
paste(sprintf("%02d", temp), RH, sep = ":")
wind 05:0 05:35 05:75 05:95 11:0 11:35 11:75 11:95 20:0 20:35 20:75 20:95
1 56 0 104 0 88 0 136 0 120 0 72 0
2 0 128 0 80 0 64 0 112 0 96 0 144
, , paint = paintB
paste(sprintf("%02d", temp), RH, sep = ":")
wind 05:0 05:35 05:75 05:95 11:0 11:35 11:75 11:95 20:0 20:35 20:75 20:95
1 56 0 104 0 88 0 136 0 120 0 72 0
2 0 128 0 80 0 64 0 112 0 96 0 144
, , paint = paintC
paste(sprintf("%02d", temp), RH, sep = ":")
wind 05:0 05:35 05:75 05:95 11:0 11:35 11:75 11:95 20:0 20:35 20:75 20:95
1 56 0 104 0 88 0 136 0 120 0 72 0
2 0 128 0 80 0 64 0 112 0 96 0 144
I have a data frame as shown below which has around 130k data values.
Eng_RPM Veh_Spd
340 56
450 65
670 0
800 0
890 0
870 0
... ..
800 0
790 0
940 0
... ...
1490 67
1540 78
1880 81
I need to have another variable called Idling Count which increments the value when ever it finds value in Eng_RMP > = 400 and Veh_Spd ==0 , the condition is the counter has to start after 960 Data points from the data point which has satisfied the condition, also the above mentioned condition should not be applicable for the first 960 data points as shown below
Expected Output
Eng_RPM Veh_Spd Idling_Count
340 56 0
450 65 0
670 0 0
... ... 0 (Upto first 960 values)
600 0 0(The Idling time starts but counter should wait for another 960 values to increment the counter value)
... ... 0
800 0 1(This is the 961st Values after start of Idling time i.e Eng_RPM>400 and Veh_Spd==0)
890 0 2
870 0 3
... .. ..
800 1 0
790 2 0
940 3 0
450 0 0(Data point which satisfies the condition but counter should not increment for another 960 values)
1490 0 4(961st Value from the above data point)
1540 0 5
1880 81 0
.... ... ... (This cycle should continue for rest of the data points)
Here is how to do with data.table (not using for which is known to be slow in R).
library(data.table)
setDT(df)
# create a serial number for observation
df[, serial := seq_len(nrow(df))]
# find series of consective observations matching the condition
# then create internal serial id within each series
df[Eng_RPM > 400 & Veh_Spd == 0, group_serial:= seq_len(.N),
by = cumsum((serial - shift(serial, type = "lag", fill = 1)) != 1) ]
df[is.na(group_serial), group_serial := 0]
# identify observations with group_serial larger than 960, add id
df[group_serial > 960, Idling_Count := seq_len(.N)]
df[is.na(Idling_Count), Idling_Count := 0]
you can do this by for cycle like this
Creating sample data and empty column Indling_Cnt
End_RMP <- round(runif(1800,340,1880),0)
Veh_Spd <- round(runif(1800,0,2),0)
dta <- data.frame(End_RMP,Veh_Spd)
dta$Indling_Cnt <- rep(0,1800)
For counting in Indling_Cnt you can use forcycle with few if conditions, this is probably not most efficient way to do it, but it should work. There are better and yet more complex solutions. For example using packages as data.table as mentioned in other answers.
for(i in 2:dim(dta)[1]){
n <- which(dta$End_RMP[-(1:960)]>=400&dta$Veh_Spd[-(1:960)]==0)[1]+960+960
if(i>=n){
if(dta$End_RMP[i]>=400&dta$Veh_Spd[i]==0){
dta$Indling_Cnt[i] <- dta$Indling_Cnt[i-1]+1
}else{
dta$Indling_Cnt[i] <- dta$Indling_Cnt[i-1]
}
}
}
I have a Df like this:
x y z
<dbl> <dbl> <dbl>
1 408001.9 343 0
2 407919.2 343 0
3 407839.6 343 0
4 407761.2 343 0
5 407681.7 343 0
6 407599.0 343 0
7 407511.0 343 0
8 407420.5 343 0
9 407331.0 343 0
10 407242.0 343 0
11 407152.7 343 0
12 407062.5 343 0
13 406970.7 343 0
14 406876.6 342 0
15 406777.1 342 0
16 406671.0 342 0
17 406560.9 342 0
18 406449.4 342 0
19 406339.0 342 0
20 406232.5 342 0
... ... ... ...
with x decreasing.
And a vector like
vec=(a1, a2, a3, a4, a5, a6, ...)
with a1< a2< a3< a4...
Now I want to divide df$x by vec[1], what will give the same result (rounded) as for df$y.
But now, when the value in df$z drops by one to 342, I want to divide the value in df$x by vec[2] from then on, to get the new df$z values.
From here the result will be different from df$y, as for df$y the number to divide with is allways vec[1]and will not change
Every time the value I get for df$z drops by one, the next values for df$z shal be calculated with the corresponding vec[i] where i is the number of drops+1 so far
In the end I want a vector df$z, where the values are df$x / vec[i], where vec [i] depends on, what the last number of df$z is.
reproducible example:
test <- data.frame(x = sort((seq(500, 600, 2)), decreasing = T)
)
vec <- seq(10, 10.9, 0.03)
for(i in 1:31){
test[i+1] <- round(test$x/vec[i])
}
This will give you a df with one col for every value of vec, that test$x got divided by.
Now, in the end, my vector shall contain the values of col2 until the value in col2 drops from 60 to 59. Afterwards I want the values from col3 until the value in col3 drops below 59 to 58. Then I want the values from col4 and so on.
How can I achive this with any data(like mine above, which is not linear ditributed as this example.)
I tried some for and while loops, but none worked. I didn't even get close to what I want.
I think my problem is that I dont know how to make the condition depenent on a value(the value of df$z at point i), that I want to calculate in the same operation. I want to calculate the value of df$z[i] with the value of vec[t], that has been used so far. But if the value of df$z drops by one at a certain observation[i], the value of vec[t+1] shall be used for the division from then on.
Thanks for your help.
I hope I've understood what you are asking. This might be it...
test <- data.frame(x = sort((seq(500, 600, 2)), decreasing = T)
vec <- seq(10, 10.9, 0.03)
#this function determines the index of `vec` to use
xcol<-function(v){
x<-rep(NA,length(v))
x[1] <- 1
for(i in 2:length(v)){
x[i] <- x[i-1]
if(round(v[i]/vec[x[i]])<round(v[i-1]/vec[x[i]])){
x[i] <- x[i]+1
}
}
return(x)
}
test$xcol <- xcol(test$x)
test$z <- round(test$x/vec[test$xcol])
test
x xcol z
1 600 1 60
2 598 1 60
3 596 1 60
4 594 2 59
5 592 2 59
6 590 2 59
7 588 2 59
8 586 3 58
9 584 3 58
10 582 3 58
11 580 3 58
12 578 4 57
...
I'm trying to use an if else statement to create a new column of binary data in my data frame, but what I get is all zeros...
command:
for(i in 1:nrow(asort)){
if(asort$recip==0 && asort$dist<.74){
asort$temp[i]<-0
} else{
asort$temp[i]<-1
}
}
#temp ends up being all 0's
In addition, I would actually like to ask something along the lines of this:
# if the data in the recip column = 0, and the distances is < 0.74, OR if the #data is greater than 1.85 give me a zero, else 1
for(i in 1:nrow(asort)){
if(asort$recip==0 && asort$dist<.74 || asort$dist>1.85){
asort$temp[i]<-0
} else{
asort$temp[i]<-1
}
}
> head(asort)
coordinates CLASS_ID Flight UFID dist nnid nnid2 observed recip temp
157 (285293.3, 4426017) 0 F4_ F4_156 0.3857936 158 F4_157 0 0 0
158 (285293.2, 4426017) 0 F4_ F4_157 0.3857936 157 F4_156 0 0 0
259 (285255, 4426014) 0 F4_ F4_258 0.5226039 261 F4_260 1 0 0
261 (285255, 4426014) 0 F4_ F4_260 0.5226039 259 F4_258 1 0 0
402 (285275.3, 4426004) 0 F4_ F4_401 0.5598427 403 F4_402 1 0 0
403 (285275.6, 4426004) 0 F4_ F4_402 0.5598427 402 F4_401 1 0 0
Using df data.frame
dist <- runif(10, 0.3, 2)
recip<- c(0,1,1,0,1,0,1,0,0,1)
df <- data.frame(dist, recip)
and ifelse
df$temp<-ifelse(df$dist < 0.74 & df$recip == 0 , 0,
ifelse(df$dist > 1.85 & df$recip == 0, 0, 1))
> head(df)
# dist recip temp
#1 1.1878002 0 1
#2 0.4147835 1 1
#3 1.3707311 1 1
#4 0.9008034 0 1
#5 1.0220149 1 1
#6 1.9384069 0 0
I am trying to evaluate if a price, price(k), in a given row,(k), is equal to the one above, price(k-1). If it is I want to sum the volume from the prior and the price in question, volume(k)+volume(k+1), and then remove the row with the duplicate price, row k.
I have the following repeat loop which I am applying to a large dataset looking to delete repeated values.
k <- 1
repeat{
if( Prices$Price[ k + 1 ] == Prices$Price[ k ] ){
Prices$CumVolume[ k + 1 ] <- Prices$CumVolume[ k + 1 ] + Prices$CumVolume[ k ]
Prices <- Prices[ -k , ]
k <- k + 1
if( k > nrow( Prices ) ) break
}
}
The loop is very slow and I was wondering if there are ways to speed it up. Unfortunately I am relatively new to R and am having difficulty working out the best way to go about this.
Also is there a way in R to observe the iteration the loop is currently up too? i.e. have it displayed in the workspace on each iteration?
Example data:
Date Time Price CumVolume Ret MeanRet VolRet
26 01-JAN-2009 21:30:01.783 96.660 537 0 0 0
31 01-JAN-2009 21:30:58.041 96.650 78 0 0 0
33 01-JAN-2009 21:34:09.589 96.640 60 0 0 0
35 01-JAN-2009 21:34:10.879 96.640 40 0 0 0
37 01-JAN-2009 21:35:55.001 96.635 50 0 0 0
It appears you want something like this:
DF <- read.table(text=" Date Time Price CumVolume Ret MeanRet VolRet
26 01-JAN-2009 21:30:01.783 96.660 537 0 0 0
31 01-JAN-2009 21:30:58.041 96.650 78 0 0 0
33 01-JAN-2009 21:34:09.589 96.640 60 0 0 0
35 01-JAN-2009 21:34:10.879 96.640 40 0 0 0
37 01-JAN-2009 21:35:55.001 96.635 50 0 0 0", header=TRUE)
#create a run id
DF$runs <- cumsum(c(TRUE, diff(DF$Price) != 0))
#sum per each price run
DF$CCVolume <- with(DF, ave(CumVolume, runs, FUN=sum))
#remove duplicated prices
DF[!duplicated(DF$Price), ]
# Date Time Price CumVolume Ret MeanRet VolRet runs CCVolume
#26 01-JAN-2009 21:30:01.783 96.660 537 0 0 0 1 537
#31 01-JAN-2009 21:30:58.041 96.650 78 0 0 0 2 78
#33 01-JAN-2009 21:34:09.589 96.640 60 0 0 0 3 100
#37 01-JAN-2009 21:35:55.001 96.635 50 0 0 0 4 50
I think your code is going in infinite loop because of your increment index.K=k+1 and Break is always within the condition,I hope you want this
k=1
z=unique(Prices$Price)
for(i in 1:length(z))
{
dupindex=which(z[i]==Prices$Price)
Prices$CumVolume[tail(dupindex,n=1)]=sum(Prices$CumVolume[dupindex])
Prices=Prices[-(dupindex[1:length(dupindex)-1]),]
}
I hope it help,thanks.