R Searching for elements and their index in an array - r

I have a matrix with 2 columns as described below:
TIME PRICE
10 45
11 89
13 89
15 12
16 09
17 34
19 89
20 90
23 21
26 09
in the above matrix, I need to iterate through the TIME column adding 5 seconds and accessing the corresponding PRICE that matches the row.
For ex: I start with 10. i need to access 15 (10+5), I would've been able to get to 15 easily if the numbers in the column were continuous data, but its not. so at 15 seconds time, i need to get hold of the corresponding price. and this goes on till the end of the entire data set. my next element that needs to be accessed is 20, and its corresponding price. now i again add 5 seconds and it hence goes on. incase the element is not present, the one immediately greater than it must be accessed to obtain the corresponding price.

If the rows you want to extract are m[1,1]+5, m[1,1]+10, m[1,1]+15 etc then:
m <- cbind(TIME=c(10,11,13,15,16,17,19,20,23,26),
PRICE=c(45,89,89,12,9,34,89,90,21,9))
r <- range(m[,1]) # 10,26
r <- seq(r[1]+5, r[2], 5) # 15,20,25
r <- findInterval(r-1, m[,1])+1 # 4,8,10 (values 15,20,26)
m[r,2] # 12,90,9
findInterval finds the index for values that are equal or less than the given value, so I give it a smaller value and then add 1 to the index.

Breaking the question apart into sub-pieces...
Getting the row with value 15:
Call your Matrix, say, DATA, and
[1] extract the row of interest:
DATA[DATA[,1] == 15, ]
Then snag the second column.
[2] Adding 5 to the first column ( I'm pretty sure you can just do this ):
DATA[,1] = DATA[,1] + 5
This should get you started. The rest seems to just be some funky iteration, incrementing by 5, using [1] to get the price you want each time, swapping 15 for some variable.
I leave the rest of the solution as an exercise to the reader. For tips on looping in R, and more, see the below tutorial ( I don't expect it to be taken down any time soon, but may want to keep a local copy. Good luck :) )
http://www.stat.berkeley.edu/users/vigre/undergrad/reports/VIGRERintro.pdf

As #Tommy commented above, it is not clear what TIME you exactly want to get. For me, it seems like you want to get the PRICE for the sequence 10,15,20,25,... If true, you could do that easily suing the mod (%%) function:
TIME <- c(10,11,13,15,16,17,19,20,23,26) # Your times
PRICE <- c(45,89,89,12,9,34,89,90,21,9) # your prices
PRICE[TIME %% 5 == 0] # Get prices from times in sequence 10, 15, 20, ...

Related

R: locate element previous in vector within for loop and report in new column

I've looked through many older posts but nothing is really hitting the answer I need. In short: I have a data frame that contains observation data and the time of observation in days.
My goal is to add a column for weeks. I have already subsetted the data so that I only have the time vector at intervals of 7 (t == 7, 14, 21, etc). I just need to make a for loop that creates a new vector of "weeks" that I can then cbind to my data. I'd prefer it to be a character string so I can use it more easily in ggplot geom_historgram, but isn't as necessary as just creating the new vector successfully.
The tricky part of the data is that there is not an equal number of observations per time- t # 28 has maybe 5x as many observations as t #7, etc.
I want to create code that evaluates what t is, then checks to see if it is greater than the last element in the t vector. If it isn't, then populate the week vector with the last value it did, and if so, then increase it by 1.
I know this is bad from a like, computer science/R perspective in a lot of ways, but any help would be useful:
#fake data (in reality this is a huge data set with many observations at intervals of 1 for t
L = rnorm(50, mean=10, sd=2)
t = c((rep.int(7,3)), (rep.int(14,6)), rep.int(21,8), rep.int(28,12), (rep.int(31, 5)), (rep.int(36,16)))
fake = cbind(L,t)
#create df that has only the observations that are at weekly time points
dayofweek = seq(7,120,7)
df = subset(fake, t %in% dayofweek)
#create empty week vector
week = c()
#for loop with if-else statement nested to populate the week vector
for (i in 1:length(dayofweek)){
if (t = t[t-1]){
week = i
} else if (t > t[t-1]{
week = i+1
}
}
Thanks!!
I'm not sure I can follow what you want to do. If you want to determine which week the data fall within, why not:
set.seed(1)
L = rnorm(50, mean=10, sd=2)
...
fake <- data.frame(L=L, t=t)
fake$week <- floor(fake$t/7) # comment this out so t==7 becomes week==1 + 1
head(fake)
# L t week
# 1 8.747092 7 2
# 2 10.367287 7 2
# 3 8.328743 7 2
# 4 13.190562 14 3
# 5 10.659016 14 3
# 6 8.359063 14 3

All data in one column

I have this soccer data all in one column.
Round 36 # Round of the league------------------------------------
29.07. 20:45 # Date and time of the match
Barcelona # Home Team
4 - 1 # FT result
Getafe # Away team
(2 - 0) # HT result
29.07. 20:45 # *date of the second match of the round*
Valencia
2 - 3
Laci
(1 - 2)
Round 35 # repeating pattern -------------------------------------------------
How can I move all the data in a certain round of the league in a new column? e.g. I want all observation from the Round 36 observation to the Round 35 observation in a single a column and so on.
I really do not have any idea how to solve this. I tried to transpose the data so that I could work better with observations as variables but still nothing. I am just a beginner in R and would appreciate any help.
thanks
Assuming your data is within a variable named lines (eg, lines[1] = Round 36 is the first entry, lines[2] = 29.07. 2045 is the next entry and so forth), we can spot the lines, split the vector into a list and then finally bind it into a data.frame (assuming they have equal length, if not you will have to do some manual work)
#Figure out where each round is.
rounds <- grepl('^Round', lines)
# Split it into seperate list. cumsum(rounds) will be an index for each group.
data <- split(lines, cumsum(rounds))
# Bind the data into a data.frame (assuming all have the same amount of data)
bound <- do.call(rbind, data)
Of course without a reproducible example it is hard to test the final result.
Note that if the soccer data does not have equal amount of data between rounds or if the data does not come in the same order, the resulting data.frame may not make immediate sense (if round 45 has 7 elements but round 46 has 4, round 46 will recycle element 1, 2 and 3 to fill out the missing values), but it might make it simpler to do some follow up data cleaning.

Why the for loop is not using the 'i' specified in the function

I have a data frame with 25 weeks of observations per animal and 20 animals in total. I am trying to write a function that calculates a linear equation between 2 points each time and do that for the 25 weeks and the 20 animals.
I want to use a general form of the equation so I can calculate values al any point. In the function, Week=t, Weight=d.
I can't figure out how to make this work. I don't think the loop is working using each row of the data frame as the index for the function. My data frame named growth looks something like this:
Week Weight Animal
1 50 1
2 60 1
n=25
1 80 2
2 90 2
.
.
20
for (i in growth$Week){
eq<- function(t){
d = growth$BW.Kg
t = growth$Week
(d[i+1]-d[i])/(t[i+1]-t[i])*(t-t[i])+d[i]
return(eq)
}
}
eq(3)
OK, so I think there are a few points of confusion here. The first is writing a function inside a for loop. What is happening is that you are re-writing the function over and over, and also your function doesn't save the values of your equation anywhere. Secondly, you are passing t as your argument but the expecting t to follow the for loop with the i value. Finally, you say that you want this to be done for each animal, but the animal value is not shown in your code.
So it's a little bit hard to see what you are trying to achieve here.
Based on your information above, I've rewritten your function into something that will provide a result for your equation.
library(tidyverse)
growth <- tibble(week = 1:5,
animal = 1,
weight = c(50,52,55,54,57))
eq <- function(d,t,i){
z <- (d[i+1]-d[i])/(t[i+1]-t[i])*(t-t[i])+d[i]
return(z)
}
test_result <- eq(growth$weight,growth$week,3)
Results:
[1] 57 56 55 54 53
Is that the kind of result you were expecting? Or did you want just a single result per week per animal? Could you provide a working example of a formula that would produce a single desired result (i.e. a result for animal 1 on week 1)?

Complex data calculation for consecutive zeros at row level in R (lag v/s lead)

I have a complex calculation that needs to be done. It is basically at a row level, and i am not sure how to tackle the same.
If you can help me with the approach or any functions, that would be really great.
I will break my problem into two sub-problems for simplicity.
Below is how my data looks like
Group,Date,Month,Sales,lag7,lag6,lag5,lag4,lag3,lag2,lag1,lag0(reference),lead1,lead2,lead3,lead4,lead5,lead6,lead7
Group1,42005,1,2503,1,1,0,0,0,0,0,0,0,0,0,0,1,0,1
Group1,42036,2,3734,1,1,1,1,1,0,0,0,0,1,1,0,0,0,0
Group1,42064,3,6631,1,0,0,1,0,0,0,0,0,0,1,1,1,1,0
Group1,42095,4,8606,0,1,0,1,1,0,1,0,1,1,1,0,0,0,0
Group1,42125,5,1889,0,1,1,0,1,0,0,0,0,0,0,0,1,1,0
Group1,42156,6,4819,0,1,0,0,0,1,0,0,1,0,1,1,1,1,0
Group1,42186,7,5120,0,0,1,1,1,1,1,0,0,1,1,0,1,1,0
I have data for each Group at Monthly Level.
I would like to capture the below two things.
1. The count of consecutive zeros for each row to-and-fro from lag0(reference)
The highlighted yellow are the cases, that are consecutive with lag0(reference) to a certain point, that it reaches first 1. I want to capture the count of zero's at row level, along with the corresponding Sales value.
Below is the output i am looking for the part1.
Output:
Month,Sales,Count
1,2503,9
2,3734,3
3,6631,5
4,8606,0
5,1889,6
6,4819,1
7,5120,1
2. Identify the consecutive rows(row:1,2 and 3 & similarly row:5,6) where overlap of any lag or lead happens for any 0 within the lag0(reference range), and capture their Sales and Month value.
For example, for row 1,2 and 3, the overlap happens at atleast lag:3,2,1 &
lead: 1,2, this needs to be captured and tagged as case1 (or 1). Similarly, for row 5 and 6 atleast lag1 is overlapping, hence this needs to be captured, and tagged as Case2(or 2), along with Sales and Month value.
Now, row 7 is not overlapping with the previous or later consecutive row,hence it will not be captured.
Below is the result i am looking for part2.
Month,Sales,Case
1,2503,1
2,3734,1
3,6631,1
5,1889,2
6,4819,2
I want to run this for multiple groups, hence i will either incorporate dplyr or loop to get the result. Currently, i am simply looking for the approach.
Not sure how to solve this problem. First time i am looking to capture things at row level in R. I am not looking for any solution. Simply looking for a first step to counter this problem. Would appreciate any leads.
An option using rle for the 1st part of the calculation can be as:
df$count <- apply(df[,-c(1:4)],1,function(x){
first <- rle(x[1:7])
second <- rle(x[9:15])
count <- 0
if(first$values[length(first$values)] == 0){
count = first$lengths[length(first$values)]
}
if(second$values[1] == 0){
count = count+second$lengths[1]
}
count
})
df[,c("Month", "Sales", "count")]
# Month Sales count
# 1 1 2503 9
# 2 2 3734 3
# 3 3 6631 5
# 4 4 8606 0
# 5 5 1889 6
# 6 6 4819 1
# 7 7 5120 1
Data:
df <- read.table(text =
"Group,Date,Month,Sales,lag7,lag6,lag5,lag4,lag3,lag2,lag1,lag0(reference),lead1,lead2,lead3,lead4,lead5,lead6,lead7
Group1,42005,1,2503,1,1,0,0,0,0,0,0,0,0,0,0,1,0,1
Group1,42036,2,3734,1,1,1,1,1,0,0,0,0,1,1,0,0,0,0
Group1,42064,3,6631,1,0,0,1,0,0,0,0,0,0,1,1,1,1,0
Group1,42095,4,8606,0,1,0,1,1,0,1,0,1,1,1,0,0,0,0
Group1,42125,5,1889,0,1,1,0,1,0,0,0,0,0,0,0,1,1,0
Group1,42156,6,4819,0,1,0,0,0,1,0,0,1,0,1,1,1,1,0
Group1,42186,7,5120,0,0,1,1,1,1,1,0,0,1,1,0,1,1,0",
header = TRUE, stringsAsFactors = FALSE, sep = ",")

Determine when a sequence of numbers has been broken in R

Say I have a series of numbers:
seq1<-c(1:20,25:40,48:60)
How can I return a vector that lists points in which the sequence was broken, like so:
c(21,24)
[1] 21 24
c(41,47)
[1] 41 47
Thanks for any help.
To show my miserably failing attempt:
nums<-min(seq1):max(seq1) %in% seq1
which(nums==F)[1]
res.vec<-vector()
counter<-0
res.vec2<-vector()
counter2<-0
for (i in 2:length(seq1)){
if(nums[i]==F & nums[i-1]!=F){
counter<-counter+1
res.vec[counter]<-seq1[i]
}
if(nums[i]==T & nums[i-1]!=T){
counter2<-counter2+1
res.vec2[counter2]<-seq1[i]
}
}
cbind(res.vec,res.vec2)
I have changed the general function a bit so I think this should be a sepparate answer.
You could try
seq1<-c(1:20,25:40,48:60)
myfun<-function(data,threshold){
cut<-which(c(1,diff(data))>threshold)
return(cut)
}
You get the points you have to care about using
myfun(seq1,1)
[1] 21 37
In order to better use is convenient to create an object with it.
pru<-myfun(seq1,1)
So you can now call
df<-data.frame(pos=pru,value=seq1[pru])
df
pos value
1 21 25
2 37 48
You get a data frame with the position and the value of the brakes with your desired threshold. If you want a list instead of a data frame it works like this:
list(pos=pru,value=seq1[pru])
$pos
[1] 21 37
$value
[1] 25 48
Function diff will give you the differences between successive values
> x <- c(1,2,3,5,6,3)
> diff(x)
[1] 1 1 2 1 -3
Now look for those values that are not equal to one for "breakpoints" in your sequence.
Taking in account the comments made here. For a general purpose, you could use.
fun<-function(data,threshold){
t<-which(c(1,diff(data)) != threshold)
return(t)
}
Consider that data could be any numerical vector (such as a data frame column). I would also consider using grep with a similar approach but it all depends on user preference.

Resources