R restructuring long to wide using for loop - r

Right now, this working loop is pasting the supervisor's scores for a given year into new columns (supervisor.score1:supervisor.score4) on the appropriate employee row. It achieves this by looking at an employee row i and finding the employee row(s) for the supervisor and year listed in row i. Then it takes the first row on that list of matching rows, and pastes the scores (score1:score 4) from that row into supervisor.score1:supervisor.score4 for the corresponding employee row i.
employeeID year supervisor score1:score 4 supervisor.score1:supervisor.score4
for (i in (1:nrow(data))){
matchvector <- which(data[,1] == data[i,3] & data[,2] == data[i,2])
if (length(matchvector) > 0) {
case <- matchvector[1]
data[i, namevector] <- data[case, supervisor.score1:supervisor.score4]}
if (length(matchvector[1]) < 1){
data[i, supervisor.score1:supervisor.score4] <- NA}
}
Is there a way to convert this loop into a function that can be called with apply?

Related

Why is loop adding NA values to the data frame?

I have a basic while and for loop, I iterate through some starting and ending values in a dataframe, then go through a list and grab (substring) some values.
The problem with the below code is that it adds a lot of NA rows which I don't understand why and how.
I have an if which looks at the GREPL- finds "TRACK 2 DATA: ", if so then ads a row in dataframe. I don't have an else which adds NA values. So in my understanding in case if Block is false, the iteration should continue and not add values to dataframe?
What might be wrong?
i=1
fundi <- nrow(find_txn) #get the last record
while(i <=fundi) { # Start while-loop until END OF records
nga <- find_txn[i,1] #FRom record
ne <- find_txn[i,3] #to Records
for (j in nga:ne){ #For J in from:to
if(grepl("TRACK 2 DATA: ",linn[j])) { #If track data found do something
gather_txn[j,1] <- j # add a record for iteration number
gather_txn[j,2] <- substr(linn[j],1,9) #get some substrings
gather_txn[j,3] <- substr(linn[j],34,39) #get some substrings
}
}
i <- i + 1
}
I was looping through the wrong variable. the inside if loop needs to add to the table using i not j variable:
gather_txn[i,1] <- j # add a record for iteration number
gather_txn[i,2] <- substr(linn[j],1,9) #get some substrings
gather_txn[i,3] <- substr(linn[j],34,39) #get some substrings

How to drop a buffer of rows in a data frame around rows of a certain condition

I am trying to remove rows in a data frame that are within x rows after rows meeting a certain condition.
I have a data frame with a response variable, a measurement type that represents the condition, and time. Here's a mock data set:
data <- data.frame(rlnorm(45,0,1),
c(rep(1,15),rep(2,15),rep(1,15)),
seq(
from=as.POSIXct("2012-1-1 0:00", tz="EST"),
to=as.POSIXct("2012-1-1 0:44", tz="EST"),
by="min"))
names(data) <- c('Variable','Type','Time')
In this mock case, I want to delete the first 5 rows in condition 1 after condition 2 occurs.
The way I thought about solving this problem was to generate a separate vector that determines the distance that each observation that is a 1 is from the last 2. Here's the code I wrote:
dist = vector()
for(i in 1:nrow(data)) {
if(data$Type[i] != 1) dist[i] <- 0
else {
position = i
tempcount = 0
while(position > 0 && data$Type[position] == 1){
position = position - 1
tempcount = tempcount + 1
}
dist[i] = tempcount
}
}
This code will do the trick, but it's extremely inefficient. I was wondering if anyone had some cleverer, faster solutions.
If I understand you correctly, this should do the trick:
criteria1 = which(data$Type[2:nrow(data)] == 2 & data$Type[2:nrow(data)] != data$Type[1:nrow(data)-1]) +1
criteria2 = as.vector(sapply(criteria1,function(x) seq(x,x+5)))
data[-criteria2,]
How it works:
criteria1 contains indices where Type==2, but the previous row is not the same type. The strange lookign subsets like 2:nrow(data) are because we want to compare to the previous row, but for the first row there is no previous row. herefore we add +1 at then end.
criteria2 contains sequences starting with the number in criteria1, to those numbers+5
the third row performs the subset
This might need small modification, I wasn't exactly clear what criteria 1 and criteria 2 were from your code. Let me know if this works or you need any more advice!

R Selecting a Row plus the next 5 rows

In my dataframe I would like to select a row based on some logic, and then return a dataframe with the selected row PLUS the next 'N' rows.
So, I have this: (a generic example)
workingRows <- myData[which(myData$Column1 >= myData$Column2 & myData$Column3 <= myData$Column4), ]
Which returns me the correct "starting values". How can I get the "next" 5 values based on each of the starting values?
We can use rep to get the next 5 rows, sort it and if there are any duplicates from overlaps, wrap it with unique and subset the 'myData'.
i1 <- which(myData$Column1 >= myData$Column2 & myData$Column3 <= myData$Column4)
myData[unique(sort(i1 + rep(0:5, each = length(i1)))),]

Finding a missed row in time series in R

I have a daily time series for 20 years(column 1 dates and other columns different data), and one row is deleted, which I don't know which one is.
I want to find that row and insert related date in that row and also interpolate other columns for that row!
Is it possible in R?
Thanks
Supposing your date column is of class "Date", here's a way:
# generate sample data
my.df <- data.frame(date=Sys.Date(), other=rnorm(1))
for(i in 2:100) {
my.df[i,] <- list(Sys.Date() + (i-1), rnorm(1))
}
class(my.df$date)
# [1] "Date"
# remove row 71
my.df <- my.df[-71,]
# Iterate to see where there is a gap
for(i in 2:nrow(my.df)) {
if(my.df$date[i] != my.df$date[i-1] + 1) {
cat("missing row:", i)
break
}
}
missing row: 71

R: Looping and syntax when grouping rows

My data is 18 rows by 8 columns. It contains both numerical and word data. I want to assing each row an ID number. I want to group the rows with the same info in the first 5 columns by the same ID number. For some reason I don't think I am looping properly. Any thoughts?
sampdata<-read.csv("xxx")
sampdata["ID"] <- 0 #ID column
count<-1 #to subtract from 10000
for (p in 1:18) {
if (sampdata[p,9] == 0){
count<-count+1
sampdata[9,p]<-10000-count
for (i in 1:5){ #column index for current check (only check defining info)
for (j in 1:18) { #row index for current check
for (k in 1:18){ #column index for current check against
if (sampdata[i,j]==sampdata[i,k])
sampdata[j,9]<-sampdata[9,p] #assign same ID number
}
}
}
}
}
Assuming your data looks something like this
mm<-matrix(c(
1,1,2,2,3, 1,1,2,2,3,
2,2,2,3,3, 2,2,2,3,3,
4,3,2,1,2, 1,1,2,2,3,
3,1,1,2,2
), byrow=T, ncol=5)
dd<-data.frame(mm[,1:3],
X4=letters[mm[,4]], X5=mm[,5],
matrix(runif(nrow(mm)*(18-ncol(mm))), nrow=nrow(mm)))
Where your data is in dd and the first 5 columns define a group. You can use interaction() to assign a unique ID to each group like this
dd$ID <- as.numeric(interaction(dd[,1:5], drop=T, lex.order=T))

Resources