Why is loop adding NA values to the data frame? - r

I have a basic while and for loop, I iterate through some starting and ending values in a dataframe, then go through a list and grab (substring) some values.
The problem with the below code is that it adds a lot of NA rows which I don't understand why and how.
I have an if which looks at the GREPL- finds "TRACK 2 DATA: ", if so then ads a row in dataframe. I don't have an else which adds NA values. So in my understanding in case if Block is false, the iteration should continue and not add values to dataframe?
What might be wrong?
i=1
fundi <- nrow(find_txn) #get the last record
while(i <=fundi) { # Start while-loop until END OF records
nga <- find_txn[i,1] #FRom record
ne <- find_txn[i,3] #to Records
for (j in nga:ne){ #For J in from:to
if(grepl("TRACK 2 DATA: ",linn[j])) { #If track data found do something
gather_txn[j,1] <- j # add a record for iteration number
gather_txn[j,2] <- substr(linn[j],1,9) #get some substrings
gather_txn[j,3] <- substr(linn[j],34,39) #get some substrings
}
}
i <- i + 1
}

I was looping through the wrong variable. the inside if loop needs to add to the table using i not j variable:
gather_txn[i,1] <- j # add a record for iteration number
gather_txn[i,2] <- substr(linn[j],1,9) #get some substrings
gather_txn[i,3] <- substr(linn[j],34,39) #get some substrings

Related

R - How to write in specific column in a loop : nothing comes out

I'm working on a R program to automatically write a text in specific cell, in specific rows while being in a loop.
The each d$data.frame is cut into a maximum of 150 rows
The aim is to divide the document into 3 sections of 50 rows to put a name in column number 5. (made to give a task to someone specific)
What we have is :
d: a list of data.frames done by splitting a huge data base
w: length of d $
for (i in 1:w) {
if (nrow(102<d[[i]])&nrow(d[[i]])<150){
d[[i]][1:50,c("Contact Owner")] <- 'Luc'
d[[i]][51:101,c("Contact Owner")] <- 'Bertha'
d[[i]][102:150,c("Contact Owner")] <- 'Marc'
} else
if (nrow(51<d[[i]])&nrow(d[[i]])<101){
d[[i]] [1:50,c("Contact Owner")] <- 'Luc'
d[[i]] [51:101,c("Contact Owner")] <- 'Bertha'
} else
if (nrow(1<d[[i]])&nrow(d[[i]])<50){
d[[i]] [1:nrow(d[[i]]),c("Contact Owner")] <- 'Luc'}
break
}
I keep on having this error message argument is of length zero
thank you in advance for all the help :)
More informations
I can't shqre the files I'm working on as they are private data but here are the current details I can give
d is a list of 5 data.frame that was given by
`d <- split(Scrapping,r)
d$1, d$2, d$3, d$4 are 150x16
d$5 is 10x16`
In the 5th column of each d$i I want to write names
row 1 -50: I would like Luc
row 51 -101: Bertha
row 102 -150: Marc
Yet the last d$i data.frame will often be less than 150 rows, so I want to put only names where there are other data.
Here is an easier option with split after creating a grouping vector with gl
d1 <- transform(d, newcol = paste0('name', as.integer(gl(nrow(d), 50, nrow(d)))))
split(d1, d1$newcol)
I was able in the end to find a way to do so like this :
w <- length(d)
for (i in 1:w) {
d[[i]][1:50,c("Contact Owner")] = "Luc"
d[[i]][51:101,c("Contact Owner")] = "Bertha"
d[[i]][102:150,c("Contact Owner")] = "Marc"
d[[i]] <- with(d[[i]], d[[i]][!(`First Name` == "" | is.na(`First Name`)), ])
}

R function to subset dataframe so that non-adjacent values in a column differ by >= X (starting with the first value)

I am looking for a function that iterates through the rows of a given column ("pos" for position, ascending) in a dataframe, and only keeps those rows whose values are at least let's say 10 different, starting with the first row.Thus it would start with the first row (and store it), and then carry on until it finds a row with a value at least 10 higher than the first, store this row, then start from this value again looking for the next >10diff one.
So far I have an R for loop that successfully finds adjacent rows at least X values apart, but it does not have the capability of looking any further than one row down, nor of stopping once it has found the given row and starting again from there.
Here is the function I have:
# example data frame
df <- data.frame(x=c(1:1000), pos=sort(sample(1:10000, 1000)))
# prep function (this only checks row above)
library(dplyr)
pos.apart.subset <- function(df, pos.diff) {
# create new dfs to store output
new.df <- list()
new.df1 <- data.frame()
# iterate through each row of df
for (i in 1:nrow(df)) {
# if the value of next row is higher or equal than value or row i+posdiff, keep
# if not ascending, keep
# if first row, keep
if(isTRUE(df$pos[i+1] >= df$pos[i]+pos.diff | df$pos[i+1] < df$pos[i] | i==1 )) {
# add rows that meet conditions to list
new.df[[i]] <- df[i,] }
}
# bind all rows that met conditions
new.df1 <- bind_rows(new.df)
return(new.df1)}
# test run for pos column adjacent values to be at least 10 apart
df1 <- pos.apart.subset(df, 10); head(df1)
Happy to do this in awk or any other language. Many thanks.
It seems I misunderstood the question earlier since we don't want to calculate the difference between consecutive rows, you can try :
nrows <- 1
previous_match <- 1
for(i in 2:nrow(df)) {
if(df$pos[i] - df$pos[previous_match] > 10) {
nrows <- c(nrows, i)
previous_match <- i
}
}
and then subset the selected rows :
df[nrows, ]
Earlier answer
We can use diff to get the difference between consecutive rows and select the row which has difference of greater than 10.
head(subset(df, c(TRUE, diff(pos) > 10)))
# x pos
#1 1 1
#2 2 31
#6 6 71
#9 9 134
#10 10 151
#13 13 185
The first TRUE is to by default select the first row.
In dplyr, we can use lag to get value from previous row :
library(dplyr)
df %>% filter(pos - lag(pos, default = -Inf) > 10)

Searching an item with an for loop

I am trying to do a for loop which would search over every row in data frame, but just the
first column checking the tag ID, and if its not it, then it should move to the next row and so on until it finds the value or get to the end of the data frame.
Then the row as a result should be printed.
The purpose is just checking how the for loop works and how "slow" it is ( I want it to compare to other way of searching). I am a bit inexperienced in R and programming general.
Progress so far/my code
Thus far I have done this code and the stopping point is how to make the function move to the other column and check it and move to the next.
SearchID = function(data,value) {
for(i in 1:nrow(testdata)) {
row <- testdata[i,1]
if("row" == "value") return(row)
#what now?
}
}
This is an reproducible example:
ID=c("ID43","ID23","ID14","ID14")
y=c(23,45,66,76)
k=c("yes","no","yes","no")
testdata= data.frame(ID,y,k)
If I give the ID14 as value, it should return the whole row with the ID14:
ID y k
4 ID14 76 no
Here, first you create an object d1 to hold the rows that match the value with the ID column. We are looping through each row of the data with the for loop and check the condition. If it matches, then use rbind to bind that row with the created object. You can also initialize d1 as d1 <- data.frame().
SearchID <- function(data,value) {
d1 <- c()
for(i in 1:nrow(data)) {
row <- data[i,1]
if(row==value){
d1 <- rbind(d1,data[i,])
}
}
d1
}
SearchID(testdata, 'ID14')
# ID y k
#3 ID14 66 yes
#4 ID14 76 no
SearchID(testdata, 'ID43')
# ID y k
#1 ID43 23 yes
SearchID(testdata, "ID86")
#NULL
It's not clear what assumptions about R knowledge can be made in trying to answer this question. For instance, if we could assume that we know how to extract rows based on a vector of row indices, the following would seem more natural to me than binding the rows one-by-one:
SearchID = function(data,value) {
getme <- numeric() # Initialize empty vector
for(i in 1:nrow(data)) { # Start the loop
row <- data[i, 1] # Capture the relevant value
# Compare, and If there's a match,
if (row == value) getme <- c(getme, i) # add loop index to "getme" vector
}
if (length(getme) == 0) NULL # If the vector is still empty, NULL
else data[getme, ] # else return the relevant rows
}
SearchID(testdata, "ID14")
# ID y k
# 3 ID14 66 yes
# 4 ID14 76 no
At the very least, this answer should give you something else to benchmark against :-)

R: Looping and syntax when grouping rows

My data is 18 rows by 8 columns. It contains both numerical and word data. I want to assing each row an ID number. I want to group the rows with the same info in the first 5 columns by the same ID number. For some reason I don't think I am looping properly. Any thoughts?
sampdata<-read.csv("xxx")
sampdata["ID"] <- 0 #ID column
count<-1 #to subtract from 10000
for (p in 1:18) {
if (sampdata[p,9] == 0){
count<-count+1
sampdata[9,p]<-10000-count
for (i in 1:5){ #column index for current check (only check defining info)
for (j in 1:18) { #row index for current check
for (k in 1:18){ #column index for current check against
if (sampdata[i,j]==sampdata[i,k])
sampdata[j,9]<-sampdata[9,p] #assign same ID number
}
}
}
}
}
Assuming your data looks something like this
mm<-matrix(c(
1,1,2,2,3, 1,1,2,2,3,
2,2,2,3,3, 2,2,2,3,3,
4,3,2,1,2, 1,1,2,2,3,
3,1,1,2,2
), byrow=T, ncol=5)
dd<-data.frame(mm[,1:3],
X4=letters[mm[,4]], X5=mm[,5],
matrix(runif(nrow(mm)*(18-ncol(mm))), nrow=nrow(mm)))
Where your data is in dd and the first 5 columns define a group. You can use interaction() to assign a unique ID to each group like this
dd$ID <- as.numeric(interaction(dd[,1:5], drop=T, lex.order=T))

removing duplicate subsets of rows

I have a list of stocks in an index sorted by date, and I'm trying to remove all rows in which the previous row has the same stock code. This will give a dataframe of the initial index and all dates that there was a change to the index
In my working example, I'll use names instead of the date column, and some numbers.
At first, I thought I could remove the rows by using subset() and !duplicated
name <- c("Joe","Mary","Sue","Frank","Carol","Bob","Kate","Jay")
num <- c(1,2,2,1,2,2,2,3)
num2 <- c(1,1,1,1,1,1,1,1)
df <- data.frame(name,num,num2)
dfnew <- subset(df, !duplicated(df[,2]))
However, this might not work in the case where a stock is removed from the list and then later replaced. So, in my working example, the desired output are the rows of Joe, Mary, Frank, Carol and Jay.
Next I created a function to tell if the index changes. The input of the function is row number:
#------ function to tell if there is a change in the row subset-----#
df2 <- as.matrix(df)
ChangeDay <- function(x){
Current <- df2[x,2:3]
Prev <- df2[x-1,2:3]
if (length(Current) != length(Prev))
NewList <- true
else
NewList <- length(which(Current==Prev))!=length(Current)
return(NewList)
}
Finally, I attempt to create a loop to remove the desired rows. I'm new to programming, and I struggle with loops. I'm not sure what the best way is to pre-allocate memory when the dimensions of my final output is unknown. All the books I've looked at only give trivial loop examples. Here is my latest attempt:
result <- matrix(data=NA,nrow=nrow(df2),ncol=3) #pre allocate memory
tmp <- as.numeric(df2) #store the original data
changes <- 1
for (i in 2:nrow(df2)){ #always keep row 1, thus the loop starts at row 2
if(ChangeDay(i)==TRUE){
result[i,] <-tmp[i] #store the row in result if ChangeDay(i)==TRUE
changes <- changes + 1 #increment counter
}
}
result <- result[1:changes,]
Thansk for your help, and any additional general advice on loops is appreciated!
It is not clear what you want to do. But I guess :
df[c(1,diff(df$num)) !=0,]
name num num2
1 Joe 1 1
2 Mary 2 1
4 Frank 1 1
5 Carol 2 1
8 Jay 3 1

Resources