R: Looping and syntax when grouping rows - r

My data is 18 rows by 8 columns. It contains both numerical and word data. I want to assing each row an ID number. I want to group the rows with the same info in the first 5 columns by the same ID number. For some reason I don't think I am looping properly. Any thoughts?
sampdata<-read.csv("xxx")
sampdata["ID"] <- 0 #ID column
count<-1 #to subtract from 10000
for (p in 1:18) {
if (sampdata[p,9] == 0){
count<-count+1
sampdata[9,p]<-10000-count
for (i in 1:5){ #column index for current check (only check defining info)
for (j in 1:18) { #row index for current check
for (k in 1:18){ #column index for current check against
if (sampdata[i,j]==sampdata[i,k])
sampdata[j,9]<-sampdata[9,p] #assign same ID number
}
}
}
}
}

Assuming your data looks something like this
mm<-matrix(c(
1,1,2,2,3, 1,1,2,2,3,
2,2,2,3,3, 2,2,2,3,3,
4,3,2,1,2, 1,1,2,2,3,
3,1,1,2,2
), byrow=T, ncol=5)
dd<-data.frame(mm[,1:3],
X4=letters[mm[,4]], X5=mm[,5],
matrix(runif(nrow(mm)*(18-ncol(mm))), nrow=nrow(mm)))
Where your data is in dd and the first 5 columns define a group. You can use interaction() to assign a unique ID to each group like this
dd$ID <- as.numeric(interaction(dd[,1:5], drop=T, lex.order=T))

Related

Why is loop adding NA values to the data frame?

I have a basic while and for loop, I iterate through some starting and ending values in a dataframe, then go through a list and grab (substring) some values.
The problem with the below code is that it adds a lot of NA rows which I don't understand why and how.
I have an if which looks at the GREPL- finds "TRACK 2 DATA: ", if so then ads a row in dataframe. I don't have an else which adds NA values. So in my understanding in case if Block is false, the iteration should continue and not add values to dataframe?
What might be wrong?
i=1
fundi <- nrow(find_txn) #get the last record
while(i <=fundi) { # Start while-loop until END OF records
nga <- find_txn[i,1] #FRom record
ne <- find_txn[i,3] #to Records
for (j in nga:ne){ #For J in from:to
if(grepl("TRACK 2 DATA: ",linn[j])) { #If track data found do something
gather_txn[j,1] <- j # add a record for iteration number
gather_txn[j,2] <- substr(linn[j],1,9) #get some substrings
gather_txn[j,3] <- substr(linn[j],34,39) #get some substrings
}
}
i <- i + 1
}
I was looping through the wrong variable. the inside if loop needs to add to the table using i not j variable:
gather_txn[i,1] <- j # add a record for iteration number
gather_txn[i,2] <- substr(linn[j],1,9) #get some substrings
gather_txn[i,3] <- substr(linn[j],34,39) #get some substrings

Counting events in data frame R

I would like to count occurrences within the data frame with 100 rows for 100 users and 5 columns for the userID, all conducted events and the thress events separately.
For each user I would like to count in column 3 to 5 the events separately which are listed in column 2 together in "" and separated by a comma (for example (c("stroke", "mouseclick1","mouseclick2")).
My code looks like this:
frame <- data.frame(matrix(ncol = 5, nrow = length(my.data)))
x <-c("user","eventsall","mouseclick1","mouseclick2","stroke")
colnames(frame) <- x
frame$user <- c(1:length(my.data))
frame$eventsall <- as.character(frame$workflow)
frame$mouseclick1 <- ?????
frame$mouseclick2 <- ?????
frame$stroke <- ?????
How can I define the three variables (above) so that I am able to count the frequency of each event for each user within the frame?
The first loop is correct but the second is wrong which I could repeat for
mouseclick2 and stroke. Is the function str_count correct?
for (i in frame$user) {
if (is.na(my.data[[i]][["scenario1"]]) == TRUE) {
frame$eventsall[i] <- NA
}
else {
frame$eventsall[i] <- list(my.data[[i]][["scenario1"]][["events.all"]])
}
}
for (i in frame$user) {
if (is.na(my.data[[i]][["scenario1"]][["events.all"]]) == TRUE) {
frame$mouseclick1[i] <- NA
}
else {
frame$mouseclick1[i,3] <- str_count(my.data[[i]][["scenario1"]][["events.all", pattern="mouseclick1"]])
}
}
View(frame)
Thanks a lot!
You can split the comma delimited string using strsplit and then loop through each row of the data.
# Sample data since none was provided
frame <- data.frame(user=c(1:5),
eventsall=c('1,2,3',
'3,4,6',
'5,3,2',
'7,4,5',
'6,6,5'))
frame$eventsall <- as.character(frame$eventsall)
events.split <- strsplit(frame$eventsall,',')
for(i in 1:nrow(frame)){
frame$mouseclick1[i] <- events.split[[i]][1]
frame$mouseclick2[i] <- events.split[[i]][2]
frame$stroke[i] <- events.split[[i]][3]
}

R restructuring long to wide using for loop

Right now, this working loop is pasting the supervisor's scores for a given year into new columns (supervisor.score1:supervisor.score4) on the appropriate employee row. It achieves this by looking at an employee row i and finding the employee row(s) for the supervisor and year listed in row i. Then it takes the first row on that list of matching rows, and pastes the scores (score1:score 4) from that row into supervisor.score1:supervisor.score4 for the corresponding employee row i.
employeeID year supervisor score1:score 4 supervisor.score1:supervisor.score4
for (i in (1:nrow(data))){
matchvector <- which(data[,1] == data[i,3] & data[,2] == data[i,2])
if (length(matchvector) > 0) {
case <- matchvector[1]
data[i, namevector] <- data[case, supervisor.score1:supervisor.score4]}
if (length(matchvector[1]) < 1){
data[i, supervisor.score1:supervisor.score4] <- NA}
}
Is there a way to convert this loop into a function that can be called with apply?

Finding a missed row in time series in R

I have a daily time series for 20 years(column 1 dates and other columns different data), and one row is deleted, which I don't know which one is.
I want to find that row and insert related date in that row and also interpolate other columns for that row!
Is it possible in R?
Thanks
Supposing your date column is of class "Date", here's a way:
# generate sample data
my.df <- data.frame(date=Sys.Date(), other=rnorm(1))
for(i in 2:100) {
my.df[i,] <- list(Sys.Date() + (i-1), rnorm(1))
}
class(my.df$date)
# [1] "Date"
# remove row 71
my.df <- my.df[-71,]
# Iterate to see where there is a gap
for(i in 2:nrow(my.df)) {
if(my.df$date[i] != my.df$date[i-1] + 1) {
cat("missing row:", i)
break
}
}
missing row: 71

Searching an item with an for loop

I am trying to do a for loop which would search over every row in data frame, but just the
first column checking the tag ID, and if its not it, then it should move to the next row and so on until it finds the value or get to the end of the data frame.
Then the row as a result should be printed.
The purpose is just checking how the for loop works and how "slow" it is ( I want it to compare to other way of searching). I am a bit inexperienced in R and programming general.
Progress so far/my code
Thus far I have done this code and the stopping point is how to make the function move to the other column and check it and move to the next.
SearchID = function(data,value) {
for(i in 1:nrow(testdata)) {
row <- testdata[i,1]
if("row" == "value") return(row)
#what now?
}
}
This is an reproducible example:
ID=c("ID43","ID23","ID14","ID14")
y=c(23,45,66,76)
k=c("yes","no","yes","no")
testdata= data.frame(ID,y,k)
If I give the ID14 as value, it should return the whole row with the ID14:
ID y k
4 ID14 76 no
Here, first you create an object d1 to hold the rows that match the value with the ID column. We are looping through each row of the data with the for loop and check the condition. If it matches, then use rbind to bind that row with the created object. You can also initialize d1 as d1 <- data.frame().
SearchID <- function(data,value) {
d1 <- c()
for(i in 1:nrow(data)) {
row <- data[i,1]
if(row==value){
d1 <- rbind(d1,data[i,])
}
}
d1
}
SearchID(testdata, 'ID14')
# ID y k
#3 ID14 66 yes
#4 ID14 76 no
SearchID(testdata, 'ID43')
# ID y k
#1 ID43 23 yes
SearchID(testdata, "ID86")
#NULL
It's not clear what assumptions about R knowledge can be made in trying to answer this question. For instance, if we could assume that we know how to extract rows based on a vector of row indices, the following would seem more natural to me than binding the rows one-by-one:
SearchID = function(data,value) {
getme <- numeric() # Initialize empty vector
for(i in 1:nrow(data)) { # Start the loop
row <- data[i, 1] # Capture the relevant value
# Compare, and If there's a match,
if (row == value) getme <- c(getme, i) # add loop index to "getme" vector
}
if (length(getme) == 0) NULL # If the vector is still empty, NULL
else data[getme, ] # else return the relevant rows
}
SearchID(testdata, "ID14")
# ID y k
# 3 ID14 66 yes
# 4 ID14 76 no
At the very least, this answer should give you something else to benchmark against :-)

Resources