Counting events in data frame R - r

I would like to count occurrences within the data frame with 100 rows for 100 users and 5 columns for the userID, all conducted events and the thress events separately.
For each user I would like to count in column 3 to 5 the events separately which are listed in column 2 together in "" and separated by a comma (for example (c("stroke", "mouseclick1","mouseclick2")).
My code looks like this:
frame <- data.frame(matrix(ncol = 5, nrow = length(my.data)))
x <-c("user","eventsall","mouseclick1","mouseclick2","stroke")
colnames(frame) <- x
frame$user <- c(1:length(my.data))
frame$eventsall <- as.character(frame$workflow)
frame$mouseclick1 <- ?????
frame$mouseclick2 <- ?????
frame$stroke <- ?????
How can I define the three variables (above) so that I am able to count the frequency of each event for each user within the frame?
The first loop is correct but the second is wrong which I could repeat for
mouseclick2 and stroke. Is the function str_count correct?
for (i in frame$user) {
if (is.na(my.data[[i]][["scenario1"]]) == TRUE) {
frame$eventsall[i] <- NA
}
else {
frame$eventsall[i] <- list(my.data[[i]][["scenario1"]][["events.all"]])
}
}
for (i in frame$user) {
if (is.na(my.data[[i]][["scenario1"]][["events.all"]]) == TRUE) {
frame$mouseclick1[i] <- NA
}
else {
frame$mouseclick1[i,3] <- str_count(my.data[[i]][["scenario1"]][["events.all", pattern="mouseclick1"]])
}
}
View(frame)
Thanks a lot!

You can split the comma delimited string using strsplit and then loop through each row of the data.
# Sample data since none was provided
frame <- data.frame(user=c(1:5),
eventsall=c('1,2,3',
'3,4,6',
'5,3,2',
'7,4,5',
'6,6,5'))
frame$eventsall <- as.character(frame$eventsall)
events.split <- strsplit(frame$eventsall,',')
for(i in 1:nrow(frame)){
frame$mouseclick1[i] <- events.split[[i]][1]
frame$mouseclick2[i] <- events.split[[i]][2]
frame$stroke[i] <- events.split[[i]][3]
}

Related

R - How to write in specific column in a loop : nothing comes out

I'm working on a R program to automatically write a text in specific cell, in specific rows while being in a loop.
The each d$data.frame is cut into a maximum of 150 rows
The aim is to divide the document into 3 sections of 50 rows to put a name in column number 5. (made to give a task to someone specific)
What we have is :
d: a list of data.frames done by splitting a huge data base
w: length of d $
for (i in 1:w) {
if (nrow(102<d[[i]])&nrow(d[[i]])<150){
d[[i]][1:50,c("Contact Owner")] <- 'Luc'
d[[i]][51:101,c("Contact Owner")] <- 'Bertha'
d[[i]][102:150,c("Contact Owner")] <- 'Marc'
} else
if (nrow(51<d[[i]])&nrow(d[[i]])<101){
d[[i]] [1:50,c("Contact Owner")] <- 'Luc'
d[[i]] [51:101,c("Contact Owner")] <- 'Bertha'
} else
if (nrow(1<d[[i]])&nrow(d[[i]])<50){
d[[i]] [1:nrow(d[[i]]),c("Contact Owner")] <- 'Luc'}
break
}
I keep on having this error message argument is of length zero
thank you in advance for all the help :)
More informations
I can't shqre the files I'm working on as they are private data but here are the current details I can give
d is a list of 5 data.frame that was given by
`d <- split(Scrapping,r)
d$1, d$2, d$3, d$4 are 150x16
d$5 is 10x16`
In the 5th column of each d$i I want to write names
row 1 -50: I would like Luc
row 51 -101: Bertha
row 102 -150: Marc
Yet the last d$i data.frame will often be less than 150 rows, so I want to put only names where there are other data.
Here is an easier option with split after creating a grouping vector with gl
d1 <- transform(d, newcol = paste0('name', as.integer(gl(nrow(d), 50, nrow(d)))))
split(d1, d1$newcol)
I was able in the end to find a way to do so like this :
w <- length(d)
for (i in 1:w) {
d[[i]][1:50,c("Contact Owner")] = "Luc"
d[[i]][51:101,c("Contact Owner")] = "Bertha"
d[[i]][102:150,c("Contact Owner")] = "Marc"
d[[i]] <- with(d[[i]], d[[i]][!(`First Name` == "" | is.na(`First Name`)), ])
}

Why is loop adding NA values to the data frame?

I have a basic while and for loop, I iterate through some starting and ending values in a dataframe, then go through a list and grab (substring) some values.
The problem with the below code is that it adds a lot of NA rows which I don't understand why and how.
I have an if which looks at the GREPL- finds "TRACK 2 DATA: ", if so then ads a row in dataframe. I don't have an else which adds NA values. So in my understanding in case if Block is false, the iteration should continue and not add values to dataframe?
What might be wrong?
i=1
fundi <- nrow(find_txn) #get the last record
while(i <=fundi) { # Start while-loop until END OF records
nga <- find_txn[i,1] #FRom record
ne <- find_txn[i,3] #to Records
for (j in nga:ne){ #For J in from:to
if(grepl("TRACK 2 DATA: ",linn[j])) { #If track data found do something
gather_txn[j,1] <- j # add a record for iteration number
gather_txn[j,2] <- substr(linn[j],1,9) #get some substrings
gather_txn[j,3] <- substr(linn[j],34,39) #get some substrings
}
}
i <- i + 1
}
I was looping through the wrong variable. the inside if loop needs to add to the table using i not j variable:
gather_txn[i,1] <- j # add a record for iteration number
gather_txn[i,2] <- substr(linn[j],1,9) #get some substrings
gather_txn[i,3] <- substr(linn[j],34,39) #get some substrings

Renaming dataframes of varying number of columns in R

I would like to rename columns sequentially, for multiple dataframes with varying number of columns.
The dataframes will be put into R from a pdf that displays a table, and each pdf page is automatically assigned to a column. Automatically, the columns are named as the entire printout of the page. I simply want to replace this automatic column name with the page number.
A dataframe with 4 columns titled 1,2,3,4 or a dataframe with 5 columns titles 1,2,3,4,5 and so on.
I tried
txt_df1 <- data.frame("page1", "page2", "page3", "page4")
#remember the dataframes might have any number of columns
for (n in (1:ncol(txt_df1))){
colnames(txt_df1[n]) <- n
}
and
txt_df1 <- data.frame("page1", "page2", "page3", "page4")
#remember the dataframes might have any number of columns
for (n in (1:ncol(txt_df1))){
txt_df1 <- rename(txt_df1, n = n)
}
For some reason, nothing happens when either of these are run. Any suggestions on how to do this better/ make this code work?
rename_func <- function(df) {
df_max <- ncol(df)
names_vec <- c(1:df_max)
names(df) <- names_vec
df
}
If we use the following data.frames:
txt_df1 <- data.frame("page1", "page2", "page3", "page4")
txt_df2 <- data.frame("page1", "page2")
We will get:
rename_func(txt_df1)
1 2 3 4
1 page1 page2 page3 page4
...and
rename_func(txt_df2)
1 2
1 page1 page2
You almost solved the issue yourself, all you need to change is moving the index [n] outside of the data.frame to colnames(txt_df1)[n] <- n
It would be better to reference the column names in your data frames like this:
names(txt_df1)[indexNumber]
This way, you can assign names in your for-loop:
for(n in 1:ncol(txt_df1)){
names(txt_df1)[n] <- #new name here
}
As for how to assign names, you could try this:
for(n in 1:ncol(txt_df1)){
names(txt_df1)[n] <- paste("page_", n, sep="")
}

Finding a missed row in time series in R

I have a daily time series for 20 years(column 1 dates and other columns different data), and one row is deleted, which I don't know which one is.
I want to find that row and insert related date in that row and also interpolate other columns for that row!
Is it possible in R?
Thanks
Supposing your date column is of class "Date", here's a way:
# generate sample data
my.df <- data.frame(date=Sys.Date(), other=rnorm(1))
for(i in 2:100) {
my.df[i,] <- list(Sys.Date() + (i-1), rnorm(1))
}
class(my.df$date)
# [1] "Date"
# remove row 71
my.df <- my.df[-71,]
# Iterate to see where there is a gap
for(i in 2:nrow(my.df)) {
if(my.df$date[i] != my.df$date[i-1] + 1) {
cat("missing row:", i)
break
}
}
missing row: 71

R: Looping and syntax when grouping rows

My data is 18 rows by 8 columns. It contains both numerical and word data. I want to assing each row an ID number. I want to group the rows with the same info in the first 5 columns by the same ID number. For some reason I don't think I am looping properly. Any thoughts?
sampdata<-read.csv("xxx")
sampdata["ID"] <- 0 #ID column
count<-1 #to subtract from 10000
for (p in 1:18) {
if (sampdata[p,9] == 0){
count<-count+1
sampdata[9,p]<-10000-count
for (i in 1:5){ #column index for current check (only check defining info)
for (j in 1:18) { #row index for current check
for (k in 1:18){ #column index for current check against
if (sampdata[i,j]==sampdata[i,k])
sampdata[j,9]<-sampdata[9,p] #assign same ID number
}
}
}
}
}
Assuming your data looks something like this
mm<-matrix(c(
1,1,2,2,3, 1,1,2,2,3,
2,2,2,3,3, 2,2,2,3,3,
4,3,2,1,2, 1,1,2,2,3,
3,1,1,2,2
), byrow=T, ncol=5)
dd<-data.frame(mm[,1:3],
X4=letters[mm[,4]], X5=mm[,5],
matrix(runif(nrow(mm)*(18-ncol(mm))), nrow=nrow(mm)))
Where your data is in dd and the first 5 columns define a group. You can use interaction() to assign a unique ID to each group like this
dd$ID <- as.numeric(interaction(dd[,1:5], drop=T, lex.order=T))

Resources