R: error handling in finding the row number of a dataframe - r

I have a dataframe with an ID column and another dummy column. In the first step the user enters a number which should be one of the IDs (ID_edit). Then the respective row index is determined. If the ID is in the dataframe everything works fine. If not (because the user enters a wrong ID or no ID at all) there should be an error message. I tried this:
test_df <- data.frame("ID" = c(1,3,6,8),
"char" = c("a","b","c","d"))
ID_edit <- as.integer(2)
row_nr_df <- which(test_df$ID == ID_edit, arr.ind=TRUE)
View(test_df$ID)
row_list <- as.numeric(rownames(test_df))
if(!is.null(row_nr_df %in% row_list)) {
print("Row number in row list")
} else {
print("Row number not in row list")}
View(row_nr_df)
If I change
ID_edit <- as.integer(1)
which is working, to
ID_edit <- as.integer(2)
the if-statement is still TRUE, but I expect and want to have the else block here.
View(row_nr_df)
shows then the message "No data available in table".
In the end I want to access the dataframe with the row number, e.g.:
char_edit <- test_df$char[[row_nr_df]]
But this is not working, if the row number does not exist .

test_df <- data.frame("ID" = c(1,3,6,8),
"char" = c("a","b","c","d"))
isin<-function(x,data)
{if(length(which(data$ID == x, arr.ind=TRUE))>=1)
{data[which(data$ID == x, arr.ind=TRUE),]}
else{"not in list"}}
> isin(x=3,data=test_df)
ID char
2 3 b
> isin(x=2,data=test_df)
[1] "not in list"

Related

sapply as.numeric not working inside a for loop

I'm using a for loop to read every sheet in an Excel file, process it and save it in a list. First iteration does it without problem, but when it gets to i=2, the following error shows up:
Error:
! Assigned data `sapply(EstContratos[cols.name], as.numeric)` must be compatible with existing data.
✖ Existing data has 1 row.
✖ Assigned data has 3 rows.
ℹ Row updates require a list value. Do you need `list()` or `as.list()`?
This is the code:
mysheetlist <- excel_sheets(path="whatever/EstadoContrato.xlsx")
datalist<-list()
i=1
for (i in 1:length(mysheetlist)){
EstContratos <- read_excel(path="whatever/EstadoContrato.xlsx", sheet = mysheetlist[i])
EstContratos<-EstContratos %>%
filter (!row_number() %in% c (1:15))
EstContratos<-Filter(function(x)!all(is.na(x)), EstContratos)
colnames(EstContratos)<-c("Cap","UM","CantContr","VrUni","VrUniTot","VrTotalContr",
"CantEjec","VrTotalEjec","CantXEjec","VrXEjec")
EstContratos<-EstContratos%>%
select(Cap,VrTotalContr,VrTotalEjec,VrXEjec)
EstContratos$VrTotalContr<- gsub(",", "",EstContratos$VrTotalContr)
EstContratos$VrTotalEjec<- gsub(",", "",EstContratos$VrTotalEjec)
EstContratos$VrXEjec<-gsub(",", "",EstContratos$VrXEjec)
EstContratos<-EstContratos %>% drop_na(Cap)
cols.name <- c("VrTotalContr","VrTotalEjec","VrXEjec")
EstContratos[cols.name] <- sapply(EstContratos[cols.name],as.numeric)
EstContratos[c('CapCod', 'Capítulo')] <- str_split_fixed(EstContratos$Cap, ' ', 2)
EstContratos<-EstContratos%>%
select(CapCod,VrTotalContr,VrTotalEjec)
ResumenContratos<-aggregate(EstContratos[,2:3], by = list(EstContratos$CapCod), sum)
datalist[[i]] <- ResumenContratos
}
ConsolidadoContratos = do.call(rbind, datalist)
I'm not sure why it says assigned data has 3 rows when its the same EstContratos dataframe that in the second iteration has 1 row. Maybe it's taking the EstContratos dataframe from the first iteration which does in fact have 3 rows, but I don't know how that's possible.
(Sorry if it's not clear its my first time posting a question)
If I understand correctly and your intention is to transform those columns to numeric type, you can try:
EstContratos[ , cols.name] <- lapply(EstContratos[ , cols.name], as.numeric)
or
EstContratos[ , cols.name] <- apply(EstContratos[ , cols.name], 2, as.numeric)

Reverse lookup for loop in R

I have a set of numbers / string that makes other number / string. I need to create a function that gives me a list of the all the numbers / string needed to create that number / string.
Consider the following dataset
ingredients <- c('N/A', 'cat', 'bird')
product <- c('cat', 'bird', 'dog')
data <- data.frame(ingredients, product)
head(data)
If I input function(dog), I would like a list that returns bird and then cat. The function knows when to stop when ingredients = N/A (there's nothing more to look up).
It seems like some of sort of for loop that appends is the right approach.
needed <- list()
for (product in list){
needed[[product]]<-df
}
df <- dplyr::bind_rows(product)
I appended your initial code to make N/A simply equal to NA so I could use the is.na function in my R code. Now the sample data is
ingredients <- c(NA, 'cat', 'bird')
product <- c('cat', 'bird', 'dog')
data <- data.frame(ingredients, product)
Code is below:
ReverseLookup <- function (input) {
ans <- list()
while (input %in% data$product) {
if (!is.na(as.character(data[which(data$product == input),]$ingredients))) {
ans <- append(ans, as.character(data[which(data$product == input),]$ingredients))
input <- as.character(data[which(data$product == input),]$ingredients)
}
else {
break
}
}
print(ans)
}
I create an empty list and then create a while loop that just checks if the input exists in the product column. If so, it then checks to see if the corresponding ingredient to the product input is a non-NA value. If that's the case, the ingredient will be appended to ans and will become the new input. I also added a break statement to get out of the while loop when you reach an NA.
I did a quick test on the case where there is no NA in your dataframe and it appears to be working fine. Maybe someone else here can figure out a more concise way to write this, but it should work for you.
You can likely find a way to use a tree of some type to work through nodes. But, using a recursive function in base R, I have come up with this.
I have also changed the 'N/A' to NA to make life easier. Also, I have added in stringsAsFactors = F to the data frame.
ingredients <- c(NA, 'cat', 'bird')
product <- c('cat', 'bird', 'dog')
data <- data.frame(ingredients, product, stringsAsFactors = F)
reverse_lookup <- function(data, x, last_result = NULL) {
if (! is.null(last_result)) {
x <- data[data$product == last_result[length(last_result)], "ingredients"]
}
if (! is.na(x)) {
last_result <- reverse_lookup(data, x, c(last_result, x))
}
last_result
}
This returns the input as well, which you can always drop off as the first element of the vector.
> reverse_lookup(data, "dog")
[1] "dog" "bird" "cat"

R Assigning New Variable based on data

I have a data which includes rentals and searchs. If search is made by same customer who made rental, and if search made before rental then i want to assign as successful search.
Here is a part of my data.
time <- c("2019-03-13 14:43:00", "2019-03-13 14:34:00", "2019-03-13 14:23:00")
user <- c("A", "B", "A")
Type <- c("Rental","Search","Search")
data <- cbind(time, user, Type)
I need a new column that shows third row as successful.
But I have lots of data. So i need to do something like this:
If type is search and
If there is a rental up to 2 hours after search,
And if that rental's user name is equals search's user name
Then data$result <- "Successful"
I changed your data because it didn't make sense with your instructions. The time var you have is a point in time not a duration. So you either need a duration or two points. Also you said the rental's user name equals search's user name, but you only provided one name. Regardless this is how you would setup an if else as you describe.
time <- c(1:3)
username <- c("A", "B", "A")
rentalname <- c("A", "B", "A")
Type <- c("Rental","Search","Search")
data <- data.frame(time, username, rentalname, Type)
data$result <- ifelse(
data$Type %in% "Search" &
data$time > 2 &
data$username %in% data$rentalname, "Successful" ,"Failure")
If I understand well what you want, this should work (it creates the new data frame "success" with the successful entries):
# create new data frame
success <- data.frame(time=character(), user=character(), Type=character(), result=character(), stringsAsFactors=F)
count <- 1
# loop around each user
for(us in unique(data[,"user"])){
# subset data per user
subdata <- data[data[,"user"] == us, ]
# skips the user if there is only one entry for that user or if there is no "Rental" entry in "Type"
if(is.null(dim(subdata))) next;
if(!is.null(dim(subdata)) & !any(subdata[,"Type"] == "Rental")) next;
# sort subdata chronologically
subdata <- subdata[order(subdata[,"time"]),]
# loop around rows in the subdata
for(i in 2:nrow(subdata)){
# calculate the time difference between entries i and i-1 if i is a rental and i-1 a search
if(difftime(subdata[i,"time"], subdata[i-1, "time"], units="mins") < 120 & subdata[i-1, "Type"] == "Search" & subdata[i, "Type"] == "Rental"){
success[count,] <- c(subdata[i,], "success")
count <- count +1
}
}
}
It works for that small matrix you gave although you would need to try and make sure it works properly with a larger one.

matching fragment of a column value with another column value in R

I want to match an original ID with a new ID which is only a fragment of the original ID and return all of the original IDs. Ex. For a data.frame dat, OrigID is a column name. ID value is XXX_X_XXX and the new ID is only the last portion after the underscore sign _, which is XXX. How can I match this?
I'm not sure how to return only the fragment. I think this returns all hits and not just the portion after the '_' giving me too many values. I also want to place NA values in the vector wherever the ID's don't match.
Ex.
IDdat <- read.csv("OrigID.csv")
data <- read.csv("data.csv")
subjects <- unique(data$ID)
IDlist <- c()
for (i in 1:length(subjects)) {
OrigID <- grep(subjects[i], IDdat$ID, value = TRUE)
IDlist <- rbind(IDlist, data.frame(OrigID)
}
Thanks!
We can use grep
grep(new_ID, colnames(dat))

How to filter for 'any value' in R?

Strange question but how to do I filter such that all rows are returned for a dataframe? For example, say you have the following dataframe:
Pts <- floor(runif(20, 0, 4))
Name <- c(rep("Adam",5), rep("Ben",5), rep("Charlie",5), rep("Daisy",5))
df <- data.frame(Pts, Name)
And say you want to set up a predetermined filter for this dataframe, for example:
Ptsfilter <- c("2", "1")
Which you will then run through the dataframe, to get your new filtered dataframe
dffil <- df[df$Pts %in% Ptsfilter, ]
At times, however, you don't want the dataframe to be filtered at all, and in the interests of automation and minimising workload, you don't want to have to go back and remove/comment-out every instance of this filter. You just want to be able to adjust the Ptsfilter value such that no rows will be filtered out of the dataframe, when that line of code is run.
I have experimented/guesses with things like:
Ptsfilter <- c("")
Ptsfilter <- c(" ")
Ptsfilter <- c()
to no avail.
Is there a value I can enter for Ptsfilter that will achieve this goal?
You might need to define a function to do this for you.
filterDF = function(df,filter){
if(length(filter)>0){
return(df[df$Pts %in% filter, ])
}
else{
return(df)
}
}

Resources