I want to update a column value in such a way that only the part of a string after the last '.' is stored. I wrote a code that does this but it only works when it's given one input. How do I loop through all the rows of my dataframe?
One value of a row looks for example like this. I only want to store the last part ".gif"
GET /./enviro/gif/emcilogo.gif
I wrote the following code that succesfully does this.
tail(c(do.call(rbind, strsplit(as.character(sapply(strsplit("GET /./enviro/gif/emcilogo.gif", "\\s+"), `[`, 2)),"\\."))), n=1)
Output:
"gif"
However, I am using the string "GET /./enviro/gif/emcilogo.gif" as input. As soon as I change this to the column of my dataframe "df$request" I receive the error.
Error in strsplit(epa.df$request) :
argument "split" is missing, with no default
I tried writing a function that one-by-one loops through my column values and updates them. However, I cant seem to get this working.
Any help would be highly appreciated!
Related
I am new to R, moving over from Excel VBA. I would like to categorize a final value based on the text provided in multiple columns and 20k+ rows.
I've been semi-successful with "if" and "identical" but have struggled with partial matches through using "grep"
I'll share psuedo-code of what I'm trying to achieve:
If d$Removal_Reason_Code contains "SCH" AND
If d$Shop_Action_Code is an exact match to "Test" AND
If d$Repair_Summary contains "No Fault Found"
Then
set d$Category to "NFF"
Else
go back to row 1 and check against other keywords
I can post the working VBA code if that is helpful. I'm just getting my head round how R works, and was hoping it may be a quick and easy answer for one of you gurus!
Much appreciated :)
We can use grepl for partial matches
i1 <- with(d, grepl("SCH", Removal_Reason_Code) & Shop_Action_Code == "TEST" &
grepl("No Fault Found", Repair_Summary))
d$Category[i1] <- "NFF"
I am new to R and I have troubles understanding how displaying an index works.
# Find indices of NAs in Max.Gust.SpeedMPH
ind <- which(is.na(weather6$Max.Gust.SpeedMPH))
# Look at the full rows for records missing Max.Gust.SpeedMPH
weather6[ind, ]
My code here works, no problem but I don't understand why weather6[ind] won't display the same thing as weather6[ind, ] . I got very lucky and mistyped the first time.
I apologize in advance that the question might have been posted somewhere else, I searched and couldn't find a proper answer.
So [ is a function just like any other function in R, but we call it strangely. Another way to write it in this case would be:
'[.data.frame'(weather6,ind,)
or the other way:
'[.data.frame'(weather6,ind)
The first three arguments to the function are named x, i and j. If you look at the code, early on it branches with the line:
if (Narg < 3L)
Putting the extra comma tells R that you've called the function with 3 arguments, but that the j argument is "missing". Otherwise, without the comma, you have only 2 arguments, and the function code moves on the the next [ method for lists, in which it will extract the first column instead.
I have a list of identifiers as follows:
url_num <- c('85054655', '85023543', '85001177', '84988480', '84978776', '84952756', '84940316', '84916976', '84901819', '84884081', '84862066', '84848942', '84820189', '84814935', '84808144')
And from each of these I'm creating a unique variable:
for (id in url_num){
assign(paste('test_', id, sep = ""), FUNCTION GOES HERE)
}
This leaves me with my variables which are:
test_8505465, test_85023543, etc, etc
Each of them hold the correct output from the function (I've checked), however my next step is to combine them into one big vector which holds all of these created variables as a seperate element in the vector. This is easy enough via:
c(test_85054655,test_85023543,test_85001177,test_84988480,test_84978776,test_84952756,test_84940316,test_84916976,test_84901819,test_84884081,test_84862066,test_84848942,test_84820189,test_84814935,test_84808144)
However, as I update the original 'url_num' vector with new identifiers, I'd also have to come down to the above chunk and update this too!
Surely there's a more automated way I can setup the above chunk?
Maybe some sort of concat() function in the original for-loop which just adds each created variable straight into an empty vector right then and there?
So far I've just been trying to list all the variable names and somehow get the output to be in an acceptable format to get thrown straight into the c() function.
for (id in url_num){
cat(as.name(paste('test_', id, ",", sep = "")))
}
...which results in:
test_85054655,test_85023543,test_85001177,test_84988480,test_84978776,test_84952756,test_84940316,test_84916976,test_84901819,test_84884081,test_84862066,test_84848942,test_84820189,test_84814935,test_84808144,
This is close to the output I'm looking for but because it's using the cat() function it's essentially a print statement and its output can't really get put anywhere. Not to mention I feel like this method I've attempted is wrong to begin with and there must be something simpler I'm missing.
Thanks in advance for any help you guys can give me!
Troy
I am working with RElasticSearch package in R. I am able to connect to the proper index in ElasticSearch. Suppose my index contains two fields like id and name. Two of my R variables,say rid and rname contains the value i want to search. How should i use the searchES method to accomplish this? I have tried using like:
searchES(server=es.index,query="id":rid & "name":rname)
but it keeps throwing an error! Can someone please help me out?
You need to correctly build your as a character value in order for this to work. In order to concatenate strings in R, you should use paste(). For example
searchES(server=es.index,query=paste0("id:", rid, " AND name:", rname))
I am a new R user and having some difficulty when trying to rename certain records in a column.
My data have columns named classcode and fish_tl, among others. Classcode is a character value, fish_tl is numeric.
When classcode='OCAL' and fish_tl<20, I need to rename that value of classcode so that it is now "OCALYOY". I don't want to change any of the other records in classcode.
I'm running the following code:
data$classcode<-ifelse(data$classcode=='OCAL'& data$fish_tl<20,
'OCALYOY',data$classcode)
My problem seems to be with the "else" aspect: the code runs fine, and returns 'OCALYOY' as expected, but the other values of classcode have now been converted to numeric (although when I look at the mode of that field, it still returns as "character").
What am I doing wrong?
Thanks very much!
You can make the else part as.character(data$classcode). ifelse has some odd semantics with regard to the classes of the arguments, and it is turning your factor into it's underlying numeric representation. as.character will keep it as a character value.
You may be getting tripped up in a factor vs character issue, though you point out that R thinks it's character. Regardless, wrapping as.character() around your code seems to fix the problem for me:
> ifelse(data$classcode=='OCAL'& data$fish_tl<20,
+ 'OCALYOY',as.character(data$classcode))
#-----
[1] "BFRE" "BFRE" "BFRE" "HARG" "OCALYOY" "OYT" "OYT" "PFUR"
[9] "SPAU" "BFRE" "OCALYOY" "OCAL"
If this isn't it, can you make your question reproducible by adding the output of dput() to your question instead of the text representation?