Adding incomplete columns to a table

Adding incomplete columns to a table - r

I'm new to R and picking it up pretty quick, I think, but I've hit a wall and I'm not even sure what to google to figure this out for myself.
In the code excerpt below, i'm adding a few calculated columns to table ALLDATA. The problem is with the last line. If I have an ALLDATA table where every entry has an associated QCAnalysisNumber, the code works fine. If only SOME of the entries have a QCAnalysis number, that column doesn't populate at all. I would like it to find an appropriate QCAnalysisNumber, and if it can't, just be NA or let me insert text like "No QCAnalysisNumber".
Can you guys tell me where I'm going wrong or point me in the right direction? Even just appropriate search terms for google would be a huge help. Thanks!
ALLDATA$IntResult <- round(ALLDATA$Value, 0)
ALLDATA$ComboResult <- ifelse(toupper(ALLDATA$DetectedResult)=="N", ALLDATA$Value/2, round(ALLDATA$Value, 0))
ALLDATA$ND15Result <- ifelse(toupper(ALLDATA$DetectedResult)=="N", ALLDATA$Value/2, ALLDATA$Value)
ALLDATA$LogComboResult <- ifelse(ALLDATA$DetectedResult=="N", log10(abs(ALLDATA$Value/2)), log10(abs(ALLDATA$Value)))
ALLDATA$LogResult <- log10(abs(ALLDATA$Value))
ALLDATA$QCAnalysisNumber <- ALLDATA$AnalysisNumber[ALLDATA$QCSampleCode!="O" &
ALLDATA$LongName==ALLDATAQC$LongName &
ALLDATA$SampleDate_D==ALLDATAQC$SampleDate_D]

Related

How to use partial matches across multiple columns in R to set final value

I am new to R, moving over from Excel VBA. I would like to categorize a final value based on the text provided in multiple columns and 20k+ rows.
I've been semi-successful with "if" and "identical" but have struggled with partial matches through using "grep"
I'll share psuedo-code of what I'm trying to achieve:
If d$Removal_Reason_Code contains "SCH" AND
If d$Shop_Action_Code is an exact match to "Test" AND
If d$Repair_Summary contains "No Fault Found"
Then
set d$Category to "NFF"
Else
go back to row 1 and check against other keywords
I can post the working VBA code if that is helpful. I'm just getting my head round how R works, and was hoping it may be a quick and easy answer for one of you gurus!
Much appreciated :)

We can use grepl for partial matches
i1 <- with(d, grepl("SCH", Removal_Reason_Code) & Shop_Action_Code == "TEST" &
grepl("No Fault Found", Repair_Summary))
d$Category[i1] <- "NFF"

Combining many vectors into one larger vector (in an automated way)

I have a list of identifiers as follows:
url_num <- c('85054655', '85023543', '85001177', '84988480', '84978776', '84952756', '84940316', '84916976', '84901819', '84884081', '84862066', '84848942', '84820189', '84814935', '84808144')
And from each of these I'm creating a unique variable:
for (id in url_num){
assign(paste('test_', id, sep = ""), FUNCTION GOES HERE)
}
This leaves me with my variables which are:
test_8505465, test_85023543, etc, etc
Each of them hold the correct output from the function (I've checked), however my next step is to combine them into one big vector which holds all of these created variables as a seperate element in the vector. This is easy enough via:
c(test_85054655,test_85023543,test_85001177,test_84988480,test_84978776,test_84952756,test_84940316,test_84916976,test_84901819,test_84884081,test_84862066,test_84848942,test_84820189,test_84814935,test_84808144)
However, as I update the original 'url_num' vector with new identifiers, I'd also have to come down to the above chunk and update this too!
Surely there's a more automated way I can setup the above chunk?
Maybe some sort of concat() function in the original for-loop which just adds each created variable straight into an empty vector right then and there?
So far I've just been trying to list all the variable names and somehow get the output to be in an acceptable format to get thrown straight into the c() function.
for (id in url_num){
cat(as.name(paste('test_', id, ",", sep = "")))
}
...which results in:
test_85054655,test_85023543,test_85001177,test_84988480,test_84978776,test_84952756,test_84940316,test_84916976,test_84901819,test_84884081,test_84862066,test_84848942,test_84820189,test_84814935,test_84808144,
This is close to the output I'm looking for but because it's using the cat() function it's essentially a print statement and its output can't really get put anywhere. Not to mention I feel like this method I've attempted is wrong to begin with and there must be something simpler I'm missing.
Thanks in advance for any help you guys can give me!
Troy

Web Scraping with R in ATPWORLDTOUR

I'm trying to scrape if the player is right handed or left handed from this page (http://www.atpworldtour.com/en/players/novak-djokovic/d643/fedex-atp-win-loss). I used the following code to scrape this info:(1603.html is the saved link)
y <- htmlParse('1603.html')
x <- xpathApply(y,"//div[#class='player-profile-hero-table']")
sapply(x,xmlValue)
The code returns me the following:
"Age\r\n\t\t\t\t\t\t\t\t\r\n28\t\t\t\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\t\t\t\t\t(1987.05.22)\r\n\t\t\t\t\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\r\n\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\tTurned Pro\r\n\t\t\t\t\t\t\t\t\r\n2003\t\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\r\n\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\tWeight\r\n\t\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\t\t\t172lbs(78kg)\r\n\t\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\r\n\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\tHeight\r\n\t\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\t\t\t6'2\"(188cm)\r\n\t\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\r\n\t\t\t\t\t\r\n\t\t\t\t\t\t\t\t\r\n\t\t\r\n\t\t\tBirthplace\r\n\t\t\r\n\t\t\r\n\t\t\tBelgrade, Serbia\r\n\t\t\r\n\t\r\n\r\n\t\t\t\t\t\t\r\n\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\tResidence\r\n\t\t\t\t\t\t\t\tMonte-Carlo, Monaco\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\r\n\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\t\r\n\t\t\r\n\t\t\tPlays\r\n\t\t\r\n\t\t\r\nRight-Handed, Two-Handed Backhand\t\t\r\n\t\r\n\r\n\t\t\t\t\t\t\r\n\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\t\r\n\t\tCoach\r\n\t\t\r\n\t\t\tBoris Becker, Marian Vajda"
What can I do to remove all this letter t's and r's in middle of the result?To know if the player is right handed or left handed I think x should be defined as : x <- xpathApply(y,"//table[#width='570']"). What should I do?

One solution is to use the wonderful readHTMLTable() to get all the tables from the page, then select the correct table and cell for the information.
This function takes the URL and returns "R" for right-handed, "L" for left-handed. It does that by selecting the first table, second row of the third column, and then uses substr to grab the 16th character. You can adapt it to whatever you like.
scrapey <- function(URL){
x <- readHTMLTable(URL,header=F, stringsAsFactors=F);
substr(x[[1]][2,3],16,16)}

how to prompt user to remove multiple columns using the readline() in R

I am trying to write a code that allows the user to decide how many columns to remove from a table in R. The steps I am trying to perform are as follows:
1) print the column headers of the table
2) ask the user if they want to remove any columns. If the answer is yes, proceed to remove columns. This is in a loop, in case the user wants to remove multiple columns.
3) once the user is done removing columns, I want the modified table (with unwanted columns removed) to be returned so that it can be used later in script.
4) if the user does not want to remove any columns at all, they can just proceed, and the table is returned with no columns missing.
I am having 2 major issues/questions with my code as I currently have it:
1) the loop only works once (only one column is removed). the loop does work (it keeps prompting me if I keep answering "Y"), however in the end, the returned object only has 1 column removed (the first column I removed when the loop began). I tried to find if there is a way to have the user write in multiple inputs using readline, however the answers I found did not really help me.
2) If I don't want to remove any columns, and I enter "no" the first time I'm prompted for input, something very strange happens where what is returned is a table with the first column is removed.
I am still a newbie at coding, and I realize this may not be the best way to do what I want to do. I appreciate any advice/feedback!
my_data<-read.table(file.choose(),header=TRUE)
print(names(my_data)
for (column in my_data) {
remove_columns<-readline("Would you like to remove any columns? \n")
if(remove_columns=="Y" || remove_columns=="y") {
my_data_new<-my_data[,-!names(my_data) %in% c(readline("Which columns would you like to remove? \n"))]
} else {
return(my_data_new)
}}

I think you're looking for a while loop
my_data <- read.table(file.choose(), header = TRUE)
print(names(my_data)
while (TRUE) {
remove_columns <- readline("Would you like to remove any columns? \n")
if (remove_columns == "Y" || remove_columns == "y") {
my_data <- my_data[,-!names(my_data) %in% c(readline("Which columns would you like to remove? \n"))]
} else {
break
}
}

readHTMLTables -- Retrieving Country Names and urls of articles related to the heads of governments

I'd like to make a map of the actual world presidents.
For this, I want to scrape the images of each president from wikipedia.
The first step is getting data from the wiki page:
http://en.wikipedia.org/wiki/List_of_current_heads_of_state_and_government
I have trouble getting the country names and president page urls because the table has rowspans.
For the moment, my code looks like below but it's not ok because of the row spanning..
library(XML)
u = "http://en.wikipedia.org/wiki/List_of_current_heads_of_state_and_government"
doc = htmlParse(u)
tb = getNodeSet(doc, "//table")[[3]]
stateNames <- readHTMLTable(tb)$State
presidentUrls <- xpathSApply(tb, "//table/tr/td[2]/a[2]/#href")
Any idea welcome!
Mat

If there is heterogeneity in the table, I don't think we can deal with the problem by a single line of code. In your case, some td has colspan=2, while the others don't. So they can be selected and processed separately with filters like the following:
nations1 <- xpathSApply(tb, "//table/tr[td[#colspan='2']]/td[1]/a/text()")
nations2 <- xpathSApply(tb, "//table/tr[count(td)=3]/td[1]/a/text()")
Should you meet other types of conditions in the table, just keep in mind that XPath has more.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Adding incomplete columns to a table - r

Related

How to use partial matches across multiple columns in R to set final value

Combining many vectors into one larger vector (in an automated way)

Web Scraping with R in ATPWORLDTOUR

how to prompt user to remove multiple columns using the readline() in R

readHTMLTables -- Retrieving Country Names and urls of articles related to the heads of governments

Categories

Resources