Extract column value based on row index - r

please help a newbie here. I want to extract the TOTAL_SITE information from the sites1 data frame. The data frame looks like:
FACILITY TOTAL_SITE
A 100
B 80
C 20
if(nrow(sites1) > 0){
outStr <- "<ul>"
for(site in sites1$FACILITY){
outStr <- paste0(outStr,"<li>",site,": ", sites1$TOTAL_SITE, "</li>")
}
outStr <- paste0(outStr, "</ul>")
} else {
outStr <- ""
}
However, the result shows repeatedly printing lines and indicates that I shouldn't loop through sites1$FACILITY and sites1$TOTAL_SITE concurrently. How can I use the index (row number) that will be corresponding to both columns of the sites1 data frame?

This will get the TOTAL_SITE data from the df (not sure if this is what you mean?)
TotalSiteData<-sites1["TOTAL_SITE"]
But this should keep corresponding row numbers and the data you're after.

If I understood correctly, you need to make an HTML unordered list from TOTAL_SITE column, correct? Hopefully this solves your problem:
total_site <- sites1[,"TOTAL_SITE"]
outStr <- sapply(total_site, function(value){
paste("<li>", value, "</li>", sep = "")
})
outStr <- paste("<ul>", outStr, "</ul>")

I believe this is what you are asking for:
if(nrow(sites1) > 0)
outStr <- paste0("<ul>",
paste0("<li>", sites1$FACILITY, ": ", sites1$TOTAL_SITE, "</li>",collapse=""),
"</ul>"
)
else
outStr <- ""
This code will take your data.frame's columns and paste them together by rows, assigning outStr the following character vector in your example:
<ul><li>A: 100</li><li>B: 80</li><li>C: 20</li></ul>

Related

Removing a row by string-matching in R regardless of whether it exists or not

I am trying to remove a row in a dataframe based on string matching. I'm using:
data <- data[- grep("my_string", data$field1),]
When there's an actual row with the value "my_string" in data$field1 this works as expected and it drops that row. However, if there is no string "my_string", it creates an empty dataframe. How to I do write this so that it allows for the possibility of the string to not exist, and still keeps my data frame intact?
It may be better to use grepl and negate with !
data[!grepl("my_string", data$field1),]
Or another option is setdiff on grep
data[setdiff(seq_len(nrow(data)), grep("my_string", data$field1)),]
You can use a plain if statement.
df <- data.frame(fieled = c("my_string", "my_string_not", "something", "something_else"),
numbers = 1:4)
result <- grep("gabriel", df$fieled)
if (length(result))
{
df <- df[- result, ]
}
df
result <- grep("my_string", df$fieled)
if (length(result))
{
df <- df[- result, ]
}
df

How to spot a phrase/word in a cell of a dataframe, using R

Having a data.frame like 'df', I would like to spot this exact phrase "keratinization [GO:0031424]" in each cell of the column 'bio_process'. Afterwards, I want to create a new vector with 'ID' of the observations that the match occured.
ID <- c("Q9BYP8", "Q17RH7", "Q6L8G8", "Q9BYR4")
bio_process <- c("keratinization [GO:0031424]", "NA", "keratinization [GO:0031424]", "aging [GO:0007568]; hair cycle [GO:0042633]; keratinization [GO:0031424]")
df <- as.data.frame(cbind(ID,bio_process))
in order to acheive this, I applied a for loop. I used the %in% into the loop, like this:
n <- 4
ids <- vector(mode = "character", length = n)
for (i in 1:n) {
if ("keratinization [GO:0031424]" %in% df$bio_process[i]) {
ids[i] <- data$ID[i]
}
}
As a result I would like the content of 'ids' vector to be like this one below.
"Q9BYP8" "Q6L8G8" "Q9BYR4"
However, %in% does not work for the cells were 'keratinization [GO:0031424]' is not the only content.
Any ideas? Thank you
you can use grepl in Base-R
df$ID[grepl("keratinization \\[GO:0031424\\]",df$bio_process)]
[1] Q9BYP8 Q6L8G8 Q9BYR4
note I had to escape the [ character with \\ as square brackets have special meaning in regex.

Grab string from table and append as column in R

I have the following .csv file:
https://drive.google.com/open?id=0Bydt25g6hdY-RDJ4WG41VFpyX1k
And I would like to be able to take the date and agent name(pasting its constituent parts) and append them as columns to the right of the table, up until it finds a different name and date, doing the same for the remaining name and date items, to get the following result:
The only thing I have been able to do with the dplyr package is the following:
library(dplyr)
library(stringr)
report <- read.csv(file ="test15.csv", head=TRUE, sep=",")
date_pattern <- "(\\d+/\\d+/\\d+)"
date <- str_extract(report[,2], date_pattern)
report <- mutate(report, date = date)
Which gives me the following result:
The difficulty I am finding is probably using conditionals in order make the script get the appropriate string and append it as a column at the end of the table.
This might be crude, but I think it illustrates several things: a) setting stringsAsFactors=F; b) "pre-allocating" the columns in the data frame; and c) using the column name instead of column number to set the value.
report<-read.csv('test15.csv', header=T, stringsAsFactors=F)
# first, allocate the two additional columns (with NAs)
report$date <- rep(NA, nrow(report))
report$agent <- rep(NA, nrow(report))
# step through the rows
for (i in 1:nrow(report)) {
# grab current name and date if "Agent:"
if (report[i,1] == 'Agent:') {
currDate <- report[i+1,2]
currName=paste(report[i,2:5], collapse=' ')
# otherwise append the name/date
} else {
report[i,'date'] <- currDate
report[i,'agent'] <- currName
}
}
write.csv(report, 'test15a.csv')

Putting a for loop result into Data frame

I am trying to create a list out of the result from a for loop in R.
Then I want to use cbind to add the list into the dataframe.
When I run this code, it does not work.
Can you please help?
GenSpc <- list()
for(i in 1:68) {
paste(NewtableAllLoci$host_genus[i], NewtableAllLoci$host_species[i], collapse = " ")
}
You do not assigned anything to the target list GenSpc. To get want you want, do:
GenSpc <- rep(0, nrow(NewtableAllLoci))
for(i in 1:nrow(NewtableAllLoci)) {
GenSpc[i] <- paste(NewtableAllLoci$host_genus[i], NewtableAllLoci$host_species[i], collapse = " ")
}
D <- cbind(NewtableAllLoci, GenSpc)

How to filter for 'any value' in R?

Strange question but how to do I filter such that all rows are returned for a dataframe? For example, say you have the following dataframe:
Pts <- floor(runif(20, 0, 4))
Name <- c(rep("Adam",5), rep("Ben",5), rep("Charlie",5), rep("Daisy",5))
df <- data.frame(Pts, Name)
And say you want to set up a predetermined filter for this dataframe, for example:
Ptsfilter <- c("2", "1")
Which you will then run through the dataframe, to get your new filtered dataframe
dffil <- df[df$Pts %in% Ptsfilter, ]
At times, however, you don't want the dataframe to be filtered at all, and in the interests of automation and minimising workload, you don't want to have to go back and remove/comment-out every instance of this filter. You just want to be able to adjust the Ptsfilter value such that no rows will be filtered out of the dataframe, when that line of code is run.
I have experimented/guesses with things like:
Ptsfilter <- c("")
Ptsfilter <- c(" ")
Ptsfilter <- c()
to no avail.
Is there a value I can enter for Ptsfilter that will achieve this goal?
You might need to define a function to do this for you.
filterDF = function(df,filter){
if(length(filter)>0){
return(df[df$Pts %in% filter, ])
}
else{
return(df)
}
}

Resources