Replace Value in a Column Using a Loop and Custom Function - R - r

I have a data.frame with a column (named "color") in which every value is "black." I also have created a function that can replace "black" with other colors depending on another column's value (the "growth" column value). I need to create a loop that uses this function to replace the values in the "color" column according to the "growth" value
# Create a function
check_it <- function(x)
if(x>500){
return("green")
} else if(x<0) {
return("red")
} else {
return("blue")
}
# Create a loop using check_it
for(x in 1:nrow(all_data)) {
...
# Given this hint:
# You can use 1:nrow(all_data) as a set of indices
# to do something like the following inside the loop:
# all_data[i, "color"] <-
# check_it( all_data[i, "growth"] )
Any suggestions?
SAMPLE DATA
| station_id | timestamp | growth.x | growth.y | color |
--------------------------------------------------------
| DB1 | 1/14/01 | 59.916 | 59.9164 | black |
--------------------------------------------------------
| DB1 | 1/14/02 | 316.128 | 316.128 | black |
--------------------------------------------------------
| DB1 | 1/14/03 | -12.456 | -12.456 | black |
--------------------------------------------------------
| DB1 | 1/14/04 | 537.443 | 537.443 | black |
--------------------------------------------------------
Thanks for the help! Thanks to the comments I was able to understand that my function wouldn't work without the proper arguments inserted (I just had "x") and didn't tell my function where to look for the "growth" value.
Here's the code I ended up using:
check_it <- function(x, )
if(all_data[x, "growth.x"] >500){
return("green")
} else if(all_data[x, "growth.x"] <0) {
return("red")
} else {
return("blue")
}
# Create a loop using check_it
for(x in 1:nrow(all_data)) {
all_data[x, "color"] <- check_it(x, all_data)
}

Well, of course there are plenty of solutions to your problem. But since you specifically requested a loop and provided your own function I tried to stick to what you've done so far as much as possible. You have however two growth-columns so I took the growth.y.
datf <- read.table(text="
station_id timestamp growth.x growth.y color
DB1 1/14/01 59.916 59.9164 black
DB1 1/14/02 316.128 316.128 black
DB1 1/14/03 12.456 12.456 black
DB1 1/14/04 537.443 537.443 black",
header = TRUE, stringsAsFactors = FALSE)
#I had to change your function a little:
check_it <- function(x, dat)
if(dat[x, "growth.y"] >500){
return("green")
} else if(dat[x, "growth.y"] < 0) {
return("red")
} else {
return("blue")
}
Now your loop-variable x corresponds to the row index of the data.frame and you're looping through it. Before that, this was not the case you just passed a number to your function.
#And finally the loop
for(x in 1:nrow(datf)){
datf[x, "color"] <- check_it(x, datf)
}
> datf
station_id timestamp growth.x growth.y color
1 DB1 1/14/01 59.916 59.9164 blue
2 DB1 1/14/02 316.128 316.1280 blue
3 DB1 1/14/03 12.456 12.4560 blue
4 DB1 1/14/04 537.443 537.4430 green
You should however consider to look at the *apply-function family.

Related

Saving click coordinates from ggplotly using event_data() and updating multiple plots

I just want to preface this by saying that this is my very first post on stackexchange. So apologies ahead of time if I violate any sort of norms or guidelines. I'll do my best to correct or clarify anything in the future. This community has saved me countless hours of frustration in the past. I hoping you can do your magic once again.
My end goal is to create an R-shiny app where users upload any number of files (in this case .txt), go through a sequential set of inputs that culminates in a peak detection algorithm selecting points along a line plot created from data in each of the uploaded files. No peak detection alg is perfect though (especially with noisy data), so as added functionality I would like users to be able to select OR deselect points on the plot that they believe should OR should not be "peaks". Using various posts, notably here and here I have been able to do this successfully in an app with only one file upload. However, I am having difficulty extending it to the case with multiple file uploads.
I am not going to include the original app code that I mentioned above as it is lengthy and some parts are unnecessary to this particular request. Instead, as a motivating example, I have included code for an R-shiny app that I think should provide the skeleton for what I am looking for, albeit in a much simplified form. In the app below, the user would select the "+Add .txt file(s)" multiple times (in this case just two should suffice; I can extrapolate from there) to upload data. I can't attach data but I have included an example of two such datasets at the end that you can save as .txt (without column names) to upload to the app. In the code, 5 random points are colored a different color (red; these represent the peaks are determined by the peak detection algorithm). At this point I would like to be able to dynamically select/deselect different points in each plot, record them, and color them. Therefore, I need the set of the points already randomly selected + any new points selected (- points deselected). Hence the use of the outersect() function. Obviouly, this code doesn't do what I would like it to do, but I think I am really close. Especially since I have a 100% working version with just one file upload. Any help would be much appreciated! Let me know if I need to clarify anything!
library(shiny)
library(plotly)
library(ggplot2)
ui <- fluidPage(
actionButton("addFiles","+Add .txt file(s)"),
uiOutput("dataUploadOption"),
uiOutput("plot1")
)
server <- function(input, output) {
counter <- reactiveValues(countervalue = 0)
observeEvent(input$addFiles,{
counter$countervalue <- counter$countervalue + 1
output$dataUploadOption <- renderUI({
lapply(1:counter$countervalue,function(i){
fileInput(paste0("fileInput",i), "Choose file", accept = c(".txt"))
})
})
})
datList <- reactive({
for(i in 1:counter$countervalue){
req(input[[paste0("fileInput",i)]])
}
datList <- list()
for(i in 1:counter$countervalue){
dat <- read.table(input[[paste0("fileInput",i)]]$datapath,
header = FALSE,
sep = "\t")
datList[[i]] <- dat
}
return(datList)
})
#set.seed(1)
pointData <- reactive({
req(datList())
datList <- datList()
randdatList <- list()
for(i in 1:length(datList)){
dat <- datList[[i]]
set.seed(i)
randpts <- sample(1:nrow(dat), size=5)
randdat <- dat[randpts,]
randdatList[[i]] <- randdat
}
return(randdatList)
})
x_save <- vector()
y_save <- vector()
outersect <- function(x, y) {
sort(c(x[!x%in%y],
y[!y%in%x]))
}
plot1 <- reactive({
req(datList())
req(pointData())
datList <- datList()
pointData <- pointData()
pList <- list()
for(i in 1:length(datList)){
dat <- datList[[i]]
pdat <- pointData[[i]]
s <- paste0("source_", i)
cpoints <- event_data("plotly_click", source = s)
x_save <<- c(x_save, cpoints$x)
y_save <<- c(y_save, cpoints$y)
clickdata <- data.frame(x = x_save, y = y_save)
osx <- outersect(pdat$V1, clickdata$x)
osy <- outersect(pdat$V2, clickdata$y)
clickdata2 <- data.frame(x = osx, y = osy)
p <- ggplot() +
geom_point(data = dat, aes(x = V1, y = V2)) +
geom_point(data = clickdata2, aes(x = x, y = y), color = "red") +
theme_bw()
s <- paste0("source_", i)
p <- ggplotly(p, source = s)
pList[[i]] <- p
}
lapply(1:length(pList), function(levels) {
output[[paste0('p1', levels)]] <- renderPlotly({
pList[[levels]]
})
})
return(pList)
})
output$plot1 <- renderUI({
req(plot1())
tagList(lapply(1:length(plot1()), function(i) {
plotlyOutput(paste0('p1', i))
}))
})
}
shinyApp(ui = ui, server = server)
Dataset 1:
| x | y |
|---------------------|------------------|
| -1.08950003950399 | 3.92061909032194 |
| 0.208720196968725 | 5.67222928235966 |
| -0.263836645668798 | 3.71247675453362 |
| 0.138665250101923 | 7.74906181389072 |
| -2.55225465270668 | 8.70452316801581 |
| 1.57869487229503 | 5.74999341498374 |
| -1.2308772107366 | 3.35530542164603 |
| -0.321971204328944 | 4.80979923193731 |
| -0.0220344748315207 | 5.71594486764801 |
| -0.744744222332549 | 4.55989156213266 |
| 1.04545646933507 | 5.91191284007836 |
| 0.487641404547292 | 5.72802397537141 |
| 0.530577651038453 | 3.09012202763487 |
| 1.66213363574977 | 3.33320134285085 |
| 0.0228775779018585 | 4.53629938354094 |
| 0.248560724286165 | 6.90396647322719 |
| -0.828199737516727 | 2.93261672162585 |
| -0.188767161133671 | 7.05261125820679 |
| -0.0516854640148708 | 6.05915293490151 |
| -1.1222846968583 | 1.39602662887452 |
Dataset 2:
| x | y |
|-------------------|--------------------|
| 1.02949171061974 | -3.01531004873537 |
| 0.142236350026741 | 4.03725957935051 |
| 0.393662548389848 | 0.693063164212043 |
| 0.850695864525208 | -0.817259089585591 |
| 0.415261536856849 | 6.20153263624976 |
| 0.286698530882788 | -1.97242366699712 |
| 0.396735374870177 | -3.49076632915453 |
| 1.31048795207181 | 10.6312475753655 |
| 1.49027462911218 | -2.36815149198265 |
| 0.25593074551849 | -4.20103425686884 |
| 0.141856102265992 | -4.96844566049411 |
| 0.557311276998118 | -2.60340268165709 |
| 1.09676002845372 | -3.13743417501215 |
| 1.3982175261235 | -2.88779134364473 |
| 0.1910142375317 | 3.04928812470083 |
| 0.305993362575559 | 4.45623609398508 |
| 1.20394527560274 | -0.766794097405343 |
| 0.329481916465341 | 8.49273280033692 |
| 0.300228890671757 | -4.72320661206269 |
| 2.07709902138915 | 2.78067540836668 |

Looping with between()

I sort through photos marking the starting and ending images of photo groups containing animals of interest. The finished product look something like whats included below. After sorting, I'd normally use the starting and ending photos as markers to move photos of interest from each subfolder into a main folder for later processing.
Primary.Folder | Sub.folder |Start.Image.. |End.Image..
RPU_03262019_05092019 | 100EK113 | 2019-03-26-11-23-46 | 2019-03-26-11-32-02
RPU_03262019_05092019 | 100EK113 | 2019-03-27-08-35-00 | 2019-03-27-08-35-00
RPU_03262019_05092019 | 101EK113 | 2019-03-31-00-29-58 | 2019-03-31-00-59-58
RPU_03262019_05092019 | 101EK113 | 2019-03-31-01-44-58 | 2019-03-31-01-59-58
RPU_03262019_05092019 | 101EK113 | 2019-03-31-03-14-58 | 2019-03-31-03-44-58
RPU_03262019_05092019 | 101EK113 | 2019-03-31-04-34-58 | 2019-03-31-04-39-58
RPU_03262019_05092019 | 101EK113 | 2019-03-31-05-04-58 | 2019-03-31-05-14-58
RPU_03262019_05092019 | 101EK113 | 2019-03-31-05-44-58 | 2019-03-31-05-44-58
RPU_03262019_05092019 | 101EK113 | 2019-03-31-19-30-58 | 2019-03-31-19-40-58
By having a list of the total images I'm hoping to loop my way through each row and build a new list of photos of just animal subjects that I can file.copy into another folder. I'm hoping between can help with this.
So far I've removed the .JPG from every file in the total photo list to match whats in the sorted csv, separated Start.Image.. column to t1 and End.Image.. column to t2, and tested a for loop to see if they line up.
fn <- photolist %>% str_replace_all('\\.JPG', '')
t1 <- csvfilled[,4]
t2 <- csvfilled[,5]
#test
for (i in t1) for (j in t2) {
print(paste(i,j,sep=","))
}
# using between() function
for (i in t1) {
for (j in t2){
finalsortedlist<- (fn[between(fn,i, j)])
}
}
The test results show i and j are running at the same time. It appears i waits for j to loop through before it continues at which j loops again.
"2019-05-09-09-24-24, 2019-05-08-18-35-24"
"2019-05-09-09-24-24, 2019-05-08-19-05-24"
"2019-05-09-09-24-24, 2019-05-08-19-50-24"
"2019-05-09-09-24-24, 2019-05-09-00-09-24"
"2019-05-09-09-24-24, 2019-05-09-09-59-24"
"2019-05-09-09-24-24, 2019-05-09-10-49-24"
Is there a way to run them in sequence like below?
"2019-03-26-11-23-46, 2019-03-26-11-32-02"
"2019-03-27-08-35-00, 2019-03-27-08-35-00"
"2019-03-31-00-29-58, 2019-03-31-00-59-58"
"2019-03-31-01-44-58, 2019-03-31-01-59-58"
"2019-03-31-03-14-58, 2019-03-31-03-44-58"
"2019-03-31-04-34-58, 2019-03-31-04-39-58"
"2019-03-31-05-04-58, 2019-03-31-05-14-58"
"2019-03-31-05-44-58, 2019-03-31-05-44-58"
"2019-03-31-19-30-58, 2019-03-31-19-40-58"
I basically want:
"1,1"
"2,2"
"3,3"
"4,4"
instead of
"1,1"
"1,2"
"1,3"
"1,4"
"2,1"
"2,2"
"2,3"
"2,4"
This would be fairly simple using 'map2' from the purrr library.
finalsortedlist <- map2(t1, t2, ~fn[between(fn, .x, .y)])
Essentially, map2 will take the nth item from both t1 and t2, and pass them as .x and .y respectively to your function. The result will be a list containing the results for every iteration.

Building a regression results table

I'm attempting to build a regression results table and I'm stuck. I'm getting the error:
Error in summary(mod)$coefficients[vars, "Estimate"] : subscript out of bounds.
I have all these models run and labeled as so. What I want my table to look like:
| | model1L | model2L | model3L | model1P | model2P | model3P |
|----------|----------|----------|----------|----------|----------|----------|
|price | coef1L | coef2L | coef3L | coef1P | coef2P | coef3P |
| | sd1L | sd2L | sd3L | sd1P | sd2P | sd3P |
|promoflag | coef1L | coef2L | coef3L | coef1P | coef2P | coef3P |
| | sd1L | sd2L | sd3L | sd1P | sd2P | sd3P |
my functions to extract key regression results from an estimated model
model_list = c("model1L","model2L","model3L", "model1P", "model2P", "model3P")
vars = c("price","promoflag")
building the table
results_table1 = function(model_list, vars) {
# build leftmost column of results table
outrec = c()
for (j in 1:length(vars)) {
outrec = c(outrec,sprintf("%s",vars[j]))
outrec = c(outrec,"")
}
outrec = c(outrec,"R^2")
outrec = c(outrec,"Observations")
outdf = as.data.frame(outrec)
# process each model
for (i in 1:length(model_list)) {
# extract estimates for this model
mod = eval(parse(text=model_list[i]))
estimates = summary(mod)$coefficients[vars,"Estimate"]
ses = summary(mod)$coefficients[vars,"Std. Error"]
pvals = summary(mod)$coefficients[vars,"Pr(>|t|)"]
# process each parameter of interest
outrec = c()
for (j in 1:length(vars)) {
# set significance stars
star = ""
if (pvals[j] <= .05) {star = "*"}
if (pvals[j] <= .01) {star = "**"}
if (pvals[j] <= .001) {star = "***"}
# output estimate and std err
outrec = c(outrec,sprintf("%.4f%s",estimates[j],star))
outrec = c(outrec,sprintf("(%.4f)",ses[j]))
}
# add R^2, # of observations to output
outrec = c(outrec,sprintf("%.4f",summary(mod)$r.squared[1]))
outrec = c(outrec,sprintf("%d",nobs(mod)))
outdf = cbind(outdf,outrec)
}
# set column names to model names
names(outdf) = c("",model_list)
outdf
}
outputting the sample results table
model_list = c("model1L", "model2L", "model3L", "model1P", "model2P", "model3P")
vars = c("price", "promoflag")
outdf = results_table1(model_list, vars)
library(knitr)
kable(outdf,align='c')

Function to find the start and end of conditional selection

I have a data that looks as follows:
Date | Time | Temperature
16995 | "12:00" | 23
16995 | "12:30" | 24
...
17499 | "23:30" | 23
17500 | "00:00" | 24
I'm writing a function to select a range of cases based on certain start and end time points. To do this I need to determine the start_pt and end_pt indices which should match with a pair of rows in the dataframe.
select_case <- function(df,date,time) {
start_pt = 0
end_pt = 0
for (i in 1:nrow(df)) {
if ((date[i] == 17000) & (time[i] == "12:00")) {
start_pt <- i
return(start_pt)
} else {
next
}
}
for (i in start_pt:nrow(df)) {
if (date[i] == 17500) {
end_pt <- i - 1
return(end_pt)
break
} else {
next
}
}
return(df[start_pt:end_pt,])
}
When I called:
test <- select_case(data,data$Date,data$Time)
test
I expect the following:
Date | Time | Temperature
17000 | "12:00" | 23
17000 | "12:30" | 24
...
17499 | "23:00" | 23
17499 | "23:30" | 23
Instead i got
[1] 1
Not sure where i got it wrong here. When I separately ran each of the two for-loops from R console and substituting in the corresponding arguments for each loop, i got the correct indices for both start_pt and end_pt.
I tried putting each loop in a separate function, named sta(date,time) and end(date). Then I bind them in the following function:
binder <- function(date,time) {
return(sta(date,time),end(date))
}
and call
sta_end <- binder(date,time)
I got the error:
Error in return(sta(date, time), end(date)) :
multi-argument returns are not permitted
So i combined them and it worked:
binder <- function(date,time) {
return(c(sta(date,time),end(date)))
}
sta_end <- binder(date,time)
[1] 1 <an index for end_pt>
So the mistake i made in my original function is that i use return() 3 times and the function will only return the first one which is start_pt. So I took out the first two return() and retained the last one:
return(df[start_pt:end_pt,])
This worked, i got the expected result.

R apply script output in different formats for similar inputs

I'm using a double apply function to get a list of p-values for cor.test between any two columns of two tables.
hel_plist<-apply(bc, 2, function(x) { apply(otud, 2, function(y) { if (cor.test(x,y,method="spearman", exact=FALSE)$p.value<0.05){cor.test(x,y,method="spearman", exact=FALSE)$p.value}}) })
The otud data.frame is 90X11 (90rows,11 colums or to say dim(otud) 90 11) and will be used with different data.frames.
bc and hel - are both 90X2 data.frame-s - so for both I get 2*11=22 p-values out of functions
bc_plist<-apply(bc, 2, function(x) { apply(otud, 2, function(y) { if (cor.test(x,y,method="spearman", exact=FALSE)$p.value<0.05){cor.test(x,y,method="spearman", exact=FALSE)$p.value}}) })
hel_plist<-apply(hel, 2, function(x) { apply(otud, 2, function(y) { if (cor.test(x,y,method="spearman", exact=FALSE)$p.value<0.05){cor.test(x,y,method="spearman", exact=FALSE)$p.value}}) })
For bc I will have an output with dim=NULL a list of elements of otunames$bcnames$ p-value (a format that I have always got from these scripts and are happy with)
But for hel I will get and output of dim(hel) 11 2 - an 11X2 table with p-values written inside.
Shortened examples of output.
hel_plist
+--------+--------------+--------------+
| | axis1 | axis2 |
+--------+--------------+--------------+
| Otu037 | 1.126362e-18 | 0.01158251 |
| Otu005 | 3.017458e-2 | NULL |
| Otu068 | 0.00476002 | NULL |
| Otu070 | 1.27646e-15 | 5.252419e-07 |
+--------+--------------+--------------+
bc_plist
$axis1
$axis1$Otu037
[1] 1.247717e-06
$axis1$Otu005
[1] 1.990313e-05
$axis1$Otu068
[1] 5.664597e-07
Why is it like that when the input formats are all the same? (Shortened examples)
bc
+-------+-----------+-----------+
| group | axis1 | axis2 |
+-------+-----------+-----------+
| 1B041 | 0.125219 | 0.246319 |
| 1B060 | -0.022412 | -0.030227 |
| 1B197 | -0.088005 | -0.305351 |
| 1B222 | -0.119624 | -0.144123 |
| 1B227 | -0.148946 | -0.061741 |
+-------+-----------+-----------+
hel
+-------+---------------+---------------+
| group | axis1 | axis2 |
+-------+---------------+---------------+
| 1B041 | -0.0667782322 | -0.1660606406 |
| 1B060 | 0.0214470932 | -0.0611351008 |
| 1B197 | 0.1761876858 | 0.0927570627 |
| 1B222 | 0.0681058251 | 0.0549292399 |
| 1B227 | 0.0516864361 | 0.0774155225 |
| 1B235 | 0.1205676221 | 0.0181712761 |
+-------+---------------+---------------+
How could I force my scripts to always produce "flat" outputs as in the case of bc
OK different output-s are caused because of the NULL results from conditional function in bc_plist case. If I'd to modify code to replace possible NULL-s with NA-s I'd get 2d tables in any case.
So to keep things constant :
bc_nmds_plist<-apply(bc_nmds, 2, function(x) { apply(stoma_otud, 2, function(y) { if (cor.test(x,y,method="spearman", exact=FALSE)$p.value<0.05){cor.test(x,y,method="spearman", exact=FALSE)$p.value}else NA}) })
And I get a 2d tabel out for bc_nmds_plist too.
So I guess this thing can be called solved - as I now have a piece of code that produces predictable output on any correct input.
If anyone has any idea how to force the output to conform to previos bc_plist format instead I would still be interested as I do actually prefer that form:
$axis1
$axis1$Otu037
[1] 1.247717e-06
$axis1$Otu005
[1] 1.990313e-05
$axis1$Otu068
[1] 5.664597e-07

Resources