Use of variable in Unix command line - r

I'm trying to make life a little bit easier for myself but it is not working yet. What I'm trying to do is the following:
NOTE: I'm running R in the unix server, since the rest of my script is in R. That's why there is system(" ")
system("TRAIT=some_trait")
system("grep var.resid.anim rep_model_$TRAIT.out > res_var_anim_$TRAIT'.xout'",wait=T)
When I run the exact same thing in putty (without system(" ") of course), then the right file is read and right output is created. The script also works when I just remove the variable that I created. However, I need to do this many times, so a variable is very convenient for me, but I can't get it to work.

This code prints nothing on the console.
system("xxx=foo")
system("echo $xxx")
But the following does.
system("xxx=foo; echo $xxx")
The system forgets your variable definition as soon as you finish one call for "system".
In your case, how about trying:
system("TRAIT=some_trait; grep var.resid.anim rep_model_$TRAIT.out > res_var_anim_$TRAIT'.xout'",wait=T)

You can keep this all in R:
grep_trait <- function(search_for, in_trait, out_trait=in_trait) {
l <- readLines(sprintf("rep_model_%s.out", in_trait))
l <- grep(search_for, l, value=TRUE) %>%
writeLines(l, sprintf("res_var_anim_%s.xout", out_trait))
}
grep_trait("var.resid.anim", "haptoglobin")
If there's a concern that the files are read into memory first (i.e. if they are huge files), then:
grep_trait <- function(search_for, in_trait, out_trait=in_trait) {
fin <- file(sprintf("rep_model_%s.out", in_trait), "r")
fout <- file(sprintf("res_var_anim_%s.xout", out_trait), "w")
repeat {
l <- readLines(fin, 1)
if (length(l) == 0) break;
if (grepl(search_for, l)[1]) writeLines(l, fout)
}
close(fin)
close(fout)
}

Related

R function stops after system() call

I've written a very easy wrapper around GDAL in R. It utilises a prewritten statement which is passed to system, creating an output, which I then want to read into the R environment again.
It works by creates a temporary directory in the working directory, printing out an ESRI shape file of our area of interest, and then cuts a raster by this, with some preset information.
My problem: after successfully running the system() call and creating the output file, the function stops. It doesn't execute the next call and read the output into the R environment.
gdalwarpwarp <- function(source_file, source_srs, newfilename, reread=TRUE, clean=TRUE, cutline){
#Create tempfolder if it doesn't exist in the working directory.
if (!dir.exists("./tempfolder")){
dir.create("./tempfolder")
}
#Write temporary shape file
terra::writeVector(cutline, filename = "./tempfolder/outline_AOI.shp" , filetype='ESRI Shapefile',overwrite=TRUE)
#Warp!
if(reread==FALSE){
system(paste0("gdalwarp -cutline ./tempfolder/outline_AOI.shp -dstnodata -9999 -s_srs EPSG:3006 ",source_file, " ",paste0("./tempfolder/",newfilename)))
message('warp warped TRUE')
} else if(reread==TRUE){
system(paste0("gdalwarp -cutline ./tempfolder/outline_AOI.shp -dstnodata -9999 -s_srs EPSG:3006 ",source_file, " ",paste0("./tempfolder/",newfilename)))
newfilename <- terra::rast(paste0("./tempfolder/",newfilename))
}
}
This doesn't run:
newfilename <- terra::rast(paste0("./tempfolder/",newfilename))
The function did not return anything. Here is a somewhat improved version of your function. If you want to keep the output it would make more sense to provide a full path, rather then saving it to a temp folder. I also note that you are not using the argument source_srs
gdalwarpwarp <- function(source_file, source_srs, newfilename, reread=TRUE, clean=TRUE, cutline){
#Write temporary shape file
shpf <- file.path(tempdir(), "aoi.shp")
terra::writeVector(cutline, filename = shpf, filetype='ESRI Shapefile',overwrite=TRUE)
outf <- file.path(tempdir(), newfilename)
system(paste0("gdalwarp -cutline shpf -dstnodata -9999 -s_srs EPSG:3006 ",source_file, " ", outf)
if (reread) {
terra::rast(outf)
} else {
message('warp warped TRUE')
invisible(filename)
}
}
I wonder why you don't use terra::resample or terra::project; perhaps preceded or followed by mask (I may not understand the benefit of using cutline.

While loop for creating multiple resources with capacity

I need to create 52 resources with capacity 2 in the Simmer simulation package. I am trying to do this by using a while loop that creates these resources for me, instead of creating each resource myself.
The idea is that I have a while loop as given below. In each loop, a resource should be created called Transport_vehicle1, Transport_vehicle2, ..., Transport_vehicle52, with capacity 2.
Now I do not know how to insert the number i in the name of the resource that I am trying to create
i<-1
while (i<=52)
{ env %>%
add_resource("Transport_vehicle"[i],capacity = 2)
i <- i+1
}
Could someone please help me out? Thanks!
You can use the paste method to concatenate the string and the number:
i<-1
while (i<=52)
{ env %>%
add_resource(paste("Transport_vehicle", i),capacity = 2)
i <- i+1
}
If you do not want a space between the string and the number add the sep="" argument
paste("Transport_vehicle", i, sep="")
or use
paste0("Transport_vehicle", i)

Unable to update data in dataframe

i tried updating data in dataframe but its unable to get updating
//Initialize data and dataframe here
user_data=read.csv("train_5.csv")
baskets.df=data.frame(Sequence=character(),
Challenge=character(),
countno=integer(),
stringsAsFactors=FALSE)
/Updating data in dataframe here
for(i in 1:length((user_data)))
{
for(j in i:length(user_data))
{
if(user_data$challenge_sequence[i]==user_data$challenge_sequence[j]&&user_data$challenge[i]==user_data$challenge[j])
{
writedata(user_data$challenge_sequence[i],user_data$challenge[i])
}
}
}
writedata=function( seqnn,challng)
{
#print(seqnn)
#print(challng)
newRow <- data.frame(Sequence=seqnn,Challenge=challng,countno=1)
baskets.df=rbind(baskets.df,newRow)
}
//view data here
View(baskets.df)
I've modified your code to what I believe will work. You haven't provided sample data, so I can't verify that it works the way you want. I'm basing my attempt here on a couple of common novice mistakes that I'll do my best to explain.
Your writedata function was written to be a little loose with it's scope. When you create a new function, what happens in the function technically happens in its own environment. That is, it tries to look for things defined within the function, and then any new objects it creates are created only within that environment. R also has this neat (and sometimes tricky) feature where, if it can't find an object in an environment, it will try to look up to the parent environment.
The impact this has on your writedata function is that when R looks for baskets.df in the function and can't find it, R then turns to the Global Environment, finds baskets.df there, and then uses it in rbind. However, the result of rbind gets saved to a baskets.df in the function environment, and does not update the object of the same name in the global environment.
To address this, I added an argument to writedata that is simply named data. We can then use this argument to pass a data frame to the function's environment and do everything locally. By not making any assignment at the end, we implicitly tell the function to return it's result.
Then, in your loop, instead of simply calling writedata, we assign it's result back to baskets.df to replace the previous result.
for(i in 1:length((user_data)))
{
for(j in i:length(user_data))
{
if(user_data$challenge_sequence[i] == user_data$challenge_sequence[j] &&
user_data$challenge[i] == user_data$challenge[j])
{
baskets.df <- writedata(baskets.df,
user_data$challenge_sequence[i],
user_data$challenge[i])
}
}
}
writedata=function(data, seqnn,challng)
{
#print(seqnn)
#print(challng)
newRow <- data.frame(Sequence = seqnn,
Challenge = challng,
countno = 1)
rbind(data, newRow)
}
I'm not sure what you're programming background is, but your loops will be very slow in R because it's an interpreted language. To get around this, many functions are vectorized (which simply means that you give them more than one data point, and they do the looping inside compiled code where the loops are fast).
With that in mind, here's what I believe will be a much faster implementation of your code
user_data=read.csv("train_5.csv")
# challenge_indices will be a matrix with TRUE at every place "challenge" and "challenge_sequence" is the same
challenge_indices <- outer(user_data$challenge_sequence, user_data$challenge_sequence, "==") &
outer(user_data$challenge, user_data$challenge, "==")
# since you don't want duplicates, get rid of them
challenge_indices[upper.tri(challenge_indices, diag = TRUE)] <- FALSE
# now let's get the indices of interest
index_list <- which(challenge_indices,arr.ind = TRUE)
# now we make the resulting data set all at once
# this is much faster, because it does not require copying the data frame many times - which would be required if you created a new row every time.
baskets.df <- with(user_data, data.frame(
Sequence = challenge_sequence[index_list[,"row"]],
challenge = challenge[index_list[,"row"]]
)

Get the URL of an .url (Windows URL shortcut) file

I want to get the URL of an .url shortcut file (made in Windows) in R.
The file format looks like this:
[{000214A0-0000-0000-C000-000000000046}]
Prop4=31,Stack Overflow - Where Developers Learn, Share, & Build Careers
Prop3=19,11
[{A7AF692E-098D-4C08-A225-D433CA835ED0}]
Prop5=3,0
Prop9=19,0
[InternetShortcut]
URL=https://stackoverflow.com/
IDList=
IconFile=https://cdn.sstatic.net/Sites/stackoverflow/img/favicon.ico?v=4f32ecc8f43d
IconIndex=1
[{9F4C2855-9F79-4B39-A8D0-E1D42DE1D5F3}]
Prop5=8,Microsoft.Website.E7533471.CBCA5933
and has some documentation.
I have used file.info(). But it only shows the information of the first properties header, I guess.
I need to do this in R, because I have a long list of .url files, which addresses I need to convert.
Crude way (I'll update this in a sec):
ini::read.ini("https://rud.is/dl/example.url")$InternetShortcut$URL
## [1] "https://rud.is/b/2017/11/11/measuring-monitoring-internet-speed-with-r/"
Made slightly less crude:
read_url_shortcut <- function(x) {
require(ini)
x <- ini::read.ini(x)
x[["InternetShortcut"]][["URL"]]
}
Without the ini package dependency:
read_url_shortcut <- function(x) {
x <- readLines(x)
x <- grep("^URL", x, value=TRUE)
gsub("^URL[[:space:]]*=[[:space:]]*", "", x)
}
More "production-worthy" version:
#' Read in internet shortcuts (.url or .webloc) and extract URL target
#'
#' #param shortcuts character vector of file path+names or web addresses
#' to .url or .webloc files to have URL fields extracted from.
#' #return character vector of URLs
read_shortcut <- function(shortcuts) {
require(ini)
require(xml2)
require(purrr)
purrr::map_chr(shortcuts, ~{
if (!grepl("^http[s]://", .x)) {
.x <- path.expand(.x)
if (!file.exists(.x)) return(NA_character_)
}
if (grepl("\\.url$", .x)) {
.ini <- suppressWarnings(ini::read.ini(.x)) # get encoding issues otherwise
.ini[["InternetShortcut"]][["URL"]][1] # some evidence multiple are supported but not sure so being safe
} else if (grepl("\\.webloc$", .x)) {
.x <- xml2::read_xml(.x)
xml2::xml_text(xml2::xml_find_first(.x, ".//dict/key[contains(., 'URL')]/../string"))[1] # some evidence multiple are supported but not sure so being safe
} else {
NA_character_
}
})
}
Ideally, such a function would return a single data frame row with all relevant info that could be found (title, URL and icon URL, creation/mod dates, etc). I'd rather not keep my Windows VM up long enough to generate sufficient samples to do that.
NOTE: Said "production"-ready version still doesn't gracefully handle edge cases where the file or web address is not readable/reachable nor does it deal with malformed .url or .webloc files.

dump() in R not source()able- output contains "..."

I'm trying to use dump() to save the settings of my analysis so I can examine them in a text editor or reload them at a later date.
In my code I'm using the command
dump(ls(), settingsOutput, append=TRUE)
The file defined by `settingsOutput' gets created, but the larger objects and locally defined functions are truncated. Here's an excerpt from such a file. Note these files are generally on the order of a few kb.
createFilePrefix <-
function (runDesc, runID, restartNumber)
{
...
createRunDesc <-
function (genomeName, nGenes, nMix, mixDef, phiFlag)
{
...
datasetID <-
"02"
descriptionPartsList <-
c("genomeNameTest", "nGenesTest", "numMixTest", "mixDefTest",
"phiFlagTest", "runDescTest", "runIDTest", "restartNumberTest"
...
diffTime <-
structure(0.531, units = "hours", class = "difftime")
dissectObjectFileName <-
function (objectFileName)
{
...
divergence <-
0
Just for reference, here's one of the functions defined above
createFilePrefix <- function(runDesc, runID, restartNumber){
paste(runDesc, "_run-", runID, "_restartNumber-", restartNumber, sep="")
}
Right now I'm going back and removing the problematic lines and then loading the files, but I'd prefer to actually have code that works as intended.
Can anyone explain to me why I'm getting this behavior and what to do to fix it?

Resources