I am trying to access the R variables in a loop in the following way
bes2 = data.frame("id"=c(1,2), "generalElectionVoteW1"=c("Labour","Bla"),
"generalElectionVoteW2"=c("x","t"))
general_names <- c("generalElectionVoteW1", "generalElectionVoteW2")
labour_w = bes2[bes2$general_names[1] == "Labour",]
Which will simply result in an empty vector.
general_names is simply used to keep generalElectionVoteW1, ...W2 and many more saved for easier access in a loop.
However if I access them manually like labour_w = bes2[bes2$generalElectionVoteW1 == "Labour",] it works as desired. Where is my mistake?
bes2:
id generalElectionVoteW1 generalElectionVoteW2
1 1 Labour x
2 2 Bla t
general_names:
"generalElectionVoteW1" "generalElectionVoteW2"
I have a dataset where i need to tokenize the words and find the frequency of each word, i can achieve this by doing for loop in R.
InputData <- To_Find_Categories
ShtDesc_Token_all <- ""
ShtDesc_Token <- ""
for(i_ID in 1:nrow(InputData))
#for(i_ID in 1:20)
{
ShtDesc_Token <- regmatches(InputData$short_description, gregexpr("((?![0-9]+)[A-Za-z0-9]+)",
InputData$short_description, perl = TRUE))[[i_ID]]
ShtDesc_Token_all <- append(ShtDesc_Token_all, ShtDesc_Token)
}
X<- sort(table(unlist(ShtDesc_Token_all)))
write.csv(X, "temp.csv", row.names=FALSE)
#
But it takes much processing time, i want to avoid the for loop, how i can do this?
Data is like in .csv format, i can give sample records
data.table::fread("number,parent , short_description
GECTASK0011264, GECHG0036340 , Restore Request
GECTASK0011265, GECHG0036340 , Restore Request
GECTASK0011748, GECHG0038670, lkj
GECTASK0011797 , GECHG0034985 , vm down-grade
GECTASK0011798, GECHG0034985 , vm down-grade
GECTASK0012252 , GECHG0040437 , remove server from load
GECTASK0012253 , GECHG0040437 , remove server from load
GECTASK0012328 , GECHG0034983 , vm down-grade
GECTASK0012329 , GECHG0034983 , vm down-grade")
Try this
You do not need for loop in this case.
input <- data.table::fread("number,parent , short_description
GECTASK0011264, GECHG0036340 , Restore Request
GECTASK0011265, GECHG0036340 , Restore Request
GECTASK0011748, GECHG0038670, lkj
GECTASK0011797 , GECHG0034985 , vm down-grade
GECTASK0011798, GECHG0034985 , vm down-grade
GECTASK0012252 , GECHG0040437 , remove server from load
GECTASK0012253 , GECHG0040437 , remove server from load
GECTASK0012328 , GECHG0034983 , vm down-grade
GECTASK0012329 , GECHG0034983 , vm down-grade")
tmp <- paste(input$short_description,collapse = " ")
tmp.splt <- stringr::str_split(tmp, pattern= " ")[[1]]
table(tmp.splt)
#> tmp.splt
#> down-grade from lkj load remove Request
#> 4 2 1 2 2 2
#> Restore server vm
#> 2 2 4
Created on 2018-08-10 by the reprex package (v0.2.0.9000).
Or
Use this one-liner (From #Onyambu 's comment):
sort(table(unlist(strsplit(InputData$short_description,"\\W"))))
Goal
The goal is to create a package that parses R scripts and lists functions (from the package - like mvbutils- but also imports).
Function
The main function relies on parsing R script with
d<-getParseData(x = parse(text = deparse(x)))
Reproducible code
For example in an interactive R session the output of
x<-test<-function(x){x+1}
d<-getParseData(x = parse(text = deparse(x)))
Has for first few lines:
line1 col1 line2 col2 id parent token terminal text
23 1 1 4 1 23 0 expr FALSE
1 1 1 1 8 1 23 FUNCTION TRUE function
2 1 10 1 10 2 23 '(' TRUE (
3 1 11 1 11 3 23 SYMBOL_FORMALS TRUE x
4 1 12 1 12 4 23 ')' TRUE )
Error
When building a vignette with knitr containing - either with knit html from RStudio or devtools::build_vignettes, the output of the previous chunk of code is NULL. On the other hand using "knitr::knit" inside an R session will give the correct output.
Questions:
Is there a reason for the parser to behave differently inside the knit function/environment, and is there a way to bypass this?
Update
Changing code to:
x<-test<-function(x){x+1}
d<-getParseData(x = parse(text = deparse(x),keep.source = TRUE))
Fixes the issue, but this does not answer the question of why the same function behaves differently.
From the help page ?options:
keep.source:
When TRUE, the source code for functions (newly defined or loaded) is stored internally allowing comments to be kept in the right places. Retrieve the source by printing or using deparse(fn, control = "useSource").
The default is interactive(), i.e., TRUE for interactive use.
When building the vignette, you are running a non-interactive R session, so the source code is discarded in parse().
parse(file = "", n = NULL, text = NULL, prompt = "?",
keep.source = getOption("keep.source"), srcfile,
encoding = "unknown")
I think I have a simple problem because I was looking up and down the internet and couldn't find someone else asking this question:
My university has a Condor set-up. I want to run several repetitions of the same code (e.g. 100 times). My R code has a routine to store the results in a file, i.e.:
write.csv(res, file=paste(paste(paste(format(Sys.time(), '%y%m%d'),'res', queue, sep="_"), sep='/'),'.csv',sep='',collapse=''))
res are my results (a data.frame), I indicate that this file contains the results with 'res' and finally I want to add the queue number of this calculation (otherwise files would be replaced, wouldn't they?). It should look like: 140109_res_1.csv, 140109_res_2.csv, ...
My submit file to condor looks like this:
universe = vanilla
executable = /usr/bin/R
arguments = --vanilla
log = testR.log
error = testR.err
input = run_condor.r
output = testR$(Process).txt
requirements = (opsys == "LINUX") && (arch == "X86_64") && (HAS_R_2_13 =?= True)
request_memory = 1000
should_transfer_files = YES
transfer_executable = FALSE
when_to_transfer_output = ON_EXIT
queue 3
I wonder how do I get the 'queue' number into my R code? I tried a simple example with
print(queue)
print(Queue)
But there is no object found called queue or Queue. Any suggestions?
Best wishes,
Marco
Okay, I solved the problem. This is how it goes:
I had to change my submit file. I changed the slot arguments to:
arguments = --vanilla --args $(Process)
Now the process number is forwarded to the R code. There you retrieve it with the following line. The value will be stored as a character. Therefore, you should convert it to a numeric value (also check whether a number like 10 is passed on as '1' and '0' in which case you should also collapse the values).
run <- commandArgs(TRUE)
Here is an example of the code I let run.
> run <- commandArgs(TRUE)
> run
[1] "0"
> class(run)
[1] "character"
> try(as.numeric(run))
[1] 0
> try(run <- as.numeric(paste(run, collapse='')) )
> try(print(run))
[1] 0
> try(write(run, paste(run,'csv', sep='.')))
You can also find information how to pass on variables/arguments to your code here: http://research.cs.wisc.edu/htcondor/manual/v7.6/condor_submit.html
I hope this helps anyone.
Cheers and thanks for all other commenters!
Marco
I would like to be able to download a .csv file from my Amazon S3 bucket using R.
I have started using the API that is documented here http://docs.amazonwebservices.com/AmazonS3/latest/API/RESTObjectGET.html
I am using the package httr to create the GET request, I just need to work out what the correct parameters are to be able to download the relevant file.
I have set the response-content-type to text/csv as I know its a .csv file I hope to download...but the response I get is as follows:
Response [https://s3-zone.amazonaws.com/bucket.name/file.name.csv?response-content-type=text%2Fcsv]
Status: 200
Content-type: text/csv
Date and Time,Open,High,Low,Close,Volume
2007/01/01 22:51:00,5683.00,5683.00,5673.00,5673.00,64
2007/01/01 22:52:00,5675.00,5676.00,5674.00,5674.00,17
2007/01/01 22:53:00,5674.00,5674.00,5673.00,5674.00,42
2007/01/01 22:54:00,5675.00,5676.00,5674.00,5676.00,36
2007/01/01 22:55:00,5675.00,5676.00,5675.00,5676.00,18
2007/01/01 22:56:00,5676.00,5677.00,5674.00,5677.00,64
2007/01/01 22:57:00,5678.00,5678.00,5677.00,5677.00,45
2007/01/01 22:58:00,5679.00,5680.00,5678.00,5680.00,30
.../01/01 22:59:00,5679.00,5679.00,5677.00,5678.00,19
And no file is downloaded and the data seems to be in the response...I can extract the string of characters that is created in the response, which represents the data, and I guess with some effort it can be converted into a data.frame as originally desired, but is there a better way of downloading the data...straight from the GET command, and then using read.csv to read the data? I think that it is a parameter issues...just not sure what parameters need to be set for the file to be downloaded.
If people suggest the conversion of the string...This is the structure of the string I have...what commands would I need to do to convert it into a data.frame?
chr "Date and Time,Open,High,Low,Close,Volume\r\n2007/01/01 22:51:00,5683.00,5683.00,5673.00,5673.00,64\r\n2007/01/01 22:52:00,5675."| __truncated__
Thanks
HLM
The answer to your second question:
> chr <- "Date and Time,Open,High,Low,Close,Volume\r\n2007/01/01 22:51:00,5683.00,5683.00,5673.00,5673.00,64\r\n"
> read.csv(text=chr)
Date.and.Time Open High Low Close Volume
1 2007/01/01 22:51:00 5683 5683 5673 5673 64
If you want extra speed for the read.csv, try this:
chr <- "Date and Time,Open,High,Low,Close,Volume\r\n2007/01/01 22:51:00,5683.00,5683.00,5673.00,5673.00,64\r\n"
read.csv(text=chr, colClasses=c("POSIXct", rep("numeric", 5) ) )
Assuming the URL is set up properly (and we have nothing to test this on yet) I'm wondering if you may want to look at the value for GET( ...)$content
Perhaps:
infile <- read.csv(text=GET(...)$content, colClasses=c("POSIXct", rep("numeric", 5) ) )
Edit:
That was not correct because the data comes across as "raw" format. One needs to convert from raw before it will become encoded as text. I did a quick search of Nabble (it must be good for something after all) to find a csv file that was residing on the Web. This is what finally worked:
read.csv(text=rawToChar(
GET(
"http://nseindia.com/content/equities/scripvol/datafiles/16-11-2012-TO-16-11-2012ACCEQN.csv"
)[["content"]] ) )
Symbol Series Date Prev.Close Open.Price High.Price Low.Price Last.Price Close.Price
1 ACC EQ 16-Nov-2012 1404.4 1410.95 1410.95 1369.45 1374.95 1378.1
Average.Price Total.Traded.Quantity Turnover.in.Lacs Deliverable.Qty X..Dly.Qt.to.Traded.Qty
1 1393.62 132921 1852.41 56899 42.81
Here's one way:
library(taRifx) # for stack.list
test <- "Date and Time,Open,High,Low,Close,Volume\r\n2007/01/01 22:51:00,5683.00,5683.00,5673.00,5673.00,64\r\n2007/01/01 22:51:00,5683.00,5683.00,5673.00,5673.00,64\r\n"
stack( sapply( strsplit( test, "\\n" )[[1]], strsplit, split="," ) )
[,1] [,2] [,3] [,4] [,5] [,6]
ret "Date and Time" "Open" "High" "Low" "Close" "Volume\r"
new "2007/01/01 22:51:00" "5683.00" "5683.00" "5673.00" "5673.00" "64\r"
new "2007/01/01 22:51:00" "5683.00" "5683.00" "5673.00" "5673.00" "64\r"
Now convert to a data.frame:
testdat <- stack( sapply( strsplit( test, "\\n" )[[1]], strsplit, split="," ) )
rownames(testdat) <- seq(nrow(testdat)) # Because duplicate rownames aren't allowed in data.frames
colnames(testdat) <- testdat[1,]
testdat <- testdat[-1,]
as.data.frame(testdat)
Date and Time Open High Low Close Volume\r
2 2007/01/01 22:51:00 5683.00 5683.00 5673.00 5673.00 64\r
3 2007/01/01 22:51:00 5683.00 5683.00 5673.00 5673.00 64\r