Apache log file format analysis by R - r

I was trying to do the analysis of weblog files by R. I am comfortable to deal with the date and bytes, wherever numeric data is present but fail to deal with the strings.
From the log file (log file in CSV format), I want to find out the particular user (with help of IP and Agents) and its total spending on the web page.

There are numurous libraries to do this kind of analysis, although I could find none in R. A google for parse apache logfile yielded a library in Perl, and python parse apache logfile yields the Scratchy library. Both rely on regular expressions to parse the contents of the file.
From here there are two ways to deal with the apache logfile:
Call perl or python from R, either using a direct link, or using a system call (this is simpler).
Take the idea from the perl or python lib and use it to implement R versions of the functions. This will take a lot of time.
You refer to a csv file, but I think the libraries above work with the original text file with the Apache log, so I'd use those, and not your csv file.
In addition, this SO post mentions an answer by #doug (profile) where he states that he has created some functions to create visualizations of apache logfile data, parsed by Python. Maybe you could send him a message or mail and see if he is willing to share the code.

Logfile analysis in R is an interesting topic we had before, you can find our discussion right here. Maybe this discussion might also help you to adjust to the SO etiquette in order to get better feedback (not to take anything away from yours, Paul).

Related

How to capture the names of all files read and written by an R script?

In a project, many separate scripts are executed and they read and write files from / for each other. It has become quite confusing, which file is coming from where. Of course, this is bad software design but that is how this has grown over a long time.
Now, I would to execute all scripts in their proper order and capture which files are read and which ones are written by each script.
Is there, e.g., a way to monitor and log the input and output of the R process while the script is running (from within R)? Or any other ideas for a solution?
I am running R under Windows 10.

Security considerations for read.csv() from web url in R?

A lot of R's read* functions permit urls to external data sources.
Are there any built in protections in R itself or in R's read* functions to protect against malicious files being read in? Or is it a case of 'user-beware'?
What I know so far
We could run some tests - e.g. a file that is not a csv e.g. read.csv(url("http://www.hello.com")) returns the raw HTML for that web page. This seems to indicate that read.csv doesn't validate that the file is a csv, but tries to parse it regardless.
I am not sure what would happen in the hypothetical read.csv(url("www.fake-microsoft.com/nasty-viris.exe")) (where virus.exe is a real virus)? Would read.csv() try to read the asset (in this case an exe file) as a csv, or would it download it successfully first?
Short of setting up a virtual machine and actually testing, I am not sure how to research how much vetting (if any) is done by read* functions on external data sources.

Reading LabVIEW TDMS files with R

As part of a transition from MATLAB to R, I am trying to figure out how to read TDMS files created with National Instruments LabVIEW using R. TDMS is a fairly complex binary file format (http://www.ni.com/white-paper/5696/en/).
Add-ons exist for excel and open-office (http://www.ni.com/white-paper/3727/en/), and I could make something in LabVIEW to make the conversion, but I am looking for a solution that would let me read the TDMS files directly into R. This would allow us to test out the use of R for certain data processing requirements without changing what we do earlier in the data acquisition process. Having a simple process would also reduce the barriers to others trying out R for this purpose.
Does anyone have any experience with reading TDMS files directly into R, that they could share?
This is far from supporting all TDMS specifications but I started a port of a python npTDMS package into R here https://github.com/msuefishlab/tdmsreader and it has been tested out in the context of a shiny app here
You don't say if you need to automate the reading of these files using R, or just convert the data manually. I'm assuming you or your colleagues don't have any access to LabVIEW yourselves otherwise you could just create a LabVIEW tool to do the conversion (and build it as a standalone application or DLL, if you have the professional development system or app builder - you could run the built app from your R code by passing parameters on a command line).
The document on your first link refers to (a) add-ins for OpenOffice Calc and for Excel, which should work for a manual conversion and which you might be able to automate using those programs' respective macro languages, and (b) a C DLL for reading TDMS - would it be possible for you to use one of those?

Can my CGI call R?

I know barely more than zero about R: until yesterday I didn't know how to spell it. But I'm suicidal: for my web site, I'm thinking about letting a visitor type in an R "program" ( is it even called a "program") and then, at submit time, blindly calling the R interpreter from my CGI. I'd then return the interpreter's output to the visitor.
Does this make sense? Or does it amount to useless noise?
If it's workable, what are the pitfalls in this approach? For example, what are the security issues, if any? Is it possible to make R crash, killing my CGI program? Do I have to clean up the R code before calling the interpreter? And the like.
you could take a look to Rserve which allows to execute R scripts via the TCP/IP interface available in PHP for example if I'm not mistaken.
Its just asking for trouble to let people run arbitrary R code on your server. You could try running it in a chroot jail, but these things can be broken out of. Even in a chroot, the R process could delete or alter files, or spawn a long-running process, or download a file to your server, and all manner of nastiness.
You might look at Rweb, which has exactly this behavior: http://www.math.montana.edu/Rweb/
Since you can read and write files in R, it would not be safe to let people run arbitrary R code at your server. I would look if R has something like PHP's safe mode... If not, and if you are root, you can try to run R under user nobody in a chroot (you must also place there packages and libraries - for readonly access, and some temporary directory for RW access).

need help in choosing the right tool

I have a client who has set-up a testing environment in some AI language. It basically runs some predefined test cases and stores the results in as log files (comma separated txt files). My job is to identify and suggest a reporting system and I have these options in mind. either
1. Importing the logs into MSSQL and use the reporting(SSRS) it uses
2. or us import the logs to MySQL and use PHP to develop custom reporting.
I am thinking that going with option2 is better. The reason for this is, the logs are inconsistent and contain unexpected wild characters that normally DB's don't accept. So, I can write some scripts in php before loading them to the database.
Can anyone please suggest if this is your problem what will you suggest to do?
It depends how fancy you need to be. If the data is in CSV files, you could even go so simple as to load it into Excel (or their favorite spreadsheet tool), and use spreadsheet macros to analyze it.

Resources