Maintain data in R package with scheduled sql pulls [closed] - r

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I have a personal, non-CRAN, R package. Its purpose is reduce the amount of repeated wrangling I need to do. The data must be pulled from sql server. This pull from SQL is done weekly. Where should I put my sql file and where should I put the R code that does a scheduled weekly sql pull?

You will need to make a chron task which will load the package, pull the data (see here) and recompile the package.

Do you mean you use SQL to pull data from somewhere? What is your data source? Or do you mean you write SQL to create data?
Without knowing the specific, you have 2 options:
1. You could use RODBC package and use sqlQuery() e.g.
Data<- sqlQuery(odbcDriverConnect(...),paste("SELECT
*
FROM DTtest;"))
to 'call' your data from your data source using SQL. More information you can find: https://www.statmethods.net/input/dbinterface.html
This way, you do not need to worry about where to save your sql file (I assume you mean the data). In this case, you only need to make sure your working directory is properly linked to your R script or at least where your R output wants to be in.
Assuming, you have a SQL script to pull data from somewhere else, you can windows scheduler or any types of scheduler to schedule and run the script and make the output saved in one folder where you also use R to grab the data from that directory.

Related

Is Git only for Code or also for Data? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I am running into a bit of a dilemma and I was hoping to be pointed in the right direction.
I have Git repository which two (self-explanatory) folders: scripts and data. I keep adding new data files to analyze in data, while in scripts I write R scripts to analyze those files.
I track changes in both folders. Therefore, I commit additions of new data files to data. This has nothing to do with tracking changes. I just want the scripts and the data to move together since I work on at least two machines.
I feel like I am using Git improperly, as (with respect to the data folder) I basically use it as a syncing tool.
So my question: is it bad habit to use Git also for data?
I don't think you are doing something particularly awful. Perhaps you could keep data on its own branch and then use it as a submodule or subtree?

how to load my own function automatically as R-package? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I built my own function and each time I need to run this function, I need to do that manulally. Is there any way that I can load it automatically as any R-package? Or can I build an R-package used only by me?
Would recommend Nate Days solution, but you could also use Rs save() and load()functions to do this. It works on all R-objects and stores them in a binay .rda-file. You can store multiple objects as well.
Try:
add <- function(x, y){return(x+y)}
save(add, file = 'add_function.rda')
Whenever you need your function, do:
load('add_function.rda')
And add() would be available in the parent environment.
There is a package called pkgmaker on CRAN that has a ton of tools and utilities for you to create your own packages. As an alternate option you might consider creating a functions.R script for you to store all of your personally created and often used functions. You can add the line source('functions.R', local=TRUE) to your programs, scripts or apps and your functions will be accessible to you. Thats how I handle the issue anyway. Cheers

How R Language works [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I was kind of wondering,how R language works internally when we type some command..
Say License()
As per my understanding,R language is made of packages,When we execute some command,it invokes the Right package,i was not able to find some documentation supporting this..
Research done from side:
1.closest i could get is below link
https://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf
2.I searched using "How R Language works internally",but i could not get any relevant results..
Below is how SQLServer executes a query from starting to end ,i am looking to see similar kind of documentation/any pointers for R
please let me know if you have any pointers
The notion that the R language is "made of packages" is inaccurate. It is made of commands, operators and functions, like other programming languages. Those commands are grouped into namespaces which comprise commands that belong to the same topic. A package provides a set of specific commands (and sometimes other objects, like sample data) grouped into a namespace. By loading a library (there are subtle semantic differences between a library and a package) the namespace of the package becomes available in the global environment, thereby making these commands directly accessible.
On the suggestion of #CapelliC here is a fully typed answer.
The internals of R are included in the document: https://cran.r-project.org/doc/manuals/r-release/R-ints.html
It is not an easy read, but covers all of the detail. My advice would be search the document if you have a specific query...

Proper Coding Techniques in R [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I am writing an R code that has many different functions that eventually I will want to use all together on different data sets.
As I keep building functions it seems to be getting harder to keep track of everything in my script.
My question is, is it proper R coding to break functions into separate R Scripts or should it all be in one massive script?
Thank you for your help. This is my first time trying to code something this large!
-B
Yes, you can store your functions in multiple R scripts.
If you need to call them, you can use source().
For eg:
Say you have func1 , func2 saved in myfunc.R.
To call them,
source('myfunc.R')
#other codes
func1()
func2()
As to whether this approach is recommended, depends on your project requirements.
Alternatively, you can consider packaging them as recommended by Richard.

Bigdata use cases [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
we are trying to create a dashboard using BigData. The Data are currently transacted in SQLServer and the front end is in MVC. As the data flow is extremely high to analyse using SQLServer itself it is decided to use BigData. I had chosen Cloudera Manager CDH, SQOOP to import data from SQLServer to HIVE and running the analytic using IMPALA. Decided to up the results with Microstrategy to provide the charts in mobile platform to the clients.
Any Ideas or suggestion are welcome to improve the process?
Looks like you're off to a great start. Remember your analytics can be done with a multitude of tools, not just Impala.
Once you're in Hadoop, Hive and Pig give a lot of power (more available with UDFS) with an easy learning curve.
If you eventually want to do some iterative use cases (and exploit machine learning), you might want to check out Spark (those two things are in its wheelhouse), which is not constrained by (to?) MapReduce.
Tons of great tools available. Enjoy the journey.
I would consider to use two stages. Data Analysis and Data Visualisation.
Use two stages makes the solution more flexible and decouple the responsiblility.
Data Analysis
Ingest the data (Include cleaning), Sqoop can do the ingest step, might require extra steps to cleaning the data.
Explore/Analyse the data, Apache Spark is a very flexible and powerful tool.
Store the analyse result in a specified format
Data Visualisation
Load the data from data analysis phase
Visualise it. Using Highcharts/Kibana/Dashing. Or use D3 create customised dashboard.

Resources