I am trying to use the free version of Amazon Web Services EC2 with Ubuntu and R. I created a simple R file I hope will read a small CSV input data file in one folder, perform a trivial operation and write the output to a CSV file in a separate folder. However, the output CSV file is not being created.
Here are the contents of the R file:
my.data <- read.csv('/my_cloud_input_file_test/my_input_test_data_Nov22_2019.csv')
my.data$c <- my.data$a + my.data$b
write.csv(my.data, '/my_cloud_output_file_test/my_output_test_data_Nov22_2019.csv', row.names = FALSE, quote = FALSE)
Here are the contents of the input data file:
a,b
100,12
200,22
300,32
400,42
500,52
Here are the only two lines I used in PuTTY after connecting to the instance:
ubuntu#ip-122-31-22-243:~$ sudo su
root#ip-122-31-22-243:/home/ubuntu# R CMD BATCH Cloud_test_R_file_Nov22_2019.R
The R file is located in the ubuntu folder according to FileZilla, as are my input and output folders.
Can someone please point out my mistake? If I put the R file and input data set both in the ubuntu folder then the output data set is created in the ubuntu folder without me having to use a setwd statement (after I modify the read.csv and write.csv statements to eliminate my input and output folder names). So, I am not using a setwd statement here. If I need a setwd statement here what should it be?
Sorry for such a trivial question.
This code worked:
setwd('/home/ubuntu/')
my.data <- read.csv('retest_input_data/my_input_data_Nov24_2019.csv')
my.data$c <- my.data$a + my.data$b
write.csv(my.data, 'retest_output_data/my_output_data_Nov24_2019.csv', row.names = FALSE, quote = FALSE)
PuTTY line:
ubuntu#ip-122-31-22-243:~$ R CMD BATCH retest_R_file_Nov24_2019.R
Related
I want to read the CSV file "mydata.csv" as an input and create the output in the same directory using R. I have hard-coded for getting csv input(Domain_test.csv) and the output(MyData.csv) path as below. But I will have to share the same Rscript and the corresponding csv files with one of the users so that he/she can execute it and take the results. I want the user should to select his specific path where ever he wants and make it run without hard coding the input/output path in the script.
How it should be done in R?
#reading csv from this current directory
data <- read.csv("C:/Users/Desktop/input_output_directory/Domain_test.csv")
#generating the output In this same directory
write.csv(dataframe,"C:/Users/Desktop/input_output_directory/MyData.csv", row.names = FALSE)
You can use
wd <- choose.dir(default = "", caption = "Select folder")
setwd(wd)
I am new to using the Jupyter notebook with R kernel.
I have R code written in two files Settings.ipynb and Main_data.ipynb.
My Settings.ipynb file has a lot of details. I am showing sample details below
Schema = "dist"
resultsSchema = "results"
sourceName = "hos"
dbms = "postgresql" #Should be "sql server", "oracle", "postgresql" or "redshift"
user <- "hos"
pw <- "hos"
server <- "localhost/hos"
port <- "9763"
I would like to source Settings file in Main_data code file.
When I was using R studio, it was easy as I just use the below
source('Settings.R')
But now in Main_data Jupyter Notebook with R kernel, when I write the below piece of code
source('Settings.R') # settings file is in same directory as main_data file
I get the below error
Error in source("Settings.R"): Settings.R:2:11: unexpected '['
1: {
2: "cells": [
^
Traceback:
1. source("Settings.R")
When I try the below, I get another error as shown below
source('Settings.ipynb')
Error in source("Settings.ipynb"): Settings.ipynb:2:11: unexpected '['
1: {
2: "cells": [
^
Traceback:
1. source("Settings.ipynb")
How can I source an R code and what is the right way to save it (.ipynb or .R format in a jupyter notebook (which uses R kernel)). Can you help me with this please?
updated screenshot
We could create a .INI file in the same working directory (or different) and use ConfigParser to parse all the elements. The .INI file would be
Settings.INI
[settings-info]
schema = dist
resultsSchema = results
sourceName = hos
dbms = postgresql
user = hos
pw = hos
server = localhost/hos
Then, we initialize a parser object, read the contents from the file. We could have multiple subheadings (here it is only 'settings-info') and extract the components using either [[ or $
library(ConfigParser)
props <- ConfigParser$new()
props <- props$read("Settings.INI")$data
props[["settings-info"]]$schema
From the Jupyter notebook
the 'Settings.INI' file
Trying to save a Jupyter notebook file in .R format will not work as the format is a bit messed up (due to the presence of things like { "cells" : [....". You can verify this by opening your .R file in Jupyter Notebook.
However, you can use a vim editor/R studio to create a .R file. This will allow you to have the contents as is without any format issues such as { "cells" : [....".
Later from another jupyter notebook, you can import/source the .R file created using vim editor/R studio. This resolved the issue for me.
In summary, don't use jupyter notebook to create .R file and source them using another jupyter notebook file.
I'm using "studio (preview)" from Microsoft Azure Machine Learning to create a pipeline that applies machine learning to a dataset in a blob storage that is connected to our data warehouse.
In the "Designer", an "Exectue R Script" action can be added to the pipeline. I'm using this functionality to execute some of my own machine learning algorithms.
I've got a 'hello world' version of this script working (including using the "script bundle" to load the functions in my own R files). It applies a very simple manipulation (compute the days difference with the date in the date column and 'today'), and stores the output as a new file. Given that the exported file has the correct information, I know that the R script works well.
The script looks like this:
# R version: 3.5.1
# The script MUST contain a function named azureml_main
# which is the entry point for this module.
# The entry point function can contain up to two input arguments:
# Param<medals>: a R DataFrame
# Param<matches>: a R DataFrame
azureml_main <- function(dataframe1, dataframe2){
message("STARTING R script run.")
# If a zip file is connected to the third input port, it is
# unzipped under "./Script Bundle". This directory is added
# to sys.path.
message('Adding functions as source...')
if (FALSE) {
# This works...
source("./Script Bundle/first_function_for_script_bundle.R")
} else {
# And this works as well!
message('Sourcing all available functions...')
functions_folder = './Script Bundle'
list.files(path = functions_folder)
list_of_R_functions <- list.files(path = functions_folder, pattern = "^.*[Rr]$", include.dirs = FALSE, full.names = TRUE)
for (fun in list_of_R_functions) {
message(sprintf('Sourcing <%s>...', fun))
source(fun)
}
}
message('Executing R pipeline...')
dataframe1 = calculate_days_difference(dataframe = dataframe1)
# Return datasets as a Named List
return(list(dataset1=dataframe1, dataset2=dataframe2))
}
And although I do print some messages in the R Script, I haven't been able to find the "stdoutlogs" nor the "stderrlogs" that should contain these printed messages.
I need the printed messages for 1) information on how the analysis went and -most importantly- 2) debugging in case the code failed.
Now, I have found (on multiple locations) the files "stdoutlogs.txt" and "stderrlogs.txt". These can be found under "Logs" when I click on "Exectue R Script" in the "Designer".
I can also find "stdoutlogs.txt" and "stderrlogs.txt" files under "Experiments" when I click on a finished "Run" and then both under the tab "Outputs" and under the tab "Logs".
However... all of these files are empty.
Can anyone tell me how I can print messages from my R Script and help me locate where I can find the printed information?
Can you please click on the "Execute R module" and download the 70_driver.log? I tried message("STARTING R script run.") in an R sample and can found the output there.
I have a directory made of 50 files, here's an excerpt about how the files are names :
input1.txt
input2.txt
input3.txt
input4.txt
I'm writing the script in R but I'm using bash commands inside it using "system"
I have a system command X that takes one file and outputs it to one file
example :
X input1.txt output1.txt
I want input1.txt to output to output1.txt, input2.txt to output to output2.txt etc..
I've been trying this:
for(i in 1:50)
{
setwd("outputdir");
create.file(paste("output",i,".txt",sep=""));
setwd("homedir");
system(paste("/usr/local/bin/command" , paste("input",i,".txt",sep=""),paste("/outputdir/output",i,".txt",sep="")));
}
What am I doing wrong? I'm getting an error at the line of system , it says incorrect string constant , I don't get it.. Did I apply the system command in a wrong manner?
Is there a way to get all the input files and output files without going through the paste command to get them inside system?
There is a pretty easy method in R to copy files to a new directory without using the system commands. This also has the benefit of being cross-capable on different operating systems (just have to change the files structures).
Modified code from: "Copying files with R" by Amy Whitehead
Using your method of running for files 1:50 I have some psudocode here. You will need to change the current.folder and new.folder to something
# identify the folders
current.folder <- "/usr/local/bin/command"
new.folder <- "/outputdir/output"
# find the files that you want
i <- 1:50
# Instead of looping we can use vector pasting to get multiple results at once!
inputfiles <- paste0(current.folder,"/input",i,".txt")
outputfiles <- paste0(new.folder,"/output",i,".txt")
# copy the files to the new folder
file.copy(inputfiles, outputfiles)
I would like to be able to open files quickly in Excel after saving them. I learned from R opening a specific worksheet in a excel workbook using shell.exec 1 on SO
On my Windows system, I can do so with the following code and could perhaps turn it into a function: saveOpen <_ function {... . However, I suspect there are better ways to accomplish this modest goal.
I would appreciate any suggestions to improve this multi-step effort.
# create tiny data frame
df <- data.frame(names = c("Alpha", "Baker"), cities = c("NYC", "Rome"))
# save the data frame to an Excel file in the working directory
save.xls(df, filename "test file.xlsx")
# I have to reenter the file name and add a forward slash for the paste() command below to create a proper file path
name <- "/test file.xlsx"
# add the working directory path to the file name
file <- paste0(getwd(), name)
# with shell and .exec for Windows, open the Excel file
shell.exec(file = file)
Do you just want to create a helper function to make this easier? How about
save.xls.and.open <- function(dataframe, filename, ...) {
save.xls(df, filename=filename, ...)
cmd <- file.path(getwd(), filename)
shell.exec(cmd)
}
then you just run
save.xls.and.open(df, filename ="testfile.xlsx")
I guess it doesn't seem like all that many steps to me.