Passing R script variables into a batch script - r

In an R script, I'm executing a batch file inside of a for loop.
for (i in 1:2){
shell(shQuote("\\\\NETWORK\\PATH\\TO\\THE\\FILE.BAT", "cmd"))
}
The batch script creates data and moves it to a SQL table that looks like this:
| Name | Version | Category | Value | Number | Replication |
|:-----:|:-------:|:--------:|:-----:|:------:|:-----------:|
| File1 | 1.0 | Time | 123 | 1 | 1 |
| File1 | 1.0 | Size | 456 | 1 | 1 |
| File2 | 1.0 | Time | 312 | 1 | 1 |
| File2 | 1.0 | Size | 645 | 1 | 1 |
| File1 | 1.0 | Time | 369 | 1 | 2 |
| File1 | 1.0 | Size | 258 | 1 | 2 |
| File2 | 1.0 | Time | 741 | 1 | 2 |
| File2 | 1.0 | Size | 734 | 1 | 2 |
| File1 | 1.1 | Time | 997 | 2 | 1 |
| File1 | 1.1 | Size | 997 | 2 | 1 |
| File2 | 1.1 | Time | 438 | 2 | 1 |
| File2 | 1.1 | Size | 735 | 2 | 1 |
| File1 | 1.1 | Time | 786 | 2 | 2 |
| File1 | 1.1 | Size | 486 | 2 | 2 |
| File2 | 1.1 | Time | 379 | 2 | 2 |
| File2 | 1.1 | Size | 943 | 2 | 2 |
| File1 | 1.2 | Time | 123 | 3 | 1 |
| File1 | 1.2 | Size | 456 | 3 | 1 |
| File2 | 1.2 | Time | 312 | 3 | 1 |
| File2 | 1.2 | Size | 645 | 3 | 1 |
| File1 | 1.2 | Time | 369 | 3 | 2 |
| File1 | 1.2 | Size | 258 | 3 | 2 |
| File2 | 1.2 | Time | 741 | 3 | 2 |
| File2 | 1.2 | Size | 734 | 3 | 2 |
| File1 | 1.3 | Time | 997 | 4 | 1 |
| File1 | 1.3 | Size | 997 | 4 | 1 |
| File2 | 1.3 | Time | 438 | 4 | 1 |
| File2 | 1.3 | Size | 735 | 4 | 1 |
However, I'd like for the Number and Replication column to be declared in the R script, not the batch file.
I know I can do that like this:
Replication <- i
Number <- as.integer(sqlQuery(dbhandle, "select max(Number) from Table"))
Number<-ifelse(is.na(Number), 1, Number + 1)
My question though is how can I pass these variables into the batch script? Can I pass parameters into the batch script?
So that in my batch script, I could have something similar to this:
set Rep=[Replication variable from R]
set Num=[Number variable from R]

Related

R combine 3 dataframes and perform operations

I have 3 dataframes which have different row numbers. I want to perform some operation on 2 dataframes based on row values in third dataframe.
dataframe 1:
+--------------------------+
| V1 Particlei Particlej |
+--------------------------+
| <chr> <dbl> <dbl> |
| 1 conf10 6 1829 |
| 2 conf10 6 13928 |
| 3 conf10 8 2875 |
| 4 conf10 8 13765 |
| 5 conf10 9 3184 |
| 6 conf10 9 11139 |
+--------------------------+
dataframe 2
+----------+----------+------------+-------------+
| V1 | cluster | position.x | position.y |
+----------+----------+------------+-------------+
| <chr> | <dbl> | <dbl> | <dbl> |
| 1 conf10 | 6 | 0.000659 | 0.00932 |
| 2 conf10 | 8 | 0.0291 | 0.00922 |
| 3 conf10 | 10 | 0.0101 | 0.00380 |
| 4 conf10 | 12 | -0.0103 | 0.00379 |
| 5 conf10 | 14 | 0.0165 | 0.000900 |
| 6 conf10 | 16 | -0.000554 | 0.0112 |
+----------+----------+------------+-------------+
and dataframe 3
+----------+----------+--------------------+------------+
| V1 | cluster | position.x | position.y |
+----------+----------+--------------------+------------+
| <chr> | <dbl> | <dbl> | <dbl> |
| 1 conf9 | 7 | -0.0104 | 0.000920 |
| 2 conf9 | 9 | -0.00426 0.0139 | |
| 3 conf9 | 11 | 0.0249 | 0.0164 |
| 4 conf9 | 13 | -0.0146 | 0.00242 |
| 5 conf9 | 15 | -0.0176 | 0.00220 |
| 6 conf9 | 17 | -0.0183 | 0.00620 |
+----------+----------+--------------------+------------+
I want to do row wise operation based on data1 values. For example I want to check that for each row in data1 if the values in columns Particlei and particlej are present in column cluster of data 2 and 3. After detecting if the values are present then perform some operations on rows in data2 and 3. For example for row number 1 in data1 I have 6 and 1829 so I want to select rows in column cluster in data2 and 3 which have 6 and 1829 and after selecting subtract column position.x of data3 from data2 for the two selected rows. similarly subtract column position.y of data3 from data2. do all these operations row wise. What I did till now
for(i in row_number(data3)){
y <- data1 %>% filter(any(data3[,1:2]==data2$cluster))
if(any(data2$cluster==data3[,1:2])){
while(any(data2$cluster==data3[,1])){
delta_x = data2$position.x-data1$position.x
delta_y = data2position.y-data1$position.y
}
}
expected output
+---------------+------------+-------------------+-------------------+------------------+------------------+-----------+-------------------------------------------------+-----------+-----------+
| | | | | | | | | | |
| V1 | cluster| position.x_data3 | position.y_data3 | position.x_data2 | position.y_data2 | delta.x | delta.y | particlei | particlej |
| +---------+ | | | | | | | | | |
| <chr> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | | |
| 1 conf9,10 | 6 | -0.0104 | 0.000920 | 0.000659 | 0.00932 | -0.011059 | -0.0084 | 6 | 1829 |
| 2 conf9,10 | 1829 | -0.00426 | 0.0139 | 0.000659 | 0.000659 | 0.000659 | 0.000575 | 6 | 1829 |
| 3 conf9,10 | 7 | 0.0249 | 0.0164 | ... | ,... | ... | some values subtracted between position columns | 7 | 13928 |
| 4 conf9,10 | 13928 | -0.0146 | 0.00242 | some values | some values | ... | ... | 7 | 13928 |
+---------------+------------+-------------------+-------------------+------------------+------------------+-----------+-------------------------------------------------+-----------+-----------+

Replacing multiple observations in multiple columns

I have two dataframes one with the original information and the second one with corrections about the first observations. I would like to create a function or find a way to replace in multiple columns the information I have in my first dataframe with the new information I received. I have an ID to identify the observations that need to be replace but since so many columns will be changing for certain IDs I don´t know which will be the appropriate way of changing them.
My first data frame has 500 columns and 1000 observations and my second data frame has 100 columns and 800 observations that will change the original dataframe. I don´t know how to efficiently replace those values according to the ID
Here is an example of what the 2 dataframes look like, I need to replace in multiple columns just some values and a merge is not the most efficient options since I have more than 100 columns at least that will need changes in some of the observations.
I just need to replace the new info and keep the old one
enter image description here
Dataframe 1
|ID | X1 | X2 | X3 | X4 | XN |
|a1 | 1 | 1 | 1 | 1 | 1 |
|a2 | 2 | 2 | 2 | 2 | 2 |
|a3 | 3 | 3 | 3 | 3 | 3 |
|a4 | 4 | 4 | 4 | 4 | 4 |
|a5 | 5 | 5 | 5 | 5 | 5 |
|an | 6 | 6 | 6 | 6 | 6 |
dataframe 2
|ID | X1 | X2 | X4|
|a1 | 8 | | 4 |
|a3 | | | 2 |
|a4 | 2 | 9 | |
|an | 1 | | 3 |
The outcome should have the old values of dataframe 1 just with the replacements I got from dataframe 2
outcome
|ID | X1 | X2 | X3 | X4 | XN |
|a1 | 8 | 1 | 1 | 4 | 1 |
|a2 | 2 | 2 | 2 | 2 | 2 |
|a3 | 3 | 3 | 3 | 2 | 3 |
|a4 | 2 | 9 | 4 | 4 | 4 |
|a5 | 5 | 5 | 5 | 5 | 5 |
|an | 1 | 6 | 6 | 3 | 6 |

Modifying function to for EXPSS summary

Hi I am trying to create a function for EXPSS table, sample data below
dput( df<-data.frame(
aa = c("q","r","y","v","g","y","d","s","n","k","y","d","s","t","n","u","l","h","x","c","q","r","y","v","g","y","d","s","n","k","y","d","s","t","n","u","l","h","x","c"),
col1=c(1,2,3,2,1,2,3,4,4,4,5,3,4,2,1,2,5,3,2,1,2,4,2,1,3,2,1,2,3,1,2,3,4,4,4,1,2,5,3,5),
col2=c(2,1,1,7,4,1,2,7,5,7,2,6,2,2,6,3,4,3,2,5,7,5,6,4,4,6,5,6,4,1,7,7,2,7,7,2,3,7,2,4)
)
)
function i created is
sum1 <- cro_cpct(df1[[1]],df2[[2]])
}
now i want to add a criteria in this function on total, if the total falls in (3,4,5) then the whole column will replace by "--".
Something like this:
library(expss)
dataa<-data.frame(
aa = c("q","r","y","v","g","y","d","s","n","k","y","d","s","t","n","u","l","h","x","c","q","r","y","v","g","y","d","s","n","k","y","d","s","t","n","u","l","h","x","c"),
col1=c(1,2,3,2,1,2,3,4,4,4,5,3,4,2,1,2,5,3,2,1,2,4,2,1,3,2,1,2,3,1,2,3,4,4,4,1,2,5,3,5),
col2=c(2,1,1,7,4,1,2,7,5,7,2,6,2,2,6,3,4,3,2,5,7,5,6,4,4,6,5,6,4,1,7,7,2,7,7,2,3,7,2,4)
)
tab1 <- cro_cpct(dataa$aa,dataa$col1)
total_row = grep("#", tab1[[1]])
tab1[total_row, -1] = ifelse(tab1[total_row, -1]<8, "--", tab1[total_row, -1])
tab1
# | | | dataa$col1 | | | | |
# | | | 1 | 2 | 3 | 4 | 5 |
# | -------- | ------------ | ---------- | ---- | ---- | ---- | -- |
# | dataa$aa | c | 12.5 | | | | 25 |
# | | d | 12.5 | | 37.5 | | |
# | | g | 12.5 | | 12.5 | | |
# | | h | | | 12.5 | | 25 |
# | | k | 12.5 | | | 12.5 | |
# | | l | | 8.3 | | | 25 |
# | | n | 12.5 | | 12.5 | 25.0 | |
# | | q | 12.5 | 8.3 | | | |
# | | r | | 8.3 | | 12.5 | |
# | | s | | 8.3 | | 37.5 | |
# | | t | | 8.3 | | 12.5 | |
# | | u | 12.5 | 8.3 | | | |
# | | v | 12.5 | 8.3 | | | |
# | | x | | 8.3 | 12.5 | | |
# | | y | | 33.3 | 12.5 | | 25 |
# | | #Total cases | 8.0 | 12.0 | 8.0 | 8.0 | -- |

r increment column value based on another column value

I have a datatable x like this
+----+---------------+-------+
| id | arg | value |
+----+---------------+-------+
| 1 | New Day | NA |
| 2 | Eat breakfast | 3 |
| 3 | Bike | 45 |
| 4 | New Day | 0 |
| 5 | Get coffee | 1 |
| 6 | Exercise | 15 |
| 7 | Get beer | NA |
| 8 | New Day | |
| 9 | Pet cat | |
+----+---------------+-------+
I would like to add an incrementing column for every day to get something like this
+----+---------------+-------+-----+
| id | arg | value | day |
+----+---------------+-------+-----+
| 1 | New Day | NA | 1 |
| 2 | Eat breakfast | 3 | 1 |
| 3 | Bike | 45 | 1 |
| 4 | New Day | 0 | 2 |
| 5 | Get coffee | 1 | 2 |
| 6 | Exercise | 15 | 2 |
| 7 | Get beer | NA | 2 |
| 8 | New Day | | 3 |
| 9 | Pet cat | | 3 |
+----+---------------+-------+-----+
I have tried this without much success
x$day <-0
x<-within(x, day<-ifelse(arg == "New day", day+1, day))
As pointed by #A.Webb
cumsum(arg == "New day")

SQLite query select best option depending on a max value

I have a probably pretty hard question/situation:
I have a database to divide several tasks to some workers.
In the next example I have two tasks (Task 1 and Task 2) and 4 Employee's(1, 2, 3 and 4)
The maximum employee's that works on 1 task is three. Therefore I have 3 columns to get all possible options (in this example, not every option is shown!). The last column is a value which indicate how good the option is (the higher the number, the better).
The goal is to get the most optimal situation which means:
Every employee have to do one task (and cannot do 2 tasks)
The sum of the values is the highest possible value
+------------+------------+------------+------+--------+
| Employee_1 | Employee_2 | Employee_3 | Task | Value |
+------------+------------+------------+------+--------+
| 1 | | | 1 | 5.0 |
| 2 | | | 1 | -2.5 |
| 3 | | | 1 | 1.0 |
| 4 | | | 1 | 0.5 |
| 1 | 2 | | 1 | 0.5 |
| 1 | 4 | | 1 | 5,0 |
| 1 | 2 | 3 | 1 | 0.33 |
| 2 | 3 | | 1 | -4.5 |
| 2 | 3 | 4 | 1 | -6.5 |
| 3 | 4 | | 1 | 3.0 |
| 1 | | | 2 | 1.0 |
| 2 | | | 2 | 2.0 |
| 3 | | | 2 | -5.0 |
| 4 | | | 2 | 3.0 |
| 1 | 2 | | 2 | -2.0 |
| 1 | 2 | 3 | 2 | -3.5 |
| 2 | 3 | | 2 | 5.0 |
| 2 | 3 | 4 | 2 | 0.5 |
| 3 | 4 | | 2 | 2.0 |
+------------+------------+------------+------+--------+
As you can see: sometimes it is better for the productivity:
Employee 1 gets a value of 5 on task 1
Employee 4 gets a value of 0.5 on task 1
Employee 1 and 3 gets a value of 5,0 on task 1
In this situation it is better that Employee 1 and 3 works separate and the query should give both lines:
+------------+-------------+------------+-------+---------+
| Employee_1 | Employee_2 | Employee_3 | Task | Value |
+------------+-------------+------------+-------+---------+
| 1 | | | 1 | 5.0 |
| 4 | | | 1 | 0.5 |
+------------+-------------+------------+-------+---------+
The real solution for this example should be:
+------------+-------------+------------+-------+---------+
| Employee_1 | Employee_2 | Employee_3 | Task | Value |
+------------+-------------+------------+-------+---------+
| 1 | | | 1 | 5.0 |
| 2 | 3 | | 2 | 5.0 |
| 4 | | | 2 | 3.0 |
+------------+-------------+------------+-------+---------+
Since employee 1 has a very high value on its own on task 1
Employee 3 is really bad on his own, but together with employee 2 they do great on task 2
Employee 4 is the only one who is left en this employee is pretty good at task 2.
The problem is to write the query to get this result

Resources