Is it possible to implement the following scenario in Power BI Desktop?
Load data from Excel file to several tables
Make calculation with R script from several data sources
Store results of calculation to new table in Power BI (.pbix)
The idea is to use Power BI Desktop for solving "transportation problem" with linear programming in R. Before solver will be running we need to make data transformations from several data sources. I'm new in Power BI. I see that it is possible to apply R scripts for loading and transformation of data, and visualizations. But I need the possibility of saving the results of calculation, for the subsequent visualization by the regular means of Power BI. Is it possible?
As I mentioned in my comment, this post would have solved most of your challenges. That approach replaces one of the tables with a new one after the R script, but you're specifically asking to produce a new table, presumably leaving the input tables untouched. I've recently written a post where you can do this using Python in the Power Query Editor. The only difference in your case would be the R script itself.
Here's how I would do it with an R script:
Data samples:
Table1
Date,Value1
2108-10-12,1
2108-10-13,2
2108-10-14,3
2108-10-15,4
2108-10-16,5
Table2
Date,Value2
2108-10-12,10
2108-10-13,11
2108-10-14,12
2108-10-15,13
2108-10-16,14
Power Query Editor:
With these tables loaded either from Excel or CSV files, you've got this setup in the Power Query Editor::
Now you can follow these steps to get a new table using an R script:
1. Change the data type of the Date Column to Text
2. Click Enter Data and click OK to get an empty table named Table3 by default.
3. Select the Transform tab and click Run R Script to open the Run R Script Edtor.
4. Leave it empty and click OK.
5. Remove = R.Execute("# 'dataset' holds the input data for this script",[dataset=#"Changed Type"]) from the Formula Bar and insert this: = R.Execute("# R Script:",[df1=Table1, df2=Table2]).
6. If you're promted to do so, click Edit Permission and Run.
7. Click the gear symbol next to Run R Scritp under APPLIED STEPS and insert the following snippet:
R script:
df3 <- merge(x = df1, y = df2, by = "Date", all.x = TRUE)
df3$Value3 <- df1$Value1 + df2$Value2
This snippet produces a new dataframe df3 by joining df1 and df2, and adds a new column Value3. This is a very simple setup but now you can do pretty much anything by just replacing the join and calculation methods:
8. Click Home > Close&Apply to get back to Power BI Desktop (Consider changing the data type of the Date column in Table3 from Text to Date before you do that, depending on how you'd like you tables, charts and slicers to behave.)
9. Insert a simple table to make sure everything went smoothly
I hope this was exactly what you were looking for. Let me know if not and I'll take another look at it.
Related
I am working in .NET Interactive (aka Polyglot) Notebooks in F# (but I believe the same would apply to C#). In my code, I am running functions that ultimately produce an F# list of floating point values, or alternatively might be an F# list of tuples which contain floating point values.
When I ask the notebook to display the variable, it shows the first 20 values and says ".. (more)." Ideally, I would like to either be able to download this data by pressing a link next to the table that's displayed, or alternatively, run some function that can copy the full data to the clipboard - similar to Pandas' to_clipboard function.
Is there a way to do this?
If you want to create a cell that, when run, copies the contents of a data frame to a clipboard, you can do this using the TextCopy package. For testing, I used the following (including also Deedle and extension for nicely rendering frames):
#i "nuget:https://www.myget.org/F/gregs-experimental-packages/api/v3/index.json"
#r "nuget:Deedle"
#r "nuget:Deedle.DotNet.Interactive.Extension,0.1.0-alpha9"
#r "nuget:TextCopy"
open Deedle
Let's create a sample data frame and a function to get its contents as CSV string:
let df =
Frame.ofRecords
[ for i in 0 .. 100 -> {| Name = $"Joe {i}" |} ]
let getFrameAsCsv (df:Frame<_, _>) =
let sb = System.Text.StringBuilder()
use sw = new System.IO.StringWriter(sb)
df.SaveCsv(sw)
sb.ToString()
To copy df to the clipboard, you can run:
TextCopy.ClipboardService.SetText(getFrameAsCsv df)
If you want to create a download link in the notebook output, this is also possible. You can use the HTML helper to output custom HTML and inside that, you can use the data: format to embed your CSV as a linked file in <a href=...> (as long as it is not too big):
let csv =
System.Convert.ToBase64String
(System.Text.UTF8Encoding.UTF8.GetBytes(getFrameAsCsv df))
HTML($"<a href='data:text/csv;name=file.csv;base64,{csv}'>Download CSV</a>")
Using AZ ML workbench for a class project (required tool) I coded the desired logic below in an exploration notebook but cannot find a way to include this into a Data-prep Transform Data flow.
all_columns = df.columns
sum_columns = [col_name for col_name in all_columns if col_name not in ['NPI', 'Gender', 'State', 'Credentials', 'Specialty']]
sum_op_columns = list(set(sum_columns) & set(df_op['Drug Name'].values))
The logic is using the column names from one data source df_op (opioid drugs) to choose which subset of columns to include from another data source df (all drugs). When adding a py script/expression Transform Data Flow I'm only seeing the ability to reference the single df. Alternatives?
I may have a way for you to access both data frames.
In Workbench, once you have the data sources that you need loaded, right click on one and select "Generate Data Access Code File".
Once there you're automatically given code to access that specific file. However, you can use the same code to access the other files.
In the screenshot above, I have two data sources. I can use the below code to access them both as a pandas data frame and manipulate them as I need.
df_salary = datasource.load_datasource('SalaryData.dsource')
df_startup = datasource.load_datasource('50-Startups.dsource')
I believe from there you can save your updated data frame to a CSV and then use that in the train script.
Hope that helps or at least points you to another solution.
I can see the entire data frame in the console. Is there any possible way or any function to view data frame in the R-Console (Editing similar to that of Excel) so that I should be able to edit the data manually?
S3 method for class 'data.frame'
You can use:
edit(name, factor.mode = c("character", "numeric"),
edit.row.names = any(row.names(name) != 1:nrow(name)), ...)
Example:
edit(your_dataframe)
You can go through in detail with the help of this link - Here
You really can use edit() or view().
But maybe, if you dataset isn't big enough, if you prefer to use Excel, you can use this function below:
library(xlsx)
view.excel<-function(inputDF,nrows=5000){
if (class(inputDF)!="data.frame"){
stop("ERROR: <inputDF> class is not \"data.frame\"")
}
if(nrow(inputDF)>5000 & nrows!=-1){
inputDF=inputDF[1:nrows,]
}
tempPath=tempfile(fileext='.xlsx')
write.xlsx(inputDF,tempPath)
system(paste0('open ',tempPath))
return(invisible(tempPath))
}
I've defined this function to help me with some tasks in R...
Basically, you only need to pass a DataFrame to the function as a parameter. The function by default display a maximum of 5000 rows (you can set the parameter nrows = -1 to view all the rows, but it may be slow).
This function opens your DataFrame in Excel and returns the path where your temporary view was saved. If you wanna save and load your temporary view, after changing something directly with Excel, you can load again your data frame with:
# Open a view in excel
tempPath <- view.excel(initialDF, nrows=-1)
# Load the file of the Excel View in the new DataFrame modifiedDF
modifiedDF <- read.xlsx(tempPath)
This function may works well in Linux, Windows or Mac.
You can view the dataframe with View():
View(df)
As #David Arenburg says, you can also open your dataframe in an editable view, but be warned this is slow:
edit(df)
For updates/changes to affect the dataframe use:
df <- edit(df)
Since a lot of people are using (and developing in) RStudio and Shiny nowadays, things have become far more convenient for R users.
You should look at the rhandsontable package.
There is also very nice Shiny implementation of rhandsontable, from a blog I stumbled upon: https://stla.github.io/stlapblog/posts/shiny_editTable.html. It's not using the console, but it is anyway super slick:
(A few years later) This may be worth trying if you use RStudio: It seems to support all data types. I did not use it extensively but helped me ocassionally:
https://cran.r-project.org/web/packages/editData/README.html
It shows an editing dialog by default. If your dataframe is big you can browse to http://127.0.0.1:7212 while the dialog is being shown, to get a resizable editing view.
You can view and edit a dataframe using with the fix() function:
# Open the mtcars dataframe for editing:
fix(mtcars)
# Edit and close.
# This produces the same result:
mtcars <- edit(mtcars)
# But it is a longer command to write.
I am creating a way to read in SPSS labels into R. Using library(sjPlot), view_spss(df, useViewer = FALSE) I can create a local html page such as http://localhost:11773/session/file1e0c67270a5.html that shows a nice table with columns for the variable names and the labels I am looking for.
Now I want to use rvest to scrape it but when I start with a command such as page <- rvest::html("http://localhost:11773/session/file1e0c67270a5.html") R just seems to get stuck.
I've tried searching for "connect with local host" but I can't seem to find any questions or answers related to the R package.
This doesn't really answer your specific question as I think the reason is that R spins up a non-persistent process to serve that HTML view of your data. But your approach seems quite round-a-bout to just get to variable labels. This is a general way that works quite well:
library(foreign)
d <- read.spss("your_data.sav", use.value.labels=TRUE, to.data.frame=FALSE)
var_labels <- attr(d, "variable.labels")
## To access the label of a variable named 'var_name':
var_labels[["var_name"]]
Where d results in a list of data, and var_labels is a named list of labels keyed by variable/column.
If you want to get variable and/or value label of SPSS-imported data, you can use get_val_labels and get_var_labels of the sjmisc-package.
See examples here. Both functions accept either a single variable (vector) or a data frame and return the associated variable and value labels. See also this blog post.
The sjmisc-Package supports data frames imported both with the haven- or foreign-package.
When applying R transform Field operation node in SPSS Modeler, for every script, the system will automatically add the following code on the top of my own script to interface with the R Add-on:
while(ibmspsscfdata.HasMoreData()){
modelerDataModel <- ibmspsscfdatamodel.GetDataModel()
modelerData <- ibmspsscfdata.GetData(rowCount=1000,missing=NA,rDate="None",logicalFields=FALSE)
Please note "rowCount=1000". When I process a table with >1000 rows (which is very normal), errors occur.
Looking for a way to change the default setting or any way to help to process table >1000 rows!
I've tried to add this at the beggining of my code and it works just fine:
while(ibmspsscfdata.HasMoreData())
{
modelerData <-rbind(modelerData,ibmspsscfdata.GetData(rowCount=1000,missing=NA,rDate="None",logicalFields=FALSE))
}
Note that you will consume a lot of memory with "big data" and parameters of .GetData() function should be set accordingly to "Read Data Options" in node setting.