As a transitional step, this site will temporarily be made Read-Only from July 8th until the new community launch. During this time, you can still search and read articles and discussions.
While the community is read-only, if you have questions or issues requiring TIBCO review/response, please access the new TIBCO Community and select "Ask A Question."
You will need to register or log in or register to engage in the new community.
Data Function for TIBCO® Data Science - Team Studio in TIBCO Spotfire®
This data function enables users to execute a TIBCO® Data Science - Team Studio workflow from Spotfire.
TIBCO Spotfire® TIBCO® Data Science Team Studio TIBCO® Data Science
The following requirements must be met to enable running Team Studio workflows from the data function extension: 1. Spotfire 7.13 or 10.x client and server 2. Latest copy of “TeamStudioDataFunction *.sdn” - part of this release. (The *.sdn file is a Spotfire distribution file that bundles three spk files: TeamStudioCore*.spk, TeamStudioForms*.spk and TeamStudioWeb*.spk). 3. Data Science Team Studio 6.4 or later 4. Data source set up in TIBCO Data Science - Team Studio. Any compatible data source (including TIBCO Data Virtualization)
The following requirements must be met to enable running Team Studio workflows from the data function extension:
1. Spotfire 7.13 or 10.x client and server
2. Latest copy of “TeamStudioDataFunction *.sdn” - part of this release. (The *.sdn file is a Spotfire distribution file that bundles three spk files: TeamStudioCore*.spk, TeamStudioForms*.spk and TeamStudioWeb*.spk).
3. Data Science Team Studio 6.4 or later
4. Data source set up in TIBCO Data Science - Team Studio. Any compatible data source (including TIBCO Data Virtualization)
TIBCO Component Exchange License
Data Function for TIBCO® Data Science - Team Studio in TIBCO Spotfire® enables users to execute a TIBCO® Data Science - Team Studio workflow from Spotfire. Users can utilize document properties and Team Studio data functions to execute workflows and bring back the results to update the Spotfire visualizations dashboard.
A demo is included and it provides an example showing how this data function can be used.
For more information on TIBCO® Data Science - Team Studio, view this Community Wiki page
Published: February 2022
- Libraries for enabling data function integration with Spotfire
- License information
Changes to previous release:
- Improved logging
- Improved Team Studio to Spotfire data type translation
- Web Player data function support (possibility to execute Team Studio data functions from Spotfire Web Client)
- Included a “skip” button on the credentials screen
The Data Science Team did such a great with this! This is something that is our first step to deeper integration with other TIBCO products and satisfies a common customer need.
Very exciting to see big data, advanced analytics capabilities tightly integrated with Spotfire like this!
Data Function for TIBCO® Data Science - Team Studio in TIBCO Spotfire®
“Data Function for TIBCO® Data Science - Team Studio in TIBCO Spotfire®” enables users to execute a workflow in TIBCO® Data Science - Team Studio from Spotfire. Users can utilize document properties and Team Studio data functions to execute workflows and bring back the results to update the Spotfire visualizations dashboard.
The following requirements must be met to enable using the Data Function for TIBCO Data Science - Team Studio (the “Team Studio Data Function”):
- Spotfire 7.13 (or later) client and server
- Latest copy of "TeamStudioDataFunction*.sdn" (The *.sdn file is a Spotfire distribution file that bundles three spk files: TeamStudioCore*.spk, TeamStudioForms*.spk and TeamStudioWeb*.spk), this distribution is available on this Exchange page for download
- TIBCO Data Science - Team Studio (“Team Studio”) version 6.4 or later
- Data Source set up in Team Studio, including TIBCO Data Virtualization data sources
“Data Function for TIBCO(R) Data Science - Team Studio in TIBCO Spotfire(R)” is available from the TIBCO Exchange here.
Installation and configuration
In order to add the Team Studio Data Function to the client software, you will need to upload and install the Spotfire .sdn package detailed in the “Prerequisites” section to a deployment area in the Spotfire server’s deployment section. The .sdn file must be added to the Spotfire client deployment that you intend to use with the data function which will update the desired deployment area and client configuration. After installation, any client will need to restart and connect to this area in order to receive the correct packages. Also, web player services intending to use the data function will need to be updated from the updated deployment area.
Click here for details on how to upload the .sdn file to the desired deployment area on the Spotfire Server.
You will also need to have access to (i.e. be a member of) the Team Studio workspace referenced by the Team Studio Data Function.
Running the Team Studio Data Function
The Data Function for TIBCO(R) Data Science - Team Studio in TIBCO Spotfire(R) lets Spotfire users execute a workflow in the Team Studio platform and bring back results in the form of data tables. These tables originate directly from workflow operators, or from SBDF (Spotfire Binary Data File) stored in the workspace. In addition, the Team Studio Data Function can trigger the reloading of tables when workflow results are stored in a database. This happens through Spotfire data connections upon successful execution. Since in this case the Team Studio workflow and Spotfire share a connection to the same data source (for example TIBCO DV), there is minimal data movement during the execution process.
The Team Studio Data Function differs from a typical Spotfire data function (e.g TERR or Python) in that it is split into two parts, in order to facilitate resuming the reading of results from long running, asynchronous jobs.
The first data function (Starter) initiates the job, and the second data function (Result) monitors it until completion. This is implemented automatically. If the Spotfire analysis file is saved after the Starter data function has finished executing, the Result data function will automatically resume polling for the data function results, even if Spotfire is shut down and restarted.
Example: Predicting Adult income class
The input dataset is based on the UCI Adult Income dataset (Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science). It contains 15 columns: 14 potential predictors (both numeric and categorical) and a target (income, with possible values <=50K and >50K).
The predictors describe socio-economic metrics of the adult US population. The target variable indicates whether their income falls above or below $50K per year.
This example builds a very simple data science process that generates a binary classification model to predict the income class.
An initial exploration of the dataset is performed (Summary Statistics operator). The data is then split into a Train and Test dataset. The Train dataset is used as input to a machine learning model (Alpine Forest Classification operator). The generated model and the Test dataset are then fed to a model assessment operator (Goodness of Fit) to test the model’s quality.
There is one input parameter (a workflow variable called @ntree) and there are three output tables returned to Spotfire (the results of Summary Statistics, the Variable Importance from Alpine Forest Classification, and the Goodness of Fit). In the following sections we will guide you through defining the inputs and outputs, and connecting them between Team Studio and Spotfire.
Figure 1: The Team Studio example workflow
Team Studio Steps
The following steps will help you reproduce the Team Studio example workflow. It is assumed that you are familiar with creating and running workflows in TIBCO Data Science - Team Studio.
- Go to Actions > Workflow Variables and create a new variable called @ntree. Set it to 10.
- Use a Hadoop File reader operator to import the Adult income dataset that you will have previously uploaded to your Team Studio instance data source. This example is based on a Hadoop data source.
- Attach to it a Summary Statistics operator. Select all columns and leave other defaults unchanged. In this example we set the Number of Most Common Values to Display to 3, to reduce output, but this is not strictly necessary.
- Attach an Export to SBDF (HD) operator to Summary Statistics. Set the Output File Name to SummaryStats.sbdf.
- Attach a Random Sampling operator to generate two samples containing 70% and 30% of the data, respectively.
- Extract the first sample (Train) using a Sample Selector operator.
- Feed the results into an Alpine Forest Classification operator.
- Set the Dependent Column to income.
- Select all the other columns as predictors, except for fnlwgt, education and native_country.
- Set the Number of Trees to @ntree, the workflow variable you just created.
- Extract the second sample (Test) using a Sample Selector operator.
- Feed the Test sample and the Alpine Forest Classification operator into a Goodness of Fit operator.
Your workflow is now ready to run. Press Run to check it runs and completes successfully.
At the end of the run:
- The Summary Statistics operator should have results similar to Figure 2 below.
Figure 2: Detail of Summary Statistics result table
- The Export to SBDF (HD) operator should have output SummaryStats.sbdf into the workspace.
- The Alpine Forest Classification operator should have results including a Variable Importance tab such as the one in Figure 3 below. The actual results may differ slightly depending on whether the random seed in the Random Sampling operator was set to a specific value.
Figure 3: Variable Importance result tab
- The Goodness of Fit operator should have Results showing a metrics table such as the one in Figure 4 below.
Figure 4: Goodness of Fit results
Setting up the Team Studio Data Function
Open up a Spotfire DXP with a data table in a Spotfire Analyst client. The Spotfire client should be connected to a Spotfire server with the required packages installed in them.
Go to the Tools menu and click Team Studio Data Function > Create New Team Studio Data Function to open a new window. This will appear as shown in Figure 5 below.
Figure 5: Initial dialog box
Enter the URL of your Team Studio instance into Team Studio location, and your Team Studio credentials into Login and Password. Press the Login button. Once logged on, select the desired Workspace and Workflow you want to connect to.
Note: It may take a few seconds for the Workspace/Workflow choices to populate.
Go to the Initiating Function Parameters tab, this will initially appear as in Figure 6.
Figure 6: Initiating Function Parameters dialog box
This tab is pre-populated with the Process ID parameter processid, a special variable that will contain the process ID of the Team Studio workflow execution. You don’t need to change anything here.
If the connected Team Studio workflow has Workflow Variables that need associating to input parameters in Spotfire, click to add them via the Add... button. The Name of each input parameter will need to be the same name as the corresponding Workflow Variable in your Team Studio workflow, excluding the “@” prefix.
- In our example, we will add an input parameter called ntree.
Once done, proceed to the Result Function Parameters tab, which will initially appear as in Figure 7.
Figure 7: Result Function Parameters dialog box
This tab is pre-populated with the success parameter, a special variable that will contain the timestamp of the successful Team Studio workflow execution. Its main purpose is to signal completion of the execution of the Team Studio workflow. You don’t need to change anything here, unless you want to use the success parameter to signal the refresh of a data table not directly returned by the Team Studio Data Function (this will be described in Section “Reading of Results outside of the Data Function”). Note that the processid parameter also appears here. It is used to automatically connect the Starter and Result data functions.
Click the Add... button, and add as many output parameters as there are data tables to be returned by the Team Studio workflow. These need to be defined as Type: Table. In order to map your Team Studio workflow output tables, you have three possible choices:
- Connect directly to an operator’s results. The Name of the parameter will need to reflect the exact label of the operator as it appears on the Team Studio workflow canvas. If the data table is taken directly from the Results, the operator’s label will be sufficient. If it is taken from a specific tab within the Results, the name of the output parameter will need to be the operator label plus a pipe (|) separator followed by the exact name of the tab.
- In our example, the output parameter from the Variable Importance tab of the Alpine Forest Classification operator will be called Alpine Forest Classification|Variable Importance. This is because there are multiple tabs in the Results, as shown in Figure 8
- The output parameter from Goodness of Fit will simply be called Goodness of Fit, as there are no further tabs in the results.
Figure 8: The three tabs from the Alpine Classification Results
Note: The length of the table is limited to 999 rows (i.e. the maximum row display limit set in Team Studio) if extracted through this method. Recommended option for small tables.
- Connect to SBDF Files – Workflow operator results that are exported as “.sbdf” into the workspace can be returned to Spotfire as tables. The Name of the output parameter will need to be the exact name of the generated file, including the .sbdf extension.
- In our example, the file generated by exporting the Summary Statistics results is called SummaryStats.sbdf.
- External Table Refresh. The Team Studio workflow may read or write from/to databases - including TIBCO DV. Using the external table refresh mechanism, it is possible to refresh already preloaded external data locations such as Hadoop tables at the end of a successful workflow execution. See Section “Reading of Results outside of the Data Function” for details.
After all the inputs and outputs have been defined, press Ok and proceed to map these to the appropriate objects within Spotfire. This part of the process is done similarly to the traditional Spotfire data functions, but you will need to take the processid and success parameters into account.
Starter function – Input Mapping
- map the specific input parameters you added, e.g. ntree in the example.
Starter function – Output Mapping
- map processid to the predefined document property ProcessId.
Result function – Input Mapping
- map again processid to the predefined document property ProcessId.
Result function – Output Mapping
- map success to the predefined document property Success.
- map the specific output parameters you added, e.g. Alpine Forest Classification|Variable Importance, Goodness of Fit and SummaryStats.sbdf in the example, to the desired output table names.
In order to make it easier for a user to run the Team Studio Data Function, it is a good idea to make the input parameter and the data function refresh action dynamic. To this end, you might set up a Textarea configured as Figure 9 below.
- The input field (here shown set to 40) writes into a Document Property called num_trees (the name is arbitrary; the property will need to be associated with ntree in the Starter Function - Input mapping tab).
- The Execute Data Function button is set to trigger the Starter function.
Figure 9: Invoking the Team Studio Data Function in a Spotfire Textarea
The three resulting data tables could be displayed in Spotfire as in Figure 10 below
Figure 10: Display of results in Spotfire
Additional Notes and Use Cases
Using the Web Player
TIBCO Spotfire Web Player can execute Team Studio Data Functions. In order to accomplish this, you will need to author a data function using TIBCO Spotfire Client Analyst and save the DXP file to either the TIBCO Spotfire localhost server or to a team server which has TIBCO Spotfire Web Player installed.
Once saved to the server location, you can then open the DXP file from the server location in a web browser, where you will be prompted to enter your Team Studio credentials and initiate execution of the data function call. See Figure 11 below.
Figure 11: Enter credentials for Web Player
Note: Users cannot create or edit the Team Studio Data Function from within the TIBCO Spotfire Web Player interface. You can author the Team Studio Data Function from within the TIBCO Spotfire Client Analyst.
Multiple data functions
It is possible to create more than one Team Studio Data Function, each pointing to a different Team Studio workflow. The only thing you need to make sure of is to keep separate Document Properties to map respectively to the processId and success parameters of each data function.
Also, when opening an analysis containing multiple DataFunctions, Spotfire may ask you to login multiple times due to the parallell execution of datafunctions resulting in a queue of login prompts being created before Spotfire has a chance to cache the login credentials. Once all DataFunctions have been executed once, credentials will be cached for following runs.
Reading of Results outside of the Data Function
One limitation with the Team Studio Data Function framework is that it can only return data into the Spotfire in-memory data engine: if the Team Studio workflow writes results to an external data source, Spotfire will not automatically be aware that this data has changed.
If the Spotfire Analysis already contains a data table that points (or links) to an external table that is not directly populated by the data function, i.e. added separately through a data connection, Spotfire will not automatically know if data has been refreshed at the source location (the Team Studio workflow).
If this is the case, you can add a mapping between the Team Studio Data Function’s Success document property and a refresh trigger. This trigger, when fired at the change of the Success value, will result in a reload of a specific data table in Spotfire as mapped in the Manage External Table section of the Result Function Parameters tab.
Figure 12: The Manage External Table dialogue
The Signal property will be mapped to the Success document property (the actual value of the success parameter) and the External data table is the specific data table we are interested in, as shown in Figure 13: in this image the data table is not selected yet, so the OK button is greyed out.
Figure 13: Setting up the External Table
- The document property used to store the processid parameter contains the cached process of the last run when the Spotfire analysis file was saved. When the Spotfire analysis is opened, if this property contains a value, the Result data function will try to use it to run automatically, and might throw an error if the process id is no longer present on the Team Studio server. Try re-executing the data function to resolve this error. Another solution is to remove values from these document properties before saving the Spotfire DXP. The same symptom (empty results because of a stale process id) may also occur when a user does not have permission to execute the workflow. Please ensure the user is a member of the Team Studio workspace.
- If a data function is linked to an Action button in a Textarea, and there are no inputs for the data function, then it may not re-execute when the button is pressed. The workaround is to use IronPython to directly execute the data function.
- If a data function appears to not run, this may be because the results of the workflow were empty. Please ensure the Team Studio workflow executes without errors by executing a test run directly inTeam Studio.
- If Team Studio does not execute the data function, it may be due to a login timeout. The workaround is to trigger a re-execution of the data function, for instance by toggling an input parameter.
- A data function is linked to a Team Studio workflow by its unique workflow ID. If a copy of this workflow is created, even if it is renamed to the same original name, the data function will still be pointing to the old workflow and will need to be edited to point to the updated workflow.
- If a workflow has been copied between Team Studio instances, any existing related data function instances must be edited and re-pointed to the new workflow location, even if the names of the workspace and workflows are the same across Team Studio installations. An HTTP 422 error in the log files when running a copied DXP analysis file may be an indication of this.