bioprocessnexus package
Submodules
bioprocessnexus.data_managment module
- bioprocessnexus.data_managment.bool_switch(parent, name)
Sets the feature selection for a specific feature to “0” in the application’s selection interface.
- Parameters:
parent – The main application instance
name – The name of the feature whose selection is being modified.
This function directly modifies the “feature_selection” attribute in “parent”.
- bioprocessnexus.data_managment.choose_model(parent)
Prompts the user to select a .nexus model file and loads the specified model into the application.
- Parameters:
parent – The main application instance
This function validates the selected file format and path, loads feature selections, and stores relevant data within the “parent” instance for use in subsequent analysis.
- bioprocessnexus.data_managment.choose_y(parent)
Opens a window to allow the user to select response and feature variables from the loaded data.
- Parameters:
parent – The main application instance
This function checks if data is loaded, then generates a new window where the user can specify which columns to use as responses or features. This selection is stored within the application and influences further analysis or model building.
- bioprocessnexus.data_managment.fix_selection(parent)
Finalizes the selected features and responses for the application, applying user choices to the main instance.
- Parameters:
parent – The main application instance
The function converts selected features and responses into boolean arrays and stores them in the parent instance for further processing. Checks if at least one response and one feature are selected, otherwise shows an error message.
- bioprocessnexus.data_managment.mix_models(parent)
Allows the user to generate a mixture of experts model by combining multiple models from a specified directory.
- Parameters:
parent – The main application instance
This function prompts the user to select a directory containing model links, then reads available models and responses, populates a GUI window where the user can specify which models to include in the mixture. Finalizes the model combination by allowing a model name input.
- bioprocessnexus.data_managment.save_mixture_model(parent)
Saves the configured mixture model, verifying model compatibility and storing it in a specified directory.
- Parameters:
parent – The main application instance
This function checks feature compatibility across selected models, creates a directory for the new mixture model, and saves model-specific data in a consistent format. It then loads the new model for immediate use in the application.
- bioprocessnexus.data_managment.set_file_dir(parent)
Prompts the user to select a data file (CSV or Excel) and loads it into the application.
- Parameters:
parent – The main application instance
This function displays an information box, resets some attributes in parent, opens a file dialog for selecting a CSV or Excel file, reads the file, and processes it into the application’s data structure. It performs checks to ensure data format and compatibility.
bioprocessnexus.explanations module
- bioprocessnexus.explanations.denormalized_prediction(data_in, parent)
Makes predictions on normalized input data and denormalizes the output.
- Parameters:
data_in (numpy array) – Normalized input data for the model.
parent – The main application instance
- Returns:
A numpy array of denormalized predictions.
- Return type:
data_out
- bioprocessnexus.explanations.make_explanation(parent)
Initializes the SHAP explanation window, prompting the user to enter a fraction of observations to analyze.
- Parameters:
parent – The main application instance
- Raises:
Displays an error message if no model has been loaded. –
- bioprocessnexus.explanations.plot_explanation(parent)
Generates and displays SHAP explanations for model predictions on a fraction of the test dataset.
- Parameters:
parent – The main application instance
This function loads the model, computes SHAP values using KernelExplainer, and plots the explanations for each response variable. It handles memory errors gracefully by prompting the user to reduce the number of observations if necessary.
bioprocessnexus.helpers module
- bioprocessnexus.helpers.check_dir(parent, y_dir, dir_type, central_log=0)
Verifies and creates the necessary directory structure for storing logs or images.
- Parameters:
parent – The main application instance
y_dir (str) – Directory name for a specific response variable.
dir_type (str) – Type of directory to create (e.g., “logs”, “images”).
central_log (int, optional) – If set to 1, creates only the main directory without nested folders.
- Returns:
Path to the created directory, if central_log is 0.
- Return type:
str
- bioprocessnexus.helpers.denormalize(array, mus, stds)
Reverts normalization by applying mean and standard deviation scaling.
- Parameters:
array (numpy array) – Array to be denormalized.
mus (numpy array) – Array of means used for normalization.
stds (numpy array) – Array of standard deviations used for normalization.
- Returns:
Denormalized input array
- Return type:
denormalized_array
- bioprocessnexus.helpers.nice_round(num)
Rounds a number based on its magnitude to provide a concise output.
- Parameters:
num (float) – Number to be rounded.
- Returns:
Rounded number with appropriate precision.
- Return type:
float
- bioprocessnexus.helpers.normalize(array, mus, stds)
Normalizes the input array by subtracting the mean and dividing by the standard deviation.
- Parameters:
array (numpy array) – Array to be normalized.
mus (numpy array) – Array of means for each feature in “array”.
stds (numpy array) – Array of standard deviations for each feature in “array”.
- Returns:
Normalized input array
- Return type:
normalized_array
- bioprocessnexus.helpers.open_help()
Opens a web browser to the help tutorial URL.
- bioprocessnexus.helpers.unzip_dir(parent)
Extracts a zip file selected by the user into a new directory.
- Parameters:
parent – The main application instance
This function prompts the user to select a zip file, creates a directory with the same name, and extracts the zip file contents into this directory.
- bioprocessnexus.helpers.zip_dir(parent)
Compresses a directory selected by the user into a zip file.
- Parameters:
parent – The main application instance
This function prompts the user to select a directory, creates a zip file with the same name, and saves it in the same location.
bioprocessnexus.hist module
- bioprocessnexus.hist.check_hist_queue(parent)
Monitors the plotting queue for completed histograms and displays them in a new window.
- Parameters:
parent – The main application instance
This function continues to check the queue until all histograms are plotted.
- bioprocessnexus.hist.init_hist(parent)
Initializes the histogram plotting process, including checking for loaded models and initiating plot threads.
- Parameters:
parent – The main application instance
- Raises:
Displays an error message if no model has been loaded or if responses and features are not selected. –
- bioprocessnexus.hist.plot_hist(parent)
Generates and saves histograms of response data with fitted probability distributions.
- Parameters:
parent – The main application instance
This function fits a probability distribution to each response, plots the histogram with fitted distribution, and saves both the plot and distribution parameters.
bioprocessnexus.interact_hist module
- bioprocessnexus.interact_hist.clear_dist(parent)
Clears the fitted distribution and associated annotations from the histogram display.
- Parameters:
parent – The main application instance
- bioprocessnexus.interact_hist.fit_dist(parent)
Fits a probability distribution to the current histogram data and displays the PDF on the histogram.
- Parameters:
parent – The main application instance
- bioprocessnexus.interact_hist.get_probability(parent)
Calculates the probability within specified bounds on the fitted distribution and updates the UI.
- Parameters:
parent – The main application instance
- bioprocessnexus.interact_hist.interactive_hist(parent, y_dir)
Opens the interactive histogram window for the selected response, displaying histograms with interactive sliders.
- Parameters:
parent – The main application instance
y_dir (str) – Directory or identifier for the selected response variable.
- bioprocessnexus.interact_hist.interactive_hist_welcome(parent)
Opens the interactive histogram welcome window, displaying buttons to choose a response for analysis.
- Parameters:
parent – The main application instance
- bioprocessnexus.interact_hist.save_inter_hist(parent)
Saves the current interactive histogram as an image and logs distribution fit information.
- Parameters:
parent – The main application instance
- bioprocessnexus.interact_hist.update_hist(val, parent)
Updates the histogram display based on slider ranges, filtering data and adjusting the histogram view.
- Parameters:
val – Current slider values (not directly used).
parent – The main application instance
bioprocessnexus.main module
- class bioprocessnexus.main.App
Bases:
CTkClass that handles launching the disclaimer and then the GUI
- launch_nexus()
- class bioprocessnexus.main.disclaimer(master)
Bases:
CTkToplevelClass that handles the disclaimer
- bioprocessnexus.main.launch_nexus()
Function to launch the GUI
bioprocessnexus.mc_subsampling module
- bioprocessnexus.mc_subsampling.generate_data_interface(parent)
Initializes the data generation interface, allowing users to define distributions for features.
- Parameters:
parent – The main application instance
This function sets up UI elements for configuring the data generation process, including feature boundaries, distribution types, and parameter inputs.
- bioprocessnexus.mc_subsampling.generate_dataset(parent)
Generates a synthetic dataset based on user-defined feature distributions and saves it to an Excel file.
- Parameters:
parent – The main application instance
- Raises:
Shows error messages for incorrect parameter entries or distribution boundaries. –
- bioprocessnexus.mc_subsampling.handle_focus_in(_, parent, feature, box_identity)
Handles focus events for entry boxes, resetting placeholder text on focus.
- Parameters:
_ – Ignored positional argument for event handling.
parent – The main application instance
feature (str) – Feature name associated with the entry box.
box_identity (int) – Identifier for the specific parameter entry box.
- bioprocessnexus.mc_subsampling.update_dist_params(dist, feature, parent)
Updates distribution parameters and entry box states based on the selected distribution type.
- Parameters:
dist (str) – Selected distribution type (e.g., Fixed value, Gaussian).
feature (str) – Feature name for which the distribution is being set.
parent – The main application instance
This function sets placeholder text and enables or disables specific entry fields according to the requirements of the selected distribution.
bioprocessnexus.model_training module
- bioprocessnexus.model_training.PLS_optimize_train(X_train, y_train, splitting_ratio=5)
Trains a Partial Least Squares (PLS) regression model by finding the optimal number of components.
- Parameters:
X_train (array-like) – Training feature data.
y_train (array-like) – Training target data.
splitting_ratio (int) – Ratio used to split data into training and validation subsets.
- Returns:
Trained PLS model with the optimal number of components.
- Return type:
PLSRegression
- bioprocessnexus.model_training.initial_normalize(array)
Normalizes data by subtracting the mean and dividing by the standard deviation.
- Parameters:
array (numpy array) – Data to be normalized.
- Returns:
Normalized array, means, and standard deviations.
- Return type:
tuple
- bioprocessnexus.model_training.train_GP(parent, splitting_ratio=5)
Trains and saves a Gaussian Process (GP) model, with stratified sampling for train-test split.
- Parameters:
parent – The main application instance
splitting_ratio (int) – Ratio used to split data into training and test subsets.
- bioprocessnexus.model_training.train_PLS(parent, splitting_ratio=5)
_summary_ Trains and saves a Partial Least Squares (PLS) model, using stratified sampling for train-test split.
- Parameters:
parent – The main application instance
splitting_ratio (int) – Ratio used to split data into training and test subsets.
- bioprocessnexus.model_training.train_RF(parent, splitting_ratio=5)
Trains and saves a Random Forest (RF) model, with stratified sampling for train-test split.
- Parameters:
parent – The main application instance
splitting_ratio (int) – Ratio used to split data into training and test subsets.
- bioprocessnexus.model_training.train_models(parent)
Initiates the model training interface, allowing the user to select model types and provide names.
- Parameters:
parent – The main application instance
- Raises:
Displays an error message if data or response features are not loaded. –
- bioprocessnexus.model_training.train_save_params(parent, mother_dir, model_type)
Saves model and feature selection parameters in the specified directories.
- Parameters:
parent – The main application instance
mother_dir (str) – Directory path where model and data are saved.
model_type (str) – Type of model to train (PLS, RF or GP).
- Returns:
Path to the save folder where model parameters are stored.
- Return type:
str
bioprocessnexus.optimizer module
- bioprocessnexus.optimizer.check_hbo_queue(parent)
Checks the HBO queue for completed optimization results and updates the interface.
- Parameters:
parent – The main application instance containing the optimization settings and inputs.
When results are found in the queue, this function applies the optimized parameters to the relevant UI fields and saves a log of the optimization process.
- bioprocessnexus.optimizer.hoptr(parent)
Starts a new thread for the hyperparameter optimization (HBO) and monitors the queue.
- Parameters:
parent – The main application instance where results will be displayed.
This function initiates a separate thread to perform optimization while monitoring the queue for completed results, allowing for non-blocking UI updates.
- bioprocessnexus.optimizer.intiate_optimizer_main(parent, param_space)
Checks if all optimization weights and iteration settings are valid before starting optimization.
- Parameters:
parent – The main application instance
param_space – Dictionary of parameters to be passed to the optimization process.
Displays errors if weights are missing, out of bounds, or if iterations are not set. Otherwise, begins the optimization process and provides feedback to the user.
- bioprocessnexus.optimizer.optimize(parent)
Sets up and displays the optimization interface, allowing users to specify parameters.
- Parameters:
parent – The main application instance
Creates sliders and entry fields for setting weights and iteration limits for the optimization, and triggers optimization when all inputs are correctly configured.
- bioprocessnexus.optimizer.run_hopt(param_space, queue)
Runs the hyperparameter optimization using TPE and stores results in a queue.
- Parameters:
param_space – Dictionary containing search space and optimization settings.
queue – Queue object to hold the results for later retrieval.
The function uses “fmin” to perform optimization, evaluating the target_function.
- bioprocessnexus.optimizer.target_function(param_space)
Defines the target function for optimization based on model predictions and normalized output.
- Parameters:
param_space – Dictionary containing feature values, model directory, optimization ratios, and bounds.
- Retruns:
target: Target value to minimize during optimization
This function evaluates models based on normalized predictions, applies optimization ratios, and returns a target value to minimize during optimization.
bioprocessnexus.performance_eval module
- bioprocessnexus.performance_eval.check_plot_predictions_queue(parent)
Continuously checks the plotting queue and updates the UI with completed prediction plots.
- Parameters:
parent – The main application instance
This function retrieves completed plots from the queue and displays them in a new window. It repeats the check until the number of displayed plots matches the expected number of plots.
- bioprocessnexus.performance_eval.init_plot_predictions(parent)
Initializes plotting predictions vs. observations by checking if a model is loaded and starting the plotting thread.
- Parameters:
parent – The main application instance
This function displays an error if no model is loaded; otherwise, it initializes the number of plots and starts the plot prediction process in a separate thread.
- bioprocessnexus.performance_eval.plot_predictions(parent)
Generates and saves predictions vs. observations plots for model evaluation.
- Parameters:
parent – The main application instance
This function loads test data and models, makes predictions, denormalizes them, and calculates performance metrics (RMSE, NRMSE). The results are plotted and saved as images, either in a single consolidated image or individually per response variable.
bioprocessnexus.prediction_making module
- bioprocessnexus.prediction_making.batch_prediction_template(parent)
Generates a template Excel file for batch predictions with required feature columns.
- Parameters:
parent – The main application instance.
Creates a template with the column names matching the features of the loaded model and saves it in the model directory.
- bioprocessnexus.prediction_making.batch_prediction_welcome(parent)
Opens the batch prediction welcome window for template generation or batch prediction.
- Parameters:
parent – The main application instance
Provides options for generating a template or making batch predictions through appropriate buttons and handlers.
- bioprocessnexus.prediction_making.make_batch_predictions(parent)
Loads a batch of inputs, makes predictions for each, and saves the results.
- Parameters:
parent – The main application instance.
This function prompts the user to load a batch input file (Excel/CSV), normalizes data, runs predictions across all models, and saves the outputs along with original inputs. Shows an error if the batch file format is incorrect.
- bioprocessnexus.prediction_making.make_predictions(parent)
Opens a prediction window and initializes features, boundaries, and response variables.
- Parameters:
parent – The main application instance
If a model is loaded, this function fetches feature and response names, sets boundaries, and prepares the prediction window. It also enables buttons for calculating outputs and optimizing inputs based on user entries.
- bioprocessnexus.prediction_making.predict(parent)
Executes prediction based on user-entered features and updates the output display.
- Parameters:
parent – The main application instance
This function collects user inputs, normalizes them, predicts outcomes using the selected model, and displays results. If inputs are incorrect, an error message is shown.
bioprocessnexus.scaling_performance module
- bioprocessnexus.scaling_performance.check_data_scaling_queue(parent)
Checks the data scaling queue for completed scaling evaluations and updates the UI with results.
- Parameters:
parent – The main application instance
This function continuously monitors the queue for any completed scaling evaluations, displaying results in a new window as they are available.
- bioprocessnexus.scaling_performance.data_scaling(parent)
Performs data scaling evaluation by subsampling, training models, and plotting performance metrics.
- Parameters:
parent – The main application instance
- The function guides users through scaling evaluation, including:
Sample subsampling, training of different model types, RMSE/NRMSE calculation, and saving the performance plots.
All plots and logs are saved in specified directories, and results are shown in the UI.
- bioprocessnexus.scaling_performance.init_data_scaling(parent)
Initializes data scaling by checking model loading status and starting the scaling process in a new thread.
- Parameters:
parent – The main application instance
If no model is loaded, displays an error message. Otherwise, sets up the necessary attributes and starts a new thread for data scaling and evaluation.
Module contents
bioprocessnexus package
This package provides tools and modules for processing, analyzing, and managing techno economic bioprocess data. It includes functions and classes for data management, model training, prediction, performance evaluation, and SHAP explanations.
- Modules:
data_management: Functions for managing and processing data.
explanations: Tools for generating SHAP explanations.
helpers: Utility functions for various tasks.
hist: Functions related to histogram processing and analysis.
interact_hist: Interactive histogram functionalities.
main: The main execution script.
mc_subsampling: Monte Carlo subsampling utilities.
model_training: Tools for training predictive models.
optimizer: Optimization of model responses.
performance_eval: Functions for evaluating model performance.
prediction_making: Functions for making predictions with trained models.
scaling_performance: Utilities for data scaling and performance testing.