API
This page details the methods and classes provided by the memocell module. Some
plotting helper methods and input validation methods are flagged as private and will
not appear below; please see the raw code on github for these.
Networks
The network module contains the Network class for the symbolic handling of network topologies and waiting time structures, to be used for downstream simulation and estimation/selection methods.
- class memocell.network.Network(net_name)
Bases:
objectClass for defining a symbolic network with main and hidden layer.
Main method is structure which requires a specific net_structure input. For a typical use case, the structure method is the only method to call; structure is a wrapper method for the other class methods, see there for more documentation.
- Parameters:
net_name (str) – A name for the network object.
- Returns:
net – Initialised memocell network object. Typically, continue with the net.structure method to further specify the network.
- Return type:
Examples
>>> import memocell as me >>> import numpy as np >>> # network with symmetric division reaction >>> # of a 5-step Erlang waiting time and mean >>> # division time 1/l >>> t = [{'start': 'X_t', 'end': 'X_t', >>> 'rate_symbol': 'l', >>> 'type': 'S -> S + S', 'reaction_steps': 5}] >>> net = me.Network('net_div_erl5') >>> net.structure(t)
- structure(net_structure)
Main method of the network class to define the network structure with main and hidden layer. Updates a network object in place, that can then be passed on to simulation or estimation methods.
Note: The basic reaction module specifies an Erlang-type waiting time (reaction_steps as shape parameter and 1/rate_symbol as mean waiting time). The standard Markov model is obtained when reaction_steps is set to 1 for all reactions of the network. More advanced phase-type waiting times can be generated by specifying two or more Erlang-type channels in parallel (define two or more reactions of the same type between the same start and end main nodes). Alternatively/in addition, one can use simulation variables (in simulate and estimate methods) to sum up a set of main nodes to a quantity of interest; this allows to specify arbitrary transition schemes.
Note: net.structure fills many attributes of a net instance that can be accessed afterwards. A lot of information is also contained in the edges of the networkx MultiDiGraph objects (main and hidden layer), accessible via net.net_main.edges(data=True) and net.net_hidden.edges(data=True).
Note: The reaction types below are interpreted in the context of cell differentiation pathways; of course, these reactions (and MemoCell as a whole) can be equally applied to molecular reactions etc.
- Parameters:
net_structure (list of dict) –
Network structure specifying the topology of interlinked reaction modules of different types and their waiting times. Each reaction is specified as a dict in the list, connecting main nodes with start and end, and identified with a symbolic rate_symbol and the reaction_steps on the hidden Markov layer (where reaction_steps is the shape parameter and 1/rate is the mean waiting time for isolated Erlang-type reactions). Available reaction types are
’S -> E’ (cell differentiation),
’S -> S + S’ (symmetric self-renewing division),
’S -> S + E’ (asymmetric division),
’S -> E + E’ (symmetric differentiating division),
’S ->’ (efflux or cell death) and
’-> E’ (influx or birth),
where S corresponds to the start node and E corresponds to the end node. If a reaction does not have S or E (influx and efflux reactions), use ‘env’ (the general environment node) for start and end node, respectively. If multiple reactions shall share the same reaction rate, use the same rate_symbol.
- Return type:
None
Examples
>>> import memocell as me >>> import numpy as np >>> # network with symmetric division reaction >>> # of a 5-step Erlang waiting time and mean >>> # division time 1/l >>> t = [{'start': 'X_t', 'end': 'X_t', >>> 'rate_symbol': 'l', >>> 'type': 'S -> S + S', 'reaction_steps': 5}] >>> net = me.Network('net_div_erl5') >>> net.structure(t)
- create_net_modules_and_identifiers(net_structure)
Creates network modules and identifiers for nodes and rates.
Note: After defining a network via the net.structure method, one can simply access the output at net.net_modules, net.net_nodes_identifier and net.net_rates_identifier for the network instance net.
- Parameters:
net_structure (list of dict) – Network structure specifying the topology and waiting times on the hidden Markov layer. See structure method for more information.
- Returns:
net_modules (list of dict) – Reaction modules of a network with identifier.
net_nodes_identifier (dict) – Z-Identifier for the main nodes.
net_rates_identifier (dict) – theta-Identifier for the symbolic rate parameters.
Examples
>>> import memocell as me >>> # network with symmetric division reaction >>> # of a 5-step Erlang waiting time and mean >>> # division time 1/l >>> t = [{'start': 'X_t', 'end': 'X_t', >>> 'rate_symbol': 'l', >>> 'type': 'S -> S + S', 'reaction_steps': 5}] >>> net = me.Network('net_div_erl5') >>> net.create_net_modules_and_identifiers(t) ([{'module': 'module_0', 'start-end': ('X_t', 'X_t'), 'start-end_ident': ('Z_0', 'Z_0'), 'sym_rate': 'l', 'sym_rate_ident': 'theta_0', 'type': 'S -> S + S', 'module_steps': 5}], {'Z_0': 'X_t'}, {'theta_0': 'l'})
- structure_net_main(net, net_modules)
Updates a networkx MultiDiGraph for the main (observable) layer of a memocell network.
Note: After defining a network via the net.structure method, there is no need to run this. One can simply access the output at net.net_main for the network instance net.
- Parameters:
net (networkx.classes.multidigraph.MultiDiGraph) – An (empty) initialised networkx MultiDiGraph object.
net_modules (list of dict) – Reaction modules of a network with identifier. As generated by the create_net_modules_and_identifiers method and available at net.net_modules.
- Returns:
net_main – The main (observable) layer of a memocell network. Typically available at net.net_main.
- Return type:
networkx.classes.multidigraph.MultiDiGraph
Updates a networkx MultiDiGraph for the hidden (Markov) layer of a memocell network.
Note: After defining a network via the net.structure method, there is no need to run this. One can simply access the output at net.net_hidden for the network instance net.
- Parameters:
net (networkx.classes.multidigraph.MultiDiGraph) – An (empty) initialised networkx MultiDiGraph object.
net_modules (list of dict) – Reaction modules of a network with identifier. As generated by the create_net_modules_and_identifiers method and available at net.net_modules.
- Returns:
net_hidden – The hidden (Markov) layer of a memocell network. Typically available at net.net_hidden.
- Return type:
networkx.classes.multidigraph.MultiDiGraph
- static create_theta_order(net_modules)
Creates an order of the symbolic rate parameters in their theta-identifier form.
Note: After defining a network via the net.structure method, there is no need to run this. One can simply access the output at net.net_theta_symbolic.
Examples
>>> # after defining some network by >>> # net.structure(...) >>> net.net_theta_symbolic ['theta_0', 'theta_1']
- static create_node_order(network_nodes)
Creates an order of the symbolic nodes in the main or hidden layer in their Z-identifier form; includes univariate order and order of pairs.
Note: After defining a network via the net.structure method, there is no need to run this. One can simply access the output at net.net_main_node_order and net.net_hidden_node_order. These attributes are later used to define the moment and variable order for moment and Gillespie simulations, respectively.
Examples
>>> # after defining some network by >>> # net.structure(...) >>> net.net_main_node_order [[('Z_0',), ('Z_1',)], [('Z_0', 'Z_0'), ('Z_0', 'Z_1'), ('Z_1', 'Z_1')]] >>> net.net_hidden_node_order [[('Z_0__centric',), ('Z_0__module_0__0',), ('Z_1__centric',)], [('Z_0__centric', 'Z_0__centric'), ('Z_0__centric', 'Z_0__module_0__0'), ('Z_0__centric', 'Z_1__centric'), ('Z_0__module_0__0', 'Z_0__module_0__0'), ('Z_0__module_0__0', 'Z_1__centric'), ('Z_1__centric', 'Z_1__centric')]]
- static count_net_main_nodes(net_main_nodes)
Counts the number of main nodes for all main node identifiers; identifier for the environment node is always included.
Note: After defining a network via the net.structure method, there is no need to run this. One can simply access the output at net.net_main_node_numbers.
Examples
>>> # after defining some network by >>> # net.structure(...) >>> net.net_main_node_numbers {'Z_env': 0, 'Z_0': 1, 'Z_1': 1}
Counts the number of hidden nodes for all main node identifiers; identifier for the environment node is always included.
Note: After defining a network via the net.structure method, there is no need to run this. One can simply access the output at net.net_hidden_node_numbers.
Examples
>>> # after defining some network by >>> # net.structure(...) >>> net.net_hidden_node_numbers {'Z_env': 0, 'Z_0': 2, 'Z_1': 1}
- static create_module_nodes(start_main_node, end_main_node, module_ident, module_steps)
Creates the hidden layer nodes in Z-identifier form for an Erlang-type reaction/module channel (with module_steps as shape parameter).
Note: There is no need to run this manually, this method is run automatically during the main method structure (as part of structure_net_hidden).
Examples
>>> net = me.Network('my_net') >>> net.create_module_nodes('Z_1', 'Z_2', 'module_2', 5) ['Z_1__centric', 'Z_1__module_2__0', 'Z_1__module_2__1', 'Z_1__module_2__2', 'Z_1__module_2__3', 'Z_2__centric']
- static create_module_reaction_types(module_type, module_steps)
Creates the hidden layer reaction types (Markovian reaction with exponential waiting times) for an Erlang-type reaction/module channel of type module_type (on the non-Markovian observable layer; module_steps as Erlang shape parameter).
Note: For Erlang-type reactions typically all but the last transition are unobservable transitions, as they do not affect the variable state on the observable layer; only the last transition into the absorbing state realises the desired state change on the observable layer.
Note: There is no need to run this manually, this method is run automatically during the main method structure (as part of structure_net_hidden).
Examples
>>> net = me.Network('my_net') >>> net.create_module_reaction_types('S -> S + S', 5) ['S -> E', 'S -> E', 'S -> E', 'S -> E', 'S -> E + E']
- static create_rate_identifiers(net_structure)
Creates the unique theta-identifiers and mapping for the rate parameters.
Note: After defining a network via the net.structure method, there is no need to run this. One can simply access the output at net.net_rates_identifier.
Examples
>>> # after defining some network by >>> # net.structure(...) >>> net.net_rates_identifier {'theta_0': 'd_xy', 'theta_1': 'l_y'}
- static create_node_identifiers(net_structure)
Creates the unique Z-identifiers and mapping for the main nodes.
Note: After defining a network via the net.structure method, there is no need to run this. One can simply access the output at net.net_nodes_identifier.
Examples
>>> # after defining some network by >>> # net.structure(...) >>> net.net_nodes_identifier {'Z_0': 'X_t', 'Z_1': 'Y_t'}
Simulations
The simulation module contains the Simulation class for moment (mean, variance, covariance) and stochastic simulations of any MemoCell network.
- class memocell.simulation.Simulation(network)
Bases:
objectClass for moment (mean, variance, covariance) or stochastic simulations of a given network.
Main method is simulate which requires a network input and additional simulation parameters. For a typical use case, the simulate method is the only method to call; it is a wrapper for the other class methods, see there for more documentation.
- Parameters:
network (memocell.network.Network) – A memocell network object.
- Returns:
sim – Initialised memocell simulation object. Typically, continue with the sim.simulate method to compute moment or stochastic simulations.
- Return type:
- simulate(simulation_type, simulation_variables, theta_values, time_values, initial_values_type, initial_gillespie=None, initial_moments=None, sim_mean_only=False)
Main method of the simulation class. Computes moment (mean, variance, covariance) or stochastic simulations of a given simulation object.
Note: simulation_type specifies whether moment or stochastic; one can also compute mean solutions only (use sim_mean_only=True). Downstream this method uses objects of MomentsSim and GillespieSim classes; available at sim.sim_moments and sim.sim_gillespie, respectively. Results of the simulations are returned and also available at sim.sim_moments_res and sim.sim_gillespie_res. Results can be plotted by sim_mean_plot, sim_variance_plot, sim_covariance_plot and sim_counts_plot plotting methods.
Note: For a given model (network + parameters) moment and stochastic simulations directly correspond to each other (of course, for consistent/identical parameter choices), i.e. the mean, variance and covariance statistics for a set of \(N\) stochastic simulations converge to the exact moment simulations with \(N \rightarrow \infty\).
- Parameters:
simulation_type (str) – Type of the simulation; use ‘moments’ for moment and ‘gillespie’ for stochastic simulations, respectively.
simulation_variables (dict) – Information for simulation variables with key:value=simulation variable:tuple of network main nodes dictionary pairs. The tuple of network main nodes can be used to sum up multiple network nodes to one simulation variable.
theta_values (dict) – Numerical values for the symbolic rate parameters as key:value=parameter:value dictionary pairs.
time_values (1d numpy.ndarray) – Time values to simulate the model with. Moment and stochastic simulations will be returned at these time values. Note that too coarse time values might make some stochastic simulation events invisible; so use a dense array if all events should be seen.
initial_values_type (str) – Initial value type to specify the multinomial distribution scheme of observable variables to the hidden variables (‘synchronous’ or ‘uniform’).
initial_gillespie (dict, optional) – Initial distribution specifying the counts of the main nodes for the stochastic simuation (as one concrete realisation). One might want to change this input, drawing from a distribution, over many successive simulation runs.
initial_moments (dict, optional) – Initial values for the moments of network main nodes for the moment simulation. Means are specified as key:value=(node, ):initial value (float), variances are specified as key:value=(node, node):initial value (float) and covariances are specified as key:value=(node1, node2):initial value (float) dictionary pairs.
sim_mean_only (bool, optional) – Only relevant for the moment simulations (is automatically set to True for stochastic simulations). Specify sim_mean_only=True, if the moment simulations shall be computed for the first moment (means) only. If the moment simulations shall be computed for the first and second moments, specify sim_mean_only=False (default).
- Returns:
sim_moments_res (tuple of numpy.ndarray) – Returned when simulation_type=’moments’. First, second and third tuple indices contain the results for mean, variance and covariance values, respectively, each with shape (#moments, #time_values). The moment order for the variables corresponds to sim.sim_variables_order with identifier mapping as in sim.sim_variables_identifier. Also available at sim.sim_moments_res.
sim_gillespie_res (list of numpy.ndarray) – Returned when simulation_type=’gillespie’. First element of the list contains the time values with shape (#time_values, ). Second element are the simulated counts with shape (#simulation_variables, #time_values); the order of the variables corresponds to sim.sim_variables_identifier. Also available at sim.sim_gillespie_res.
Examples
>>> ### moment simulation >>> import memocell as me >>> import numpy as np >>> # network with symmetric division reaction >>> # of a 5-step Erlang waiting time and mean >>> # division time 1/l >>> net = me.Network('net_div_erl5') >>> net.structure([{'start': 'X_t', 'end': 'X_t', >>> 'rate_symbol': 'l', >>> 'type': 'S -> S + S', 'reaction_steps': 5}]) >>> >>> # all cells start in the first node on the hidden layer >>> initial_values_type = 'synchronous' >>> # mean and variance at t=0 (initial_moments) >>> # (consistent with the distribution for stochastic simulations below) >>> initial_values = {('X_t',): 1.0, ('X_t','X_t'): 0.0} >>> # rate of 0.2 implies 1/0.2=5 mean Erlang waiting time >>> theta_values = {'l': 0.2} >>> time_values = np.linspace(0.0, 10.0, endpoint=True, num=11) >>> # our main node is also our output simulation variable >>> variables = {'X_t': ('X_t', )} >>> >>> sim = me.Simulation(net) >>> sim.simulate('moments', variables, theta_values, time_values, >>> initial_values_type, initial_moments=initial_values) >>> # plots >>> me.plots.sim_mean_plot(sim) >>> me.plots.sim_variance_plot(sim)
>>> ### stochastic simulation >>> import memocell as me >>> import numpy as np >>> # network with symmetric division reaction >>> # of a 5-step Erlang waiting time and mean >>> # division time 1/l >>> net = me.Network('net_div_erl5') >>> net.structure([{'start': 'X_t', 'end': 'X_t', >>> 'rate_symbol': 'l', >>> 'type': 'S -> S + S', 'reaction_steps': 5}]) >>> >>> initial_values_type = 'synchronous' >>> # we draw from a deterministic initial distribution of cells >>> # (consistent with mean and variances from above) >>> initial_values = {'X_t': 1} >>> theta_values = {'l': 0.2} >>> time_values = np.linspace(0.0, 10.0, endpoint=True, num=11) >>> variables = {'X_t': ('X_t', )} >>> >>> sim = me.Simulation(net) >>> sim.simulate('gillespie', variables, theta_values, time_values, >>> initial_values_type, initial_gillespie=initial_values) >>> # plot (one stochastic realisation) >>> me.plots.sim_counts_plot(sim)
- prepare_simulation(simulation_type, simulation_variables, initial_values_type, initial_gillespie, initial_moments, sim_mean_only)
Validates user provided input and triggers preparation steps for moment or stochastic simulations.
Note: This method is automatically run during sim.simulate.
- prepare_time_values(time_values)
Validates user provided time_values input and assigns it to sim.sim_time_values.
Note: This method is automatically run during sim.simulate.
- prepare_theta_values(theta_values_dict)
Validates the user input for the dict of theta_values and returns the ordered array of theta_values (output of process_theta_values, see there).
Note: This method is automatically run during sim.simulate.
- static process_theta_values(theta_values_dict, net_rates_identifier, net_theta_symbolic)
Processes user provided dictionary-type theta_values to an ordered array given the ordering of the symbolic theta parameters in the network.
Note: This method is automatically run during sim.simulate.
Examples
>>> import memocell as me >>> theta_values_dict = {'l': 0.06, 'd': 0.04} >>> net_rates_identifier = {'theta_0': 'd', 'theta_1': 'l'} >>> net_theta_symbolic = ['theta_0', 'theta_1'] >>> me.simulation.Simulation.process_theta_values(theta_values_dict, net_rates_identifier, net_theta_symbolic) array([0.04, 0.06])
- static process_sim_mean_only(simulation_type, mean_only)
Updates the mean_only mode (whether simulation only requires univariate variable identifiers and order) depending on the simulation_type; i.e. for stochastic simulations (‘gillespie’) this returns True by construction.
Note: This method is automatically run during sim.simulate. Afterwards one can simply access the output at sim.sim_mean_only.
- prepare_simulation_variables(simulation_variables)
Validates the user input for the simulation_variables and is a wrapper for create_variables_identifiers and create_variables_order methods (see there).
Note: This method is automatically run during sim.simulate. Afterwards one can simply access the output at sim.sim_variables, sim.sim_variables_identifier and sim.sim_variables_order.
- static create_variables_identifiers(variables)
Creates V-identifiers for the (sorted) user provided simuation variables.
Note: This method is automatically run during sim.simulate. Afterwards one can simply access the output at sim.sim_variables_identifier.
Examples
>>> import memocell as me >>> variables = {'X_t': ('X_t', ), 'Y_t': ('Y_t', )} >>> me.simulation.Simulation.create_variables_identifiers(variables) {'V_0': ('X_t', ('X_t',)), 'V_1': ('Y_t', ('Y_t',))}
- static create_variables_order(variables_identifier, mean_only)
Creates a (sorted) order of the simulation variables in their V-identifier form; univariate only (for first moments, stochastic simulation variables) or with pairs (for second moments) depending on mean_only.
Note: This method is automatically run during sim.simulate. Afterwards one can simply access the output at sim.sim_variables_order.
Examples
>>> import memocell as me >>> variables_identifier = {'V_0': ('X_t', ('X_t',)), 'V_1': ('Y_t', ('Y_t',))} >>> me.simulation.Simulation.create_variables_order(variables_identifier, mean_only=True) [[('V_0',), ('V_1',)], []] >>> me.simulation.Simulation.create_variables_order(variables_identifier, mean_only=False) [[('V_0',), ('V_1',)], [('V_0', 'V_0'), ('V_0', 'V_1'), ('V_1', 'V_1')]]
Data
The data module contains the Data class to handle and load data which subsequently can be used for statistical inference.
- class memocell.data.Data(data_name)
Bases:
objectClass to handle and load data.
Main method is load which will create a ready-to-use data instance with dynamic summary statistics that can subsequently be used for the statistical inference; for the typical use case this is the only method to call. load is a wrapper method for most of the other class methods, so more documentation can be found in the respective individual methods. Further functionalities of this class include stochastic event analysis and Gamma/Erlang fitting to waiting time distributions.
- Parameters:
data_name (str) – A name for the data object.
- Returns:
data – Initialised memocell data object. Typically, continue with the data.load method to add the actual data information.
- Return type:
Examples
>>> # initialise data >>> import memocell as me >>> import numpy as np >>> data = me.Data('my_data') >>> # use load() to fill it >>> # data.load(...)
- load(variables, time_values, count_data, data_type='counts', mean_data=None, var_data=None, cov_data=None, bootstrap_samples=10000, basic_sigma=0.0)
Main method of the data class. Method will load data either from dynamic count data (data_type=’counts’, default) or from already summarised data as mean, variance and covariance statistics (data_type=’summary’). Depending on the selected data type, different inputs are used: 1) variables, time_values and basic_sigma in both cases 2) data_type=’counts’ additionally uses count_data and bootstrap_samples 3) data_type=’summary’ additionally uses mean_data, var_data and cov_data (to load mean_data only is also supported). load defines many data attributes, the resulting summary statistics particularly are accessible via data.data_mean, data.data_variance and data.data_covariance. For more information on how the order of the summary statistics corresponds to variables, see method data.create_data_variable_order.
- Parameters:
variables (list of str) – A list of strings for the data variables.
time_values (1d numpy.ndarray) – Time values corresponding to time dimension of count_data (in case of data_type=’counts’, default) or of mean_data, var_data and cov_data (in case of data_type=’summary’).
count_data (numpy.ndarray or None) – Required input for data_type=’counts’ (default); data object will be computed based on count_data to get summary statistics data_mean, data_variance and data_covariance including standard errors by bootstrapping. Shape of count_data has to be (n, m, t), with the number of repeats n, the number of variables m, the number of time points t. Order of the variables should match with variables.
data_type (str, optional) – String to define the mode how to create the data object; either ‘counts’ (default) or ‘summary’.
mean_data (numpy.ndarray or None, optional) – Required input for data_type=’summary’; mean_data will be directly loaded into data.data_mean. mean_data contains the dynamic mean statistics and standard errors with shape (2, len(data_mean_order), len(time_values)). mean_data[0, :, :] contains the statistics; mean_data[1, :, :] contains the standard errors.
var_data (numpy.ndarray or None, optional) – Optional input for data_type=’summary’; var_data will be directly loaded into data.data_variance. var_data contains the dynamic variance statistics and standard errors with shape (2, len(data_variance_order), len(time_values)). var_data[0, :, :] contains the statistics; var_data[1, :, :] contains the standard errors.
cov_data (numpy.ndarray or None, optional) – Optional input for data_type=’summary’; cov_data will be directly loaded into data.data_covariance. cov_data contains the dynamic covariance statistics and standard errors with shape (2, len(data_covariance_order), len(time_values)). cov_data[0, :, :] contains the statistics; cov_data[1, :, :] contains the standard errors.
bootstrap_samples (int, optional) – Integer to set the number of bootstrap samples (in case of data_type=’counts’, default). Number between bootstrap_samples=10000 (default) and bootstrap_samples=100000 is typically sufficient. For more information, see method data.bootstrap_count_data_to_summary_stats and methods therein.
basic_sigma (float, optional) – Non-negative float value for basic_sigma. For more information, see method data.introduce_basic_sigma.
- Return type:
None
Examples
>>> import memocell as me >>> import numpy as np >>> data = me.Data('my_data') >>> variables = ['A', 'B'] >>> time_values = np.linspace(0.0, 4.0, num=5) >>> count_data = np.array([[[0.0, 0.0, 2.0, 2.0, 4.0], [1.0, 1.0, 1.0, 1.0, 0.0]], >>> [[0.0, 1.0, 2.0, 4.0, 4.0], [1.0, 1.0, 0.0, 0.0, 0.0]], >>> [[0.0, 1.0, 1.0, 4.0, 4.0], [1.0, 1.0, 0.0, 0.0, 0.0]], >>> [[0.0, 0.0, 0.0, 2.0, 4.0], [1.0, 0.0, 0.0, 0.0, 0.0]]]) >>> data.load(variables, time_values, count_data) >>> data.data_mean [[[0. 0.5 1.25 3. 4. ] [1. 0.75 0.25 0.25 0. ]] [[0. 0.25096378 0.41313568 0.50182694 0. ] [0. 0.21847682 0.21654396 0.21624184 0. ]]] >>> data.data_variance [[[0. 0.33333333 0.91666667 1.33333333 0. ] [0. 0.25 0.25 0.25 0. ]] [[0. 0.10247419 0.39239737 0.406103 0. ] [0. 0.13197367 0.13279328 0.13272021 0. ]]] >>> data.data_covariance [[[ 0. 0.16666667 0.25 -0.33333333 0. ]] [[ 0. 0.11379549 0.18195518 0.22722941 0. ]]]
- static process_mean_exist_only(data_type, var_data, cov_data)
Initialise the data_mean_exists_only data attribute. The data_mean_exists_only attribute indicates whether a data object contains potentially higher summary moments (variance and covariance) or the first moments only (means).
- Parameters:
data_type (str) – Type of the data object; either ‘counts’ or ‘summary’.
var_data (numpy.ndarray or None) – Dynamic variance statistics used in data_type=’summary’ mode.
cov_data (numpy.ndarray or None) – Dynamic covariance statistics used in data_type=’summary’ mode.
- Returns:
data_mean_exists_only – Bool to indicate whether data contains mean information only or also higher moments (variance and covariance). Typically available at data.data_mean_exists_only for a data object data.
- Return type:
bool
Examples
>>> me.Data.process_mean_exist_only('counts', None, None) False >>> me.Data.process_mean_exist_only('summary', None, None) True >>> # with some data arrays for variance and covariance (!=None) >>> me.Data.process_mean_exist_only('summary', var_data, cov_data) False
- static convert_none_data_to_empty_array(count_data, mean_data, var_data, cov_data, num_variables, num_time_values)
Convert None-type data input into empty numpy arrays.
Returned data arrays have a zero-sized data repeats or variable order dimension but are otherwise shaped as needed for further internal computations.
- Parameters:
count_data (numpy.ndarray or None) – Dynamic count data used in data_type=’counts’ mode. If None, an empty array will be constructed with shape (0, number of variables, number of time points); if numpy array already, it will be left unchanged.
mean_data (numpy.ndarray or None) – Dynamic mean statistics used in data_type=’summary’ mode. If None, an empty array will be constructed with shape (2, 0, number of time points); if numpy array already, it will be left unchanged.
var_data (numpy.ndarray or None) – Dynamic variance statistics used in data_type=’summary’ mode. If None, an empty array will be constructed with shape (2, 0, number of time points); if numpy array already, it will be left unchanged.
cov_data (numpy.ndarray or None) – Dynamic covariance statistics used in data_type=’summary’ mode. If None, an empty array will be constructed with shape (2, 0, number of time points); if numpy array already, it will be left unchanged.
num_variables (int) – Number of data variables.
num_time_values (int) – Number of time points.
- Returns:
count_data (numpy.ndarray) – Dynamic count data used in data_type=’counts’ mode, possibly an empty array.
mean_data (numpy.ndarray) – Dynamic mean statistics used in data_type=’summary’ mode, possibly an empty array.
var_data (numpy.ndarray) – Dynamic variance statistics used in data_type=’summary’ mode, possibly an empty array.
cov_data (numpy.ndarray) – Dynamic covariance statistics used in data_type=’summary’ mode, possibly an empty array.
- static create_data_variable_order(data_variables, data_mean_exists_only)
Creates objects to define the order of data variables for mean, variance and covariance. Mean and variance order follows the order of the input; covariances are ordered with a priority for what comes first in the input.
- Parameters:
data_variables (list of str) – A list of strings for the data variables.
data_mean_exists_only (bool) – Bool to indicate whether data contains mean information only or also higher moments (variance and covariance). If False, variable order will be computed for mean, variance and covariance; if True, variance and covariance order will be empty.
- Returns:
data_mean_order (list of dict) – Variable order for the data means.
data_variance_order (list of dict) – Variable order for the data variances.
data_covariance_order (list of dict) – Variable order for the data covariances.
Examples
>>> mean_only = False >>> me.Data.create_data_variable_order(['A', 'B', 'C'], mean_only) ([{'variables': 'A', 'summary_indices': 0, 'count_indices': (0,)}, {'variables': 'B', 'summary_indices': 1, 'count_indices': (1,)}, {'variables': 'C', 'summary_indices': 2, 'count_indices': (2,)}], [{'variables': ('A', 'A'), 'summary_indices': 0, 'count_indices': (0, 0)}, {'variables': ('B', 'B'), 'summary_indices': 1, 'count_indices': (1, 1)}, {'variables': ('C', 'C'), 'summary_indices': 2, 'count_indices': (2, 2)}], [{'variables': ('A', 'B'), 'summary_indices': 0, 'count_indices': (0, 1)}, {'variables': ('A', 'C'), 'summary_indices': 1, 'count_indices': (0, 2)}, {'variables': ('B', 'C'), 'summary_indices': 2, 'count_indices': (1, 2)}])
- bootstrap_count_data_to_summary_stats(data_num_time_values, data_mean_order, data_variance_order, data_covariance_order, count_data, bootstrap_samples)
Bootstraps dynamic count data to mean, variance and covariance statistics over time with associated standard errors.
- Parameters:
data_num_time_values (int) – Number of time values; typical usage with data_num_time_values=data.data_num_time_values of a memocell data object.
data_mean_order (list of dict) – Data mean order of a memocell data object; typical usage with data_mean_order=data.data_mean_order.
data_variance_order (list of dict) – Data variance order of a memocell data object; typical usage with data_variance_order=data.data_variance_order.
data_covariance_order (list of dict) – Data covariance order of a memocell data object; typical usage with data_covariance_order=data.data_covariance_order.
count_data (numpy.ndarray) – Numpy array of the count data to bootstrap with shape (n, m, t); the number of repeats n, the number of variables m, the number of time points t. Typical usage with count_data=data.data_counts of a memocell data object. Order of the variables should match with data_mean_order, data_variance_order and data_covariance_order. Time points t should be equal to data_num_time_values.
bootstrap_samples (int) – Integer to set the number of bootstrap samples. Typically bootstrap_samples=100000 is sufficient; typical usage with data.data_bootstrap_samples.
- Returns:
data_mean (numpy.ndarray) – Numpy array of the bootstrapped dynamic mean statistics and standard errors with shape (2, len(data_mean_order), data_num_time_values). data_mean[0, :, :] contains the statistics; data_mean[1, :, :] contains the standard errors.
data_var (numpy.ndarray) – Numpy array of the bootstrapped dynamic variance statistics and standard errors with shape (2, len(data_variance_order), data_num_time_values). data_var[0, :, :] contains the statistics; data_var[1, :, :] contains the standard errors.
data_cov (numpy.ndarray) – Numpy array of the bootstrapped dynamic covariance statistics and standard errors with shape (2, len(data_covariance_order), data_num_time_values). data_cov[0, :, :] contains the statistics; data_cov[1, :, :] contains the standard errors.
- static bootstrapping_mean(sample, num_resamples)
Compute mean and associated standard error of the mean of a given 1-dimensional sample. Standard error is obtained by bootstrapping.
- Parameters:
sample (1d numpy.ndarray) – Sample used to compute mean and standard error of the mean.
num_resamples (int) – Integer to set the number of bootstrap samples. Typically num_resamples=100000 is sufficient.
- Returns:
stat_sample (numpy.float64) – Mean of the sample.
se_stat_sample (numpy.float64) – Standard error of the mean of the sample obtained by bootstrapping.
Examples
>>> me.Data.bootstrapping_mean(np.array([1.0, 2.0, 3.0]), 100000) (2.0, 0.4705758644290084)
- static bootstrapping_variance(sample, num_resamples)
Compute variance and associated standard error of the variance of a given 1-dimensional sample. Standard error is obtained by bootstrapping.
- Parameters:
sample (1d numpy.ndarray) – Sample used to compute variance and standard error of the variance.
num_resamples (int) – Integer to set the number of bootstrap samples. Typically num_resamples=100000 is sufficient.
- Returns:
stat_sample (numpy.float64) – Variance of the sample (ddof=1).
se_stat_sample (numpy.float64) – Standard error of the variance of the sample obtained by bootstrapping.
Examples
>>> me.Data.bootstrapping_variance(np.array([1.0, 2.0, 3.0]), 100000) (1.0, 0.4707057737121149)
- static bootstrapping_covariance(sample1, sample2, num_resamples)
Compute covariance and associated standard error of the covariance of two given 1-dimensional samples. Standard error is obtained by bootstrapping.
- Parameters:
sample1 (1d numpy.ndarray) – Sample used to compute covariance and standard error of the covariance. Same length as sample2.
sample2 (1d numpy.ndarray) – Sample used to compute covariance and standard error of the covariance. Same length as sample1.
num_resamples (int) – Integer to set the number of bootstrap samples. Typically num_resamples=100000 is sufficient.
- Returns:
stat_sample (numpy.float64) – Covariance of the sample (ddof=1).
se_stat_sample (numpy.float64) – Standard error of the covariance of the sample obtained by bootstrapping.
Examples
>>> me.Data.bootstrapping_covariance(np.array([1.0, 2.0, 3.0]), np.array([3.0, 2.0, 1.0]), 10000) (-1.0, 0.47186781968816127)
- static introduce_basic_sigma(basic_sigma, data)
Handles zero-valued or almost zero-valued standard errors by introducing a basic_sigma standard error value for those entries; this is required for numerical stability of the likelihood calculation and allows to remove too strong weights on certain data points. Zero or almost-zero standard errors can occur when bootstrapping data features that are not adequately represented by the actual data sample used in the bootstrap. Important note: In such a situation a basic_sigma can be introduced but the specific value has to be chosen very carefully, as its choice can strongly influence model selection results. One can test different choices on artificial in silico data; or use guidelines based on additive smoothing and the rule of succession.
- Parameters:
basic_sigma (float) – Non-negative float value for basic_sigma.
data (numpy.ndarray) – Data to introduce a basic_sigma to; more specifically, the standard errors (in data[1, :, :]) will be replaced by basic_sigma if they are smaller than basic_sigma.
- Returns:
data – Returns data with standard errors larger or equal to basic_sigma.
- Return type:
numpy.ndarray
Examples
>>> data = np.array([[[0. , 0.02272727, 2.93181818], >>> [1. , 0.97727273, 0.15909091]], >>> [[0. , 0.01998622, 0.40653029], >>> [0. , 0.02294546, 0.05969529]]]) >>> me.Data.introduce_basic_sigma(0.01, data) array([[[0. , 0.02272727, 2.93181818], [1. , 0.97727273, 0.15909091]], [[0.01 , 0.01998622, 0.40653029], [0.01 , 0.02294546, 0.05969529]]])
- static get_number_data_points(data_mean, data_var, data_cov)
Get the number of total data points n used in the inference. This is the number of data points of mean, variance and covariance statistics (default inference) or the number of mean data points only (mean-only modes).
- Parameters:
data_mean (numpy.ndarray) – Numpy array of the dynamic mean statistics and standard errors with shape (2, len(data_mean_order), data_num_time_values).
data_var (numpy.ndarray) – Numpy array of the dynamic variance statistics and standard errors with shape (2, len(data_variance_order), data_num_time_values).
data_cov (numpy.ndarray) – Numpy array of the dynamic covariance statistics and standard errors with shape (2, len(data_covariance_order), data_num_time_values).
- Returns:
data_num_values (int) – Number of data points of mean, variance and covariance statistics.
data_num_values_mean_only (int) – Number of data points of mean statistics.
- events_find_all()
Wrapper method to call a set of event functions on dynamic count data in data.data_counts. Results are accessible via data event attributes; for example data.event_all_first_cell_type_conversion.
- event_find_first_change_from_inital_conditions(well_trace, time_values)
Find events of first change from initial conditions (state at first time point).
- Parameters:
well_trace (numpy.ndarray) – Dynamic variable counts of one well / one stochastic realisation with shape (number of variables, len(time_values)).
time_values (1d numpy.ndarray) – Time values corresponding to variable counts (len(time_values) should match well_trace.shape[1]).
- Returns:
event_bool (bool) – True if an event occurs in this well_trace, False otherwise.
event_tau (float or None) – If event_bool=True, event_tau provides the waiting time when the event occured (according to time_values); None otherwise.
Examples
>>> me.Data.event_find_first_change_from_inital_conditions(data, >>> np.array([[0.0, 0.0, 1.0, 2.0], >>> [0.0, 0.0, 0.0, 1.0]]), >>> np.array([0.0, 1.0, 2.0, 3.0])) (True, 2.0)
- event_find_first_cell_count_increase(well_trace, time_values)
Find events of first cell count increase; calculation based on total cell count compared to initial condition (state at first time point).
- Parameters:
well_trace (numpy.ndarray) – Dynamic variable counts of one well / one stochastic realisation with shape (number of variables, len(time_values)).
time_values (1d numpy.ndarray) – Time values corresponding to variable counts (len(time_values) should match well_trace.shape[1]).
- Returns:
event_bool (bool) – True if an event occurs in this well_trace, False otherwise.
event_tau (float or None) – If event_bool=True, event_tau provides the waiting time when the event occured (according to time_values); None otherwise.
Examples
>>> me.Data.event_find_first_cell_count_increase(data, >>> np.array([[4.0, 4.0, 4.0, 4.0], >>> [1.0, 1.0, 2.0, 3.0]]), >>> np.array([0.0, 1.0, 2.0, 3.0])) (True, 2.0)
- event_find_first_cell_type_conversion(well_trace, time_values)
Find events of first cell type conversion (e.g., differentiation between different cell types / count variables); calculation looks for change of cell state despite maintenance of total cell counts.
- Parameters:
well_trace (numpy.ndarray) – Dynamic variable counts of one well / one stochastic realisation with shape (number of variables, len(time_values)).
time_values (1d numpy.ndarray) – Time values corresponding to variable counts (len(time_values) should match well_trace.shape[1]).
- Returns:
event_bool (bool) – True if an event occurs in this well_trace, False otherwise.
event_tau (float or None) – If event_bool=True, event_tau provides the waiting time when the event occured (according to time_values); None otherwise.
Examples
>>> me.Data.event_find_first_cell_type_conversion(data, >>> np.array([[4.0, 4.0, 3.0, 3.0], >>> [1.0, 1.0, 2.0, 2.0]]), >>> np.array([0.0, 1.0, 2.0, 3.0])) (True, 2.0)
- event_find_first_cell_count_increase_after_cell_type_conversion(well_trace, time_values, diff=True)
Find events of first cell count increase after cell type conversion. Calculation looks first for the conditioned event (cell type conversion, based on event_find_first_cell_type_conversion method). If this event exists, the well_trace and time_values are shortened and then searched for a first cell count increase (based on event_find_first_cell_count_increase method).
- Parameters:
well_trace (numpy.ndarray) – Dynamic variable counts of one well / one stochastic realisation with shape (number of variables, len(time_values)).
time_values (1d numpy.ndarray) – Time values corresponding to variable counts (len(time_values) should match well_trace.shape[1]).
diff (bool) – If diff=True, event_tau will provide the waiting time starting at the conditioned event; otherwise the waiting time is provided starting at the first time point.
- Returns:
event_bool (bool) – True if an event occurs in this well_trace, False otherwise.
event_tau (float or None) – If event_bool=True, event_tau provides the waiting time when the event occured (according to time_values); None otherwise.
Examples
>>> me.Data.event_find_first_cell_count_increase_after_cell_type_conversion(data, >>> np.array([[4.0, 3.0, 3.0, 4.0], >>> [1.0, 2.0, 2.0, 2.0]]), >>> np.array([0.0, 1.0, 2.0, 3.0])) (True, 2.0)
>>> me.Data.event_find_first_cell_count_increase_after_cell_type_conversion(data, >>> np.array([[4.0, 3.0, 3.0, 4.0], >>> [1.0, 2.0, 2.0, 2.0]]), >>> np.array([0.0, 1.0, 2.0, 3.0]), diff=False) (True, 3.0)
- event_find_second_cell_count_increase_after_first_cell_count_increase_after_cell_type_conversion(well_trace, time_values, diff=True)
Find events of a second cell count increase after a first cell count increase and (even more before that) a cell type conversion. Calculation looks first for the conditioned event (first increase and cell type conversion, based on event_find_first_cell_count_increase_after_cell_type_conversion method). If this event exists, the well_trace and time_values are shortened and then searched for another “first” cell count increase (based on event_find_first_cell_count_increase method).
- Parameters:
well_trace (numpy.ndarray) – Dynamic variable counts of one well / one stochastic realisation with shape (number of variables, len(time_values)).
time_values (1d numpy.ndarray) – Time values corresponding to variable counts (len(time_values) should match well_trace.shape[1]).
diff (bool) – If diff=True, event_tau will provide the waiting time starting at the conditioned event; otherwise the waiting time is provided starting at the first time point.
- Returns:
event_bool (bool) – True if an event occurs in this well_trace, False otherwise.
event_tau (float or None) – If event_bool=True, event_tau provides the waiting time when the event occured (according to time_values); None otherwise.
Examples
>>> me.Data.event_find_second_cell_count_increase_after_first_cell_count_increase_after_cell_type_conversion( >>> data, >>> np.array([[4.0, 3.0, 3.0, 4.0, 5.0], >>> [1.0, 2.0, 2.0, 2.0, 2.0]]), >>> np.array([0.0, 1.0, 2.0, 3.0, 4.0])) (True, 1.0)
>>> me.Data.event_find_second_cell_count_increase_after_first_cell_count_increase_after_cell_type_conversion( >>> data, >>> np.array([[4.0, 3.0, 3.0, 4.0, 5.0], >>> [1.0, 2.0, 2.0, 2.0, 2.0]]), >>> np.array([0.0, 1.0, 2.0, 3.0, 4.0]), diff=False) (True, 4.0)
- event_find_third_cell_count_increase_after_first_and_second_cell_count_increase_after_cell_type_conversion(well_trace, time_values, diff=True)
Find events of a third cell count increase after a first and second cell count increase and (even more before that) a cell type conversion. Calculation looks first for the conditioned event (first and second increase and cell type conversion, based on event_find_second_cell_count_increase_after_first_cell_count_increase_after_cell_type_conversion method). If this event exists, the well_trace and time_values are shortened and then searched for another “first” cell count increase (based on event_find_first_cell_count_increase method).
- Parameters:
well_trace (numpy.ndarray) – Dynamic variable counts of one well / one stochastic realisation with shape (number of variables, len(time_values)).
time_values (1d numpy.ndarray) – Time values corresponding to variable counts (len(time_values) should match well_trace.shape[1]).
diff (bool) – If diff=True, event_tau will provide the waiting time starting at the conditioned event; otherwise the waiting time is provided starting at the first time point.
- Returns:
event_bool (bool) – True if an event occurs in this well_trace, False otherwise.
event_tau (float or None) – If event_bool=True, event_tau provides the waiting time when the event occured (according to time_values); None otherwise.
Examples
>>> me.Data.event_find_third_cell_count_increase_after_first_and_second_cell_count_increase_after_cell_type_conversion( >>> data, >>> np.array([[4.0, 3.0, 3.0, 4.0, 5.0, 5.0], >>> [1.0, 2.0, 2.0, 2.0, 2.0, 3.0]]), >>> np.array([0.0, 1.0, 2.0, 3.0, 4.0, 5.0])) (True, 1.0)
>>> me.Data.event_find_third_cell_count_increase_after_first_and_second_cell_count_increase_after_cell_type_conversion( >>> data, >>> np.array([[4.0, 3.0, 3.0, 4.0, 5.0, 5.0], >>> [1.0, 2.0, 2.0, 2.0, 2.0, 3.0]]), >>> np.array([0.0, 1.0, 2.0, 3.0, 4.0, 5.0]), diff=False) (True, 5.0)
- gamma_fit_binned_waiting_times(waiting_times_arr)
Fit a Gamma distribution with parameter theta (theta[0] shape parameter and theta[1] scale parameter) to a data sample of waiting times (waiting_times_arr). Fitting is based on binning the waiting time data first, so this method works equally well for continuous and discrete waiting times (when continuous observation is experimentally not accessible). The bin edges are given by [-np.inf, data.data_time_values, np.inf] where individual bins are left-open, right-closed intervals of the bins edges (of form (a, b]). See utility methods data.gamma_compute_bin_probabilities and data.gamma_negative_multinomial_log_likelihood for more information of code specifics.
- Parameters:
waiting_times_arr (1d numpy.ndarray or list of float or int) – Continuous or discrete waiting times to fit the Gamma distribution to.
- Returns:
theta – Result of the fit of a Gamma(theta) distribution to the data of waiting_times_arr; with theta[0] shape parameter and theta[1] scale parameter of the Gamma distribution.
- Return type:
1d numpy.ndarray
Examples
>>> data = me.Data('data_init') >>> data.data_time_values = [0.0, 1.0, 2.0, 3.0, 4.0, 5.0] >>> theta = [4.0, 0.5] >>> waiting_times_arr = np.random.gamma(theta[0], theta[1], 100000) >>> data.gamma_fit_binned_waiting_times(waiting_times_arr) >>> data.gamma_fit_theta array([4.01966125, 0.49769529])
- gamma_compute_bin_probabilities(theta)
Utility method for fitting Gamma/Erlang distribution to waiting time histogram. This method computes bin probabilities for bins in data.gamma_fit_bins for a given theta, where theta[0] and theta[1] are the shape and scale parameter of the Gamma distribution, respectively. More specifically, the probability of a continuous waiting time tau to be in the bin (a, b] is calculated by p(tau in (a, b]) = F_theta(b) - F_theta(a), with F_theta being the cumulative density function of the Gamma distribution for parameters theta.
- Parameters:
theta (list of float) – Shape (theta[0]) and scale (theta[1]) parameter for the Gamma distribution.
- Returns:
bins_probs – Probabilities to find a continuous waiting time following a Gamma(theta) distribution in the bins given by data.gamma_fit_bins.
- Return type:
1d numpy.ndarray
Examples
>>> data = me.Data('data_init') >>> data_time_values = [0.0, 1.0, 2.0, 3.0, 4.0, 5.0] >>> data.gamma_fit_bins = np.concatenate(([-np.inf], data_time_values, [np.inf])) >>> data.gamma_compute_bin_probabilities([4.0, 0.5]) np.array([0. , 0.14287654, 0.42365334, 0.28226624, 0.10882377, 0.03204406, 0.01033605])
- gamma_negative_multinomial_log_likelihood(theta)
Utility method for fitting Gamma/Erlang distribution to waiting time histogram. This method computes the negative logarithmic likelihood to see a certain waiting time histogram (in the form of data.gamma_fit_bin_inds_all_occ) for given bin probabilities based on a Gamma waiting time distribution with parameters theta. Likelihood calculation is based on a multinomial model.
- Parameters:
theta (list of float) – Shape (theta[0]) and scale (theta[1]) parameter for the Gamma distribution.
- Returns:
neg_log_likelihood – Negative logarithmic likelihood p(D | Gamma(theta)) to see waiting time histogram data D given waiting time distribution Gamma(theta).
- Return type:
float
Estimation
The estimation module contains the Estimation class for statistical inference of models given data based on nested sampling.
- class memocell.estimation.Estimation(est_name, network, data, est_iter=None)
Bases:
objectClass for statistical inference of a model given data.
Main method is estimate which requires model and data input and then computes model and parameter estimates based on nested sampling. For the nested sampling the dynesty package is used. For a typical use case, the estimate method is the only method to call. estimate is a wrapper method for the other class methods, so more documentation for the computations in the background and also how the class attributes are obtained can be found in the respective individual methods.
- Parameters:
est_name (str) – A name for the estimation object.
network (memocell.network.Network) – A memocell network object.
data (memocell.data.Data) – A memocell data object.
est_iter (None or int, optional) – Number to indicate iteration of a set of estimations; default is None.
- Returns:
est – Initialised memocell estimation object. Typically, continue with the est.estimate method to run the actual estimation.
- Return type:
Examples
>>> import memocell as me >>> import numpy as np >>> # initialise some data and a network >>> data = me.Data('my_data') >>> # data.load(...) >>> net = me.Network('my_net') >>> # net.structure(...) >>> # an estimation can then look like this >>> variables = {'X_t': ('X_t', ), 'Y_t': ('Y_t', )} >>> init_val_type = 'synchronous' >>> initial_values = {('X_t',): 1.0, ('Y_t',): 0.0, >>> ('X_t','X_t'): 0.0, ('Y_t','Y_t'): 0.0, ('X_t','Y_t'): 0.0} >>> theta_bounds = {'d': (0.0, 0.15), 'l': (0.0, 0.15)} >>> est = me.Estimation('my_est', net, data) >>> est.estimate(variables, init_val_type, initial_values, theta_bounds)
- estimate(variables, initial_values_type, initial_values, theta_bounds, time_values=None, sim_mean_only=False, fit_mean_only=False, nlive=1000, tolerance=0.01, bound='multi', sample='unif')
Main method of the estimation class. This method computes model estimates and θ parameter estimates of a specified model M and given data D based on nested sampling; main results are the model estimate as logarithmic evidence value p(D | M) and the parameter posterior distribution p(θ | M, D).
These and other results of the estimation run can be accessed through various class attributes, automatically computed by the other methods of this class (see more info at the respective methods). For an estimation object est, the logarithmic model evidence can be accessed at est.bay_est_log_evidence and the estimated parameter posterior at est.bay_est_samples_weighted. Specific point estimates of the parameters are at est.bay_est_params_median and 95% credible intervals are at est.bay_est_params_cred.
If multiple models shall be compared, please use the select_models function of the selection module, which is a wrapper of the estimate method for multiple models. The selection module also contains further model comparison measures such as the model posterior distribution p(M | D) and Bayes factors.
- Parameters:
variables (dict) – Information for simulation variables with key:value=simulation variable:tuple of network main nodes dictionary pairs. The simulation variables have to correspond to the data variables. The tuple of network main nodes can be used to sum up multiple network nodes to one simulation variable.
initial_values_type (str) – Initial value type to specify the multinomial distribution scheme of observable variables to the hidden variables (‘synchronous’ or ‘uniform’).
initial_values (dict) – Initial values for the moments of network main nodes for the simulations during the estimation. Means are specified as key:value=(node, ):initial value (float), variances are specified as key:value=(node, node):initial value (float) and covariances are specified as key:value=(node1, node2):initial value (float) dictionary pairs.
theta_bounds (dict) – Uniform prior bounds of the parameters as key:value=parameter:tuple of (lower bound, upper bound) dictionary pairs.
time_values (None or 1d numpy.ndarray, optional) – Time values to simulate the model with. If None (default), the time values of the data will be used (data.data_time_values). If specified, time_values has to contain at least all time values of the data, but can have more.
sim_mean_only (bool, optional) – If the model simulations shall be computed for the first moment (means) only, specify sim_mean_only=True. If the model simulations shall be computed for the first and second moments, specify sim_mean_only=False (default). If sim_mean_only=True, fit_mean_only is overwritten with True in any case (when higher order moments are not computed, they cannot be fitted).
fit_mean_only (bool, optional) – If the inference shall be based on the first moment (means) only, specify fit_mean_only=True. If the inference shall be based on information from the first and second moments, specify fit_mean_only=False (default). If sim_mean_only=True, fit_mean_only cannot be False and will be overwritten with True (when higher order moments are not computed, they cannot be fitted).
nlive (int, optional) –
Number of live points used for the nested sampling; default is 1000. Passed to dynesty’s top-level NestedSampler; see there for more info.
tolerance (float, optional) –
Tolerance to define the stopping criterion for the nested sampling; default is 0.01. Passed to dynesty’s dlogz argument in the run_nested method; see there for more info.
bound (str, optional) –
Method used to approximately bound the prior for the nested sampling; default is ‘multi’. Passed to dynesty’s top-level NestedSampler; see there for more info.
sample (str, optional) –
Method used to sample uniformly within the likelihood constraint for the nested sampling; default is ‘unif’. Passed to dynesty’s top-level NestedSampler; see there for more info.
- Return type:
None
Examples
>>> # net is a memocell network object >>> # data is a memocell data object >>> # with this, an estimation can look like this >>> variables = {'X_t': ('X_t', ), 'Y_t': ('Y_t', )} >>> init_val_type = 'synchronous' >>> initial_values = {('X_t',): 1.0, ('Y_t',): 0.0, >>> ('X_t','X_t'): 0.0, ('Y_t','Y_t'): 0.0, ('X_t','Y_t'): 0.0} >>> theta_bounds = {'d': (0.0, 0.15), 'l': (0.0, 0.15)} >>> est = me.Estimation('my_est', net, data) >>> est.estimate(variables, init_val_type, initial_values, theta_bounds)
- initialise_estimation(variables, initial_values_type, initial_values, theta_bounds, time_values, sim_mean_only, fit_mean_only, nlive, tolerance, bound, sample)
Initialise and prepare an estimation.
Helper function for the estimate method, arguments are passed over from there (see there for more info).
- run_estimation()
Run the estimation based on nested sampling.
Helper function for the estimate method, which handles the initialisation of the estimation before running it (see there for more info).
- static get_posterior_samples(sampler_result)
Obtain samples for the parameters θ according to the estimated posterior p(θ | M, D), where M is the corresponding model and D is the data. Parameter samples from nested sampling have to be weighted to represent the posterior, so either use the weighted samples directly (samples_weighted) or use the original samples and the weights to weight them manually (samples and weights).
Note: After running a memocell estimation there is no need to run this method, one can simply access the output at est.bay_est_samples, est.bay_est_samples_weighted and est.bay_est_weights for the estimation instance est.
- Parameters:
sampler_result (dynesty.results.Results) – Nested sampling result of a memocell estimation. Typically available at est.bay_nested_sampler_res.
- Returns:
samples (numpy.ndarray) – Unweighted parameter samples from nested sampling with shape (number of samples, number of parameters). Typically available at est.bay_est_samples.
samples_weighted (numpy.ndarray) – Weighted parameter posterior samples from nested sampling with shape (number of samples, number of parameters). Typically available at est.bay_est_samples_weighted.
weights (numpy.ndarray) – Weights to weight unweighted samples to get posterior samples with shape (number of samples,). Typically available at est.bay_est_weights.
Examples
>>> # est is a memocell estimation instance obtained by est.estimate(...) >>> est.get_posterior_samples(est.bay_nested_sampler_res) (array([[0.14791189, 0.14857705], [0.12864247, 0.14920066], [0.13767801, 0.14860021], ..., [0.02801861, 0.07502377], [0.02801369, 0.07502103], [0.02802205, 0.07499796]]), array([[0.02426309, 0.07119228], [0.03200628, 0.06646134], [0.03220339, 0.06802025], ..., [0.02802894, 0.07496195], [0.02800815, 0.07502692], [0.02801369, 0.07502103]]), array([0.00000000e+00, 0.00000000e+00, 0.00000000e+00, ..., 9.93638169e-06, 9.93651290e-06, 9.93662331e-06]))
- static get_credible_interval(samples)
Get the median value and a 95% credible interval for each parameter based on an estimated posterior distribution given by samples. Median is obtained as 50-th percentile and the interval bounds are obtained by 2.5-th and 97.5-th percentiles, respectively.
Note: After running a memocell estimation there is no need to run this method, one can simply access the medians and credible intervals for the parameters with est.bay_est_params_cred for the estimation instance est.
- Parameters:
samples (numpy.ndarray) – Parameter posterior samples with shape (number of samples, number of parameters). A typical choice are the weighted parameter samples from nested sampling at est.bay_est_samples_weighted.
- Returns:
params_cred – Median values and 95% credible interval for each parameter (with the inner tuple order median, 2.5-th perc, 97.5-th perc). Typically available at est.bay_est_params_cred.
- Return type:
tuple of tuple
Examples
>>> # est is a memocell estimation instance obtained by est.estimate(...) >>> samples = est.bay_est_samples_weighted >>> samples.shape (12962, 2) >>> est.get_credible_interval(samples) ((0.0280347396878894, 0.02594988652911552, 0.030144084514338445), (0.07470536719462752, 0.06919784056074933, 0.07955644971380103)) >>> est.bay_est_params_cred ((0.0280347396878894, 0.02594988652911552, 0.030144084514338445), (0.07470536719462752, 0.06919784056074933, 0.07955644971380103))
>>> # minimal example on concrete values >>> samples = np.array([[0.2, 3.4], [0.4, 3.2], [0.25, 3.65]]) >>> samples.shape (3, 2) >>> est.get_credible_interval(samples) ((0.25, 0.2025, 0.3925), (3.4, 3.21, 3.6375))
- static get_model_evidence(sampler_result)
Obtain logarithmic evidence value and its error estimate from the nested sampling result.
Note: After running a memocell estimation there is no need to run this method, one can simply access the logarithmic model evidence and its error with est.bay_est_log_evidence and est.bay_est_log_evidence_error for the estimation instance est.
- Parameters:
sampler_result (dynesty.results.Results) – Nested sampling result of a memocell estimation. Typically available at est.bay_nested_sampler_res.
- Returns:
log_evid_dynesty (float) – Logarithmic evidence of the estimated model. Typically available at est.bay_est_log_evidence.
log_evid_err_dynesty (float) – Error of the logarithmic evidence of the estimated model. Typically available at est.bay_est_log_evidence_error.
Examples
>>> # est is a memocell estimation instance obtained by est.estimate(...) >>> est.get_model_evidence(est.bay_nested_sampler_res) (28.139812540432732, 0.11225503808864087) >>> est.bay_est_log_evidence 28.139812540432732 >>> est.bay_est_log_evidence_error 0.11225503808864087
- static get_maximal_log_likelihood(sampler_result)
Obtain the maximal logarithmic likelihood value from the nested sampling result.
Note: After running a memocell estimation there is no need to run this method, one can simply access the maximal log-likelihood with est.bay_est_log_likelihood_max for the estimation instance est.
- Parameters:
sampler_result (dynesty.results.Results) – Nested sampling result of a memocell estimation. Typically available at est.bay_nested_sampler_res.
- Returns:
logl_max – Maximal logarithmic likelihood value of the estimated model. Typically available at est.bay_est_log_likelihood_max.
- Return type:
float
Examples
>>> # est is a memocell estimation instance obtained by est.estimate(...) >>> est.get_maximal_log_likelihood(est.bay_nested_sampler_res) 35.48531419345989 >>> est.bay_est_log_likelihood_max 35.48531419345989
- static compute_bayesian_information_criterion(num_data, num_params, log_likelihood_max)
Compute the Bayesian information criterion (BIC). Calculation is based on \(\mathrm{BIC} = k \cdot \mathrm{ln}(n) - 2 \, \mathrm{ln}(L_{max})\) where \(k\) is the number of parameters (num_params), \(n\) is the number of data points (num_data) and \(\mathrm{ln}(L_{max})\) is the maximal log-likelihood value (log_likelihood_max).
Note: After running a memocell estimation there is no need to run this method, one can simply access the BIC with est.bay_est_bayesian_information_criterion for the estimation instance est.
- Parameters:
num_data (int or float) – Number of data points. Typically available at est.data_num_values.
num_params (int or float) – Number of estimated parameters. Typically available at est.bay_nested_ndims.
log_likelihood_max (float) – Maximal logarithmic likelihood value of the estimated model. Typically available at est.bay_est_log_likelihood_max.
- Returns:
bic – Bayesian information criterion of the estimated model. Typically available at est.bay_est_bayesian_information_criterion.
- Return type:
float
Examples
>>> # est is a memocell estimation instance obtained by est.estimate(...) >>> est.compute_bayesian_information_criterion( >>> est.data_num_values, >>> est.bay_nested_ndims, >>> est.bay_est_log_likelihood_max) -65.55452798471536 >>> est.bay_est_bayesian_information_criterion -65.55452798471536 >>> est.compute_bayesian_information_criterion(15, 2, -35.49) -65.56389959779558
- static compute_log_evidence_from_bic(bic)
Under certain assumptions one can approximate the logarithmic evidence value with \(\mathrm{ln}(p(D | M)) \approx -\frac{1}{2} \mathrm{BIC}\) where \(M\) is the model, \(D\) is the data and \(\mathrm{BIC}\) is the Bayesian information criterion, see BIC (wiki).
Note: This calculation is more a consistency check and can be accessed with est.bay_est_log_evidence_from_bic after a memocell estimation for est. The more accurate value of the logarithmic evidence from the nested sampling should be preferred for serious tasks (at est.bay_est_log_evidence).
- Parameters:
bic (float) – Bayesian information criterion of the estimated model. Typically available at est.bay_est_bayesian_information_criterion.
- Returns:
log_evidence_from_bic – Logarithmic evidence of the estimated model, approximated from the BIC. Typically available at est.log_evidence_from_bic.
- Return type:
float
Examples
>>> # est is a memocell estimation instance obtained by est.estimate(...) >>> est.compute_log_evidence_from_bic(est.bay_est_bayesian_information_criterion) 32.77726399235768 >>> est.bay_est_log_evidence_from_bic 32.77726399235768 >>> # compare with the more accurate log evid from nested sampling >>> est.bay_est_log_evidence 28.139812540432732
- prior_transform(theta_unit)
Transform parameter values \(\theta\) from the unit hypercube form (as used in the nested sampling) to the original prior space.
For uniform parameter priors (as generally used) this transformation is achieved with the respective lower and upper parameter bounds \([b_l, b_u]\) as \(\theta_{\mathrm{orig}} = \theta_{\mathrm{unit}} (b_u - b_l) + b_l\). Parameter bounds can be accessed with est.net_theta_bounds.
- Parameters:
theta_unit (1d numpy.ndarray) – Values for parameters \(\theta\) in unit hypercube space.
- Returns:
theta_orig – Values for parameters \(\theta\) in original prior space.
- Return type:
1d numpy.ndarray
Examples
>>> # est is a memocell estimation instance obtained by est.estimate(...) >>> est.net_theta_bounds array([[0. , 0.15], [0. , 0.15]]) >>> theta_unit = np.array([0.2, 0.5]) >>> est.prior_transform(theta_unit) array([0.03 , 0.075])
- log_likelihood(theta_values, moment_initial_values, time_values, time_ind, mean_data, var_data, cov_data)
Compute the logarithmic likelihood \(\mathrm{ln}(\mathcal{L(\theta)}) = \mathrm{ln}(p(D | \theta, M))\) for parameter values \(\theta\) of a given model \(M\) and given data \(D\). This method is used in the nested sampling.
The computation is based on the following formula. Under the assumption of \(r\) independent and normally distributed errors, the likelihood function is given by \(\mathcal{L(\theta)} = p(D | \theta, M) = \prod_{i=1}^{r} f_{\mu_i, \sigma_i}(x_i)\), where
\(D = (x_1,\,..., x_r)\) are the data points,
\(\Sigma = (\sigma_1,\,..., \sigma_r)\) are the data standard errors,
\(M_{\theta} = (\mu_1,\,..., \mu_r)\) are the model evaluations and
\(f_{\mu, \sigma}(x) = \frac{1}{\sqrt{2\pi\sigma^2}}\,\mathrm{exp}\big(-\frac{1}{2} \big( \frac{x-\mu}{\sigma} \big)^2\big)\) is the normal density.
The log-likelihood is then given by \(\mathrm{ln}(\mathcal{L(\theta)}) =-\tfrac{1}{2} \sum_{i=1}^{r} \big( \frac{x_i - \mu_i}{\sigma_i} \big)^2 \,+\, \eta\), where \(\eta\) is the model-independent normalisation term \(\eta\) computed as \(\eta = -\tfrac{1}{2} \sum_{i=1}^{r} \mathrm{ln}(2 \pi \sigma_{i}^{2})\); also see at the compute_log_likelihood_norm method.
- Parameters:
theta_values (1d numpy.ndarray) – Values for parameters \(\theta\) in the model order (according to net.net_theta_symbolic via net.net_rates_identifier); passed to a moment simulation method.
moment_initial_values (dict) – Initial values for all moments of the hidden network layer; passed to a moment simulation method. Typically available at est.net_simulation.sim_moments.moment_initial_values; order of the moments corresponds to est.net_simulation.sim_moments.moment_order_hidden.
time_values (1d numpy.ndarray) – Time values for which model simulations are solved; passed to a moment simulation method. After estimation initialisation available at est.net_time_values.
time_ind (slice or tuple of int) – Indexing information to read out model simulations at the time points of the data to allow comparison. After estimation initialisation available at est.net_time_ind.
mean_data (numpy.ndarray) – Data mean statistics and standard errors with shape (2, number of means, number of time points) that have been matched to the model order. mean_data[0, :, :] contains the statistics; mean_data[1, :, :] contains the standard errors. After estimation initialisation available at est.data_mean_values.
var_data (numpy.ndarray) – Data variance statistics and standard errors with shape (2, number of variances, number of time points) that have been matched to the model order. var_data[0, :, :] contains the statistics; var_data[1, :, :] contains the standard errors. After estimation initialisation available at est.data_var_values.
cov_data (numpy.ndarray) – Data covariance statistics and standard errors with shape (2, number of covariances, number of time points) that have been matched to the model order. cov_data[0, :, :] contains the statistics; cov_data[1, :, :] contains the standard errors. After estimation initialisation available at est.data_cov_values.
- Returns:
logl – Computed value of the logarithmic likelihood.
- Return type:
numpy.float64
Examples
>>> # est is a memocell estimation instance obtained by est.estimate(...) >>> theta_values = np.array([0.03, 0.07]) >>> est.log_likelihood(theta_values, >>> est.net_simulation.sim_moments.moment_initial_values, >>> est.net_time_values, est.net_time_ind, >>> est.data_mean_values, est.data_var_values, >>> est.data_cov_values) 32.823084036435795
- static compute_log_likelihood_norm(mean_data, var_data, cov_data, fit_mean_only)
Compute the model-independent normalisation term of the logarithmic likelihood. This value can be computed once and then used for all subsequent evaluations of the log-likelihood.
With data points \(D = (x_1,\,..., x_r)\) and data standard errors \(\Sigma = (\sigma_1,\,..., \sigma_r)\) the normalisation term \(\eta\) is computed as \(\eta = -\tfrac{1}{2} \sum_{i=1}^{r} \mathrm{ln}(2 \pi \sigma_{i}^{2})\); also see more info at the log_likelihood method.
Note: This method will be automatically called during estimation initialisation (est.initialise_estimation).
- Parameters:
mean_data (numpy.ndarray) – Data mean statistics and standard errors with shape (2, number of means, number of time points) that have been matched to the model order. mean_data[0, :, :] contains the statistics; mean_data[1, :, :] contains the standard errors. After estimation initialisation available at est.data_mean_values.
var_data (numpy.ndarray) – Data variance statistics and standard errors with shape (2, number of variances, number of time points) that have been matched to the model order. var_data[0, :, :] contains the statistics; var_data[1, :, :] contains the standard errors. After estimation initialisation available at est.data_var_values.
cov_data (numpy.ndarray) – Data covariance statistics and standard errors with shape (2, number of covariances, number of time points) that have been matched to the model order. cov_data[0, :, :] contains the statistics; cov_data[1, :, :] contains the standard errors. After estimation initialisation available at est.data_cov_values.
fit_mean_only (bool) – Calculate the normalisation for an estimation in fit_mean_only=False or fit_mean_only=True mode.
- Returns:
norm – Model-independent normalisation term of the logarithmic likelihood. Typically available at est.bay_log_likelihood_norm.
- Return type:
numpy.float64
Examples
>>> # est is a memocell estimation instance obtained by est.estimate(...) >>> est.compute_log_likelihood_norm(est.data_mean_values, >>> est.data_var_values, >>> est.data_cov_values, >>> False) 37.04057852140377 >>> est.bay_log_likelihood_norm 37.04057852140377
>>> # example with concrete values >>> mean_data = np.array([[[1., 0.67, 0.37], >>> [0., 0.45, 1.74]], >>> [[0.01, 0.0469473, 0.04838822], >>> [0.01, 0.07188642, 0.1995514]]]) >>> var_data = np.array([[[0., 0.22333333, 0.23545455], >>> [0., 0.51262626, 4.03272727]], >>> [[0.01, 0.01631605, 0.01293869], >>> [0.01, 0.08878719, 0.68612036]]]) >>> cov_data = np.array([[[ 0., -0.30454545, -0.65030303]], >>> [[ 0.01, 0.0303608, 0.06645246]]]) >>> est.compute_log_likelihood_norm(mean_data, var_data, cov_data, False) 37.04057852140377 >>> est.compute_log_likelihood_norm(mean_data, var_data, cov_data, True) 14.028288976285737
- static initialise_time_values(time_values, data_time_values)
Obtain time values that can be used for model simulations (net_time_values).
If time_values=None, the data_time_values are used to obtain net_time_values. If time_values are specified explicitly as array, net_time_values is referenced to them. This method also returns a dense version (net_time_values_dense) and tuple of indices to read out model simulations for time points of the data.
Before this method is used the input should be checked with _validate_time_values_input.
- Parameters:
time_values (None or 1d numpy.ndarray) – Information for time values for model simulations. If None, net_time_values will be referenced to data time values (data.data_time_values). If specified as array, net_time_values=time_values. Note in this case that time_values has to contain at least all time values of the data, but can have more.
data_time_values (1d numpy.ndarray) – Time values of the data. Typically at data.data_time_values of a memocell data object data.
- Returns:
net_time_values (1d numpy.ndarray) – Time values for the model simulations.
net_time_values_dense (1d numpy.ndarray) – A dense version of net_time_values with the same minimal and maximal values but 1000 equally spaced total values.
net_time_ind (slice or tuple of int) – Index information that can be used for numpy array indexing to read out model simulations at the data time points (data_time_values).
Examples
>>> import memocell as me >>> time_values = np.array([0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0]) >>> data_time_values = np.array([1.0, 3.0, 5.0]) >>> net_time_values, __, net_time_ind = me.Estimation.initialise_time_values(time_values, data_time_values) >>> net_time_values np.array([0., 1., 2., 3., 4., 5., 6.]) >>> net_time_ind (1, 3, 5)
- static initialise_net_theta_bounds(theta_symbolic, theta_identifier, theta_bounds)
Initialise uniform prior bounds of parameters \(\theta\).
Note: This method will be automatically called during estimation initialisation (est.initialise_estimation).
- Parameters:
theta_symbolic (list of str) – List of theta identifiers. Each identifier represents a parameter. Typically available at est.net.net_theta_symbolic.
theta_identifier (dict) – Map between parameter and their theta identifiers with key:value=theta identifier:parameter pairs. Typically available at est.net.net_rates_identifier.
theta_bounds (numpy.ndarray) – Uniform prior bounds for the parameters with key:value=parameter:tuple of bounds pairs.
- Returns:
net_theta_bounds – Bounds of the uniform parameter prior with shape (number of parameters, 2). Typically available at est.net_theta_bounds. Lower bounds are at net_theta_bounds[:, 0] and upper bounds are at net_theta_bounds[:, 1].
- Return type:
numpy.ndarray
Examples
>>> # est is a memocell estimation instance obtained by est.estimate(...) >>> est.net.net_theta_symbolic ['theta_0', 'theta_1'] >>> est.net.net_rates_identifier {'theta_0': 'd', 'theta_1': 'l'} >>> theta_bounds = {'d': (0.0, 0.15), 'l': (0.0, 0.15)} >>> est.initialise_net_theta_bounds(est.net.net_theta_symbolic, >>> est.net.net_rates_identifier, theta_bounds) array([[0. , 0.15], [0. , 0.15]])
- static match_data_to_network(sim_variables_order, sim_variables_identifier, data_mean, data_var, data_cov, data_mean_order, data_variance_order, data_covariance_order)
Simulation and data variables have to be one-to-one/bijectively mappable in general. This method then sorts the data in a way that is given by the order of the simulation variables to simplify model/data comparison in the estimation. Sorted/ordered data arrays are typically available at est.data_mean_values, est.data_var_values and est.data_cov_values.
Note: This method will be automatically called during estimation initialisation (est.initialise_estimation).
- Parameters:
sim_variables_order (list of list) – Simulation variable order in terms of the variable identifiers; first element contains the first moment order (means), the second element the second moments order (variances and covariances).
sim_variables_identifier (dict) – Information of mapping between simulation variables and their identifiers.
data_mean (numpy.ndarray) – Dynamic data mean statistics and standard errors with shape (2, len(data_mean_order), len(time_values)). mean_data[0, :, :] contains the statistics; mean_data[1, :, :] contains the standard errors.
data_var (numpy.ndarray) – Dynamic data variance statistics and standard errors with shape (2, len(data_variance_order), len(time_values)). var_data[0, :, :] contains the statistics; var_data[1, :, :] contains the standard errors.
data_cov (numpy.ndarray) – Dynamic data covariance statistics and standard errors with shape (2, len(data_covariance_order), len(time_values)). cov_data[0, :, :] contains the statistics; cov_data[1, :, :] contains the standard errors.
data_mean_order (list of dict) – Variable order for the data means.
data_variance_order (list of dict) – Variable order for the data variances.
data_covariance_order (list of dict) – Variable order for the data covariances.
- Returns:
data_mean_ordered (numpy.ndarray) – Dynamic data mean statistics and standard errors matched to the simulation variable order with shape (2, len(data_mean_order), len(time_values)).
data_var_ordered (numpy.ndarray) – Dynamic data variance statistics and standard errors matched to the simulation variable order with shape (2, len(data_variance_order), len(time_values)).
data_cov_ordered (numpy.ndarray) – Dynamic data covariance statistics and standard errors matched to the simulation variable order with shape (2, len(data_covariance_order), len(time_values)).
Examples
>>> # est is a memocell estimation instance obtained by est.estimate(...) >>> est.net_simulation.sim_variables_order [[('V_0',), ('V_1',)], [('V_0', 'V_0'), ('V_0', 'V_1'), ('V_1', 'V_1')]] >>> est.net_simulation.sim_variables_identifier {'V_0': ('X_t', ('X_t',)), 'V_1': ('Y_t', ('Y_t',))} >>> est.data.data_mean_order [{'variables': 'X_t', 'summary_indices': 0, 'count_indices': (0,)}, {'variables': 'Y_t', 'summary_indices': 1, 'count_indices': (1,)}] >>> est.data.data_variance_order [{'variables': ('X_t', 'X_t'), 'summary_indices': 0, 'count_indices': (0, 0)}, {'variables': ('Y_t', 'Y_t'), 'summary_indices': 1, 'count_indices': (1, 1)}] >>> est.data.data_covariance_order [{'variables': ('X_t', 'Y_t'), 'summary_indices': 0, 'count_indices': (0, 1)}] >>> est.match_data_to_network(est.net_simulation.sim_variables_order, >>> est.net_simulation.sim_variables_identifier, >>> est.data.data_mean, >>> est.data.data_variance, >>> est.data.data_covariance, >>> est.data.data_mean_order, >>> est.data.data_variance_order, >>> est.data.data_covariance_order) (array([[[1. , 0.67 , 0.37 ], [0. , 0.45 , 1.74 ]], [[0.01 , 0.0469473 , 0.04838822], [0.01 , 0.07188642, 0.1995514 ]]]), array([[[0. , 0.22333333, 0.23545455], [0. , 0.51262626, 4.03272727]], [[0.01 , 0.01631605, 0.01293869], [0.01 , 0.08878719, 0.68612036]]]), array([[[ 0. , -0.30454545, -0.65030303]], [[ 0.01 , 0.0303608 , 0.06645246]]]))
- compute_bestfit_simulation()
Compute a simulation of the estimated model with ‘best-fit’ \(\theta\) parameters. The 50-th percentiles (i.e., medians) of the respective 1d marginal posterior distributions are taken as best-fit parameter values.
The simulation result is available at est.net_simulation_bestfit for an estimation instance est. This method is used as plotting helper function; see plots.est_bestfit_mean_plot, plots.est_bestfit_variance_plot and plots.est_bestfit_covariance_plot for more info.
- Return type:
None
- compute_simulation_credible_band(num_sim_ensemble=5000)
Compute a simulation of the estimated model with ‘best-fit’ \(\theta\) parameters and 95% credible band. Credible band and best-fit simulation are computed as percentiles (2.5-th, 50-th and 97.5-th) from an ensemble of simulations with parameter values sampled from the complete parameter posterior.
The best-fit simulation result is available at est.net_simulation_credible_band_bestfit for an estimation instance est; the 95% credible band at est.net_simulation_credible_band. This method is used as plotting helper function; see plots.est_bestfit_mean_plot, plots.est_bestfit_variance_plot and plots.est_bestfit_covariance_plot for more info.
- Parameters:
num_sim_ensemble (int, optional) – Ensemble size (number of simulations) to compute percentiles from.
- Return type:
None
Model Selection
The selection module provides methods to run the statistical inference for a set of models on given data (model selection and parameter estimation). Use the top-level function select_models; select_models then calls required helper functions (such as net_estimation) automatically.
- memocell.selection.select_models(networks, variables, initial_values_types, initial_values, theta_bounds, data, time_values=None, sim_mean_only=False, fit_mean_only=False, nlive=1000, tolerance=0.01, bound='multi', sample='unif', parallel=True, processes=None)
Main function of the selection module for statistical inference of models M given data D (model selection and parameter estimation). Main results are model estimates, in terms of evidence values p(D | M), and parameter estimates p(θ | M, D) for each model. The resulting output can be further processed to obtain model probabilities p(M | D) and model Bayes factors by the compute_model_probabilities and compute_model_bayes_factors functions, respectively (see there).
- Parameters:
networks (list of memocell.Network.network) – A list of memocell network objects to run the statistical inference with. Each network is associated with further information given by variables, initial_values_types, initial_values and theta_bounds to specify a complete model (all lists must have the same length).
variables (list of dict) – List of simulation variables for each network as dictionary. Each network dictionary requires key:value=simulation variable:tuple of network main nodes pairs. The simulation variables have to correspond to the data variables. The tuple of network main nodes can be used to sum up multiple network nodes to one simulation variable.
initial_values_types (list of str) – List of initial value types to specify the multinomial distribution scheme of observable variables to the hidden variables for each network (‘synchronous’ or ‘uniform’).
initial_values (list of dict) – List of initial values for the moments of each network’s main nodes for the simulations during the estimation. For each network dict, means are specified as key:value=(node, ):initial value (float), variances are specified as key:value=(node, node):initial value (float) and covariances are specified as key:value=(node1, node2):initial value (float) dictionary pairs.
theta_bounds (list of dict) – List of uniform prior bounds of the parameters for each network as dictionary. Each network dictionary requires key:value=parameter:tuple of (lower bound, upper bound) pairs.
data (memocell.Data.data) – A memocell data object used in the statistical inference.
time_values (None or list of (1d numpy.ndarray or None), optional) – List of time values to simulate each model with. If None (default), the time values of the data will be used (data.data_time_values). If specified, time_values has to contain at least all time values of the data, but can have more.
sim_mean_only (bool, optional) – If the model simulations shall be computed for the first moment (means) only, specify sim_mean_only=True. If the model simulations shall be computed for the first and second moments, specify sim_mean_only=False (default). If sim_mean_only=True, fit_mean_only is overwritten with True in any case (when higher order moments are not computed, they cannot be fitted).
fit_mean_only (bool, optional) – If the inference shall be based on the first moment (means) only, specify fit_mean_only=True. If the inference shall be based on information from the first and second moments, specify fit_mean_only=False (default). If sim_mean_only=True, fit_mean_only cannot be False and will be overwritten with True (when higher order moments are not computed, they cannot be fitted).
nlive (int, optional) –
Number of live points used for the nested sampling; default is 1000. Passed to dynesty’s top-level NestedSampler; see there for more info.
tolerance (float, optional) –
Tolerance to define the stopping criterion for the nested sampling; default is 0.01. Passed to dynesty’s dlogz argument in the run_nested method; see there for more info.
bound (str, optional) –
Method used to approximately bound the prior for the nested sampling; default is ‘multi’. Passed to dynesty’s top-level NestedSampler; see there for more info.
sample (str, optional) –
Method used to sample uniformly within the likelihood constraint for the nested sampling; default is ‘unif’. Passed to dynesty’s top-level NestedSampler; see there for more info.
parallel (bool, optional) – Run select_models in parallel based on Python’s multiprocessing module with parallel=True (default); the number of parallel processes can be specified with the processes argument. Use parallel=False to run select_models sequentially.
processes (None or int, optional) – If parallel=True, the number of parallel processes used for multiprocessing can be specified here. If parallel=True and processes=None (default) the number of processes is set to the number of physical cores of the system. If parallel=False, the processes argument will be ignored.
- Returns:
est_res – A list of memocell estimation objects for the models.
- Return type:
list of memocell.estimation.Estimation
Examples
>>> # given some memocell data object, models are defined as follows >>> # (here a simple model with symmetric division of one cell type) >>> net1 = me.Network('net_l2') >>> net1.structure([{'start': 'X_t', 'end': 'X_t', >>> 'rate_symbol': 'l', >>> 'type': 'S -> S + S', >>> 'reaction_steps': 2}]) >>> net2 = me.Network('net_l5') >>> net2.structure([{'start': 'X_t', 'end': 'X_t', >>> 'rate_symbol': 'l', >>> 'type': 'S -> S + S', >>> 'reaction_steps': 5}]) >>> nets = [net1, net2] >>> variables = [{'X_t': ('X_t', )}]*2 >>> initial_values_types = ['synchronous']*2 >>> initial_values = [{('X_t', ): 1.0, ('X_t', 'X_t'): 0.0}]*2 >>> theta_bounds = [{'l': (0.0, 0.5)}]*2 >>> # then the inference is started with >>> est_res = me.selection.select_models(nets, variables, initial_values_types, >>> initial_values, theta_bounds, data)
- memocell.selection.net_estimation(input_var)
Helper function to handle the parameter and evidence estimation of a single model as specified by input_var.
Note: Please use the top-level select_models function for all inference tasks; if you wish to estimate a single model only, you can just apply select_models on a one-model list or use estimate from the memocell estimation class.
- Parameters:
input_var (tuple) – Internal tuple structure with various information passed over from select_models function.
- Returns:
est – A memocell estimation object.
- Return type:
- memocell.selection.compute_model_probabilities(estimation_instances, mprior=None)
Compute the posterior probability distribution p(M | D) (probabilities of the models M given the data D) for a list of estimated models. This function is a wrapper of the compute_model_probabilities_from_log_evidences function; see there for more info.
- Parameters:
estimation_instances (list of memocell.estimation.Estimation) – A list of memocell estimation objects for the models; for example obtained by the select_models function.
mprior (None or 1d numpy.ndarray, optional) – Array of prior model probabilities; if mprior=None (default) an uniform model prior will be used: p(M)=1/n where n is the total number of models. If a custom prior is specified, it has to have the same length n as estimation_instances.
- Returns:
probs – Array of posterior model probabilities p(M | D).
- Return type:
1d numpy.ndarray
Examples
>>> # estimation_instances for example by est_res = me.select_models(...) >>> # with estimated log evidences = [4.1, 1.8, 4.4, -1.6] >>> # (see compute_model_probabilities_from_log_evidences for more examples) >>> me.selection.compute_model_probabilities(est_res) array([0.40758705, 0.04086421, 0.55018497, 0.00136377])
- memocell.selection.compute_model_probabilities_from_log_evidences(logevids, mprior=None)
Compute the posterior probability distribution p(M | D) (probabilities of the models M given the data D), assuming uniform model prior (mprior=None, default) or assuming a specified custom model prior mprior. Computation is based on the Bayes theorem \(p(M | D) = \frac{p(D | M) \cdot p(M)}{p(D)}\) where p(M) is the model prior and p(D | M) are the evidence values obtained from the estimation.
Note: While evidence values are model-instrinsic (do not depend on the considered set of models), posterior model probabilities depend on the context of the overall considered set of models. A ‘high’ model probability might not mean much, if all models are bad or if the model space is not exhaustive.
- Parameters:
logevids (1d numpy.ndarray) – Array of logarithmic evidence values of the models.
mprior (None or 1d numpy.ndarray, optional) – Array of prior model probabilities; if mprior=None (default) an uniform model prior will be used: p(M)=1/n where n is the total number of models. If a custom prior is specified, it has to have the same length n as logevids.
- Returns:
probs – Array of posterior model probabilities p(M | D).
- Return type:
1d numpy.ndarray
Examples
>>> # calculation of model probabilities; note that they sum to 1 >>> logevids = np.array([4.1, 1.8, 4.4, -1.6]) >>> probs = me.selection.compute_model_probabilities_from_log_evidences(logevids) >>> probs array([0.40758705, 0.04086421, 0.55018497, 0.00136377]) >>> np.sum(probs) 1.0
>>> # calculation of model probabilities with an uniform model prior >>> # this is the default option (thus we get the same result) >>> mprior = np.array([0.25, 0.25, 0.25, 0.25]) >>> me.selection.compute_model_probabilities_from_log_evidences(logevids, mprior=mprior) array([0.40758705, 0.04086421, 0.55018497, 0.00136377])
>>> # calculation of model probabilities with a non-uniform model prior >>> # (here, the first two models are given a bit more prior probability) >>> mprior = np.array([0.3, 0.3, 0.2, 0.2]) >>> me.selection.compute_model_probabilities_from_log_evidences(logevids, mprior=mprior) array([0.49940188, 0.05006945, 0.44941468, 0.00111399])
- memocell.selection.compute_model_bayes_factors(estimation_instances)
Compute Bayes factors for a list of estimated models. This function is a wrapper of the compute_model_bayes_factors_from_log_evidences function; see there for more info.
- Parameters:
estimation_instances (list of memocell.estimation.Estimation) – A list of memocell estimation objects for the models; for example obtained by the select_models function.
- Returns:
bayesf – Array of Bayes factors of the estimated models.
- Return type:
1d numpy.ndarray
Examples
>>> # estimation_instances for example by est_res = me.select_models(...) >>> # with estimated log evidences = [4.1, 1.8, 4.4, -1.6] >>> me.selection.compute_model_bayes_factors(est_res) array([ 1.34985881, 13.46373804, 1. , 403.42879349])
- memocell.selection.compute_model_bayes_factors_from_log_evidences(logevids)
Compute Bayes factors from an array of logarithmic evidences. A Bayes factor K for a model M is computed by an evidence ratio as \(K = p(D | M_b) / p(D | M)\) where D is the data and Mb is the overall best model of the given evidences. Bayes factors can be used to classify models into different levels of support given by the data (e.g., see further info here).
Note: While evidence values are model-instrinsic (do not depend on the considered set of models), a model Bayes factor depends on the context of the overall considered set of models. A ‘good’ Bayes factor might not mean much, if all models are bad or if the model space is not exhaustive.
- Parameters:
logevids (1d numpy.ndarray) – Array of logarithmic evidence values of the models.
- Returns:
bayesf – Array of Bayes factors of the models.
- Return type:
1d numpy.ndarray
Examples
>>> logevids = np.array([4.1, 1.8, 4.4, -1.6]) >>> me.selection.compute_model_bayes_factors_from_log_evidences(logevids) array([ 1.34985881, 13.46373804, 1. , 403.42879349])
Simulation Library
The simulation library contains the GillespieSim and MomentsSim helper classes for stochastic and moment (mean, variance, covariance) simulations, respectively.
- class memocell.simulation_lib.sim_gillespie.GillespieSim(net)
Bases:
objectHelper class for stochastic simulations.
In the typical situation, use the top-level Simulation class with its main method simulate (simulation_type=’gillespie’). The GillespieSim class and its methods are then called automatically.
Note: The stochastic simulations in MemoCell are based on a Gillespie simulation (first-reaction method) on the hidden layer and produce correct stochastic realisations of the (possibly non-Markovian) stochastic process on the observable/main layer.
- prepare_gillespie_simulation(variables_order, variables_identifier)
Prepares Gillespie simulation by creating symbolic attributes for the propensity and node state update schemes and the summation indices. Specifically define_gill_fct, create_node_summation_indices and create_variables_summation_indices downstream methods are called, see there for more info.
- gillespie_simulation(theta_values_order, time_values, initial_values_main, initial_values_type)
Top-level method in the GillespieSim class to produce one stochastic simulation for a memocell model.
This method wraps downstream methods to update the user provided theta rate parameters and initial values, compute a stochastic simulation on the hidden layer (by run_gillespie_first_reaction_method), and sum up the hidden layer numbers to obtain observable/main node and simulation variable numbers.
Note: In the typical situation, use the top-level Simulation class with its main method simulate; this method is then run automatically.
- static run_gillespie_first_reaction_method(time_arr_expl, initial_values, reac_rates_exec, prop_arr_eval, num_reacs, reac_event_fct)
Runs the Gillespie first-reaction method on the hidden Markov layer. Makes use of metaprogramming to be applicaple for any memocell network, see also define_gill_fct and related methods.
Note: The current implementation is not yet designed for performance. Current main purpose is to obtain correct stochastic realisations of the process, in match with the exact moment simulations. Faster Gillespie methods exists and might be integrated in the future.
- static create_theta_numeric_exec(net_theta_symbolic, theta_values_order)
Creates an executable string to assign numerical values for theta rate parameters; used in run_gillespie_first_reaction_method and top-level gillespie_simulation.
Note: This method is automatically run during sim.simulate in simulation_type=’gillespie’. Afterwards one can access the output at sim.sim_gillespie.sim_gill_theta_numeric_exec.
Examples
>>> # with a memocell simulation instance sim >>> sim.sim_gillespie.sim_gill_theta_numeric_exec 'theta_0, theta_1 = (0.04, 0.06)'
- process_initial_values(initial_values_main, initial_values_type)
Processes the user provided initial distribution (for the main node numbers) to obtain an initial distribution on the hidden layer, depending on the multinomial schemes initial_values_type=’synchronous’ or initial_values_type=’uniform’.
Note: Synchronous initial distribution type means that main node numbers are placed into the each main node’s ‘centric’ hidden layer node. Uniform initial distribution types means that main node numbers are distributed randomly (uniform) among all its hidden layer nodes. For this the respective helper methods process_initial_values_synchronous or process_initial_values_uniform are called.
Note: The distribution types have their moment simulation equivalents, see there also for background theory on the employed multinomial sampling scheme.
- static process_initial_values_synchronous(hidden_node_order, initial_values_main, net_nodes_identifier)
Helper method for process_initial_values; returns the hidden layer initial distribution under ‘synchronous’ initial_values_type.
- static process_initial_values_uniform(main_node_order, hidden_node_order, initial_values_main, net_nodes_identifier, summation_indices_nodes)
Helper method for process_initial_values; returns the hidden layer initial distribution under ‘uniform’ initial_values_type.
- static define_gill_fct(node_order, net_hidden_edges, create_propensities_update_str, create_node_state_update_str)
Defines update schemes for the propensities and node states for the Gillespie simulation on the hidden layer.
Note: Based on metaprogramming to construct schemes for any user provided network, using eval() and exec() methods. Uses create_propensities_update_str and create_node_state_update_str downstream methods; outputs are then used in run_gillespie_first_reaction_method and top-level gillespie_simulation.
Note: This method is automatically run during sim.simulate in simulation_type=’gillespie’. Afterwards one can access the outputs at sim_gill_propensities_eval, sim_gill_reaction_update_exec, sim_gill_reaction_update_str and sim_gill_reaction_number.
- static create_propensities_update_str(reaction_rate_symbolic, start_node_ind)
Helper method for define_gill_fct; returns propensity update scheme for a single hidden layer reaction.
- static create_node_state_update_str(reac_ind_count, start_node_ind, end_node_ind, reaction_type, start_module_centric_node_ind, end_module_centric_node_ind)
Helper method for define_gill_fct; returns node update scheme for a single hidden layer reaction.
- static exact_interpolation(simulation, time_array_explicit)
Reads out a stochastic simulation at explicitly given time values (time_array_explicit).
Note: The Gillespie simulation provides a time array of random numbers at which the reactions happened; this method reads out the state of the system for an explicitly given time array of fixed values. This means: If time_array_explicit is too coarse, one might miss some reaction events and/or a reaction is seen much later than it actually occured; this behavior is desired, when the biological experiments are also restricted to read-outs at certain time points only. In contrast, one can choose a very dense time array to report all reaction events.
Examples
>>> import memocell as me >>> import numpy as np >>> # made-up stochastic simulation times and numbers >>> time_array_gill = np.array([0.00, 0.12, 4.67, 8.01, 10.00]) >>> nodes_array_gill = np.array([[1.0, 2.0, 3.0, 4.0, 5.0]]) >>> simulation = [time_array_gill, nodes_array_gill] >>> # explicit times to read-out the simulation >>> time_array_explicit = np.array([0.0, 2.0, 4.0, 6.0, 8.0, 10.0]) >>> me.simulation_lib.sim_gillespie.GillespieSim.exact_interpolation(simulation, time_array_explicit) [array([ 0., 2., 4., 6., 8., 10.]), array([[1., 2., 2., 3., 3., 5.]])]
- static create_node_summation_indices(main_node_order, hidden_node_order)
Creates a list of tuples with hidden node indices that are needed to sum up each main node; index ordering as in sim.sim_gillespie.net_main_node_order_without_env and sim.sim_gillespie.net_hidden_node_order_without_env with Z-identifier sim.net.net_nodes_identifier.
Note: This method is automatically run during sim.simulate in simulation_type=’gillespie’. Afterwards one can access the output at sim.sim_gillespie.summation_indices_nodes.
Examples
>>> # with a memocell simulation instance sim >>> # e.g., the first four hidden nodes provide the first main node >>> sim.sim_gillespie.summation_indices_nodes [(0, 1, 2, 3), (4, 5, 6)]
- static summation_main_nodes(sum_tuple_main, sim_sol_expl_time)
Computes the stochastic simulation numbers for the main nodes from the Gillespie-simulated numbers of the hidden nodes; returns list of time values and simulated number of the main nodes (with order as in sim.sim_gillespie.net_main_node_order_without_env and sim.net.net_nodes_identifier).
Note: This method is automatically run during sim.simulate in simulation_type=’gillespie’.
- static create_variables_summation_indices(variables_order, variables_identifier, net_main_node_order_without_env, net_nodes_identifier)
Creates a list of tuples with main node indices that are needed to sum up each simulation variable; index ordering as in sim.sim.sim_variables_order and sim.sim_gillespie.net_main_node_order_without_env with V-identifier sim.sim.sim_variables_identifier and Z-identifier sim.net.net_nodes_identifier, respectively.
Note: This method is automatically run during sim.simulate in simulation_type=’gillespie’. Afterwards one can access the output at sim.sim_gillespie.summation_indices_variables.
Examples
>>> # with a memocell simulation instance sim >>> # e.g., main nodes and simulation variables are the same >>> sim.sim_gillespie.summation_indices_variables [(0,), (1,)]
- static summation_simulation_variables(sum_tuple_variables, sim_sol_expl_time_main_nodes)
Computes the stochastic simulation numbers for the simulation variables from the simulated numbers of the main nodes; returns list of time values and simulated number of the variables (with order as in sim.sim.sim_variables_order and sim.sim.sim_variables_identifier).
Note: This method is automatically run during sim.simulate in simulation_type=’gillespie’.
The simulation library contains the GillespieSim and MomentsSim class for stochastic and moment (mean, variance, covariance) simulations, respectively.
- class memocell.simulation_lib.sim_moments.MomentsSim(net)
Bases:
objectHelper class for moment (mean, variance, covariance) simulations.
In the typical situation, use the top-level Simulation class with its main method simulate (simulation_type=’moments’). The MomentsSim class and its methods are then called automatically.
Note: The moment simulations in MemoCell are obtained as solutions of a differential equation system for the moments of all hidden Markov layer variables. They are the exact counterpart to mean, variance and covariance statistics as computed approximately from a set of stochastic simulations.
- prepare_moment_simulation(variables_order, variables_identifier, mean_only)
Prepares the moment simulation by automatic symbolic derivation of the differential equations for the moments on the hidden Markov layer and the summation indices to assemble them to mean, variance and covariance solutions for the main/observable layer and simulation variables. See the called downstream methods for more info.
- moment_simulation(theta_values_order, time_values, initial_values_main, initial_values_type)
Top-level method in the MomentsSim class to compute moment (mean, variance, covariance) simulations.
This method wraps downstream methods to update the user provided theta rate parameters and initial values, compute the moment simulation on the hidden layer and sum up the hidden layer moments to obtain mean, variance and covariance solutions on the observable/main layer and for the simulation variables (by run_moment_ode_system).
Note: In the typical situation, use the top-level Simulation class with its main method simulate; this method is then run automatically.
- run_moment_ode_system(moment_initial_values, time_values, theta_values)
Integrates the differential equation system for the hidden layer moments and sums them up to obtain mean, variance and covariance solutions for the simulation variables (and main/observable layer nodes).
Note: Based on the moment_system and summation indices, obtained by automatic symbolic derivation and metaprogramming for any MemoCell model (e.g., as in executed in prepare_moment_simulation); see downstream methods for more info. Integration itself is done numerically by scipy’s odeint method.
- process_initial_values(initial_values_main, initial_values_type)
Processes the user provided initial values for the moments (mean, variance, covariance for the main nodes) to obtain the initial moments on the hidden layer, depending on the multinomial schemes initial_values_type=’synchronous’ or initial_values_type=’uniform’.
Note: Synchronous initial distribution type means that main node numbers are placed into the each main node’s ‘centric’ hidden layer node. Uniform initial distribution types means that main node numbers are distributed randomly (uniform) among all its hidden layer nodes. For this the respective helper methods process_initial_values_synchronous or process_initial_values_uniform are called.
Note: The distribution types have their stochastic simulation equivalents, see there.
Note: Below we summarise the theory for distributing the hidden layer from main layer moments + initial value type. In MemoCell the stochastic processes on the main/observable layer are the sum of their stochastic processes on the hidden Markov layer. For each cell type \(i\) its stochastic cell numbers follow \(W^{(i)}_t = \sum_{j \in \{1,...,u_i\} } W^{(i,j)}_t\), where \(u_i\) is the number of all hidden variables for that cell type. For the initial distribution (\(t=0\)) we have \(N=W^{(i)}_0\) (random) cells to distribute for each cell type and hence sample the hidden variables from a multinomial distribution, i.e. \((..., W^{(i,j)}_0,...) \sim \mathrm{MultiNomial}(p_1,...,p_j,...,p_{u_i}; N)\), where the \(p_j\) probabilities allow to encode any hidden layer distribution scheme. Using theorems of conditional and total expectation, variance and covariance one can then obtain the following relations, connecting the main/observable and the hidden layer:
The mean of the \(j\)-th hidden variable of cell type \(i\) is \(\mathrm{E}(W^{(i,j)}_0) = p_j\,\mathrm{E}(W^{(i)}_0)\),
the variance of the \(j\)-th hidden variable of cell type \(i\) is \(\mathrm{Var}(W^{(i,j)}_0) = p_j (1-p_j)\,\mathrm{E}(W^{(i)}_0) + p_j^2 \,\mathrm{Var}(W^{(i)}_0)\),
the covariance between the \(j\)-th and \(l\)-th hidden variables (\(j≠l\)) of cell type \(i\) is \(\mathrm{Cov}(W^{(i,j)}_0, W^{(i,l)}_0) = - p_j p_l \,\mathrm{E}(W^{(i)}_0) + p_j p_l \,\mathrm{Var}(W^{(i)}_0)\) and
the covariance between the \(j\)-th hidden variable of cell type \(i\) and the \(l\)-th hidden variable of cell type \(k\) (\(i≠k\)) is \(\mathrm{Cov}(W^{(i,j)}_0, W^{(k,l)}_0) = p_j p_l \, \mathrm{Cov}(W^{(i)}_0, W^{(k)}_0)\).
As MemoCell works with (mixed/factorial) moments one readily rephrases the above relations and obtains
The mean remains \(\mathrm{E}(W^{(i,j)}_0) = p_j\,\mathrm{E}(W^{(i)}_0)\),
the second factorial moment is \(\mathrm{E}\big(W^{(i,j)}_0 (W^{(i,j)}_0-1)\big) = p_j^2\,\big(\mathrm{Var}(W^{(i)}_0)+ \mathrm{E}(W^{(i)}_0)^2 - \mathrm{E}(W^{(i)}_0)\big)\),
the second mixed moment within cell type \(i\) (\(j≠l\)) is \(\mathrm{E}(W^{(i,j)}_0 W^{(i,l)}_0) = p_j p_l\big(\mathrm{Var}(W^{(i)}_0)+ \mathrm{E}(W^{(i)}_0)^2 - \mathrm{E}(W^{(i)}_0)\big)\) and
the second mixed moment for different cell types \(i≠k\) is \(\mathrm{E}(W^{(i,j)}_0 W^{(k,l)}_0) = p_j p_l\big(\mathrm{Cov}(W^{(i)}_0, W^{(k)}_0) + \mathrm{E}(W^{(i)}_0) \, \mathrm{E}(W^{(k)}_0) \big)\).
These ideas allow to implement any distribution scheme for the hidden layer from observable information and the given multinomial type (\(p_j\) parameters). Specifically, MemoCell currently implements a uniform and a synchronous type, i.e.
uniform initial value type: \(p_j=1/u_i\) (for each cell type \(i\)) and
synchronous initial value type: \(p_1=1\) (‘centric’ node), else \(p_j=0\), \(j>1\) (for each cell type).
- process_initial_values_uniform(moment_order_hidden, initial_values_main, net_nodes_identifier, net_hidden_node_numbers)
Helper method for process_initial_values (see there also); returns the hidden layer initial moment values under ‘uniform’ initial_values_type. Order of the moments follows sim.sim_moments.moment_order_hidden in their Z-identifier form.
- process_initial_values_synchronous(moment_order_hidden, initial_values_main, net_nodes_identifier)
Helper method for process_initial_values (see there also); returns the hidden layer initial moment values under ‘synchronous’ initial_values_type. Order of the moments follows sim.sim_moments.moment_order_hidden in their Z-identifier form.
- static compute_initial_moment_first(p_j, mean_i)
Helper method for process_initial_values (see there and related); computes mean of a hidden variable with multinomial parameter p_j and main/observable mean mean_i for cell type i.
- static compute_initial_moment_second_factorial(p_j, mean_i, var_i)
Helper method for process_initial_values (see there and related); computes second factorial moment of a hidden variable with multinomial parameter p_j and main/observable mean mean_i and variance var_i for cell type i.
- static compute_initial_moment_second_mixed_ii(p_j, p_l, mean_i, var_i)
Helper method for process_initial_values (see there and related); computes second mixed moment of hidden variables with multinomial parameters p_j, p_l and main/observable mean mean_i and variance var_i of the same cell type i.
- static compute_initial_moment_second_mixed_ik(p_j, p_l, mean_i, mean_k, cov_ik)
Helper method for process_initial_values (see there and related); computes second mixed moment of hidden variables with multinomial parameters p_j, p_l and main/observable means mean_i, mean_k and covariance cov_ik for different cell types i, k.
- static derive_moment_order_main(node_order, mean_only)
Derives the order of the moments for the main/observable nodes in their Z-identifier form. Contains two lists, with the first moments (means) and second moments (for variance, covariance), respectively; the second moments are left out if mean_only=True.
Note: Based on net.net_main_node_order, with the difference that the environmental node is removed for the moments. Hence original names are also available via sim.net.net_nodes_identifier.
Note: This method is automatically run during sim.simulate in simulation_type=’moments’ and during estimate and select_models methods. The output is typically available via sim_moments.moment_order_main.
Examples
>>> # with a memocell simulation instance sim >>> sim.sim_moments.moment_order_main [[('Z_0',), ('Z_1',)], [('Z_0', 'Z_0'), ('Z_0', 'Z_1'), ('Z_1', 'Z_1')]]
Derives the order of the moments for the hidden nodes in their Z-identifier form. Contains two lists, with the first moments (means) and second moments (for variance, covariance), respectively; the second moments are left out if mean_only=True.
Note: Based on net.net_hidden_node_order, with the difference that the environmental node is removed for the moments. Hence original names are also available via sim.net.net_nodes_identifier.
Note: This method is automatically run during sim.simulate in simulation_type=’moments’ and during estimate and select_models methods. The output is typically available via sim_moments.moment_order_hidden.
Note: The order for the hidden moments defines the order of the initial values (moment_initial_values) and the differential equation system (moment_eqs and moment_system). The tuples below correspond to means E(X) (e.g., (‘Z_0__centric’,)), second factorial moments E(X(X-1)) (e.g., (‘Z_0__centric’, ‘Z_0__centric’)) and second mixed moments E(XY) (e.g., (‘Z_0__centric’, ‘Z_1__centric’)), respectively.
Examples
>>> # with a memocell simulation instance sim >>> sim.sim_moments.moment_order_hidden [[('Z_0__centric',), ('Z_1__centric',), ('Z_1__module_1__0',)], [('Z_0__centric', 'Z_0__centric'), ('Z_0__centric', 'Z_1__centric'), ('Z_0__centric', 'Z_1__module_1__0'), ('Z_1__centric', 'Z_1__centric'), ('Z_1__centric', 'Z_1__module_1__0'), ('Z_1__module_1__0', 'Z_1__module_1__0')]]
- derive_moment_pde(net_edges, z_aux_vars, z_aux_vars_dict, theta_repl_dict)
Derives the partial differential equation (PDE) for the probability generating function G, providing a complete description of the stochastic process on the hidden Markov layer.
Note: This method goes over all edges (net_edges) to accumulate the overall PDE from the single-reaction building blocks (see helper methods below). The PDE description is equivalent to (and can be derived from) the description in terms of the master equation. Taking derivatives for the auxiliary z-variables and applying the limit operator provide the differential equation system for the moments (see derive_moment_eqs method).
Note: This method is automatically run during sim.simulate in simulation_type=’moments’ and during estimate and select_models methods. The output is typically available via sim_moments.moment_pde.
Examples
>>> # with a memocell simulation instance sim >>> sim.sim_moments.moment_pde '1.0 * theta_0_q * (z_1__centric_q - z_0__centric_q) * diff(G(z_0__centric_q, z_1__centric_q), z_0__centric_q)'
- static derive_moment_eqs(moment_pde, moment_order_hidden, moment_aux_vars, moment_aux_vars_dict, theta_replaceables)
Derives the ordinary differential equation (ODE) system for the moments on the hidden Markov layer in its symbolic form.
Note: This is applied theory surrounding the probability generating function G and Markov jump processes described by a PDE in G (or equivalent a master equation). Derivatives of the PDE (as in moment_pde) in the auxiliary variables z and application of the limit z→1 lead to linear differential equations for the moments (mean/first moment and second factorial and mixed moments). Also, this ODE system is closed for the linear reaction types available in MemoCell, i.e. the resulting equations are exact. These operations are automatically conducted for any MemoCell model, making use of sympy; downstream, they are processed to a callable class method (moment_system), available for numerical integration (run_moment_ode_system and top-level moment_simulation).
Note: This method is automatically run during sim.simulate in simulation_type=’moments’ and during estimate and select_models methods. The output is typically available via sim_moments.moment_eqs; the order of the equations corresponds to sim_moments.moment_order_hidden.
Examples
>>> # with a memocell simulation instance sim >>> # theta rate parameters and moment vector m >>> sim.sim_moments.moment_eqs ['-1.0*m[0]*theta[0]', '1.0*m[0]*theta[0]', '-2.0*m[2]*theta[0]', '1.0*m[2]*theta[0] - 1.0*m[3]*theta[0]', '2.0*m[3]*theta[0]']
- static get_indices_for_solution_readout(moment_order_main, moment_order_hidden)
Creates array objects with indices for the hidden layer moments (first, second mixed and factorial) that allow to sum them up for solutions of mean, variance and covariance of the main/observable nodes.
Note: This method is automatically run during sim.simulate in simulation_type=’moments’ and during estimate and select_models methods. The output is typically available at moment_mean_ind, moment_var_ind_intra, moment_var_ind_inter and moment_cov_ind and used in run_moment_ode_system and top-level moment_simulation; index values correspond to moment_order_hidden.
Examples
>>> # with a memocell simulation instance sim >>> sim.sim_moments.moment_mean_ind array([[(0,)], [(1, 2)]], dtype=object) >>> sim.sim_moments.moment_var_ind_intra array([[(3,), (0,)], [(6, 8), (1, 2)]], dtype=object) >>> sim.sim_moments.moment_var_ind_inter array([[(), (), ()], [(7,), (1,), (2,)]], dtype=object) >>> sim.sim_moments.moment_cov_ind array([[(4, 5), (0, 0), (1, 2)]], dtype=object)
- static get_indices_for_moment_readout(variables_order, variables_identifier, moment_order_main, net_nodes_identifier)
Creates array objects with indices for the mean, variance and covariance solutions on the main/observable layer that allow to sum them up for mean, variance and covariance solutions for the simulation variables.
Note: This method is automatically run during sim.simulate in simulation_type=’moments’ and during estimate and select_models methods. The output is typically available at variables_mean_ind, variables_var_ind and variables_cov_ind and used in run_moment_ode_system and top-level moment_simulation.
Examples
>>> # with a memocell simulation instance sim >>> sim.sim_moments.variables_mean_ind array([[(0,)], [(1,)]], dtype=object) >>> sim.sim_moments.variables_var_ind array([[(0,), ()], [(1,), ()]], dtype=object) >>> sim.sim_moments.variables_cov_ind array([[(), (0,)]], dtype=object)
- setup_executable_moment_eqs_template(moment_eqs, use_jit=True)
Converts the user-specific moment_eqs (list of str) to a callable class method, applying the metaprogramming principle via eval() and exec() methods.
Note: This method is automatically run during sim.simulate in simulation_type=’moments’ and during estimate and select_models methods. The output is typically available via the moment_system method.
Note: Numba’s @jit (just-in-time compilation) decorator is added (default) to allow fast computation of the differential moment equation system during simulation and estimation runs; the first moment_system call might then take a bit longer due to the compilation.
- set_moment_eqs_from_template_after_reset()
Reevaluates the differential moment equation system when it is in ‘reset’ mode. The moment_system is typically overwritten with ‘reset’ when the selection module was used to be able to save and load objects with the pickle package.
- static reac_type_to_end(z_start, z_end, rate, z_vars)
Returns the PDE building block of the probability generating function \(G\) for a ‘-> E’ reaction (e.g., cell influx or birth) on the hidden Markov layer.
Note: The PDE building block can be derived from (and is equivalent to) the master equation for this hidden layer reaction. For the reaction \(∅ → W^{(i,j)}\) with the hidden variable \(W^{(i,j)}\) in state \(w^{(i,j)}\) and auxiliary variable \(z_{(i,j)}\) we have the master equation
\(\partial_t \, p(w^{(i,j)}, t) = \lambda \, p(w^{(i,j)}-1, t) - \lambda \, p(w^{(i,j)}, t)\)
which is equivalent to the PDE for \(G\)
\(\partial_t \, G(z,t) = \lambda \, (z_{(i,j)} - 1) \, G(z,t)\),
where \(z\) is representative for all auxiliary variables and \(\lambda\) is the transition rate.
- static reac_type_start_to(z_start, z_end, rate, z_vars)
Returns the PDE building block of the probability generating function \(G\) for a ‘S ->’ reaction (e.g., efflux or cell death) on the hidden Markov layer.
Note: The PDE building block can be derived from (and is equivalent to) the master equation for this hidden layer reaction. For the reaction \(W^{(i,j)} → ∅\) with the hidden variable \(W^{(i,j)}\) in state \(w^{(i,j)}\) and auxiliary variable \(z_{(i,j)}\) we have the master equation
\(\partial_t \, p(w^{(i,j)}, t) = \lambda \, (w^{(i,j)}+1) \, p(w^{(i,j)}+1, t) - \lambda \, w^{(i,j)} \, p(w^{(i,j)}, t)\)
which is equivalent to the PDE for \(G\)
\(\partial_t \, G(z,t) = \lambda \, (1 - z_{(i,j)}) \, \partial_{z_{(i,j)}} G(z,t)\),
where \(z\) is representative for all auxiliary variables and \(\lambda\) is the single-cell transition rate.
- static reac_type_start_to_end(z_start, z_end, rate, z_vars)
Returns the PDE building block of the probability generating function \(G\) for a ‘S -> E’ reaction (e.g., cell differentiation or hidden transitions) on the hidden Markov layer.
Note: The PDE building block can be derived from (and is equivalent to) the master equation for this hidden layer reaction. For the reaction \(W^{(i,j)} → W^{(k,l)}\) with different hidden variables \(W^{(i,j)}\), \(W^{(k,l)}\) in states \(w^{(i,j)}\), \(w^{(k,l)}\) and auxiliary variables \(z_{(i,j)}\), \(z_{(k,l)}\), respectively, we have the master equation
\(\partial_t \, p(w^{(i,j)}, w^{(k,l)}, t) = \lambda \, (w^{(i,j)}+1) \, p(w^{(i,j)}+1, w^{(k,l)}-1, t) - \lambda \, w^{(i,j)} \, p(w^{(i,j)}, w^{(k,l)}, t)\)
which is equivalent to the PDE for \(G\)
\(\partial_t \, G(z,t) = \lambda \, (z_{(k,l)} - z_{(i,j)}) \, \partial_{z_{(i,j)}} G(z,t)\),
where \(z\) is representative for all auxiliary variables and \(\lambda\) is the single-cell transition rate. The reaction is hidden for \(i=k\) (same cell type) and realises a differentiation event for \(i≠k\) (different cell types).
- static reac_type_start_to_start_end(z_start, z_end, rate, z_vars)
Returns the PDE building block of the probability generating function \(G\) for a ‘S -> S + E’ reaction (e.g., asymmetric cell division) on the hidden Markov layer.
Note: The PDE building block can be derived from (and is equivalent to) the master equation for this hidden layer reaction. For the reaction \(W^{(i,j)} → W^{(i,j)} + W^{(k,l)}\) with different hidden variables \(W^{(i,j)}\), \(W^{(k,l)}\) in states \(w^{(i,j)}\), \(w^{(k,l)}\) and auxiliary variables \(z_{(i,j)}\), \(z_{(k,l)}\), respectively, we have the master equation
\(\partial_t \, p(w^{(i,j)}, w^{(k,l)}, t) = \lambda \, w^{(i,j)} \, p(w^{(i,j)}, w^{(k,l)}-1, t) - \lambda \, w^{(i,j)} \, p(w^{(i,j)}, w^{(k,l)}, t)\)
which is equivalent to the PDE for \(G\)
\(\partial_t \, G(z,t) = \lambda \, (z_{(k,l)}\,z_{(i,j)} - z_{(i,j)}) \, \partial_{z_{(i,j)}} G(z,t)\),
where \(z\) is representative for all auxiliary variables and \(\lambda\) is the single-cell transition rate.
- static reac_type_start_to_start_start(z_start, z_end, rate, z_vars)
Returns the PDE building block of the probability generating function \(G\) for a ‘S -> S + S’ reaction (e.g., symmetric self-renewing cell division) on the hidden Markov layer.
Note: The PDE building block can be derived from (and is equivalent to) the master equation for this hidden layer reaction. For the reaction \(W^{(i,j)} → W^{(i,j)} + W^{(i,j)}\) with the hidden variable \(W^{(i,j)}\) in state \(w^{(i,j)}\) and auxiliary variable \(z_{(i,j)}\) we have the master equation
\(\partial_t \, p(w^{(i,j)}, t) = \lambda \, (w^{(i,j)}-1) \, p(w^{(i,j)}-1, t) - \lambda \, w^{(i,j)} \, p(w^{(i,j)}, t)\)
which is equivalent to the PDE for \(G\)
\(\partial_t \, G(z,t) = \lambda \, (z_{(i,j)}^2 - z_{(i,j)}) \, \partial_{z_{(i,j)}} G(z,t)\),
where \(z\) is representative for all auxiliary variables and \(\lambda\) is the single-cell transition rate.
- static reac_type_start_to_end_end(z_start, z_end, rate, z_vars)
Returns the PDE building block of the probability generating function \(G\) for a ‘S -> E + E’ reaction (e.g., symmetric differentiating cell division) on the hidden Markov layer.
Note: The PDE building block can be derived from (and is equivalent to) the master equation for this hidden layer reaction. For the reaction \(W^{(i,j)} → W^{(k,l)} + W^{(k,l)}\) with different hidden variables \(W^{(i,j)}\), \(W^{(k,l)}\) in states \(w^{(i,j)}\), \(w^{(k,l)}\) and auxiliary variables \(z_{(i,j)}\), \(z_{(k,l)}\), respectively, we have the master equation
\(\partial_t \, p(w^{(i,j)}, w^{(k,l)}, t) = \lambda \, (w^{(i,j)}+1) \, p(w^{(i,j)}+1, w^{(k,l)}-2, t) - \lambda \, w^{(i,j)} \, p(w^{(i,j)}, w^{(k,l)}, t)\)
which is equivalent to the PDE for \(G\)
\(\partial_t \, G(z,t) = \lambda \, (z_{(k,l)}^2 - z_{(i,j)}) \, \partial_{z_{(i,j)}} G(z,t)\),
where \(z\) is representative for all auxiliary variables and \(\lambda\) is the single-cell transition rate.
- static reac_type_start_to_end1_end2(z_start, z_end_1, z_end_2, rate, z_vars)
Returns the PDE building block of the probability generating function \(G\) for a ‘S -> E1 + E2’ reaction (e.g., asymmetric differentiating cell division) on the hidden Markov layer.
Note: The PDE building block can be derived from (and is equivalent to) the master equation for this hidden layer reaction. For the reaction \(W^{(i,j)} → W^{(k,l)} + W^{(r,s)}\) with different hidden variables \(W^{(i,j)}\), \(W^{(k,l)}\), \(W^{(r,s)}\) in states \(w^{(i,j)}\), \(w^{(k,l)}\), \(w^{(r,s)}\) and auxiliary variables \(z_{(i,j)}\), \(z_{(k,l)}\), \(z_{(r,s)}\), respectively, we have the master equation
\(\partial_t \, p(w^{(i,j)}, w^{(k,l)}, w^{(r,s)}, t) = \lambda \, (w^{(i,j)}+1) \, p(w^{(i,j)}+1, w^{(k,l)}-1, w^{(r,s)}-1, t) - \lambda \, w^{(i,j)} \, p(w^{(i,j)}, w^{(k,l)}, w^{(r,s)}, t)\)
which is equivalent to the PDE for \(G\)
\(\partial_t \, G(z,t) = \lambda \, (z_{(k,l)} \, z_{(r,s)} - z_{(i,j)}) \, \partial_{z_{(i,j)}} G(z,t)\),
where \(z\) is representative for all auxiliary variables and \(\lambda\) is the single-cell transition rate.
Plotting
- memocell.plots.net_main_plot(net, node_settings=None, edge_settings=None, layout=None, show=True, save=None)
Plot the main layer of a network.
- Parameters:
net (memocell.network.Network) – A memocell network object.
node_settings (dict of dict, optional) – Optional label and color settings for the network nodes. Colors require hex literals and labels require html (not latex) format.
edge_settings (dict of dict, optional) – Optional label and color settings for the network edges. Colors require hex literals and labels require html (not latex) format.
layout (None or str, optional) – Specify layout engine for computing node positions; e.g. ‘dot’, ‘neato’, ‘circo’.
show (bool, optional) – Network plot is shown if show=True.
save (None or str, optional) – Provide a path to save the plot.
- Returns:
pdot – Pydot network main layer object.
- Return type:
pydot.Dot
Plot the hidden layer of a network.
- Parameters:
net (memocell.network.Network) – A memocell network object.
node_settings (dict of dict, optional) – Optional label and color settings for the network nodes. Colors require hex literals and labels require html (not latex) format.
edge_settings (dict of dict, optional) – Optional label and color settings for the network edges. Colors require hex literals and labels require html (not latex) format.
layout (None or str, optional) – Specify layout engine for computing node positions; e.g. ‘dot’, ‘neato’, ‘circo’.
show (bool, optional) – Network plot is shown if show=True.
save (None or str, optional) – Provide a path to save the plot.
- Returns:
pdot – Pydot network hidden layer object.
- Return type:
pydot.Dot
- memocell.plots.sim_counts_plot(sim, settings=None, xlabel='Time', xlim=None, xlog=False, ylabel='Counts', ylim=None, ylog=False, show=True, save=None)
Plot the variable count for a given simulation.
- Parameters:
sim (memocell.simulation.Simulation) – A memocell simulation object that contains a Gillespie simulation.
settings (dict of dict, optional) – Optional label and color settings for the simulation variables.
xlabel (str, optional) – Label for x-axis.
xlim (None or tuple of floats, optional) – Specify x-axis limits.
xlog (bool, optional) – Logarithmic x-axis if xlog=True.
ylabel (str, optional) – Label for y-axis.
ylim (None or tuple of floats, optional) – Specify y-axis limits.
ylog (bool, optional) – Logarithmic y-axis if ylog=True.
show (bool, optional) – Plot is shown if show=True.
save (None or str, optional) – Provide a path to save the plot.
- Returns:
fig (matplotlib.figure.Figure)
axes (list or array of matplotlib.axes)
- memocell.plots.sim_mean_plot(sim, settings=None, xlabel='Time', xlim=None, xlog=False, ylabel='Mean', ylim=None, ylog=False, show=True, save=None)
Plot the mean (or expectation) dynamics of a given simulation.
- Parameters:
sim (memocell.simulation.Simulation) – A memocell simulation object that contains a moment simulation.
settings (dict of dict, optional) – Optional label and color settings for the mean traces.
xlabel (str, optional) – Label for x-axis.
xlim (None or tuple of floats, optional) – Specify x-axis limits.
xlog (bool, optional) – Logarithmic x-axis if xlog=True.
ylabel (str, optional) – Label for y-axis.
ylim (None or tuple of floats, optional) – Specify y-axis limits.
ylog (bool, optional) – Logarithmic y-axis if ylog=True.
show (bool, optional) – Plot is shown if show=True.
save (None or str, optional) – Provide a path to save the plot.
- Returns:
fig (matplotlib.figure.Figure)
axes (list or array of matplotlib.axes)
- memocell.plots.sim_variance_plot(sim, settings=None, xlabel='Time', xlim=None, xlog=False, ylabel='Variance', ylim=None, ylog=False, show=True, save=None)
Plot the variance dynamics of a given simulation.
- Parameters:
sim (memocell.simulation.Simulation) – A memocell simulation object that contains a moment simulation.
settings (dict of dict, optional) – Optional label and color settings for the variance traces.
xlabel (str, optional) – Label for x-axis.
xlim (None or tuple of floats, optional) – Specify x-axis limits.
xlog (bool, optional) – Logarithmic x-axis if xlog=True.
ylabel (str, optional) – Label for y-axis.
ylim (None or tuple of floats, optional) – Specify y-axis limits.
ylog (bool, optional) – Logarithmic y-axis if ylog=True.
show (bool, optional) – Plot is shown if show=True.
save (None or str, optional) – Provide a path to save the plot.
- Returns:
fig (matplotlib.figure.Figure)
axes (list or array of matplotlib.axes)
- memocell.plots.sim_covariance_plot(sim, settings=None, xlabel='Time', xlim=None, xlog=False, ylabel='Covariance', ylim=None, ylog=False, show=True, save=None)
Plot the covariance dynamics of a given simulation.
- Parameters:
sim (memocell.simulation.Simulation) – A memocell simulation object that contains a moment simulation.
settings (dict of dict, optional) – Optional label and color settings for the covariance traces.
xlabel (str, optional) – Label for x-axis.
xlim (None or tuple of floats, optional) – Specify x-axis limits.
xlog (bool, optional) – Logarithmic x-axis if xlog=True.
ylabel (str, optional) – Label for y-axis.
ylim (None or tuple of floats, optional) – Specify y-axis limits.
ylog (bool, optional) – Logarithmic y-axis if ylog=True.
show (bool, optional) – Plot is shown if show=True.
save (None or str, optional) – Provide a path to save the plot.
- Returns:
fig (matplotlib.figure.Figure)
axes (list or array of matplotlib.axes)
- memocell.plots.data_mean_plot(data, settings=None, xlabel='Time', xlim=None, xlog=False, ylabel='Mean', ylim=None, ylog=False, show=True, save=None)
Plot the mean statistics with standard errors of a data object.
- Parameters:
data (memocell.data.Data) – A memocell data object.
settings (dict of dict, optional) – Optional label and color settings for the mean traces.
xlabel (str, optional) – Label for x-axis.
xlim (None or tuple of floats, optional) – Specify x-axis limits.
xlog (bool, optional) – Logarithmic x-axis if xlog=True.
ylabel (str, optional) – Label for y-axis.
ylim (None or tuple of floats, optional) – Specify y-axis limits.
ylog (bool, optional) – Logarithmic y-axis if ylog=True.
show (bool, optional) – Plot is shown if show=True.
save (None or str, optional) – Provide a path to save the plot.
- Returns:
fig (matplotlib.figure.Figure)
axes (list or array of matplotlib.axes)
- memocell.plots.data_variance_plot(data, settings=None, xlabel='Time', xlim=None, xlog=False, ylabel='Variance', ylim=None, ylog=False, show=True, save=None)
Plot the variance statistics with standard errors of a data object (if variance data are available).
- Parameters:
data (memocell.data.Data) – A memocell data object.
settings (dict of dict, optional) – Optional label and color settings for the variance traces.
xlabel (str, optional) – Label for x-axis.
xlim (None or tuple of floats, optional) – Specify x-axis limits.
xlog (bool, optional) – Logarithmic x-axis if xlog=True.
ylabel (str, optional) – Label for y-axis.
ylim (None or tuple of floats, optional) – Specify y-axis limits.
ylog (bool, optional) – Logarithmic y-axis if ylog=True.
show (bool, optional) – Plot is shown if show=True.
save (None or str, optional) – Provide a path to save the plot.
- Returns:
fig (matplotlib.figure.Figure)
axes (list or array of matplotlib.axes)
- memocell.plots.data_covariance_plot(data, settings=None, xlabel='Time', xlim=None, xlog=False, ylabel='Covariance', ylim=None, ylog=False, show=True, save=None)
Plot the covariance statistics with standard errors of a data object (if covariance data are available).
- Parameters:
data (memocell.data.Data) – A memocell data object.
settings (dict of dict, optional) – Optional label and color settings for the covariance traces.
xlabel (str, optional) – Label for x-axis.
xlim (None or tuple of floats, optional) – Specify x-axis limits.
xlog (bool, optional) – Logarithmic x-axis if xlog=True.
ylabel (str, optional) – Label for y-axis.
ylim (None or tuple of floats, optional) – Specify y-axis limits.
ylog (bool, optional) – Logarithmic y-axis if ylog=True.
show (bool, optional) – Plot is shown if show=True.
save (None or str, optional) – Provide a path to save the plot.
- Returns:
fig (matplotlib.figure.Figure)
axes (list or array of matplotlib.axes)
- memocell.plots.data_hist_variables_plot(data, time_ind, normalised=False, settings=None, xlabel='Variable counts', xlim=None, xlog=False, ylabel='Frequency', ylim=None, ylog=False, show=True, save=None)
Histograms of variable counts of a data object at a given time point.
- Parameters:
data (memocell.data.Data) – A memocell data object.
time_ind (int) – Time index to take variable counts from; the time point is then given by data.data_time_values[time_ind].
normalised (bool, optional) – Histograms are normalised if normalised=True.
settings (dict of dict, optional) – Optional label, color and opacity settings for histogram variables.
xlabel (str, optional) – Label for x-axis.
xlim (None or tuple of floats, optional) – Specify x-axis limits.
xlog (bool, optional) – Logarithmic x-axis if xlog=True.
ylabel (str, optional) – Label for y-axis.
ylim (None or tuple of floats, optional) – Specify y-axis limits.
ylog (bool, optional) – Logarithmic y-axis if ylog=True.
show (bool, optional) – Plot is shown if show=True.
save (None or str, optional) – Provide a path to save the plot.
- Returns:
fig (matplotlib.figure.Figure)
axes (list or array of matplotlib.axes)
- memocell.plots.data_hist_waiting_times_plot(data, data_events, normalised=True, gamma_fit=True, settings=None, xlabel='Waiting time', xlim=None, xlog=False, ylabel='Frequency', ylim=None, ylog=False, show=True, save=None)
Histogram of waiting times of a data object for a given event.
- Parameters:
data (memocell.data.Data) – A memocell data object.
data_events (list of tuples with (bool, float or None)) – Data events to take waiting times from; e.g. data.event_all_first_cell_type_conversion. The boolean value of the tuples indicates whether an event happened or not. If not, the second value should be None. If yes, the second value should be a float of the waiting time for that event to happen.
normalised (bool, optional) – Histograms are normalised if normalised=True.
gamma_fit (bool, optional) – If gamma_fit=True, fit a Gamma distribution to binned waiting time data. Fitted shape and scale parameters are printed; fitted shape (=steps n) and mean waiting time (theta = shape * scale) are shown in the legend label.
settings (dict of dict, optional) – Optional label, color, opacity and gamma_color settings for the histogram.
xlabel (str, optional) – Label for x-axis.
xlim (None or tuple of floats, optional) – Specify x-axis limits.
xlog (bool, optional) – Logarithmic x-axis if xlog=True.
ylabel (str, optional) – Label for y-axis.
ylim (None or tuple of floats, optional) – Specify y-axis limits.
ylog (bool, optional) – Logarithmic y-axis if ylog=True.
show (bool, optional) – Plot is shown if show=True.
save (None or str, optional) – Provide a path to save the plot.
- Returns:
fig (matplotlib.figure.Figure)
axes (list or array of matplotlib.axes)
- memocell.plots.data_variable_scatter_plot(data, time_ind, variable1, variable2, settings=None, xlabel=None, xlim=None, xlog=False, ylabel=None, ylim=None, ylog=False, show=True, save=None)
Scatter plot between counts of two variables of a data object at a given time point.
- Parameters:
data (memocell.data.Data) – A memocell data object.
time_ind (int) – Time index to take variable counts from; the time point is then given by data.data_time_values[time_ind].
variable1 (str) – First variable of the scatter plot (x-axis).
variable2 (str) – Second variable of the scatter plot (y-axis).
settings (dict, optional) – Optional label, color and opacity settings for the scatter plot.
xlabel (str, optional) – Label for x-axis.
xlim (None or tuple of floats, optional) – Specify x-axis limits.
xlog (bool, optional) – Logarithmic x-axis if xlog=True.
ylabel (str, optional) – Label for y-axis.
ylim (None or tuple of floats, optional) – Specify y-axis limits.
ylog (bool, optional) – Logarithmic y-axis if ylog=True.
show (bool, optional) – Plot is shown if show=True.
save (None or str, optional) – Provide a path to save the plot.
- Returns:
fig (matplotlib.figure.Figure)
axes (list or array of matplotlib.axes)
- memocell.plots.selection_plot(estimation_instances, est_type='evidence', settings=None, show_errorbar=True, xlabel=None, xlim=None, xlog=False, ylabel=None, ylim=None, ylog=False, show=True, save=None)
Scatter plot between counts of two variables of a data object at a given time point.
- Parameters:
estimation_instances (list of memocell.estimation.Estimation) – A list of memocell estimation objects.
est_type (str, optional) – Specify type of selection plot. est_type=’evidence’ (default) to plot logarithmic evidence values from nested sampling. est_type=’likelihood’ to plot the maximal logarithmic likelihood value from nested sampling. est_type=’bic’ to plot the Bayesian information criterion based on the maximal logarithmic likelihood (from nested sampling), number of data points n and number of parameters k (BIC = ln(n) k - 2 ln(Lmax)). est_type=’evidence_from_bic’ to plot logarithmic evidence values (approximated from the BIC values via evidence ≈ exp(-BIC / 2)).
settings (dict of dict, optional) – Optional label and color for the estimation instances.
show_errorbar (bool, optional) – Error bars are shown if show_errorbar=True (default). Only available for est_type=’evidence’.
xlabel (str, optional) – Label for x-axis.
xlim (None or tuple of floats, optional) – Specify x-axis limits.
xlog (bool, optional) – Logarithmic x-axis if xlog=True.
ylabel (str, optional) – Label for y-axis.
ylim (None or tuple of floats, optional) – Specify y-axis limits.
ylog (bool, optional) – Logarithmic y-axis if ylog=True.
show (bool, optional) – Plot is shown if show=True.
save (None or str, optional) – Provide a path to save the plot.
- Returns:
fig (matplotlib.figure.Figure)
axes (list or array of matplotlib.axes)
- memocell.plots.est_runplot(estimation, color='limegreen', show=True, save=None)
Wrapper to runplot of dynesty nested sampling module.
- Parameters:
estimation (memocell.estimation.Estimation) – A memocell estimation object.
color (str, optional) – Optional color passed to dynesty’s runplot.
show (bool, optional) – Plot is shown if show=True.
save (None or str, optional) – Provide a path to save the plot.
- Returns:
fig (matplotlib.figure.Figure)
axes (list or array of matplotlib.axes)
- memocell.plots.est_traceplot(estimation, settings=None, show=True, save=None)
Wrapper to traceplot of dynesty nested sampling module.
- Parameters:
estimation (memocell.estimation.Estimation) – A memocell estimation object.
settings (dict of dict, optional) – Optional labels for parameters.
show (bool, optional) – Plot is shown if show=True.
save (None or str, optional) – Provide a path to save the plot.
- Returns:
fig (matplotlib.figure.Figure)
axes (list or array of matplotlib.axes)
- memocell.plots.est_parameter_plot(estimation, settings=None, show_errorbar=True, xlabel=None, xlim=None, ylabel='Parameter values', ylim=None, ylog=False, show=True, save=None)
Plot summary of 1-dimensional marginal parameter posteriors as median and 95% credible interval (2.5th and 97.5th percentiles; error bars).
- Parameters:
estimation (memocell.estimation.Estimation) – A memocell estimation object.
settings (dict of dict, optional) – Optional label and color for parameters.
show_errorbar (bool, optional) – Error bars are shown if show_errorbar=True (default). Error bars show 95% credible intervals based on 2.5th and 97.5th percentiles for each parameter’s 1-dimensional marginal posterior distribution.
xlabel (str, optional) – Label for x-axis.
xlim (None or tuple of floats, optional) – Specify x-axis limits.
ylabel (str, optional) – Label for y-axis.
ylim (None or tuple of floats, optional) – Specify y-axis limits.
ylog (bool, optional) – Logarithmic y-axis if ylog=True.
show (bool, optional) – Plot is shown if show=True.
save (None or str, optional) – Provide a path to save the plot.
- Returns:
fig (matplotlib.figure.Figure)
axes (list or array of matplotlib.axes)
- memocell.plots.est_corner_plot(estimation, settings=None, show=True, save=None)
Wrapper to corner plot of corner module; visualisation of the parameter posterior distribution by all 2-dimensional and 1-dimensional marginals.
- Parameters:
estimation (memocell.estimation.Estimation) – A memocell estimation object.
settings (dict of dict, optional) – Optional labels for parameters.
show (bool, optional) – Plot is shown if show=True.
save (None or str, optional) – Provide a path to save the plot.
- Returns:
fig (matplotlib.figure.Figure)
axes (list or array of matplotlib.axes)
- memocell.plots.est_corner_kernel_plot(estimation, settings=None, show=True, save=None)
Wrapper to corner plot of dynesty nested sampling module; visualisation of the parameter posterior distribution by all 2-dimensional and 1-dimensional marginals (kernel smoothed).
- Parameters:
estimation (memocell.estimation.Estimation) – A memocell estimation object.
settings (dict of dict, optional) – Optional labels for parameters.
show (bool, optional) – Plot is shown if show=True.
save (None or str, optional) – Provide a path to save the plot.
- Returns:
fig (matplotlib.figure.Figure)
axes ((list or array of) matplotlib.axes)
- memocell.plots.est_corner_weight_plot(estimation, settings=None, show=True, save=None)
Wrapper to cornerpoints plot of dynesty nested sampling module.
- Parameters:
estimation (memocell.estimation.Estimation) – A memocell estimation object.
settings (dict of dict, optional) – Optional labels for parameters.
show (bool, optional) – Plot is shown if show=True.
save (None or str, optional) – Provide a path to save the plot.
- Returns:
fig (matplotlib.figure.Figure)
axes ((list or array of) matplotlib.axes)
- memocell.plots.est_chains_plot(estimation, weighted=True, xlabel='Sample iteration', xlim=None, xlog=False, ylabel='Parameter values', ylim=None, ylog=False, show=True, save=None)
Plot (weighted) parameter samples over the iterations of the nested sampling run.
- Parameters:
estimation (memocell.estimation.Estimation) – A memocell estimation object.
weighted (bool, optional) – Plot parameter samples of the nested sampling run weighted (default) or unweighted. Samples need to be weighted to correspond to the actual parameter posterior.
xlabel (str, optional) – Label for x-axis.
xlim (None or tuple of floats, optional) – Specify x-axis limits.
xlog (bool, optional) – Logarithmic x-axis if xlog=True.
ylabel (str, optional) – Label for y-axis.
ylim (None or tuple of floats, optional) – Specify y-axis limits.
ylog (bool, optional) – Logarithmic y-axis if ylog=True.
show (bool, optional) – Plot is shown if show=True.
save (None or str, optional) – Provide a path to save the plot.
- Returns:
fig (matplotlib.figure.Figure)
axes (list or array of matplotlib.axes)
- memocell.plots.est_bestfit_mean_plot(estimation, settings=None, data=True, cred=True, xlabel='Time', xlim=None, xlog=False, ylabel='Mean', ylim=None, ylog=False, show=True, save=None)
Plot the model mean trajectories based on the estimated parameter posterior distribution. Note: A summary model trajectory is shown based on the median of the individual 1-dimensional marginal parameter posteriors (cred=False) or based on the 50th percentile of model trajectory samples from the complete parameter posterior; for multimodal parameter posteriors the cred=False option can yield an inaccurate summary (here, it is advised to use cred=True).
- Parameters:
estimation (memocell.estimation.Estimation) – A memocell estimation object.
settings (dict of dict, optional) – Optional label and color settings for the mean traces.
data (bool, optional) – If data=True (default), plot the data mean statistics with standard errors in the background (grey color).
cred (bool, optional) – If cred=True (default), a median trajectory with 95% credible bands of the model means are shown; based on 2.5th, 50th and 97.5th percentiles for each time point of model trajectories drawn from the estimated parameter posterior distribution. Note: The computation of credible bands can be expensive for complex models; in this case you might want to start plotting with cred=False. If cred=False, no band is shown and the summary trajectory is computed by taking median parameters of the individual 1-dimensional posterior marginals.
xlabel (str, optional) – Label for x-axis.
xlim (None or tuple of floats, optional) – Specify x-axis limits.
xlog (bool, optional) – Logarithmic x-axis if xlog=True.
ylabel (str, optional) – Label for y-axis.
ylim (None or tuple of floats, optional) – Specify y-axis limits.
ylog (bool, optional) – Logarithmic y-axis if ylog=True.
show (bool, optional) – Plot is shown if show=True.
save (None or str, optional) – Provide a path to save the plot.
- Returns:
fig (matplotlib.figure.Figure)
axes (list or array of matplotlib.axes)
- memocell.plots.est_bestfit_variance_plot(estimation, settings=None, data=True, cred=True, xlabel='Time', xlim=None, xlog=False, ylabel='Variance', ylim=None, ylog=False, show=True, save=None)
Plot the model variance trajectories based on the estimated parameter posterior distribution. Note: A summary model trajectory is shown based on the median of the individual 1-dimensional marginal parameter posteriors (cred=False) or based on the 50th percentile of model trajectory samples from the complete parameter posterior; for multimodal parameter posteriors the cred=False option can yield an inaccurate summary (here, it is advised to use cred=True).
- Parameters:
estimation (memocell.estimation.Estimation) – A memocell estimation object.
settings (dict of dict, optional) – Optional label and color settings for the variance traces.
data (bool, optional) – If data=True (default), plot the data variance statistics with standard errors in the background (grey color), if these data are available.
cred (bool, optional) – If cred=True (default), a median trajectory with 95% credible bands of the model variances are shown; based on 2.5th, 50th and 97.5th percentiles for each time point of model trajectories drawn from the estimated parameter posterior distribution. Note: The computation of credible bands can be expensive for complex models; in this case you might want to start plotting with cred=False. If cred=False, no band is shown and the summary trajectory is computed by taking median parameters of the individual 1-dimensional posterior marginals.
xlabel (str, optional) – Label for x-axis.
xlim (None or tuple of floats, optional) – Specify x-axis limits.
xlog (bool, optional) – Logarithmic x-axis if xlog=True.
ylabel (str, optional) – Label for y-axis.
ylim (None or tuple of floats, optional) – Specify y-axis limits.
ylog (bool, optional) – Logarithmic y-axis if ylog=True.
show (bool, optional) – Plot is shown if show=True.
save (None or str, optional) – Provide a path to save the plot.
- Returns:
fig (matplotlib.figure.Figure)
axes (list or array of matplotlib.axes)
- memocell.plots.est_bestfit_covariance_plot(estimation, settings=None, data=True, cred=True, xlabel='Time', xlim=None, xlog=False, ylabel='Covariance', ylim=None, ylog=False, show=True, save=None)
Plot the model covariance trajectories based on the estimated parameter posterior distribution. Note: A summary model trajectory is shown based on the median of the individual 1-dimensional marginal parameter posteriors (cred=False) or based on the 50th percentile of model trajectory samples from the complete parameter posterior; for multimodal parameter posteriors the cred=False option can yield an inaccurate summary (here, it is advised to use cred=True).
- Parameters:
estimation (memocell.estimation.Estimation) – A memocell estimation object.
settings (dict of dict, optional) – Optional label and color settings for the covariance traces.
data (bool, optional) – If data=True (default), plot the data covariance statistics with standard errors in the background (grey color), if these data are available.
cred (bool, optional) – If cred=True (default), a median trajectory with 95% credible bands of the model covariances are shown; based on 2.5th, 50th and 97.5th percentiles for each time point of model trajectories drawn from the estimated parameter posterior distribution. Note: The computation of credible bands can be expensive for complex models; in this case you might want to start plotting with cred=False. If cred=False, no band is shown and the summary trajectory is computed by taking median parameters of the individual 1-dimensional posterior marginals.
xlabel (str, optional) – Label for x-axis.
xlim (None or tuple of floats, optional) – Specify x-axis limits.
xlog (bool, optional) – Logarithmic x-axis if xlog=True.
ylabel (str, optional) – Label for y-axis.
ylim (None or tuple of floats, optional) – Specify y-axis limits.
ylog (bool, optional) – Logarithmic y-axis if ylog=True.
show (bool, optional) – Plot is shown if show=True.
save (None or str, optional) – Provide a path to save the plot.
- Returns:
fig (matplotlib.figure.Figure)
axes (list or array of matplotlib.axes)
Utility Methods
Some memocell utility functions are listed below. Particularly, methods to compute the phase-type distribution for a hidden layer from (multiple, parallel) Erlang channels. The methods were partially adapted from butools, please see there for more waiting time and phase-type distribution related helper methods.
- memocell.utils.phase_type_pdf(alpha, S, x)
Returns the probability density function of a continuous phase-type distribution.
Note: Method adapted from butools, see there for further phase-type related methods.
- Parameters:
alpha (1d numpy.ndarray) – The initial probability vector of the phase-type distribution (with shape (1,m)).
S (2d numpy.ndarray) – The transient generator matrix of the phase-type distribution (with shape (m,m)).
x (1d numpy.ndarray or array-like) – The density function will be computed at these points.
- Returns:
ph_pdf – The values of the phase-type density function at the corresponding x values.
- Return type:
1d numpy.ndarray
- memocell.utils.phase_type_from_erlang(theta, n)
Returns initial probabilities \(\alpha\) and generator matrix \(S\) for phase-type representation of an Erlang waiting time distribution with \(n\) steps and rate \(\theta\) (mean waiting time \(1/\theta\)).
Note: To obtain a phase-type density pass the results of this method into the method utils.phase_type_pdf.
Note: The parametrisation implies the rate \(n\cdot\theta\) on the individual exponentially-distributed substeps.
Note: This method is more a complement for other phase-type representations. For computational tasks one can simply use the faster scipy.stats version:
>>> erlang_pdf = stats.gamma.pdf(x, a=n, scale=1/(theta*n))
- Parameters:
theta (float) – Rate parameter of the complete Erlang channel (inverse of the mean Erlang waiting time).
n (int or float) – Number of steps of the Erlang channel (shape parameter).
- Returns:
alpha (1d numpy.ndarray) – The initial probability vector of the phase-type distribution (with shape (1,n)).
S (2d numpy.ndarray) – The transient generator matrix of the phase-type distribution (with shape (n,n)).
- memocell.utils.phase_type_from_parallel_erlang2(theta1, theta2, n1, n2)
Returns initial probabilities \(\alpha\) and generator matrix \(S\) for a phase-type representation of two parallel Erlang channels with parametrisation \((\theta_1, n_1)\) and \((\theta_2, n_2)\) (rate and steps of Erlang channels).
Note: To obtain a phase-type density pass the results of this method into the method utils.phase_type_pdf.
Note: The two Erlang channels split at the first substep into each channel. The parametrisation implies the rate \(n\cdot\theta\) on the individual exponentially-distributed substeps for the respective channel.
- Parameters:
theta1 (float) – Rate parameter of the first complete Erlang channel (inverse of the mean Erlang waiting time).
theta2 (float) – Rate parameter of the second complete Erlang channel (inverse of the mean Erlang waiting time).
n1 (int or float) – Number of steps of the first Erlang channel (shape parameter).
n2 (int or float) – Number of steps of the second Erlang channel (shape parameter).
- Returns:
alpha (1d numpy.ndarray) – The initial probability vector of the phase-type distribution (with shape (1,m) where \(m=n_1+n_2-1\)).
S (2d numpy.ndarray) – The transient generator matrix of the phase-type distribution (with shape (m,m) where \(m=n_1+n_2-1\)).
- memocell.utils.phase_type_from_parallel_erlang3(theta1, theta2, theta3, n1, n2, n3)
Returns initial probabilities \(\alpha\) and generator matrix \(S\) for a phase-type representation of three parallel Erlang channels with parametrisation \((\theta_1, n_1)\), \((\theta_2, n_2)\) and \((\theta_3, n_3)\) (rate and steps of Erlang channels).
Note: To obtain a phase-type density pass the results of this method into the method utils.phase_type_pdf.
Note: The three Erlang channels split at the first substep into each channel. The parametrisation implies the rate \(n\cdot\theta\) on the individual exponentially-distributed substeps for the respective channel.
- Parameters:
theta1 (float) – Rate parameter of the first complete Erlang channel (inverse of the mean Erlang waiting time).
theta2 (float) – Rate parameter of the second complete Erlang channel (inverse of the mean Erlang waiting time).
theta3 (float) – Rate parameter of the third complete Erlang channel (inverse of the mean Erlang waiting time).
n1 (int or float) – Number of steps of the first Erlang channel (shape parameter).
n2 (int or float) – Number of steps of the second Erlang channel (shape parameter).
n3 (int or float) – Number of steps of the third Erlang channel (shape parameter).
- Returns:
alpha (1d numpy.ndarray) – The initial probability vector of the phase-type distribution (with shape (1,m) where \(m=n_1+n_2+n_3-2\)).
S (2d numpy.ndarray) – The transient generator matrix of the phase-type distribution (with shape (m,m) where \(m=n_1+n_2+n_3-2\)).