bioflow.algorithms_bank package¶
Submodules¶
bioflow.algorithms_bank.clustering_routines module¶
bioflow.algorithms_bank.conduction_routines module¶
bioflow.algorithms_bank.flow_calculation_methods module¶
These methods are responsible for generation of pairs of nodes for which we will be calculating and summing the flow.

bioflow.algorithms_bank.flow_calculation_methods.
evaluate_ops
(prim_len: int, sec_len: int, sparse_rounds: int = 1) → float[source]¶ Evaluates the number of total node pair flow computations needed to calculate the complete flow in the sample according to the general_flow policy.
Parameters:  prim_len – length of the primary set
 sec_len – length of the secondary set
 sparse_rounds – sparse rounds.
Returns:

bioflow.algorithms_bank.flow_calculation_methods.
general_flow
(sample: Union[List[int], List[Tuple[int, float]]], secondary_sample: Union[List[int], List[Tuple[int, float]], None] = None, sparse_rounds: int = 1) → List[Tuple[Tuple[int, float], Tuple[int, float]]][source]¶ Performs the information flow computation best matching the provided parameters.
Parameters:  sample – primary sample of nodes
 secondary_sample – secondary sample of nodes
 sparse_rounds – sparse rounds, in case samples are too big
Returns:

bioflow.algorithms_bank.flow_calculation_methods.
reduce_and_deduplicate_sample
(sample: Union[List[int], List[Tuple[int, float]]]) → List[Tuple[int, float]][source]¶ Deduplicates the nodes found in the sample by adding weights of duplicated nodes. In case a list of node ids only is provided, transforms them into a weighted list with all weights set to 1.
Parameters: sample – sample to deduplicate and/or add weights to Returns:

bioflow.algorithms_bank.flow_calculation_methods.
reduce_ops
(prim_len, sec_len, max_ops) → int[source]¶ Determines the sparse_rounds parameter that needs to be used in order to maintain the total number of pairs of nodes needed to calculate the complete flow in the sample according to the general_flow_policy.
Parameters:  prim_len – length of the primary set
 sec_len – length of the secondary set
 max_ops – maximum allowed number of node pairs
Returns:
bioflow.algorithms_bank.flow_significance_evaluation module¶

bioflow.algorithms_bank.flow_significance_evaluation.
get_neighboring_degrees
()[source]¶ Recovers the maximum flow achieved by nodes of a given degree for each run. On case the user requests it with nearest_degrees or min_nodes parameters, also recovers maximum flow values for the nodes of similar degrees or looks fro flow values in nearest degrees until at least min_nodes are found nearest_degrees
Parameters:  degree – degree of the nodes
 max_array – maximum nodes for a given degree in each run
 nearest_degrees – the minimum number of the nearest gedgrees to look for
 min_nodes – the minimum number of nodes until which to look for neighbours
Returns:

bioflow.algorithms_bank.flow_significance_evaluation.
get_p_val_by_gumbel
()[source]¶ Recovers the statistical significance (pvalue equivalent) by performing a gumbel test
Parameters:  entry – the values achieved in the real hits information flow computation
 max_set_red – background set of maximum values achieved during blanc sampling runs
Returns:
bioflow.algorithms_bank.model_assumptions module¶
bioflow.algorithms_bank.sampling_policies module¶
This module defines the policies that will be used in order to sample the information flow patterns to compare with.
The general approach is a function that takes in any eventual parameters and outputs a list of pairs of DB_Ids for which the flow will be calculated.

bioflow.algorithms_bank.sampling_policies.
characterize_flow_parameters
(sample: Union[List[int], List[Tuple[int, float]]], secondary_sample: Union[List[int], List[Tuple[int, float]], None], sparse_rounds: int)[source]¶ Characterizes the primary and secondary sets and computes their hash, that can be used ot match similar samples for random sampling.
Parameters:  sample – primary set
 secondary_sample – secondary set
 sparse_rounds – if sparse rounds are to be performed
Returns: first set length, shape, hist, second set length, shape, hist, sparse rounds, hash

bioflow.algorithms_bank.sampling_policies.
matched_sample_distribution
()[source]¶ Tries to guess a distribution of floats and sample from it. uses np.histogram with the number of bins equal to the granularity parameter. For each sample, selects which bin to sample and then picks from the bin a float according to a uniform distribution. if logmode is enabled, histogram will be in the logspace, as well as the sampling.
Parameters:  floats_arr – array of floats for which to match the distribution
 samples_no – number of random samples to retrieve
 granularity – granularity at which to operate
 logmode – if sample in logspace
Returns: samples drawn from the empirically matched distribution

bioflow.algorithms_bank.sampling_policies.
matched_sampling
(sample, secondary_sample, background, samples, float_sampling_method='exact')[source]¶ The general random sampling strategy that sample sets of the same size and shape as primary and secondary sample set and, if they are weighted, try to match the random sample weights according to the
Parameters:  sample – primary sample set
 secondary_sample – secondary sample_set
 background – background of ids (and potentially weights) from which to sample
 samples – random samples wanted
 sampling_mode – exact/distro/logdistro. the sampling parametrization method ingesting
all the parameters in a single string argument in the general case, here, a pass through parameter for the _sample_floats function if samples are weighted and the distribution of weights is being matched. :return: