bioflow.algorithms_bank package

Submodules

bioflow.algorithms_bank.clustering_routines module

bioflow.algorithms_bank.conduction_routines module

bioflow.algorithms_bank.flow_calculation_methods module

These methods are responsible for generation of pairs of nodes for which we will be calculating and summing the flow.

bioflow.algorithms_bank.flow_calculation_methods.evaluate_ops(prim_len: int, sec_len: int, sparse_rounds: int = -1) → float[source]

Evaluates the number of total node pair flow computations needed to calculate the complete flow in the sample according to the general_flow policy.

Parameters:
  • prim_len – length of the primary set
  • sec_len – length of the secondary set
  • sparse_rounds – sparse rounds.
Returns:

bioflow.algorithms_bank.flow_calculation_methods.general_flow(sample: Union[List[int], List[Tuple[int, float]]], secondary_sample: Union[List[int], List[Tuple[int, float]], None] = None, sparse_rounds: int = -1) → List[Tuple[Tuple[int, float], Tuple[int, float]]][source]

Performs the information flow computation best matching the provided parameters.

Parameters:
  • sample – primary sample of nodes
  • secondary_sample – secondary sample of nodes
  • sparse_rounds – sparse rounds, in case samples are too big
Returns:

bioflow.algorithms_bank.flow_calculation_methods.reduce_and_deduplicate_sample(sample: Union[List[int], List[Tuple[int, float]]]) → List[Tuple[int, float]][source]

Deduplicates the nodes found in the sample by adding weights of duplicated nodes. In case a list of node ids only is provided, transforms them into a weighted list with all weights set to 1.

Parameters:sample – sample to deduplicate and/or add weights to
Returns:
bioflow.algorithms_bank.flow_calculation_methods.reduce_ops(prim_len, sec_len, max_ops) → int[source]

Determines the sparse_rounds parameter that needs to be used in order to maintain the total number of pairs of nodes needed to calculate the complete flow in the sample according to the general_flow_policy.

Parameters:
  • prim_len – length of the primary set
  • sec_len – length of the secondary set
  • max_ops – maximum allowed number of node pairs
Returns:

bioflow.algorithms_bank.flow_significance_evaluation module

bioflow.algorithms_bank.flow_significance_evaluation.get_neighboring_degrees()[source]

Recovers the maximum flow achieved by nodes of a given degree for each run. On case the user requests it with nearest_degrees or min_nodes parameters, also recovers maximum flow values for the nodes of similar degrees or looks fro flow values in nearest degrees until at least min_nodes are found nearest_degrees

Parameters:
  • degree – degree of the nodes
  • max_array – maximum nodes for a given degree in each run
  • nearest_degrees – the minimum number of the nearest gedgrees to look for
  • min_nodes – the minimum number of nodes until which to look for neighbours
Returns:

bioflow.algorithms_bank.flow_significance_evaluation.get_p_val_by_gumbel()[source]

Recovers the statistical significance (p-value equivalent) by performing a gumbel test

Parameters:
  • entry – the values achieved in the real hits information flow computation
  • max_set_red – background set of maximum values achieved during blanc sparse_sampling runs
Returns:

bioflow.algorithms_bank.sampling_policies module

This module defines the policies that will be used in order to sample the information flow patterns to compare with.

The general approach is a function that takes in any eventual parameters and outputs a list of pairs of DB_Ids for which the flow will be calculated.

bioflow.algorithms_bank.sampling_policies.characterize_flow_parameters(sample: Union[List[int], List[Tuple[int, float]]], secondary_sample: Union[List[int], List[Tuple[int, float]], None], sparse_rounds: int)[source]

Characterizes the primary and secondary sets and computes their hash, that can be used ot match similar samples for random sampling.

Parameters:
  • sample – primary set
  • secondary_sample – secondary set
  • sparse_rounds – if sparse rounds are to be performed
Returns:

first set length, shape, hist, second set length, shape, hist, sparse rounds, hash

bioflow.algorithms_bank.sampling_policies.matched_sample_distribution()[source]

Tries to guess a distribution of floats and sample from it. uses np.histogram with the number of bins equal to the granularity parameter. For each sample, selects which bin to sample and then picks from the bin a float according to a uniform distribution. if logmode is enabled, histogram will be in the log-space, as well as the sampling.

Parameters:
  • floats_arr – array of floats for which to match the distribution
  • samples_no – number of random samples to retrieve
  • granularity – granularity at which to operate
  • logmode – if sample in log-space
Returns:

samples drawn from the empirically matched distribution

bioflow.algorithms_bank.sampling_policies.matched_sampling(sample, secondary_sample, background, samples, float_sampling_method='exact')[source]

The general random sampling strategy that sample sets of the same size and shape as primary and secondary sample set and, if they are weighted, try to match the random sample weights according to the

Parameters:
  • sample – primary sample set
  • secondary_sample – secondary sample_set
  • background – background of ids (and potentially weights) from which to sample
  • samples – random samples wanted
  • sampling_mode – exact/distro/logdistro. the sampling parametrization method ingesting

all the parameters in a single string argument in the general case, here, a pass- through parameter for the _sample_floats function if samples are weighted and the distribution of weights is being matched. :return:

bioflow.algorithms_bank.weigting_policies module

Module contents