bioflow.algorithms_bank package¶

Submodules¶

bioflow.algorithms_bank.clustering_routines module¶

bioflow.algorithms_bank.conduction_routines module¶

bioflow.algorithms_bank.flow_calculation_methods module¶

These methods are responsible for generation of pairs of nodes for which we will be calculating and summing the flow.

bioflow.algorithms_bank.flow_calculation_methods.evaluate_ops(prim_len: int, sec_len: int, sparse_rounds: int = -1) → float[source]¶

Evaluates the number of total node pair flow computations needed to calculate the complete flow in the sample according to the general_flow policy.

Parameters:	prim_len – length of the primary set sec_len – length of the secondary set sparse_rounds – sparse rounds.
Returns:

bioflow.algorithms_bank.flow_calculation_methods.general_flow(sample: Union[List[int], List[Tuple[int, float]]], secondary_sample: Union[List[int], List[Tuple[int, float]], None] = None, sparse_rounds: int = -1) → List[Tuple[Tuple[int, float], Tuple[int, float]]][source]¶

Performs the information flow computation best matching the provided parameters.

Parameters:	sample – primary sample of nodes secondary_sample – secondary sample of nodes sparse_rounds – sparse rounds, in case samples are too big
Returns:

bioflow.algorithms_bank.flow_calculation_methods.reduce_and_deduplicate_sample(sample: Union[List[int], List[Tuple[int, float]]]) → List[Tuple[int, float]][source]¶

Deduplicates the nodes found in the sample by adding weights of duplicated nodes. In case a list of node ids only is provided, transforms them into a weighted list with all weights set to 1.

Parameters:	sample – sample to deduplicate and/or add weights to
Returns:

bioflow.algorithms_bank.flow_calculation_methods.reduce_ops(prim_len, sec_len, max_ops) → int[source]¶

Determines the sparse_rounds parameter that needs to be used in order to maintain the total number of pairs of nodes needed to calculate the complete flow in the sample according to the general_flow_policy.

Parameters:	prim_len – length of the primary set sec_len – length of the secondary set max_ops – maximum allowed number of node pairs
Returns:

bioflow.algorithms_bank.flow_significance_evaluation module¶

bioflow.algorithms_bank.flow_significance_evaluation.get_neighboring_degrees()[source]¶

Recovers the maximum flow achieved by nodes of a given degree for each run. On case the user requests it with nearest_degrees or min_nodes parameters, also recovers maximum flow values for the nodes of similar degrees or looks fro flow values in nearest degrees until at least min_nodes are found nearest_degrees

Parameters:	degree – degree of the nodes max_array – maximum nodes for a given degree in each run nearest_degrees – the minimum number of the nearest gedgrees to look for min_nodes – the minimum number of nodes until which to look for neighbours
Returns:

bioflow.algorithms_bank.flow_significance_evaluation.get_p_val_by_gumbel()[source]¶

Recovers the statistical significance (p-value equivalent) by performing a gumbel test

Parameters:	entry – the values achieved in the real hits information flow computation max_set_red – background set of maximum values achieved during blanc sparse_sampling runs
Returns:

bioflow.algorithms_bank.sampling_policies module¶

This module defines the policies that will be used in order to sample the information flow patterns to compare with.

The general approach is a function that takes in any eventual parameters and outputs a list of pairs of DB_Ids for which the flow will be calculated.

bioflow.algorithms_bank.sampling_policies.characterize_flow_parameters(sample: Union[List[int], List[Tuple[int, float]]], secondary_sample: Union[List[int], List[Tuple[int, float]], None], sparse_rounds: int)[source]¶

Characterizes the primary and secondary sets and computes their hash, that can be used ot match similar samples for random sampling.

Parameters:	sample – primary set secondary_sample – secondary set sparse_rounds – if sparse rounds are to be performed
Returns:	first set length, shape, hist, second set length, shape, hist, sparse rounds, hash

bioflow.algorithms_bank.sampling_policies.matched_sample_distribution()[source]¶

Tries to guess a distribution of floats and sample from it. uses np.histogram with the number of bins equal to the granularity parameter. For each sample, selects which bin to sample and then picks from the bin a float according to a uniform distribution. if logmode is enabled, histogram will be in the log-space, as well as the sampling.

Parameters:	floats_arr – array of floats for which to match the distribution samples_no – number of random samples to retrieve granularity – granularity at which to operate logmode – if sample in log-space
Returns:	samples drawn from the empirically matched distribution

bioflow.algorithms_bank.sampling_policies.matched_sampling(sample, secondary_sample, background, samples, float_sampling_method='exact')[source]¶

The general random sampling strategy that sample sets of the same size and shape as primary and secondary sample set and, if they are weighted, try to match the random sample weights according to the

Parameters:	sample – primary sample set secondary_sample – secondary sample_set background – background of ids (and potentially weights) from which to sample samples – random samples wanted sampling_mode – exact/distro/logdistro. the sampling parametrization method ingesting

all the parameters in a single string argument in the general case, here, a pass- through parameter for the _sample_floats function if samples are weighted and the distribution of weights is being matched. :return:

bioflow.algorithms_bank package¶

Submodules¶

bioflow.algorithms_bank.clustering_routines module¶

bioflow.algorithms_bank.conduction_routines module¶

bioflow.algorithms_bank.flow_calculation_methods module¶

bioflow.algorithms_bank.flow_significance_evaluation module¶

bioflow.algorithms_bank.sampling_policies module¶

bioflow.algorithms_bank.weigting_policies module¶

Module contents¶

BioFlow

Navigation

Related Topics