bioflow.bio_db_parsers package

Submodules

bioflow.bio_db_parsers.ComplexPortalParser module

bioflow.bio_db_parsers.ComplexPortalParser.parse_complex_portal(complex_portal_file)[source]

bioflow.bio_db_parsers.PhosphositeParser module

bioflow.bio_db_parsers.PhosphositeParser.parse_phosphosite(phoshosite_file, organism)[source]

Parses the phosphocite tsv file

Parameters:
  • phoshosite_file
  • organism
Returns:

bioflow.bio_db_parsers.geneOntologyParser module

Contains the functions responsible for the parsing of the GO terms

class bioflow.bio_db_parsers.geneOntologyParser.GOTermsParser[source]

Bases: object

Wrapper object for a parser of GO terms.

flush_block()[source]

flushes all temporary term stores to the main data stores

parse_go_terms(source_file_path)[source]

Takes the path to the gene ontology .obo file and returns result of parse dict and list

Parameters:source_file_path – gene ontology .obo file
Returns:dict containing term parse, list containing inter-term relationship (turtle)

triplets

parse_line_in_block(header, payload)[source]

Parses a line within GO term parameters block

Parameters:
  • header – GO term parameter name
  • payload – GO term parameter value
start_block()[source]

resets temporary stores to fill so that a new term can be loaded

bioflow.bio_db_parsers.proteinRelParsers module

Protein relationships parser

bioflow.bio_db_parsers.proteinRelParsers.parse_bio_grid(bio_grid)[source]

Parses the given file as a BioGrid file and returns as

Parameters:bio_grid – the location of the biogrid_path bioflow file that needs to bprased
Returns:
bioflow.bio_db_parsers.proteinRelParsers.parse_hint(_hint_csv)[source]

Reads protein-protein relationships from a HiNT database file

Parameters:_hint_csv – location of the HiNT database tsv file
Returns:{UP_Identifier:[UP_ID1, UP_ID2, …]}

bioflow.bio_db_parsers.reactomeParser module

bioflow.bio_db_parsers.tfParsers module

bioflow.bio_db_parsers.uniprotParser module

The module responsible for parsing of the Uniprot SWISSPROT .dat file for a subset of cross-references that are useful in our database.

Once uniprot is parsed, it is returned as the dictionary containing the following elements:

Uniprot = { SWISSPROT_ID:{
‘Acnum’:[], ‘Names’: {‘Full’: ‘’, ‘AltNames’: []}, ‘GeneRefs’: {‘Names’: [], ‘OrderedLocusNames’: [], ‘ORFNames’: []}, ‘TaxID’: ‘’, ‘Ensembl’: [], ‘KEGG’: [], ‘EMBL’: [], ‘GO’: [], ‘Pfam’: [], ‘SUPFAM’: [], ‘PDB’: [], ‘GeneID’: [], }}
class bioflow.bio_db_parsers.uniprotParser.UniProtParser(tax_ids_to_parse)[source]

Bases: object

Wraps the Uniprot parser

end_block()[source]

Manages the behavior of the end of a parse block

Returns:
get_access_dicts()[source]

Returns an access dictionary that would plot genes names, AcNums or EMBL identifiers to the Swissprot IDs

Returns:dictionary mapping all teh external database identifiers towards uniprot IDs
parse_gene_references(line)[source]

Parses gene names and references from the UNIPROT text file

Parameters:line
parse_name(line)[source]

Parses a line that contains a name associated to the entry we are trying to load

Parameters:line
Returns:
parse_uniprot(source_path)[source]

Performs the entire uniprot file parsing and importing

Parameters:source_path – path towards the uniprot test file
Returns:uniprot parse dictionary
parse_xref(line)[source]

Parses an xref line from the Uniprot text file and updates the provided dictionary with the results of parsing

Parameters:line
process_line(line, keyword)[source]

A function that processes a line parsed from the UNIPROT database file

Parameters:
  • line
  • keyword

Module contents