seed.lib.mappings package

Submodules

seed.lib.mappings.mapper module

seed.lib.mappings.mapper.create_column_regexes(raw_columns)

Take the columns in the format below and sanitize the keys and add in the regex.

Parameters:raw_data – list of strings (columns names from imported file)
Returns:list of dict
seed.lib.mappings.mapper.get_pm_mapping(raw_columns, mapping_data=None, resolve_duplicates=True)

Create and return Portfolio Manager (PM) mapping for a given version of PM and the given list of column names.

The method will take the raw_columns (from the CSV/XLSX file) and attempt to normalize the column names so that they can be mapped to the data in the pm-mapping.json[‘from_field’].

seed.lib.mappings.mapping_columns module

class seed.lib.mappings.mapping_columns.MappingColumns(raw_columns, dest_columns, previous_mapping=None, map_args=None, default_mappings=None, threshold=0)

Bases: object

This class handles the probabilistic mapping of unknown columns to defined fields. This is mainly used in the build_column_mapping API endpoint.

add_mappings(raw_column, mappings, previous_mapping=False)

Add mappings to the data structure for later processing.

Parameters:
  • raw_column – list of strings
  • mappings – list of tuples of potential mappings and confidences
  • previous_mapping – boolean, if true these these mappings will take precedence
Returns:

Bool, whether or not the mapping was added

apply_threshold(threshold)

Remove mapping suggestions that do not meet the defined threshold

This method is forced as part of the workflow for now, but could easily be made as a separate call.

Parameters:threshold – int, min value to be greater than or equal to.
Returns:None
duplicates

Check for duplicate initial mapping results.

Returns:List of raw col
final_mappings

Return the final mappings in a format that can be used downstream from this method {

“raw_column_1”: (‘table’, ‘db_column_1’, confidence), “raw_column_2”: (‘table’, ‘db_column_1’, confidence),

}

first_suggested_mapping(raw_column)

Grab the first suggested mapping for a raw column

Parameters:raw_column – String
Returns:tuple of the mapping (‘table’, ‘field’, confidence), or ()
resolve_duplicate(dup_map_field, raw_columns)
Parameters:
  • dup_map_field – String, name of the field that is a duplicate
  • columns – list, raw columns that mapped to the same result
Returns:

None

set_initial_mapping_cmp(raw_column)

Set the initial_mapping_cmp helper item in the self.data hash. This is used to detect if there are any duplicates.

Parameters:raw_column – String, name of the raw column to set the initial_mapping_cmp
Returns:None
seed.lib.mappings.mapping_columns.sort_duplicates(a, b)

Custom sort for the duplicate hash to decide which raw column will get the mapping suggestion

seed.lib.mappings.mapping_data module

seed.lib.mappings.test_mapper module

seed.lib.mappings.test_mapping_columns module

seed.lib.mappings.test_mapping_data module

Module contents