seed.lib.mappings package

Submodules

seed.lib.mappings.mapper module

seed.lib.mappings.mapper.create_column_regexes(raw_columns)

Take the columns in the format below and sanitize the keys and add in the regex.

Parameters:raw_data – list of strings (columns names from imported file)
Returns:list of dict
seed.lib.mappings.mapper.get_pm_mapping(raw_columns, mapping_data=None, resolve_duplicates=True)

Create and return Portfolio Manager (PM) mapping for a given version of PM and the given list of column names.

The method will take the raw_columns (from the CSV/XLSX file) and attempt to normalize the column names so that they can be mapped to the data in the pm-mapping.json[‘from_field’].

seed.lib.mappings.mapping_columns module

class seed.lib.mappings.mapping_columns.MappingColumns(raw_columns, dest_columns, previous_mapping=None, map_args=None, threshold=0)

Bases: object

This class handles the probabilistic mapping of unknown columns to defined fields. This is mainly used in the build_column_mapping API endpoint.

add_mappings(raw_column, mappings, previous_mapping=False)

Add mappings to the data structure for later processing.

Parameters:
  • raw_column – list of strings
  • mappings – list of tuples of potential mappings and confidences
  • previous_mapping – boolean, if true these these mappings will take precedence
Returns:

Bool, whether or not the mapping was added

apply_threshold(threshold)

Remove mapping suggestions that do not meet the defined threshold

This method is forced as part of the workflow for now, but could easily be made as a separate call.

Parameters:threshold – int, min value to be greater than or equal to.
Returns:None
duplicates

Check for duplicate initial mapping results.

Returns:List of raw col
final_mappings

Return the final mappings in a format that can be used downstream from this method {

“raw_column_1”: (‘table’, ‘db_column_1’, confidence), “raw_column_2”: (‘table’, ‘db_column_1’, confidence),

}

first_suggested_mapping(raw_column)

Grab the first suggested mapping for a raw column

Parameters:raw_column – String
Returns:tuple of the mapping (‘table’, ‘field’, confidence), or ()
resolve_duplicate(dup_map_field, raw_columns)
Parameters:
  • dup_map_field – String, name of the field that is a duplicate
  • columns – list, raw columns that mapped to the same result
Returns:

None

set_initial_mapping_cmp(raw_column)

Set the initial_mapping_cmp helper item in the self.data hash. This is used to detect if there are any duplicates.

Parameters:raw_column – String, name of the raw column to set the initial_mapping_cmp
Returns:None
seed.lib.mappings.mapping_columns.sort_duplicates(a, b)

Custom sort for the duplicate hash to decide which raw column will get the mapping suggestion

seed.lib.mappings.mapping_data module

class seed.lib.mappings.mapping_data.MappingData(exclude_fields=None)

Bases: object

New format for managing looking up mapping data. This includes a more comprehensive set of data fields with type and schema information

Makes a dictionary of the column names and their respective types.

MappingData data property contains the list of fields in the database with the table name.

add_extra_data(columns)

Add in the unit types from a columns queryset

Args:
columns: list of columns from the Column table

Returns: None

building_columns

Return a set of the sorted keys which are the possible columns

Returns: set of keys

extra_data

List only the extra_data columns, that is the columns that are not database fields.

Returns: set of keys of the extra_data columns

find_column(table_name, column_name)
Args:
table_name: name of the table to find column_name: name of the column to find with the correct table

Returns: None or Dict of found column

keys

Flatten the data set to a list of unique names independent of the table.

Returns: List of keys

keys_with_table_names

Similar to keys, except it returns a list of tuples

Returns: list of tuples

normalize_mappable_type(in_str)

Normalize the data types for when we communicate the fields in JavaScript. ensures that the data types are consistent.

Args:
in_str: string to normalize

Returns: normalized string with JavaScript data types

sort_data()

sort the objects by table . name

Returns: None, updates member variable

seed.lib.mappings.test_mapper module

class seed.lib.mappings.test_mapper.TestMapper(methodName=’runTest’)

Bases: django.test.testcases.TestCase

Test mapping methods.

setUp()
test_column_regexes()
test_mapping()
test_mapping_pm_to_seed()

seed.lib.mappings.test_mapping_columns module

Unit tests for map.py

class seed.lib.mappings.test_mapping_columns.TestMappingColumns(methodName=’runTest’)

Bases: django.test.testcases.TestCase

Test mapping data methods.

setUp()
test_add_mappings()
test_duplicate_fields_across_models()

Test to make sure that similar fields have different targets

test_excluded_fields()

Test to make sure excluded fields are not mapped to

test_first_suggested_mapping()
test_mapping_columns()
test_mapping_columns_with_threshold()
test_no_more_matches()

Test to make sure that similar fields have different targets

test_sort_duplicates()

seed.lib.mappings.test_mapping_data module

Unit tests for map.py

class seed.lib.mappings.test_mapping_data.TestMappingData(methodName=’runTest’)

Bases: django.test.testcases.TestCase

Test mapping data methods.

setUp()
test_extra_data()
test_find_column()
test_keys()
test_keys_with_table_names()
test_mapping_data_init()
test_null_extra_data()

Module contents