seed.lib.mappings package¶
Submodules¶
seed.lib.mappings.mapper module¶
-
seed.lib.mappings.mapper.
create_column_regexes
(raw_columns)¶ Take the columns in the format below and sanitize the keys and add in the regex.
Parameters: raw_data – list of strings (columns names from imported file) Returns: list of dict
-
seed.lib.mappings.mapper.
get_pm_mapping
(raw_columns, mapping_data=None, resolve_duplicates=True)¶ Create and return Portfolio Manager (PM) mapping for a given version of PM and the given list of column names.
The method will take the raw_columns (from the CSV/XLSX file) and attempt to normalize the column names so that they can be mapped to the data in the pm-mapping.json[‘from_field’].
seed.lib.mappings.mapping_columns module¶
-
class
seed.lib.mappings.mapping_columns.
MappingColumns
(raw_columns, dest_columns, previous_mapping=None, map_args=None, threshold=0)¶ Bases:
object
This class handles the probabilistic mapping of unknown columns to defined fields. This is mainly used in the build_column_mapping API endpoint.
-
add_mappings
(raw_column, mappings, previous_mapping=False)¶ Add mappings to the data structure for later processing.
Parameters: - raw_column – list of strings
- mappings – list of tuples of potential mappings and confidences
- previous_mapping – boolean, if true these these mappings will take precedence
Returns: Bool, whether or not the mapping was added
-
apply_threshold
(threshold)¶ Remove mapping suggestions that do not meet the defined threshold
This method is forced as part of the workflow for now, but could easily be made as a separate call.
Parameters: threshold – int, min value to be greater than or equal to. Returns: None
-
duplicates
¶ Check for duplicate initial mapping results.
Returns: List of raw col
-
final_mappings
¶ Return the final mappings in a format that can be used downstream from this method {
“raw_column_1”: (‘table’, ‘db_column_1’, confidence), “raw_column_2”: (‘table’, ‘db_column_1’, confidence),}
-
first_suggested_mapping
(raw_column)¶ Grab the first suggested mapping for a raw column
Parameters: raw_column – String Returns: tuple of the mapping (‘table’, ‘field’, confidence), or ()
-
resolve_duplicate
(dup_map_field, raw_columns)¶ Parameters: - dup_map_field – String, name of the field that is a duplicate
- columns – list, raw columns that mapped to the same result
Returns: None
-
set_initial_mapping_cmp
(raw_column)¶ Set the initial_mapping_cmp helper item in the self.data hash. This is used to detect if there are any duplicates.
Parameters: raw_column – String, name of the raw column to set the initial_mapping_cmp Returns: None
-
-
seed.lib.mappings.mapping_columns.
sort_duplicates
(a, b)¶ Custom sort for the duplicate hash to decide which raw column will get the mapping suggestion
seed.lib.mappings.mapping_data module¶
-
class
seed.lib.mappings.mapping_data.
MappingData
(exclude_fields=None)¶ Bases:
object
New format for managing looking up mapping data. This includes a more comprehensive set of data fields with type and schema information
Makes a dictionary of the column names and their respective types.
MappingData data property contains the list of fields in the database with the table name.
-
add_extra_data
(columns)¶ Add in the unit types from a columns queryset
- Args:
- columns: list of columns from the Column table
Returns: None
-
building_columns
¶ Return a set of the sorted keys which are the possible columns
Returns: set of keys
-
extra_data
¶ List only the extra_data columns, that is the columns that are not database fields.
Returns: set of keys of the extra_data columns
-
find_column
(table_name, column_name)¶ - Args:
- table_name: name of the table to find column_name: name of the column to find with the correct table
Returns: None or Dict of found column
-
keys
¶ Flatten the data set to a list of unique names independent of the table.
Returns: List of keys
-
keys_with_table_names
¶ Similar to keys, except it returns a list of tuples
Returns: list of tuples
-
normalize_mappable_type
(in_str)¶ Normalize the data types for when we communicate the fields in JavaScript. ensures that the data types are consistent.
- Args:
- in_str: string to normalize
Returns: normalized string with JavaScript data types
-
sort_data
()¶ sort the objects by table . name
Returns: None, updates member variable
-
seed.lib.mappings.test_mapper module¶
seed.lib.mappings.test_mapping_columns module¶
Unit tests for map.py
-
class
seed.lib.mappings.test_mapping_columns.
TestMappingColumns
(methodName=’runTest’)¶ Bases:
django.test.testcases.TestCase
Test mapping data methods.
-
setUp
()¶
-
test_add_mappings
()¶
-
test_duplicate_fields_across_models
()¶ Test to make sure that similar fields have different targets
-
test_excluded_fields
()¶ Test to make sure excluded fields are not mapped to
-
test_first_suggested_mapping
()¶
-
test_mapping_columns
()¶
-
test_mapping_columns_with_threshold
()¶
-
test_no_more_matches
()¶ Test to make sure that similar fields have different targets
-
test_sort_duplicates
()¶
-
seed.lib.mappings.test_mapping_data module¶
Unit tests for map.py