Data Quality Package¶
Inheritance¶
Submodules¶
Models¶
-
exception
seed.models.data_quality.
ComparisonError
¶ Bases:
exceptions.Exception
-
class
seed.models.data_quality.
DataQualityCheck
(*args, **kwargs)¶ Bases:
django.db.models.base.Model
Object that stores the high level configuration per organization of the DataQualityCheck
-
exception
DoesNotExist
¶ Bases:
django.core.exceptions.ObjectDoesNotExist
-
exception
DataQualityCheck.
MultipleObjectsReturned
¶ Bases:
django.core.exceptions.MultipleObjectsReturned
-
DataQualityCheck.
REQUIRED_FIELDS
= {'PropertyState': ['address_line_1', 'custom_id_1', 'pm_property_id'], 'TaxLotState': ['address_line_1', 'custom_id_1', 'jurisdiction_tax_lot_id']}¶
-
DataQualityCheck.
add_result_comparison_error
(row_id, rule, display_name, value, rule_check)¶
-
DataQualityCheck.
add_result_is_null
(row_id, rule, display_name, value)¶
-
DataQualityCheck.
add_result_max_error
(row_id, rule, display_name, value, rule_max)¶
-
DataQualityCheck.
add_result_min_error
(row_id, rule, display_name, value, rule_min)¶
-
DataQualityCheck.
add_result_missing_and_none
(row_id, rule, display_name, value)¶
-
DataQualityCheck.
add_result_missing_req
(row_id, rule, display_name, value)¶
-
DataQualityCheck.
add_result_string_error
(row_id, rule, display_name, value)¶
-
DataQualityCheck.
add_rule
(rule)¶ Add a new rule to the Data Quality Checks
Parameters: rule – dict to be added as a new rule Returns: None
-
static
DataQualityCheck.
cache_key
(identifier)¶ Static method to return the location of the data_quality results from redis.
Parameters: identifier – Import file primary key Returns:
-
DataQualityCheck.
check_data
(record_type, rows)¶ Send in data as a queryset from the Property/Taxlot ids.
Parameters: - record_type – one of property/taxlot
- rows – rows of data to be checked for data quality
Returns: None
-
DataQualityCheck.
get_fieldnames
(record_type)¶ Get fieldnames to apply to results.
-
static
DataQualityCheck.
initialize_cache
(identifier)¶ Initialize the cache for storing the results. This is called before the celery tasks are chunked up.
Parameters: identifier – Import file primary key Returns: string, cache key
-
DataQualityCheck.
initialize_rules
()¶ Initialize the default rules for a DataQualityCheck object
Returns: None
-
DataQualityCheck.
objects
= <django.db.models.manager.Manager object>¶
-
DataQualityCheck.
organization
¶ Accessor to the related object on the forward side of a many-to-one or one-to-one relation.
In the example:
class Child(Model): parent = ForeignKey(Parent, related_name='children')
child.parent
is aForwardManyToOneDescriptor
instance.
-
DataQualityCheck.
remove_all_rules
()¶ Removes all the rules associated with this DataQualityCheck instance.
Returns: None
-
DataQualityCheck.
remove_status_label
(label_class, rule, linked_id)¶ Remove label because it didn’t match any of the range exceptions
Parameters: - label_class – statuslabel object, either property label or taxlot label
- rule – rule object
- linked_id – id of propertystate or taxlotstate object
Returns: boolean, if labeled was applied
-
DataQualityCheck.
reset_all_rules
()¶ Delete all rules and reinitialize the default set of rules
Returns: None
-
DataQualityCheck.
reset_default_rules
()¶ Reset only the default rules
Returns:
-
DataQualityCheck.
reset_results
()¶
-
classmethod
DataQualityCheck.
retrieve
(organization)¶ DataQualityCheck was previously a simple object but has been migrated to a django model. This method ensures that the data quality model will be backwards compatible.
This is the preferred method to initialize a new object.
Parameters: organization – int or instance of Organization Returns: obj, DataQualityCheck
-
DataQualityCheck.
retrieve_result_by_address
(address)¶ Retrieve the results of the data quality checks for a specific address.
Parameters: address – string, address to find the result for Returns: dict, results of data quality check for specific building
-
DataQualityCheck.
retrieve_result_by_tax_lot_id
(tax_lot_id)¶ Retrieve the results of the data quality checks by the jurisdiction ID.
Parameters: tax_lot_id – string, jurisdiction tax lot id Returns: dict, results of data quality check for specific building
-
DataQualityCheck.
rules
¶ Accessor to the related objects manager on the reverse side of a many-to-one relation.
In the example:
class Child(Model): parent = ForeignKey(Parent, related_name='children')
parent.children
is aReverseManyToOneDescriptor
instance.Most of the implementation is delegated to a dynamically defined manager class built by
create_forward_many_to_many_manager()
defined below.
-
DataQualityCheck.
save_to_cache
(identifier)¶ Save the results to the cache database. The data in the cache are stored as a list of dictionaries. The data in this class are stored as a dict of dict. This is important to remember because the data from the cache cannot be simply loaded into the above structure.
Parameters: identifier – Import file primary key Returns: None
-
DataQualityCheck.
update_status_label
(label_class, rule, linked_id)¶ Parameters: - label_class – statuslabel object, either property label or taxlot label
- rule – rule object
- linked_id – id of propertystate or taxlotstate object
Returns: boolean, if labeled was applied
-
exception
-
class
seed.models.data_quality.
Rule
(*args, **kwargs)¶ Bases:
django.db.models.base.Model
Rules for DataQualityCheck
-
exception
DoesNotExist
¶ Bases:
django.core.exceptions.ObjectDoesNotExist
-
exception
Rule.
MultipleObjectsReturned
¶ Bases:
django.core.exceptions.MultipleObjectsReturned
-
Rule.
data_quality_check
¶ Accessor to the related object on the forward side of a many-to-one or one-to-one relation.
In the example:
class Child(Model): parent = ForeignKey(Parent, related_name='children')
child.parent
is aForwardManyToOneDescriptor
instance.
-
Rule.
format_strings
(value)¶
-
Rule.
get_data_type_display
(*moreargs, **morekwargs)¶
-
Rule.
get_rule_type_display
(*moreargs, **morekwargs)¶
-
Rule.
get_severity_display
(*moreargs, **morekwargs)¶
-
Rule.
maximum_valid
(value)¶ Validate that the value is not greater than the maximum specified by the rule.
Parameters: value – Value to validate rule against Returns: bool, True is valid, False if the value is out of range
-
Rule.
minimum_valid
(value)¶ Validate that the value is not less than the minimum specified by the rule.
Parameters: value – Value to validate rule against Returns: bool, True is valid, False if the value is out of range
-
Rule.
objects
= <django.db.models.manager.Manager object>¶
-
Rule.
status_label
¶ Accessor to the related object on the forward side of a many-to-one or one-to-one relation.
In the example:
class Child(Model): parent = ForeignKey(Parent, related_name='children')
child.parent
is aForwardManyToOneDescriptor
instance.
-
Rule.
str_to_data_type
(value)¶ If the check is coming from a field in the database then it will be typed correctly; however, for extra_data, the values are typically strings or unicode. Therefore, the values are typed before they are checked using the rule’s data type definition.
Parameters: value – variant, value to type Returns: typed value
-
Rule.
valid_text
(value)¶ Validate the rule matches the specified text. Text is matched by regex.
Parameters: value – Value to validate rule against Returns: bool, True is valid, False if the value does not match
-
exception
Tests¶
Views¶
-
class
seed.views.data_quality.
DataQualityViews
(**kwargs)¶ Bases:
rest_framework.viewsets.ViewSet
Handles Data Quality API operations within Inventory backend. (1) Post, wait, get… (2) Respond with what changed
-
authentication_classes
= (<class 'rest_framework.authentication.SessionAuthentication'>, <class 'seed.authentication.SEEDAuthentication'>)¶
-
create
(request)¶ This API endpoint will create a new cleansing operation process in the background, on potentially a subset of properties/taxlots, and return back a query key — parameters:
name: organization_id description: Organization ID type: integer required: true paramType: query
name: data_quality_ids description: An object containing IDs of the records to perform data quality checks on.
Should contain two keys- property_state_ids and taxlot_state_ids, each of which is an array of appropriate IDs.
required: true paramType: body
- type:
- status:
- type: string description: success or error required: true
-
data_quality_rules
(request, *args, **kwargs)¶ Returns the data_quality rules for an org. — parameters:
- name: organization_id description: Organization ID type: integer required: true paramType: query
- type:
- status:
- type: string required: true description: success or error
- rules:
- type: object required: true description: An object containing ‘properties’ and ‘taxlots’ arrays of rules
-
reset_all_data_quality_rules
(request, *args, **kwargs)¶ Resets an organization’s data data_quality rules — parameters:
- name: organization_id description: Organization ID type: integer required: true paramType: query
- type:
- status:
- type: string description: success or error required: true
- in_range_checking:
- type: array[string] required: true description: An array of in-range error rules
- missing_matching_field:
- type: array[string] required: true description: An array of fields to verify existence
- missing_values:
- type: array[string] required: true description: An array of fields to ignore missing values
-
reset_default_data_quality_rules
(request, *args, **kwargs)¶ Resets an organization’s data data_quality rules — parameters:
- name: organization_id description: Organization ID type: integer required: true paramType: query
- type:
- status:
- type: string description: success or error required: true
- in_range_checking:
- type: array[string] required: true description: An array of in-range error rules
- missing_matching_field:
- type: array[string] required: true description: An array of fields to verify existence
- missing_values:
- type: array[string] required: true description: An array of fields to ignore missing values
-
save_data_quality_rules
(request, *args, **kwargs)¶ Saves an organization’s settings: name, query threshold, shared fields. The method passes in all the fields again, so it is okay to remove all the rules in the db, and just recreate them (albeit inefficient) — parameter_strategy: replace parameters:
- name: organization_id description: Organization ID type: integer required: true paramType: query
- name: body description: JSON body containing organization rules information paramType: body pytype: RulesSerializer required: true
- type:
- status:
- type: string description: success or error required: true
- message:
- type: string description: error message, if any required: true
-
suffix
= None¶
-
-
class
seed.views.data_quality.
RulesIntermediateSerializer
(instance=None, data=<class rest_framework.fields.empty>, **kwargs)¶ Bases:
rest_framework.serializers.Serializer
-
class
seed.views.data_quality.
RulesSerializer
(instance=None, data=<class rest_framework.fields.empty>, **kwargs)¶ Bases:
rest_framework.serializers.Serializer
-
class
seed.views.data_quality.
RulesSubSerializer
(instance=None, data=<class rest_framework.fields.empty>, **kwargs)¶ Bases:
rest_framework.serializers.Serializer
-
class
seed.views.data_quality.
RulesSubSerializerB
(instance=None, data=<class rest_framework.fields.empty>, **kwargs)¶ Bases:
rest_framework.serializers.Serializer