Data Quality Package

Inheritance

Submodules

Models

exception seed.models.data_quality.ComparisonError

Bases: exceptions.Exception

class seed.models.data_quality.DataQualityCheck(*args, **kwargs)

Bases: django.db.models.base.Model

Object that stores the high level configuration per organization of the DataQualityCheck

exception DoesNotExist

Bases: django.core.exceptions.ObjectDoesNotExist

exception MultipleObjectsReturned

Bases: django.core.exceptions.MultipleObjectsReturned

REQUIRED_FIELDS = {‘PropertyState’: [‘address_line_1’, ‘custom_id_1’, ‘pm_property_id’], ‘TaxLotState’: [‘address_line_1’, ‘custom_id_1’, ‘jurisdiction_tax_lot_id’]}
add_result_comparison_error(row_id, rule, display_name, value, rule_check)
add_result_is_null(row_id, rule, display_name, value)
add_result_max_error(row_id, rule, display_name, value, rule_max)
add_result_min_error(row_id, rule, display_name, value, rule_min)
add_result_missing_and_none(row_id, rule, display_name, value)
add_result_missing_req(row_id, rule, display_name, value)
add_result_string_error(row_id, rule, display_name, value)
add_rule(rule)

Add a new rule to the Data Quality Checks

Parameters:rule – dict to be added as a new rule
Returns:None
static cache_key(identifier)

Static method to return the location of the data_quality results from redis.

Parameters:identifier – Import file primary key
Returns:
check_data(record_type, rows)

Send in data as a queryset from the Property/Taxlot ids.

Parameters:
  • record_type – one of PropertyState | TaxLotState
  • rows – rows of data to be checked for data quality
Returns:

None

get_fieldnames(record_type)

Get fieldnames to apply to results.

static initialize_cache(identifier)

Initialize the cache for storing the results. This is called before the celery tasks are chunked up.

Parameters:identifier – Import file primary key
Returns:string, cache key
initialize_rules()

Initialize the default rules for a DataQualityCheck object

Returns:None
objects = <django.db.models.manager.Manager object>
organization

Accessor to the related object on the forward side of a many-to-one or one-to-one relation.

In the example:

class Child(Model):
    parent = ForeignKey(Parent, related_name='children')

child.parent is a ForwardManyToOneDescriptor instance.

remove_all_rules()

Removes all the rules associated with this DataQualityCheck instance.

Returns:None
remove_status_label(label_class, rule, linked_id)

Remove label because it did not match any of the range exceptions

Parameters:
  • label_class – statuslabel object, either property label or taxlot label
  • rule – rule object
  • linked_id – id of propertystate or taxlotstate object
Returns:

boolean, if labeled was applied

reset_all_rules()

Delete all rules and reinitialize the default set of rules

Returns:None
reset_default_rules()

Reset only the default rules

Returns:
reset_results()
classmethod retrieve(organization)

DataQualityCheck was previously a simple object but has been migrated to a django model. This method ensures that the data quality model will be backwards compatible.

This is the preferred method to initialize a new object.

Parameters:organization – int or instance of Organization
Returns:obj, DataQualityCheck
retrieve_result_by_address(address)

Retrieve the results of the data quality checks for a specific address.

Parameters:address – string, address to find the result for
Returns:dict, results of data quality check for specific building
retrieve_result_by_tax_lot_id(tax_lot_id)

Retrieve the results of the data quality checks by the jurisdiction ID.

Parameters:tax_lot_id – string, jurisdiction tax lot id
Returns:dict, results of data quality check for specific building
rules

Accessor to the related objects manager on the reverse side of a many-to-one relation.

In the example:

class Child(Model):
    parent = ForeignKey(Parent, related_name='children')

parent.children is a ReverseManyToOneDescriptor instance.

Most of the implementation is delegated to a dynamically defined manager class built by create_forward_many_to_many_manager() defined below.

save_to_cache(identifier)

Save the results to the cache database. The data in the cache are stored as a list of dictionaries. The data in this class are stored as a dict of dict. This is important to remember because the data from the cache cannot be simply loaded into the above structure.

Parameters:identifier – Import file primary key
Returns:None
update_status_label(label_class, rule, linked_id)
Parameters:
  • label_class – statuslabel object, either property label or taxlot label
  • rule – rule object
  • linked_id – id of propertystate or taxlotstate object
Returns:

boolean, if labeled was applied

class seed.models.data_quality.Rule(*args, **kwargs)

Bases: django.db.models.base.Model

Rules for DataQualityCheck

exception DoesNotExist

Bases: django.core.exceptions.ObjectDoesNotExist

exception MultipleObjectsReturned

Bases: django.core.exceptions.MultipleObjectsReturned

data_quality_check

Accessor to the related object on the forward side of a many-to-one or one-to-one relation.

In the example:

class Child(Model):
    parent = ForeignKey(Parent, related_name='children')

child.parent is a ForwardManyToOneDescriptor instance.

format_strings(value)
get_data_type_display(*moreargs, **morekwargs)
get_rule_type_display(*moreargs, **morekwargs)
get_severity_display(*moreargs, **morekwargs)
maximum_valid(value)

Validate that the value is not greater than the maximum specified by the rule.

Parameters:value – Value to validate rule against
Returns:bool, True is valid, False if the value is out of range
minimum_valid(value)

Validate that the value is not less than the minimum specified by the rule.

Parameters:value – Value to validate rule against
Returns:bool, True is valid, False if the value is out of range
objects = <django.db.models.manager.Manager object>
status_label

Accessor to the related object on the forward side of a many-to-one or one-to-one relation.

In the example:

class Child(Model):
    parent = ForeignKey(Parent, related_name='children')

child.parent is a ForwardManyToOneDescriptor instance.

str_to_data_type(value)

If the check is coming from a field in the database then it will be typed correctly; however, for extra_data, the values are typically strings or unicode. Therefore, the values are typed before they are checked using the rule’s data type definition.

Parameters:value – variant, value to type
Returns:typed value
valid_text(value)

Validate the rule matches the specified text. Text is matched by regex.

Parameters:value – Value to validate rule against
Returns:bool, True is valid, False if the value does not match

Tests

Views

class seed.views.data_quality.DataQualityViews(**kwargs)

Bases: rest_framework.viewsets.ViewSet

Handles Data Quality API operations within Inventory backend. (1) Post, wait, get… (2) Respond with what changed

authentication_classes = (<class ‘rest_framework.authentication.SessionAuthentication’>, <class ‘seed.authentication.SEEDAuthentication’>)
create(request)

This API endpoint will create a new cleansing operation process in the background, on potentially a subset of properties/taxlots, and return back a query key — parameters:

  • name: organization_id description: Organization ID type: integer required: true paramType: query

  • name: data_quality_ids description: An object containing IDs of the records to perform data quality checks on.

    Should contain two keys- property_state_ids and taxlot_state_ids, each of which is an array of appropriate IDs.

    required: true paramType: body

type:
status:
type: string description: success or error required: true
data_quality_rules(request, *args, **kwargs)

Returns the data_quality rules for an org. — parameters:

  • name: organization_id description: Organization ID type: integer required: true paramType: query
type:
status:
type: string required: true description: success or error
rules:
type: object required: true description: An object containing ‘properties’ and ‘taxlots’ arrays of rules
reset_all_data_quality_rules(request, *args, **kwargs)

Resets an organization’s data data_quality rules — parameters:

  • name: organization_id description: Organization ID type: integer required: true paramType: query
type:
status:
type: string description: success or error required: true
in_range_checking:
type: array[string] required: true description: An array of in-range error rules
missing_matching_field:
type: array[string] required: true description: An array of fields to verify existence
missing_values:
type: array[string] required: true description: An array of fields to ignore missing values
reset_default_data_quality_rules(request, *args, **kwargs)

Resets an organization’s data data_quality rules — parameters:

  • name: organization_id description: Organization ID type: integer required: true paramType: query
type:
status:
type: string description: success or error required: true
in_range_checking:
type: array[string] required: true description: An array of in-range error rules
missing_matching_field:
type: array[string] required: true description: An array of fields to verify existence
missing_values:
type: array[string] required: true description: An array of fields to ignore missing values
save_data_quality_rules(request, *args, **kwargs)

Saves an organization’s settings: name, query threshold, shared fields. The method passes in all the fields again, so it is okay to remove all the rules in the db, and just recreate them (albeit inefficient) — parameter_strategy: replace parameters:

  • name: organization_id description: Organization ID type: integer required: true paramType: query
  • name: body description: JSON body containing organization rules information paramType: body pytype: RulesSerializer required: true
type:
status:
type: string description: success or error required: true
message:
type: string description: error message, if any required: true
suffix = None
class seed.views.data_quality.RulesIntermediateSerializer(instance=None, data=<class rest_framework.fields.empty>, **kwargs)

Bases: rest_framework.serializers.Serializer

class seed.views.data_quality.RulesSerializer(instance=None, data=<class rest_framework.fields.empty>, **kwargs)

Bases: rest_framework.serializers.Serializer

class seed.views.data_quality.RulesSubSerializer(instance=None, data=<class rest_framework.fields.empty>, **kwargs)

Bases: rest_framework.serializers.Serializer

class seed.views.data_quality.RulesSubSerializerB(instance=None, data=<class rest_framework.fields.empty>, **kwargs)

Bases: rest_framework.serializers.Serializer