Data Studio Dataset Versioning and Best Practices

Overview

Data Studio datasets may become outdated for various reasons (e.g. change in schema).  As client integrations have dependencies on these datasets, PrecisionLender maintains a dataset version and deprecation process.

 

In this Article

 

Dataset Versioning 

  • Version definition: v{major}.{minor}  -  for example, v2.1 is major version 2 and minor version 1
  • Major version changes signify breaking changes - changes that make the dataset non-backwards compatible (or non-resolvable over the entire time series).  These may require changes to client processes since they may include removal of fields, renamed fields, reordering of fields, or changes to the data type or meaning of fields.
  • Minor version changes signify non-breaking changes.  These typically do not require changes to client processes.  A common minor version update is the addition of a field to the end of the dataset.

When there is a version update (major or minor), there is the possibility for the old version to continue populating for a period of time while the new version is already active.  This means there can be duplicate entries across versions.  Typically, this overlap is expected for major version changes but not common for minor version changes.

For some major version updates, PrecisionLender will backfill the entire time series with the updated data.  This makes it possible to work with the entire time series without having to handle differences in the dataset across time.  When we backfill the time series in this manner, we'll communicate that situation directly and via Release Notes.

 

Best Practices 

  • Specify your major version in your code, and either only read in that major version or filter to that major version after loading all versions.  By being explicit about the major version, your code won't start reading in a new major version without your awareness- this would be undesirable as new major versions introduce breaking changes.  Instead, when there is a new major version, you should make any necessary code changes and then manually update the major version you're loading.

  • Load all minor versions for your specified major version, and filter for the latest minor version for each set of unique IDs for that dataset.  For example, if AccountId and DatePartition are the fields that uniquely identify records in a dataset, you can filter for the latest minor version for each combination of (AccountId, DatePartition) - this ensures you won't have duplicates and you'll be working with the latest data.