-
Notifications
You must be signed in to change notification settings - Fork 2
QUA-997: Refactor " Container Userguide" as per your suggestions. #880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
54e088b
513714f
6f92f98
1d88bf6
4e61ce3
fa2cd6b
0f02305
c1f3715
15d5784
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I found the following issues:
Overall note: The document is well-written and clear. The only issue is a missing article in one sentence. The rest of the content is grammatically correct and properly formatted.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @RafaelOsiro skip this one |
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks like we are not using this screenshot here. Let's check and then remove (if necessary)
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @RafaelOsiro Done use this one another one deleted |
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In general, this page could have links that redirect users to pages where they can perform actions. If the content is about adding a computed field, then it could have a link to the page explaining how to add it. And so on.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @RafaelOsiro done
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I found the following issues:
Overall consistency note: The tables have inconsistent spacing in the "Options" column with extra spaces before the text. Consider standardizing the spacing across all table rows for better readability and consistency.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @RafaelOsiro i left some suggestion that does not make sense |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,44 @@ | ||
| # Actions on Container | ||
|
|
||
| Users can perform various operations on containers to manage datasets effectively. The actions are divided into three main sections: **Settings**, **Add**, and **Run**. Each section contains specific options to perform different tasks. | ||
|
|
||
|  | ||
|
|
||
| ## Settings | ||
|
|
||
| The **Settings** button allows users to configure the container. By clicking on the **Settings** button, users can access the following options: | ||
|
|
||
|  | ||
|
|
||
| | No | Options | Description | | ||
| | :---- | :---- | :---- | | ||
| | **1.** | Settings | Configure incremental strategy, partitioning fields, and exclude specific fields from analysis. | | ||
| | **2.** | Score | Score allowing you to adjust the decay period and factor weights for metrics like completeness, accuracy, and consistency.<br>**Note:** To understand how each score metric works in detail (completeness, accuracy, consistency, decay period, and weights), please refer to the [Quality Score Page](../quality-scores/what-are-quality-scores.md){target="_blank"}.<br>Score settings modified here apply **only to this container** and do not affect any other container in the datastore.| | ||
| | **3.** | Observability | Enables or disables tracking for data volume and freshness.<br>**Volume Tracking:** Monitors daily volume metrics to identify trends and detect anomalies over time.<br>**Freshness Tracking:** Records the last update timestamp to ensure data timeliness and detect pipeline delays. | | ||
| | **4.** | Migrate | Migrate authored quality checks from one container to another (even across datastores) to quickly reuse, standardize, and avoid recreating rules. | | ||
| | **5.** | Export | Export quality checks, field profiles, and anomalies to an enrichment datastore for further action or analysis. | | ||
| | **6.** | Materialize | Captures snapshots of data from a source datastore and exports it to an enrichment datastore for faster access and analysis. | | ||
| | **7.** | Delete | Delete the selected container from the system. | | ||
|
|
||
| ## Add | ||
|
|
||
| The **Add** button allows users to add checks or computed fields. By clicking on the **Add** button, users can access the following options: | ||
|
|
||
|  | ||
|
|
||
| | No. | Options | Description | | ||
| | :---- | :---- | :---- | | ||
| | **1.** | Checks | Checks allow you to add new checks or validation rules for the container.<br>**Note:** To learn how to add checks, refer to the [Check Templates documentation](../checks/checks-template.md){target="_blank"}.| | ||
| | **2.** | Computed Field | Allows you to add a computed field.<br>**Note:** To learn how to create a computed field, refer to the [Computed Field Guide](../container/computed-fields/add-computed-fields.md){target="_blank"}.| | ||
|
|
||
| ## Run | ||
|
|
||
| The **Run** button provides options to execute operations on datasets, such as profiling, scanning, and external scans. By clicking on the **Run** button, users can access the following options: | ||
|
|
||
|  | ||
|
|
||
| | No. | Options | Description | | ||
| | :---- | :---- | :---- | | ||
| | **1.** | Profile | **Profile** allows you to run a profiling operation to analyze the data structure, gather metadata, set thresholds, and define record limits for comprehensive dataset profiling.<br>**Note:** For profile operation, please refer to the [Profile Operation documentation](../source-datastore/profile.md){target="_blank"}. | | ||
| | **2.** | Scan | **Scan** allows you to perform data quality checks, configure scan strategies, and detect anomalies in the dataset.<br>**Note:** For scan operation, please refer to the [Scan Operation documentation](../source-datastore/scan.md){target="_blank"}. | | ||
| | **3.** | External Scan | **External Scan** allows you to upload a file and validate its data against predefined checks in the selected table.<br>**Note:** For external scan, please refer to the [ External Scan documentation](../source-datastore/external-scan.md){target="_blank"}. | |
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I found the following issues:
Overall consistency note: The main issues are periods after numbers in the table and inconsistent table header naming. Consider standardizing to "DESCRIPTION" or "ACTIONS" (plural) for the third column header to match other documentation files.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @RafaelOsiro done, and skip that does not make sense |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,52 @@ | ||
| # Add Computed Fields | ||
|
|
||
| **Step 1:** Log in to Your Qualytics Account, navigate to the side menu, and select the **source datastore** where you want to create a computed field. | ||
|
|
||
|  | ||
|
|
||
| **Step 2:** Select the **Container** within the chosen datastore where you want to create the computed field. This container holds the data to which the new computed field will be applied, enabling you to enhance your data analysis within that specific datastore. | ||
|
|
||
| For demonstration purposes, we have selected the **Bank Dataset-Staging** source datastore and the **bank_transactions_.csv** container within it to create a computed field. | ||
|
|
||
|  | ||
|
|
||
| **Step 3:** After selecting the container, click on the **Add** button and select **Computed Field** from the dropdown menu to create a new computed field. | ||
|
|
||
|  | ||
|
|
||
| A modal window will appear, allowing you to enter the details for your computed field. | ||
|
|
||
|  | ||
|
|
||
| **Step 4:** Enter the **Name** for the computed field, select **Transformation Type** from the dropdown menu, and optionally add **Additional Metadata**. | ||
|
|
||
| | REF. | FIELDS | ACTION | | ||
| |------|--------|--------| | ||
| | 1 | Field Name (Required) | Add a unique name for your computed field. | | ||
| | 2 | Transformation Type (Required) | The type of transformation you want to apply from the available options. | | ||
| | 3 | Additional Metadata (Optional) | Enhance the computed field definition by setting custom metadata. Click the plus icon **(+)** to open the metadata input form and add key-value pairs. | | ||
|
|
||
|  | ||
|
|
||
| !!! info | ||
| Transformations are changes made to data, like converting formats, doing calculations, or cleaning up fields. In Qualytics, you can use transformations to meet specific needs, such as cleaning entity names, converting formatted numbers, or applying custom expressions. With various transformation types available, Qualytics enables you to customize your data directly within the platform, ensuring it’s accurate and ready for analysis. | ||
|
|
||
| | Transformation Types | Purpose | Reference | | ||
| |------|--------|---------| | ||
| | Cleaned Entity Name | Removes business signifiers (such as 'Inc.' or 'Corp') from an entity name. | For more information, please refer to the guide [cleaned entity name section.](../computed-fields/transformation-types.md#cleaned-entity-name){target="_blank"} | | ||
| | Convert Formatted Numeric | Removes formatting (such as parentheses for denoting negatives or commas as delimiters) from values that represent numeric data, converting them into a numerically typed field. | For more information, please refer to the guide [convert formatted numeric section.](../computed-fields/transformation-types.md#convert-formatted-numeric){target="_blank"} | | ||
| | Custom Expression | Allows you to create a new field by applying any valid Spark SQL expression to one or more existing fields. | For more information, please refer to the guide [custom expression section.](../computed-fields/transformation-types.md#custom-expression){target="_blank"} | | ||
|
|
||
|  | ||
|
|
||
| **Step 5:** After selecting the appropriate **Transformation Type**, click the **Save** button. | ||
|
|
||
|  | ||
|
|
||
| **Step 6:** After clicking on the **Save** button, your computed field is created and a success flash message will display saying **The computed field has been successfully created**. | ||
|
|
||
|  | ||
|
|
||
| You can find your computed field by clicking on the dropdown arrow next to the container you selected when creating the computed field. | ||
|
|
||
|  | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This image looks strange. In the tree view, the computed fields icon doesn't look right.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @RafaelOsiro, this icon shows the field created using expressions or formulas. So that's why it's good, no need to change
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @iammuze we need to change this screenshot. It may cause confusion to users and customers.
|
||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I found the following issues:
Overall consistency note: The numbered items in the Totals section use periods after the numbers (1., 2., 3., etc.), which should be standardized across the documentation to match the preferred format without periods. The table format in the Profile section is correct without periods.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @RafaelOsiro done and skip that does not make sense. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,50 @@ | ||
| # Computed Fields Details | ||
|
|
||
| Computed Field Details provides a quick overview of the metrics generated from a computed field. The **Totals** section summarizes the results produced by this computed field and displays a report that reflects only the data output of this specific computed field. | ||
|
|
||
| ### Totals | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Below the title, you could include a brief explanation of what Totals does. And remember that it shows the report for that computed field.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @RafaelOsiro done |
||
|
|
||
| **1 Quality Score**: This provides a comprehensive assessment of the overall health of the data, factoring in multiple checks for accuracy, consistency, and completeness. A higher score, closer to 100, indicates optimal data quality with minimal issues or errors detected. A lower score may highlight areas that require attention and improvement. | ||
|
|
||
| **2 Sampling**: This shows the percentage of data that was evaluated during profiling. A sampling rate of 100% indicates that the entire dataset was analyzed, ensuring a complete and accurate representation of the data’s quality across all records, rather than just a partial sample. | ||
|
|
||
| **3 Completeness**: This metric measures how fully the data is populated without missing or null values. A higher completeness percentage means that most fields contain the necessary information, while a lower percentage indicates data gaps that could negatively impact downstream processes or analysis. | ||
|
|
||
| **4 Active Checks**: This refers to the number of ongoing quality checks being applied to the dataset. These checks monitor aspects such as format consistency, uniqueness, and logical correctness. Active checks help maintain data integrity and provide real-time alerts about potential issues that may arise. | ||
|
|
||
| **5 Active Anomalies**: This tracks the number of anomalies or irregularities detected in the data. These could include outliers, duplicates, or inconsistencies that deviate from expected patterns. A count of zero indicates no anomalies, while a higher count suggests that further investigation is needed to resolve potential data quality issues. | ||
|
|
||
|  | ||
|
|
||
| ### Profile | ||
|
|
||
| This provides detailed insights into the characteristics of the field, including its type, distinct values, and length. You can use this information to evaluate the data's uniqueness, length consistency, and complexity. | ||
|
|
||
| | **No** | **Profile** | **Description** | | ||
| |--------|-----------------------|---------------------------------------------------------------------------------| | ||
| | 1 | Declared Type | Indicates whether the type is declared by the source or inferred. | | ||
| | 2 | Distinct Values | Count of distinct values observed in the dataset. | | ||
| | 3 | Min Length | Shortest length of the observed string values or lowest value for numerics. | | ||
| | 4 | Max Length | Greatest length of the observed string values or highest value for numerics. | | ||
| | 5 | Mean | Mathematical average of the observed numeric values. | | ||
| | 6 | Median | The median of the observed numeric values. | | ||
| | 7 | Standard Deviation | Measure of the amount of variation in observed numeric values. | | ||
| | 8 | Kurtosis | Measure of the ‘tailedness’ of the distribution of observed numeric values. | | ||
| | 9 | Skewness | Measure of the asymmetry of the distribution of observed numeric values. | | ||
| | 10 | Q1 | The first quartile; the central point between the minimum and the median. | | ||
| | 11 | Q3 | The third quartile; the central point between the median and the maximum. | | ||
| | 12 | Sum | Total sum of all observed numeric values. | | ||
|
|
||
|  | ||
|
|
||
| You can hover over the **(i)** button to view the native field properties, which provide detailed information such as the field's type (numeric), size, decimal digits, and whether it allows null values. | ||
|
|
||
|  | ||
|
|
||
| #### Last Profile | ||
|
|
||
| The **Last Profile** timestamp helps users understand how up to date the field is. When you hover over the time indicator shown on the right side of the Last Profile label (e.g., "8 months ago"), a tooltip displays the complete date and time the field was last profiled. | ||
|
|
||
|  | ||
|
|
||
| This visibility ensures better context for interpreting profile metrics like mean, completeness, and anomalies. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I found the following issues:
Overall consistency note: All the "For more information please refer" phrases throughout the document are missing commas after "information". This should be standardized across all instances to match the pattern established in other documentation files.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @RafaelOsiro done |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,25 @@ | ||
| # Computed Fields | ||
|
|
||
| Computed Fields allow you to enhance data analysis by applying dynamic transformations directly to your data. These fields let you create new data points, perform calculations, and customize data views based on your specific needs, ensuring your data is both accurate and actionable. | ||
|
|
||
| Let's get started 🚀 | ||
|
|
||
| ## Add Computed Fields | ||
|
|
||
| **Step 1:** Log in to Your Qualytics Account, navigate to the side menu, and select the **source datastore** where you want to add a computed field. | ||
|
|
||
| !!! note | ||
| For next steps please refer to the [add computed field documentation](../computed-fields/add-computed-fields.md){target="_blank"}. | ||
|
|
||
| ## Computed Fields Details | ||
|
|
||
| ### Totals | ||
|
|
||
| !!! note | ||
| For more information, please refer to the [computed fields details documentation](../computed-fields/computed-fields-details.md){target="_blank"}. | ||
|
|
||
| ## Types of Transformations | ||
|
|
||
| !!! note | ||
| For more information, please refer to the [types of transformations](../computed-fields/transformation-types.md){target="_blank"}. | ||
|
|

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found the following issues:
Line 30:
| No. | File Format | Extension Example |The table header uses "No." with a period, which should be standardized. For consistency across documentation, consider using "No" without a period.
Line 32:
| 1 | Avro |.avro|Through line 45, the table rows don't use periods after numbers, which is correct and consistent. This is fine as-is.
Line 105:
!!! example "Begin by creating a new folder in your distributed filesystem."The word "filesystem" should be two words for consistency with usage elsewhere in the document. Should be:
!!! example "Begin by creating a new folder in your distributed file system."Line 141:
This option leverages filename conventions that align with POSIX globs, allowing our system to automatically organize files for you.The phrase "our system" shifts to first-person perspective, which is inconsistent with the rest of the documentation's third-person tone. Should be:
This option leverages filename conventions that align with POSIX globs, allowing the system to automatically organize files for you.Line 143:
The system intelligently analyzes filename patterns, making the process seamless and efficient.This is correct as written.
Line 164:
!!! example " Our system will automatically detect and analyze the filename conventions, creating appropriate glob patterns."Extra space at the beginning of the quote. Also, "Our system" should be "The system" for consistency. Should be:
!!! example "The system will automatically detect and analyze the filename conventions, creating appropriate glob patterns."Line 178:
While our system offers powerful features to automate file organization, we strongly discourage manually creating globs.Again, "our system" and "we strongly discourage" uses first-person perspective. Should be:
While the system offers powerful features to automate file organization, manually creating globs is strongly discouraged.Line 180:
This option may lead to errors, inconsistencies, and hinder the efficiency of our system."our system" should be "the system". Should be:
This option may lead to errors, inconsistencies, and hinder the efficiency of the system.Line 182:
We recommend leveraging our automated tools for a seamless and error-free experience."We recommend" and "our automated tools" should be rephrased. Should be:
It is recommended to leverage the automated tools for a seamless and error-free experience.Overall consistency note: The document shifts between first-person ("our system", "we recommend") and third-person perspective. For consistency with technical documentation standards, use third-person throughout ("the system", "it is recommended"). Also, standardize "filesystem" as "file system" (two words).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RafaelOsiro skip this one