Skip to content

Commit 8b8a36f

Browse files
authored
Update README.md
1 parent 456b26d commit 8b8a36f

File tree

1 file changed

+12
-17
lines changed

1 file changed

+12
-17
lines changed

README.md

Lines changed: 12 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -22,10 +22,9 @@ The fixed structure must be technology agnostic.
2222

2323
### General
2424

25-
* `ID: [String]` UUID
26-
* `Name: [String]` the identifier of the Data Product
25+
* `ID: [String]` the unique identifier of the Data Product --> this will never change in the life of a DP
26+
* `Name: [String]` the name of DP
2727
* `FullyQualifiedName: [String]` Human-readable that uniquely identifies an entity
28-
* `DisplayName: [String]` Optional name used for display purposes
2928
* `Domain: [String]` the identifier of the domain this DP is belonging to
3029
* `Description: [String]` detailed description about what functional area this DP is representing, what purpose has and business related information.
3130
* `Version: [String]` this is representing the version of the DP, because we consider the DP as an indipendent unit of deployment, so if a breaking change is needed, we create a brand new versionof the DP
@@ -38,17 +37,16 @@ The fixed structure must be technology agnostic.
3837
* `Tags: [Array[Yaml]]` Free tags at DP level ( please refer to OpenMetadata https://docs.open-metadata.org/openmetadata/schemas/entities/tagcategory )
3938
* `Specific: [Yaml]` this is a custom section where we can put all the information strictly related to a specific execution environment. It can also refer to an additional file. At this level we also embed all the information to provision the general infrastructure ( resource groups, networking, etc ) needed for a specific Data Product. For example if a company decide to create a ResourceGroup for each data product and have a subscription reference for each domain and environment, it will be specified at this level. Also it is reccommended to put general security here, Azure Policy or IAM policies, VPC/Vnet, Subnet. THis will be filled merging data from
4039

41-
The **unique identifier** of a DataProduct is the concatenation of Domain, Name and Version. So we will refer to the `DP_UK` as a string composed in the following way `$DPDomain.$DPName.$DPVersion`
40+
The **unique identifier** of a DataProduct is the concatenation of Domain, Name and Version. So we will refer to the `DP_UK` as a string composed in the following way `$DPDomain.$DPID.$DPVersion`
4241

4342

4443

4544

4645
### Output Ports
4746

48-
* `ID: [String]` UUID
49-
* `Name: [String]` the identifier of the output port
47+
* `ID: [String]` the unique identifier of the output port --> not modifiable
48+
* `Name: [String]` the name of the DP
5049
* `FullyQualifiedName: [String]` Human-readable that uniquely identifies an entity
51-
* `DisplayName: [String]` Optional name used for display purposes
5250
* `ResourceType: [String]` the kind of output port: Files - SQL - Events. This should be extendible with GraphQL or others.
5351
* `Technology: [String]` the underlying technology is useful for the consumer to understand better how to consume the output port and also needed for self serve provisioning specific stuff.
5452
* `Description: [String]` detailed explanation about the function and the meaning of the output port
@@ -74,25 +72,23 @@ The **unique identifier** of a DataProduct is the concatenation of Domain, Name
7472

7573
### Workloads
7674

77-
* `ID: [String]` UUID
78-
* `Name: [String]` the identifier of the workload
75+
* `ID: [String]` the unique identifier of the workload
76+
* `Name: [String]` the name of the workload
7977
* `FullyQualifiedName: [String]` Human-readable that uniquely identifies an entity
80-
* `DisplayName: [String]` Optional name used for display purposes
8178
* `Description: [String]` detailed description about the process, its purpose and characteristics
8279
* `ResourceType: [String]` explain what type of workload is: Ingestion ETL, Streaming, Internal Process, etc.
8380
* `Technology: [String]` this is a list of technologies: Airflow, Spark, Scala. It is a free field but it is useful to understand better how it is behaving
8481
* `Description: [String]` detailed explaination about the purpose of the workload, what sources is reading, what business logic is apllying, etc
8582
* `Tags: [Array[Yaml]]` Free tags at Workload level ( please refer to OpenMetadata https://docs.open-metadata.org/openmetadata/schemas/entities/tagcategory )
86-
* `DependsOn: [Array[String]]` This is filled only for `DataPipeline` workloads and it represents the list of output ports or external systems that is reading. Output Ports are identified with `DP_UK.OutputPort_Name`, while external systems will be defined by a string `EX_$systemdescription`. Here we can elaborate a bit more and create a more semantic struct.
83+
* `ReadsFrom: [Array[String]]` This is filled only for `DataPipeline` workloads and it represents the list of output ports or external systems that is reading. Output Ports are identified with `DP_UK.OutputPort_ID`, while external systems will be defined by a string `EX_$systemdescription`. Here we can elaborate a bit more and create a more semantic struct.
8784
* `Specific: [Yaml]` this is a custom section where we can put all the information strictly related to a specific technology or dependent from a standard/policy defined in the federated governance.
8885

8986

9087
### Storage Area
9188

92-
* `ID: [String]` UUID
93-
* `Name: [String]` the identifier of the Storage Area
89+
* `ID: [String]` the unique identifier of the Storage Area
90+
* `Name: [String]` the name of the Storage Area
9491
* `FullyQualifiedName: [String]` Human-readable that uniquely identifies an entity
95-
* `DisplayName: [String]` Optional name used for display purposes
9692
* `ResourceType: [String]` explain what type of workload is, at the moment: batch or streaming
9793
* `Type: [String]` This is an enum `[HouseKeeping|DataPipeline]`, `Housekeeping` is for all the workloads that are acting on internal data without any external dependency. `DataPipeline` instead is for workloads that are reading from outputport of other DP or external systems.
9894
* `Technology: [String]` this is a list of technologies: S3, ADLS, GFS.
@@ -106,10 +102,9 @@ The **unique identifier** of a DataProduct is the concatenation of Domain, Name
106102
Observability should be applied to each Outputport and is better to represent it as the Swagger of an API rather than something declarative like a Yaml, because it will expose runtime metrics and statistics.
107103
Anyway is good to formalize what kind of information should be included and verified at deploy time for the observability API:
108104

109-
* `ID: [String]` UUID
110-
* `Name: [String]` the identifier of the observability API
105+
* `ID: [String]` the unique identifier of the observability API
106+
* `Name: [String]` the name of the observability API
111107
* `FullyQualifiedName: [String]` Human-readable that uniquely identifies an entity
112-
* `DisplayName: [String]` Optional name used for display purposes
113108
* `Description: [String]` detailed explanation about what this observability is exposing
114109
* `Endpoint: [URL]` this is the API endpoint that will expose the observability for each OutputPort
115110

0 commit comments

Comments
 (0)