You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+12-17Lines changed: 12 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -22,10 +22,9 @@ The fixed structure must be technology agnostic.
22
22
23
23
### General
24
24
25
-
*`ID: [String]`UUID
26
-
*`Name: [String]` the identifier of the Data Product
25
+
*`ID: [String]`the unique identifier of the Data Product --> this will never change in the life of a DP
26
+
*`Name: [String]` the name of DP
27
27
*`FullyQualifiedName: [String]` Human-readable that uniquely identifies an entity
28
-
*`DisplayName: [String]` Optional name used for display purposes
29
28
*`Domain: [String]` the identifier of the domain this DP is belonging to
30
29
*`Description: [String]` detailed description about what functional area this DP is representing, what purpose has and business related information.
31
30
*`Version: [String]` this is representing the version of the DP, because we consider the DP as an indipendent unit of deployment, so if a breaking change is needed, we create a brand new versionof the DP
@@ -38,17 +37,16 @@ The fixed structure must be technology agnostic.
38
37
*`Tags: [Array[Yaml]]` Free tags at DP level ( please refer to OpenMetadata https://docs.open-metadata.org/openmetadata/schemas/entities/tagcategory )
39
38
*`Specific: [Yaml]` this is a custom section where we can put all the information strictly related to a specific execution environment. It can also refer to an additional file. At this level we also embed all the information to provision the general infrastructure ( resource groups, networking, etc ) needed for a specific Data Product. For example if a company decide to create a ResourceGroup for each data product and have a subscription reference for each domain and environment, it will be specified at this level. Also it is reccommended to put general security here, Azure Policy or IAM policies, VPC/Vnet, Subnet. THis will be filled merging data from
40
39
41
-
The **unique identifier** of a DataProduct is the concatenation of Domain, Name and Version. So we will refer to the `DP_UK` as a string composed in the following way `$DPDomain.$DPName.$DPVersion`
40
+
The **unique identifier** of a DataProduct is the concatenation of Domain, Name and Version. So we will refer to the `DP_UK` as a string composed in the following way `$DPDomain.$DPID.$DPVersion`
42
41
43
42
44
43
45
44
46
45
### Output Ports
47
46
48
-
*`ID: [String]`UUID
49
-
*`Name: [String]` the identifier of the output port
47
+
*`ID: [String]`the unique identifier of the output port --> not modifiable
48
+
*`Name: [String]` the name of the DP
50
49
*`FullyQualifiedName: [String]` Human-readable that uniquely identifies an entity
51
-
*`DisplayName: [String]` Optional name used for display purposes
52
50
*`ResourceType: [String]` the kind of output port: Files - SQL - Events. This should be extendible with GraphQL or others.
53
51
*`Technology: [String]` the underlying technology is useful for the consumer to understand better how to consume the output port and also needed for self serve provisioning specific stuff.
54
52
*`Description: [String]` detailed explanation about the function and the meaning of the output port
@@ -74,25 +72,23 @@ The **unique identifier** of a DataProduct is the concatenation of Domain, Name
74
72
75
73
### Workloads
76
74
77
-
*`ID: [String]`UUID
78
-
*`Name: [String]` the identifier of the workload
75
+
*`ID: [String]`the unique identifier of the workload
76
+
*`Name: [String]` the name of the workload
79
77
*`FullyQualifiedName: [String]` Human-readable that uniquely identifies an entity
80
-
*`DisplayName: [String]` Optional name used for display purposes
81
78
*`Description: [String]` detailed description about the process, its purpose and characteristics
82
79
*`ResourceType: [String]` explain what type of workload is: Ingestion ETL, Streaming, Internal Process, etc.
83
80
*`Technology: [String]` this is a list of technologies: Airflow, Spark, Scala. It is a free field but it is useful to understand better how it is behaving
84
81
*`Description: [String]` detailed explaination about the purpose of the workload, what sources is reading, what business logic is apllying, etc
85
82
*`Tags: [Array[Yaml]]` Free tags at Workload level ( please refer to OpenMetadata https://docs.open-metadata.org/openmetadata/schemas/entities/tagcategory )
86
-
*`DependsOn: [Array[String]]` This is filled only for `DataPipeline` workloads and it represents the list of output ports or external systems that is reading. Output Ports are identified with `DP_UK.OutputPort_Name`, while external systems will be defined by a string `EX_$systemdescription`. Here we can elaborate a bit more and create a more semantic struct.
83
+
*`ReadsFrom: [Array[String]]` This is filled only for `DataPipeline` workloads and it represents the list of output ports or external systems that is reading. Output Ports are identified with `DP_UK.OutputPort_ID`, while external systems will be defined by a string `EX_$systemdescription`. Here we can elaborate a bit more and create a more semantic struct.
87
84
*`Specific: [Yaml]` this is a custom section where we can put all the information strictly related to a specific technology or dependent from a standard/policy defined in the federated governance.
88
85
89
86
90
87
### Storage Area
91
88
92
-
*`ID: [String]`UUID
93
-
*`Name: [String]` the identifier of the Storage Area
89
+
*`ID: [String]`the unique identifier of the Storage Area
90
+
*`Name: [String]` the name of the Storage Area
94
91
*`FullyQualifiedName: [String]` Human-readable that uniquely identifies an entity
95
-
*`DisplayName: [String]` Optional name used for display purposes
96
92
*`ResourceType: [String]` explain what type of workload is, at the moment: batch or streaming
97
93
*`Type: [String]` This is an enum `[HouseKeeping|DataPipeline]`, `Housekeeping` is for all the workloads that are acting on internal data without any external dependency. `DataPipeline` instead is for workloads that are reading from outputport of other DP or external systems.
98
94
*`Technology: [String]` this is a list of technologies: S3, ADLS, GFS.
@@ -106,10 +102,9 @@ The **unique identifier** of a DataProduct is the concatenation of Domain, Name
106
102
Observability should be applied to each Outputport and is better to represent it as the Swagger of an API rather than something declarative like a Yaml, because it will expose runtime metrics and statistics.
107
103
Anyway is good to formalize what kind of information should be included and verified at deploy time for the observability API:
108
104
109
-
*`ID: [String]`UUID
110
-
*`Name: [String]` the identifier of the observability API
105
+
*`ID: [String]`the unique identifier of the observability API
106
+
*`Name: [String]` the name of the observability API
111
107
*`FullyQualifiedName: [String]` Human-readable that uniquely identifies an entity
112
-
*`DisplayName: [String]` Optional name used for display purposes
113
108
*`Description: [String]` detailed explanation about what this observability is exposing
114
109
*`Endpoint: [URL]` this is the API endpoint that will expose the observability for each OutputPort
0 commit comments